<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.opensourceecology.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Maltfield</id>
	<title>Open Source Ecology - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.opensourceecology.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Maltfield"/>
	<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/wiki/Special:Contributions/Maltfield"/>
	<updated>2026-04-24T11:16:00Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.39.13</generator>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Google_Workspace&amp;diff=319738</id>
		<title>Google Workspace</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Google_Workspace&amp;diff=319738"/>
		<updated>2026-02-03T17:23:25Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: paragraph formatting&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;As a legally-registered NGO (non-profit in the US), Open Source Ecology has a free Google Workspace account.&lt;br /&gt;
&lt;br /&gt;
Note that Google Workspace is also known as:&lt;br /&gt;
&lt;br /&gt;
# Google Apps (or gapps) and&lt;br /&gt;
# Google Suite (or gsuite)&lt;br /&gt;
&lt;br /&gt;
= Why? =&lt;br /&gt;
&lt;br /&gt;
Google Workspace lets us create Google accounts with a username on the &amp;lt;code&amp;gt;@opensourceecology.org&amp;lt;/code&amp;gt; domain. For example, when OSE users manage email, we can do so from a gmail-like UI. While we have access to numerous apps in Google Workspace, OSE specifically makes heavy use of the following apps:&lt;br /&gt;
&lt;br /&gt;
# Google Mail&lt;br /&gt;
# Google Calendar&lt;br /&gt;
# Google Docs&lt;br /&gt;
# Google Drive&lt;br /&gt;
# Google Meet&lt;br /&gt;
# Google Groups&lt;br /&gt;
# Google Slides&lt;br /&gt;
# etc&lt;br /&gt;
&lt;br /&gt;
=Google Groups=&lt;br /&gt;
&lt;br /&gt;
OSE uses (internal-only) Google Groups for creating one-to-many email lists (a designated email account that reaches the inbox of many people at OSE).&lt;br /&gt;
&lt;br /&gt;
Because&lt;br /&gt;
&lt;br /&gt;
# Google doesn&#039;t support the concept of &amp;quot;shared accounts&amp;quot;, &amp;lt;ref&amp;gt;https://support.google.com/a/answer/33330?hl=en&amp;lt;/ref&amp;gt;,&lt;br /&gt;
# Google may lock you out of being able to login to your account if their anomaly detection system thinks an account is being shared, &amp;lt;ref&amp;gt;https://support.google.com/a/answer/6002699?hl=en&amp;amp;ref_topic=2759193&amp;amp;sjid=814912251894340756-EU#zippy=%2Cwhen-does-google-consider-a-sign-in-attempt-suspicious&amp;lt;/ref&amp;gt;&lt;br /&gt;
# Google won&#039;t let you turn-off their &amp;quot;suspicious login&amp;quot; feature that locks you out of your own account -- even if their system is faulty and blocking you from logging in, even when you entered the correct password&amp;lt;ref&amp;gt;https://knowledge.workspace.google.com/kb/how-to-disable-login-challenge-security-method-permanently-000007696&amp;lt;/ref&amp;gt;&lt;br /&gt;
# Google doesn&#039;t let you forward mail from one account to many accounts&lt;br /&gt;
&lt;br /&gt;
If you want to create a one-to-many email address (eg &amp;lt;code&amp;gt;tractor-team@opensourceecology.org&amp;lt;/code&amp;gt;) for which there are many recipients, the way to do this in Google Workspace is to create a &amp;quot;Google Group&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== System Alerts ==&lt;br /&gt;
&lt;br /&gt;
For example, in September 2024, OSE nearly lost all of its backup data (on [[Backblaze]]) due to few missed payments (amounting to &amp;lt;$10) because our bank false-positive blocked the transaction as &amp;quot;suspicious&amp;quot;. The issue was exacerbated by the fact that our backblaze-specific email address (which received many, many &amp;quot;payment failed&amp;quot; alerts) was not being forwarded to the email inboxes of Marcin (or anyone else).&lt;br /&gt;
&lt;br /&gt;
For security reasons, it&#039;s always better to use services that &#039;&#039;don&#039;t&#039;&#039; use shared logins. If possible, create one user account per person and grant that user account access to the OSE account. Unfortunately, this isn&#039;t possible with many services -- and we&#039;re forced to use one shared account.&lt;br /&gt;
&lt;br /&gt;
For more flexibility and security, rather than signing-up for an account directly with some shared &amp;lt;code&amp;gt;some-google-group-list@opensourceecology.org&amp;lt;/code&amp;gt; account that&#039;s tied to a Google Group directly, we create a new user account for that account. Then you can [1] forward all of that account&#039;s mail to a Google Group and [2] grant other users to be able to access that account&#039;s mail.&lt;br /&gt;
&lt;br /&gt;
To setup email forwarding, login as the &amp;lt;code&amp;gt;service-specific-shared-account@opensourceecology.org&amp;lt;/code&amp;gt; account in gmail. Click on the settings &amp;quot;gear icon&amp;quot; in the top-right of the webpage. Click on the &amp;quot;Forwarding and POP/IMAP&amp;quot; tab. Under the &amp;quot;Forwarding&amp;quot; section, enter the email address of the Google Group. Make sure to check the correct radio button that says &amp;quot;Forward a copy of incoming mail to ...&amp;quot; and also leave the drop-down set to &amp;quot;keep ... copy in the inbox&amp;quot;. This will ensure that, even if the Google Group gets moved or deleted in the future, all of the mail for this specific account will be retained in gmail. Finally, click &amp;quot;Save Changes&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
To grant Marcin or anyone else access to this new service-specific account&#039;s mail, login as the account in Gmail. Click on the settings &amp;quot;gear icon&amp;quot; in the top-right of the webpage. Click on the &amp;quot;Accounts&amp;quot; tab. Under the &amp;quot;Grant access to your account&amp;quot; section, click &amp;quot;Add an account&amp;quot; and enter the email address of the person (eg Marcin) that you want to give access to be able to read and write mail on behalf of this user.&lt;br /&gt;
&lt;br /&gt;
{{Warning|Please note that &amp;quot;reset password&amp;quot; functionality usually works by sending a link to a user&#039;s email address, so we should assume that &#039;&#039;&#039;anyone either on the Google Groups list or under the &amp;quot;Grant access to your account&amp;quot; list will be able to login&#039;&#039;&#039; to these services, &#039;&#039;&#039;even if they don&#039;t have the account password&#039;&#039;&#039;. So please only ever put trusted users on this list.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
= Why can&#039;t I login? =&lt;br /&gt;
&lt;br /&gt;
The best way to avoid lockout issues on Google is to use a [https://tech.michaelaltfield.net/2026/02/03/single-site-browser-firejail-proxychains/ persistent single-site browser]. For more info, see: &lt;br /&gt;
&lt;br /&gt;
 * https://tech.michaelaltfield.net/2026/02/03/single-site-browser-firejail-proxychains/&lt;br /&gt;
&lt;br /&gt;
Unfortunately, Google employs an infamously faulty anomaly detection system&amp;lt;ref&amp;gt;https://support.google.com/a/answer/6002699?hl=en&amp;amp;ref_topic=2759193&amp;amp;sjid=814912251894340756-EU#zippy=%2Cwhen-does-google-consider-a-sign-in-attempt-suspicious&amp;lt;/ref&amp;gt; that may false-positive due to a &amp;quot;suspicious login&amp;quot; that could lock you out of your own account -- even when you entered the correct password on the first try. Unfortunatly, Google is aware of the issue and refuses to let Google Workspace (or individual user) disable this broken &amp;quot;feature&amp;quot; for their accounts, even if it causes more harm than good &amp;lt;ref&amp;gt;https://knowledge.workspace.google.com/kb/how-to-disable-login-challenge-security-method-permanently-000007696&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
If this happens, try enabling 2FA (with TOTP) in your account. It &#039;&#039;should&#039;&#039; prevent Google from locking you out of your own account, even if you enter the correct password on the first try.&lt;br /&gt;
&lt;br /&gt;
Of course, you need to login in order to add 2FA to your account. To bypass the lockout, ask an OSE member with Admin access to Google Workspace to temporarily turn-off &amp;quot;two step authentication&amp;quot; (which is a distinct Google concept from &amp;quot;two factor authentication&amp;quot;) as follows:&lt;br /&gt;
&lt;br /&gt;
# Log into the admin.google.com panel&lt;br /&gt;
# Click Directory -&amp;gt; Users&lt;br /&gt;
# Click on your username&lt;br /&gt;
# Click on the &amp;quot;Security&amp;quot; tab&lt;br /&gt;
# Scroll-down to &amp;quot;Login challenge&amp;quot; and clicked the &amp;quot;TURN OFF FOR 10 MINS&amp;quot; button &amp;lt;ref&amp;gt;https://knowledge.workspace.google.com/kb/how-to-turn-off-2-step-verification-for-specific-users-000007496&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you should be able to login and setup 2FA with TOTP to prevent this from happening again.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
{{reflist}}&lt;br /&gt;
&lt;br /&gt;
[[Category: IT Infrastructure]]&lt;br /&gt;
[[Category: Software]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Google_Workspace&amp;diff=319737</id>
		<title>Google Workspace</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Google_Workspace&amp;diff=319737"/>
		<updated>2026-02-03T17:22:43Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: add link to article with more background info, and a solution to prevent google lockouts&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;As a legally-registered NGO (non-profit in the US), Open Source Ecology has a free Google Workspace account.&lt;br /&gt;
&lt;br /&gt;
Note that Google Workspace is also known as:&lt;br /&gt;
&lt;br /&gt;
# Google Apps (or gapps) and&lt;br /&gt;
# Google Suite (or gsuite)&lt;br /&gt;
&lt;br /&gt;
= Why? =&lt;br /&gt;
&lt;br /&gt;
Google Workspace lets us create Google accounts with a username on the &amp;lt;code&amp;gt;@opensourceecology.org&amp;lt;/code&amp;gt; domain. For example, when OSE users manage email, we can do so from a gmail-like UI. While we have access to numerous apps in Google Workspace, OSE specifically makes heavy use of the following apps:&lt;br /&gt;
&lt;br /&gt;
# Google Mail&lt;br /&gt;
# Google Calendar&lt;br /&gt;
# Google Docs&lt;br /&gt;
# Google Drive&lt;br /&gt;
# Google Meet&lt;br /&gt;
# Google Groups&lt;br /&gt;
# Google Slides&lt;br /&gt;
# etc&lt;br /&gt;
&lt;br /&gt;
=Google Groups=&lt;br /&gt;
&lt;br /&gt;
OSE uses (internal-only) Google Groups for creating one-to-many email lists (a designated email account that reaches the inbox of many people at OSE).&lt;br /&gt;
&lt;br /&gt;
Because&lt;br /&gt;
&lt;br /&gt;
# Google doesn&#039;t support the concept of &amp;quot;shared accounts&amp;quot;, &amp;lt;ref&amp;gt;https://support.google.com/a/answer/33330?hl=en&amp;lt;/ref&amp;gt;,&lt;br /&gt;
# Google may lock you out of being able to login to your account if their anomaly detection system thinks an account is being shared, &amp;lt;ref&amp;gt;https://support.google.com/a/answer/6002699?hl=en&amp;amp;ref_topic=2759193&amp;amp;sjid=814912251894340756-EU#zippy=%2Cwhen-does-google-consider-a-sign-in-attempt-suspicious&amp;lt;/ref&amp;gt;&lt;br /&gt;
# Google won&#039;t let you turn-off their &amp;quot;suspicious login&amp;quot; feature that locks you out of your own account -- even if their system is faulty and blocking you from logging in, even when you entered the correct password&amp;lt;ref&amp;gt;https://knowledge.workspace.google.com/kb/how-to-disable-login-challenge-security-method-permanently-000007696&amp;lt;/ref&amp;gt;&lt;br /&gt;
# Google doesn&#039;t let you forward mail from one account to many accounts&lt;br /&gt;
&lt;br /&gt;
If you want to create a one-to-many email address (eg &amp;lt;code&amp;gt;tractor-team@opensourceecology.org&amp;lt;/code&amp;gt;) for which there are many recipients, the way to do this in Google Workspace is to create a &amp;quot;Google Group&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== System Alerts ==&lt;br /&gt;
&lt;br /&gt;
For example, in September 2024, OSE nearly lost all of its backup data (on [[Backblaze]]) due to few missed payments (amounting to &amp;lt;$10) because our bank false-positive blocked the transaction as &amp;quot;suspicious&amp;quot;. The issue was exacerbated by the fact that our backblaze-specific email address (which received many, many &amp;quot;payment failed&amp;quot; alerts) was not being forwarded to the email inboxes of Marcin (or anyone else).&lt;br /&gt;
&lt;br /&gt;
For security reasons, it&#039;s always better to use services that &#039;&#039;don&#039;t&#039;&#039; use shared logins. If possible, create one user account per person and grant that user account access to the OSE account. Unfortunately, this isn&#039;t possible with many services -- and we&#039;re forced to use one shared account.&lt;br /&gt;
&lt;br /&gt;
For more flexibility and security, rather than signing-up for an account directly with some shared &amp;lt;code&amp;gt;some-google-group-list@opensourceecology.org&amp;lt;/code&amp;gt; account that&#039;s tied to a Google Group directly, we create a new user account for that account. Then you can [1] forward all of that account&#039;s mail to a Google Group and [2] grant other users to be able to access that account&#039;s mail.&lt;br /&gt;
&lt;br /&gt;
To setup email forwarding, login as the &amp;lt;code&amp;gt;service-specific-shared-account@opensourceecology.org&amp;lt;/code&amp;gt; account in gmail. Click on the settings &amp;quot;gear icon&amp;quot; in the top-right of the webpage. Click on the &amp;quot;Forwarding and POP/IMAP&amp;quot; tab. Under the &amp;quot;Forwarding&amp;quot; section, enter the email address of the Google Group. Make sure to check the correct radio button that says &amp;quot;Forward a copy of incoming mail to ...&amp;quot; and also leave the drop-down set to &amp;quot;keep ... copy in the inbox&amp;quot;. This will ensure that, even if the Google Group gets moved or deleted in the future, all of the mail for this specific account will be retained in gmail. Finally, click &amp;quot;Save Changes&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
To grant Marcin or anyone else access to this new service-specific account&#039;s mail, login as the account in Gmail. Click on the settings &amp;quot;gear icon&amp;quot; in the top-right of the webpage. Click on the &amp;quot;Accounts&amp;quot; tab. Under the &amp;quot;Grant access to your account&amp;quot; section, click &amp;quot;Add an account&amp;quot; and enter the email address of the person (eg Marcin) that you want to give access to be able to read and write mail on behalf of this user.&lt;br /&gt;
&lt;br /&gt;
{{Warning|Please note that &amp;quot;reset password&amp;quot; functionality usually works by sending a link to a user&#039;s email address, so we should assume that &#039;&#039;&#039;anyone either on the Google Groups list or under the &amp;quot;Grant access to your account&amp;quot; list will be able to login&#039;&#039;&#039; to these services, &#039;&#039;&#039;even if they don&#039;t have the account password&#039;&#039;&#039;. So please only ever put trusted users on this list.&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
= Why can&#039;t I login? =&lt;br /&gt;
&lt;br /&gt;
Unfortunately, Google employs an infamously faulty anomaly detection system&amp;lt;ref&amp;gt;https://support.google.com/a/answer/6002699?hl=en&amp;amp;ref_topic=2759193&amp;amp;sjid=814912251894340756-EU#zippy=%2Cwhen-does-google-consider-a-sign-in-attempt-suspicious&amp;lt;/ref&amp;gt; that may false-positive due to a &amp;quot;suspicious login&amp;quot; that could lock you out of your own account -- even when you entered the correct password on the first try. Unfortunatly, Google is aware of the issue and refuses to let Google Workspace (or individual user) disable this broken &amp;quot;feature&amp;quot; for their accounts, even if it causes more harm than good &amp;lt;ref&amp;gt;https://knowledge.workspace.google.com/kb/how-to-disable-login-challenge-security-method-permanently-000007696&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
If this happens, try enabling 2FA (with TOTP) in your account. It &#039;&#039;should&#039;&#039; prevent Google from locking you out of your own account, even if you enter the correct password on the first try.&lt;br /&gt;
&lt;br /&gt;
Of course, you need to login in order to add 2FA to your account. To bypass the lockout, ask an OSE member with Admin access to Google Workspace to temporarily turn-off &amp;quot;two step authentication&amp;quot; (which is a distinct Google concept from &amp;quot;two factor authentication&amp;quot;) as follows:&lt;br /&gt;
&lt;br /&gt;
# Log into the admin.google.com panel&lt;br /&gt;
# Click Directory -&amp;gt; Users&lt;br /&gt;
# Click on your username&lt;br /&gt;
# Click on the &amp;quot;Security&amp;quot; tab&lt;br /&gt;
# Scroll-down to &amp;quot;Login challenge&amp;quot; and clicked the &amp;quot;TURN OFF FOR 10 MINS&amp;quot; button &amp;lt;ref&amp;gt;https://knowledge.workspace.google.com/kb/how-to-turn-off-2-step-verification-for-specific-users-000007496&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you should be able to login and setup 2FA with TOTP to prevent this from happening again.&lt;br /&gt;
&lt;br /&gt;
The best way to avoid lockout issues on Google is to use a [https://tech.michaelaltfield.net/2026/02/03/single-site-browser-firejail-proxychains/ persistent single-site browser]. For more info, see: &lt;br /&gt;
&lt;br /&gt;
 * https://tech.michaelaltfield.net/2026/02/03/single-site-browser-firejail-proxychains/&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
{{reflist}}&lt;br /&gt;
&lt;br /&gt;
[[Category: IT Infrastructure]]&lt;br /&gt;
[[Category: Software]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=OSE_Piping_Workbench&amp;diff=311653</id>
		<title>OSE Piping Workbench</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=OSE_Piping_Workbench&amp;diff=311653"/>
		<updated>2025-09-16T21:39:49Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: prevent error if ~/.FreeCAD doesn&amp;#039;t exist yet&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Hint|See Workbench Source Code at &#039;&#039;&#039;[[PVC_Pipe_and_Fittings_Library#OSE_Piping_Workbench]]&#039;&#039;&#039;}}&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
The OSE pipe workbench is a FreeCAD workbench with pipes and fittings. It creates pipes and fitting using FreeCAD Parts workbench and [https://github.com/oddtopus/flamingo Flamingo].&lt;br /&gt;
&lt;br /&gt;
[[File:OsePiningWorkbenchScreenshot.png | 512px]]&lt;br /&gt;
&lt;br /&gt;
= Installation =&lt;br /&gt;
In a Linux system &lt;br /&gt;
 $ mkdir -p ~/.FreeCAD/Mod&lt;br /&gt;
 $ cd ~/.FreeCAD/Mod&lt;br /&gt;
 $ git clone https://github.com/rkrenzler/ose-piping-workbench.git&lt;br /&gt;
&lt;br /&gt;
[[File:check.png]] Command line instructions work on Ubuntu 16.04&lt;br /&gt;
&lt;br /&gt;
Hint:For those new to Linux, always remember Linux is case sensitive.&lt;br /&gt;
mkdir ~/.FreeCAD/Mod creates the mod directory inside of FreeCAD. this might already exist, and that is fine.&lt;br /&gt;
&lt;br /&gt;
=Pipes=&lt;br /&gt;
&lt;br /&gt;
The dimensions of the PVC pipes can be found here [[PVC_Pipe]].&lt;br /&gt;
Wikipedia on Nominal Pipe Size (NPS) [https://en.wikipedia.org/wiki/Nominal_Pipe_Size],&lt;br /&gt;
&lt;br /&gt;
A pipe is described by its outer diameter &#039;&#039;&#039;OD&#039;&#039;&#039;, its wall thickness &#039;&#039;&#039;Thk&#039;&#039;&#039; and its height&amp;lt;ref&amp;gt;We use height instead of length  in order to make a pipe similar to a FreeCAD cylinder. These particular choice of pipe dimensions makes it more compatible with pipes from flamingo workbench.&amp;lt;/ref&amp;gt; H. &lt;br /&gt;
&lt;br /&gt;
To create a pipe, click [[File:CreatePipe.svg]] in OSE piping workbench. Select pipe dimensions and click &amp;quot;OK&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
[[File:create-pipe-screenshot.png| 512px]]&lt;br /&gt;
&lt;br /&gt;
To add new dimensions adjust CSV &#039;&#039;&#039;pipe.csv&#039;&#039;&#039; in &#039;&#039;tables&#039;&#039; directory within workbench directory.&lt;br /&gt;
&lt;br /&gt;
=Elbows=&lt;br /&gt;
&lt;br /&gt;
An elbow is described by an angle alpha, outer pipe diameter POD, inner pipe diameter PID, H, J, M.&lt;br /&gt;
&lt;br /&gt;
To create an elbow, click [[File:CreateElbow.svg]] in OSE piping workbench. &lt;br /&gt;
&lt;br /&gt;
[[File:create-elbow-screenshot.png|512px]]&lt;br /&gt;
[[File:create-elbow-cad-screenshot.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
To add new elbows, adjust &#039;&#039;&#039;elbow.csv&#039;&#039;&#039; in &#039;&#039;tables&#039;&#039; directory within workbench directory.&lt;br /&gt;
&lt;br /&gt;
=Sweep Elbows=&lt;br /&gt;
&lt;br /&gt;
A sweep elbow is a special elbow with larger radius of the bent part. It  is described by outer pipe diameter POD, pipe thickness PThk, G, H,and M.&lt;br /&gt;
To create an elbow, click [[File:CreateSweepElbow.svg]]. &lt;br /&gt;
&lt;br /&gt;
[[File:create-sweep-elbow-screenshot.png|512px]]&lt;br /&gt;
[[File:create-sweep-elbow-cad-screenshot.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
To add new sweep elbows, adjust &#039;&#039;&#039;sweep-elbow.csv&#039;&#039;&#039; in &#039;&#039;tables&#039;&#039; directory within workbench directory.&lt;br /&gt;
&lt;br /&gt;
=Couplings=&lt;br /&gt;
&lt;br /&gt;
A (general) coupling is described by dimensions: POD, POD1, PID, PID1, L, M, M1, N. The dimensions POD1 and PID1 are not from a official specifications.&lt;br /&gt;
They are derived from pipe size and schedule. In a reducer coupling, the pipe dimensions on one side POD and PID differ from on the other side POD1 and PID1.&lt;br /&gt;
&lt;br /&gt;
To create a coupling, click [[File:CreateCoupling.svg]] in OSE piping workbench. &lt;br /&gt;
&lt;br /&gt;
[[File:create-coupling-screenshot.png|512px]]&lt;br /&gt;
[[File:create-coupling-cad-screenshot.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
To add new couplings, adjust &#039;&#039;&#039;coupling.csv&#039;&#039;&#039; in &#039;&#039;tables&#039;&#039; directory within workbench directory.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Bushings=&lt;br /&gt;
{{Hint|Correction needed from octagonal shape to hex shape bushing flange, as bushings like bolts are hexagonal.}}&lt;br /&gt;
A bushing is described by dimensions N, L and pipe dimensions. As pipe dimensions we use POD, PID1, and POD1.&lt;br /&gt;
&lt;br /&gt;
To create a bushing, click [[File:CreateBushing.svg]] in OSE piping workbench. &lt;br /&gt;
&lt;br /&gt;
[[File:create-bushing-screenshot.png|512px]]&lt;br /&gt;
[[File:create-bushing-cad-screenshot.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
To add a new coupling to the part list, adjust &#039;&#039;&#039;bushing.csv&#039;&#039;&#039; in &#039;&#039;tables&#039;&#039; directory within workbench directory.&lt;br /&gt;
&lt;br /&gt;
=Tees=&lt;br /&gt;
&lt;br /&gt;
A tee is described by parameters G, G1, H, H1, M, M1, and pipe dimensions. As pipe dimensions we use POD, POD1, PID, and PID1.&lt;br /&gt;
&lt;br /&gt;
To create a tee click [[File:CreateTee.svg]] in OSE piping workbench. &lt;br /&gt;
&lt;br /&gt;
[[File:create-tee-screenshot.png|512px]]&lt;br /&gt;
[[File:create-tee-cad-screenshot.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
To add a new tee to the part list, adjust &#039;&#039;&#039;tee.csv&#039;&#039;&#039; in &#039;&#039;tables&#039;&#039; directory within workbench directory.&lt;br /&gt;
&lt;br /&gt;
=Crosses=&lt;br /&gt;
&lt;br /&gt;
A cross is described by parameters G, G1, H, H1, L, L1, M, M1,  and pipe dimensions. As pipe dimensions we use POD, POD1, PThk, and PThk1.&lt;br /&gt;
&lt;br /&gt;
To create a tee click [[File:CreateCross.svg]] in OSE piping workbench. &lt;br /&gt;
&lt;br /&gt;
[[File:create-cross-screenshot.png|512px]]&lt;br /&gt;
[[File:create-cross-cad-screenshot.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
To add a new cross to the part list, adjust &#039;&#039;&#039;cross.csv&#039;&#039;&#039; in &#039;&#039;tables&#039;&#039; directory within workbench directory.&lt;br /&gt;
&lt;br /&gt;
=Corners=&lt;br /&gt;
&lt;br /&gt;
An corner is described by dimensions G, H, M and pipe dimensions. As pipe dimensions we use POD and PID. &lt;br /&gt;
&lt;br /&gt;
To create a corner, click [[File:CreateCorner.svg]] in OSE piping workbench. &lt;br /&gt;
&lt;br /&gt;
[[File:create-corner-screenshot.png|512px]]&lt;br /&gt;
[[File:create-corner-cad-screenshot.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
To add a new corner to the part list, adjust &#039;&#039;&#039;corner.csv&#039;&#039;&#039; in &#039;&#039;tables&#039;&#039; directory within workbench directory.&lt;br /&gt;
=Customization=&lt;br /&gt;
The dimensions of the fittings are saved in [https://en.wikipedia.org/wiki/Comma-separated_values | CSV files]. &lt;br /&gt;
If you want add new dimensions or change old ones, modify tese CSV files.&lt;br /&gt;
&lt;br /&gt;
The CSV files are in ~/.FreeCAD/Mod/ose-piping-workbench/tables. The columns are separted by commas &amp;quot;,&amp;quot;. Always keep this format.&lt;br /&gt;
&lt;br /&gt;
To modify CSV files with LibreOffice Calc follow these steps:&lt;br /&gt;
&lt;br /&gt;
# Open CSV file in  LibreOffice Calc. Calc must correctly detect the column-separator &amp;quot;Comma&amp;quot;. If it does not, check &amp;quot;Comma&amp;quot; manually. Click OK.&amp;lt;p&amp;gt; [[File:calc-imports-csv.png]]&amp;lt;/p&amp;gt;&lt;br /&gt;
# Now you can add, remove and modify dimensions of the fittings. Each row of the table must contain a &#039;&#039;&#039;unique&#039;&#039;&#039; part number and dimensions. You do not need to specify every dimension. To find out which dimensions are mandatory for particular part, click on a button with this part in OSE-piping-workbench. The dialog will tell you which dimensions are mandatory.&amp;lt;p&amp;gt;[[File:piping-workbench-mandatory-dimensions.png]]&amp;lt;/p&amp;gt; &lt;br /&gt;
# Save the CSV file. Calc will ask you which format to use.&amp;lt;p&amp;gt; [[File:calc-store-csv.png]]&amp;lt;/p&amp;gt; Select &amp;quot;Use Text CSV Format&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=Programming=&lt;br /&gt;
* [https://www.freecadweb.org/wiki/Scripted_objects FreeCAD scripted object]&lt;br /&gt;
* It should be possible to represent the object with &amp;quot;classic&amp;quot; FreeCAD forms like cylinders, spheres, sweeping objects ...&lt;br /&gt;
* It should be possible to use solids.&lt;br /&gt;
* The main purpose is to create tools for moving, rotations, and fittings.&lt;br /&gt;
&lt;br /&gt;
=Documentation=&lt;br /&gt;
&lt;br /&gt;
==Programming==&lt;br /&gt;
* [https://www.freecadweb.org/wiki/Scripted_objects FreeCAD scripted objects]&lt;br /&gt;
* [https://forum.freecadweb.org/viewtopic.php?f=8&amp;amp;t=27641&amp;amp;sid=0f829d3bd056ec5add5407879796451a Forum entry on freecadweb.org]&lt;br /&gt;
&lt;br /&gt;
== Remarks about the coupling code ==&lt;br /&gt;
&lt;br /&gt;
To create a simple coupling or a reduced we internally use a more general coupling.&lt;br /&gt;
This general coupling is described by 9 dimensions: POD, PID, POD1, PID1, X1, X2, N, M, M1. The dimensions POD, PID, POD1, and PID1 are derived from the pipe sizes.&lt;br /&gt;
The are abbreviations of &#039;&#039;&#039;P&#039;&#039;&#039;ipe &#039;&#039;&#039;O&#039;&#039;&#039;uter &#039;&#039;&#039;D&#039;&#039;&#039;iameter and &#039;&#039;&#039;P&#039;&#039;&#039;ipe &#039;&#039;&#039;I&#039;&#039;&#039;nner &#039;&#039;&#039;D&#039;&#039;&#039;iameter.&lt;br /&gt;
The dimensions X1 and X2 are not official dimension names. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:coupling-calculations.png]]&lt;br /&gt;
&lt;br /&gt;
The offset a1 is calculated in such a way, that the thinest part of the middle section is not thinner than the walls on of the both sockets.&lt;br /&gt;
Lengths a2, a3, a4 and angle b1 are derived from the dimensions and are only used to calculate a1.&lt;br /&gt;
&lt;br /&gt;
=Useful links=&lt;br /&gt;
* An example of fittings with dimensioned drawings produced by [https://www.aetnaplastics.com/site_media/media/attachments/aetna_product_aetnaproduct/204/PVC%20Sch%2040%20Fittings%20Dimensions.pdf Aetna lastics].&lt;br /&gt;
* [https://forum.freecadweb.org/viewtopic.php?f=8&amp;amp;t=27641&amp;amp;sid=0f829d3bd056ec5add5407879796451a Forum entry on freecadweb.org]&lt;br /&gt;
&lt;br /&gt;
* [https://youtu.be/1FBudfRcQv4 Using Flamingo to move parts]&lt;br /&gt;
=Discussion=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;html&amp;gt;&amp;lt;div id=&amp;quot;disqus_thread&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;script&amp;gt;&lt;br /&gt;
&lt;br /&gt;
/**&lt;br /&gt;
*  RECOMMENDED CONFIGURATION VARIABLES: EDIT AND UNCOMMENT THE SECTION BELOW TO INSERT DYNAMIC VALUES FROM YOUR PLATFORM OR CMS.&lt;br /&gt;
*  LEARN WHY DEFINING THESE VARIABLES IS IMPORTANT: https://disqus.com/admin/universalcode/#configuration-variables*/&lt;br /&gt;
/*&lt;br /&gt;
var disqus_config = function () {&lt;br /&gt;
this.page.url = PAGE_URL;  // Replace PAGE_URL with your page&#039;s canonical URL variable&lt;br /&gt;
this.page.identifier = PAGE_IDENTIFIER; // Replace PAGE_IDENTIFIER with your page&#039;s unique identifier variable&lt;br /&gt;
};&lt;br /&gt;
*/&lt;br /&gt;
(function() { // DON&#039;T EDIT BELOW THIS LINE&lt;br /&gt;
var d = document, s = d.createElement(&#039;script&#039;);&lt;br /&gt;
s.src = &#039;https://ose-piping-workbench.disqus.com/embed.js&#039;;&lt;br /&gt;
s.setAttribute(&#039;data-timestamp&#039;, +new Date());&lt;br /&gt;
(d.head || d.body).appendChild(s);&lt;br /&gt;
})();&lt;br /&gt;
&amp;lt;/script&amp;gt;&lt;br /&gt;
&amp;lt;noscript&amp;gt;Please enable JavaScript to view the &amp;lt;a href=&amp;quot;https://disqus.com/?ref_noscript&amp;quot;&amp;gt;comments powered by Disqus.&amp;lt;/a&amp;gt;&amp;lt;/noscript&amp;gt;&lt;br /&gt;
                            &amp;lt;/html&amp;gt;&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311600</id>
		<title>File Simplification</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311600"/>
		<updated>2025-09-09T17:02:31Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: fix italic syntax&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction=&lt;br /&gt;
With FreeCAD, OSE practices 2 levels of file simplification. In both cases, the goals are is to reduce file size, and to simplify the part tree. OSE workflow assumes that we work with the part tree (especially the very useful feature of hiding and un-hiding parts for build instructionals purposes), and that we reduce file size as much as possible to make complex files quick to open and easy to manipulate without bogging down the computer. This is especially important when large teams are collaborating.&lt;br /&gt;
&lt;br /&gt;
The file simplification below refers to simplifying the actual features of a part - the Level of Detail section below. Another type of simplification can be done on the part tree to simplify the part tree during the design phase. This is the Part Tree Simplification section.&lt;br /&gt;
&lt;br /&gt;
=Identifying problem objects=&lt;br /&gt;
&lt;br /&gt;
If you have a large/slow FreeCAD file, you&#039;ll first want to identify &#039;&#039;which&#039;&#039; object is causing the problem.&lt;br /&gt;
&lt;br /&gt;
There is a distinction in two sizes:&lt;br /&gt;
&lt;br /&gt;
# The (compressed) on-disk size of the .FCStd file&lt;br /&gt;
# The (uncompressed) MemSize size of each object&lt;br /&gt;
&lt;br /&gt;
The two are &#039;&#039;sometimes&#039;&#039; correlated, but it&#039;s possible to have a &amp;lt;1 MB .FCStd file that is completely unusable because of a very large MemSize. This would happen, for example, if you made a very simple sketch and then an enormous array of the sketch in three dimensions (eg for a mesh object). That would compress to a very small file size, but explode to a very large (uncompressed) MemSize, crashing FreeCAD.&lt;br /&gt;
&lt;br /&gt;
Fortunately, FreeCAD is a very robust software that exposes the &amp;quot;python console&amp;quot; to the user, where you can paste custom code to interact with the objects. The snippet below will:&lt;br /&gt;
&lt;br /&gt;
# Iterate through every layer in the [https://wiki.freecad.org/Document_structure FreeCAD Document&#039;s Tree]&lt;br /&gt;
# Get the size [https://github.com/FreeCAD/FreeCAD/blob/6ab8589a03b498b237f8ba88c6ae4692bb3adba6/src/Mod/TemplatePyMod/DocumentObject.py#L117-L119 MemSize] of each layer,&lt;br /&gt;
# Sort the list of layers by their size, and&lt;br /&gt;
# Print the list of layers (sorted by size)&lt;br /&gt;
&lt;br /&gt;
To use this, you first need to open the [https://wiki.freecad.org/index.php?title=Python_Console Python Console] in FreeCAD. Do this by clicking to &#039;&#039;&#039;View -&amp;gt; Panels -&amp;gt; Python Console&#039;&#039;&#039;. Then &#039;&#039;&#039;paste the following snippet&#039;&#039;&#039; into the Python Console. And &#039;&#039;&#039;press enter&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
def printMem():&lt;br /&gt;
    objs = list(FreeCAD.ActiveDocument.Objects)&lt;br /&gt;
    objs.append(FreeCAD.ActiveDocument)              # add doc to list&lt;br /&gt;
    objs.sort(reverse=True, key=lambda x: x.MemSize) # max mem is first&lt;br /&gt;
    &lt;br /&gt;
    hdr = &amp;quot;MemSize (bytes) | Object Label\n&amp;quot;&lt;br /&gt;
    hLine = &amp;quot;-&amp;quot;*len(hdr) + &amp;quot;\n&amp;quot;&lt;br /&gt;
    linesList = [&amp;quot;\n&amp;quot;, hLine, hdr, hLine]&lt;br /&gt;
    for obj in objs:&lt;br /&gt;
        linesList.append(&amp;quot;{:&amp;gt;15,d} | {}\n&amp;quot;.format(obj.MemSize, obj.Label))&lt;br /&gt;
    linesList.append(hLine)&lt;br /&gt;
    s = &amp;quot;&amp;quot;.join(linesList)&lt;br /&gt;
    print(s)&lt;br /&gt;
&lt;br /&gt;
printMem();&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that it may take several seconds to finish the calculation.&lt;br /&gt;
&lt;br /&gt;
For more information (and an example) of using the above code snippet to find the MemSize of every object in your FreeCAD file&#039;s tree, please see [https://www.eco-libre.org/big-freecad-file-size/ Troubleshooting Large FreeCAD File Sizes]&lt;br /&gt;
&lt;br /&gt;
* https://www.eco-libre.org/big-freecad-file-size/&lt;br /&gt;
&lt;br /&gt;
=Part Tree Simplification=&lt;br /&gt;
When doing design work with multiple modules of similar parts, such as the Seed Eco-Home wall modules - it is useful to collapse the part tree into a single item.&lt;br /&gt;
&lt;br /&gt;
OSE usually creates detailed CAD where every single part (such as the tens of parts of wall modules - each appear as an individual item in the Part Tree. This is useful for making instructionals, where parts can be hidden and unhidden to allow for step-by-step build sequences. Also, exploded part animations can be done using the [[Exploded Assembly Workbench]]. &lt;br /&gt;
&lt;br /&gt;
However, in the design phase, it is challenging to keep track of dozens of parts, so it is useful to collapse the part tree into a more manageable form. This can be done by either removing information from the CAD file, or retaining it. To retain all information, right click on a part tree heading and Create Group - which creates a folder. Then you can drag and drop parts into that folder. This makes it easy to keep track of parts - or selecting a bunch of parts at once by selecting that folder. This does not reduce file size.&lt;br /&gt;
&lt;br /&gt;
To reduce file size, we can remove sketches by Create Simple Copy in the Part Workbench, or by clicking on a sketch and deleting. We can also Make Compound - collapsing a bunch of parts into one. However, Make Compound does not reduce file size further - in fact, a Compound of a bunch of simple parts takes more memory than the simple copies themselves. To reduce file size of a compound, Ctrl-C and Ctrl-V into a new document. Ctrl-V into the same document doesn&#039;t seem to reduce the file size. You will notice typically when you select a compound or part with sketch:&lt;br /&gt;
&lt;br /&gt;
[[File:dependenciescopy.png|300px]]&lt;br /&gt;
&lt;br /&gt;
Select no, and your paste will be lower in size.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;To summarize - remove sketches to reduce memory, make a compound to collapse all parts into one, and then copy-paste without detail into a new file - and you will have the minimum-size file possible under one item in the part tree. Such format makes the overall assembly file in a team workflow the smallest possible, allowing for large scale design. The limit here is a few thousand part files that can be manipulated readily. Once a file reaches an unmanageable size - we can go to file simplification in terms of Level of Detail - in the next section. This is like making thumbnails of pictures available: you can work with it, but it doesn&#039;t contain all the detail. The simple version is an abstract version of the original file. Thus, in large-scale team workflows - the part tree simplification and level of detail simplification can be pursued ad infinitum - abstracting the design further and furth - so that complex assemblies can be created. In principle, the complexity of design that this process can handle has no limit. Therefore, even the largest design problems can be solved in a day - with thousands or even millions of people collaborating in realtime.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=Level of Detail=&lt;br /&gt;
&lt;br /&gt;
We work with CAD files at different levels of detail. For example, we can download a file for a valve from [[McMaster-Carr]] and the thing is a few MB because it has details like threads. But - the problem comes in when we have an assembly of many parts. This leads easily to 100MB or GB size files if one doesn&#039;t pay attention to file size. This is rather unworkable - as the computer bogs down to very slow operations.&lt;br /&gt;
&lt;br /&gt;
The solution is creating very small part files that represent the original - but instead of say 2 MB - it would be like 10k or so. Just a placeholder - which shows relatively accurate dimensions (important for analyzing part interference and fit) - but shows them in the crudest way possible. Such that - say we have a file with 200 parts of 10k each - so the entire assembly remains at only 2MB. As a general practice - files above 50MB are unusable - the practical limit is 10-20MB. But if kept down to around 1MB, navigation is lightning fast and no time is wasted. &lt;br /&gt;
&lt;br /&gt;
We save these small files as individual files, and assemblies of individual files, in the OSE [[Part Library]]. Thus, if we want to create an excessively large file - we can handle complex files of hundreds of parts without any visible slowdown of the computer. Read more about our workflow of merging files together - see [[Merge Workflow]].&lt;br /&gt;
&lt;br /&gt;
=Working Doc=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;html&amp;gt;&amp;lt;iframe src=&amp;quot;https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/embed?start=false&amp;amp;loop=false&amp;amp;delayms=3000&amp;quot; frameborder=&amp;quot;0&amp;quot; width=&amp;quot;960&amp;quot; height=&amp;quot;569&amp;quot; allowfullscreen=&amp;quot;true&amp;quot; mozallowfullscreen=&amp;quot;true&amp;quot; webkitallowfullscreen=&amp;quot;true&amp;quot;&amp;gt;&amp;lt;/iframe&amp;gt;&amp;lt;/html&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/edit#slide=id.g22c1dd84ad_1_132 edit]&lt;br /&gt;
&lt;br /&gt;
=Notes=&lt;br /&gt;
*Note that FreeCAD file size is 2.8k minimum for a cubic shape in the above presentation.&lt;br /&gt;
*Thus, the simplest useful files start at about 10k. Files with about&lt;br /&gt;
*A cube should be only a few bytes - l, w, h. 8 bits are a byte. About 65,000 divisions is 2 bytes (16 bit depth). So each dimension should be stores in 2 bytes. Thus, a cube should be 6 bytes large. If we add angle and position, we have 18 bytes. Thus, memory size of FreeCAD files can be reduced by at least 100x if files were stores in their most efficient form, because minimum file size is on the order of kilobytes, not bytes. Just sayin&#039;.&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* [https://forum.freecad.org/viewtopic.php?p=844168#p844168 Why is my FreeCAD file so large? (grainular file size view)] question on FreeCAD Forums&lt;br /&gt;
* [https://engineering.stackexchange.com/questions/63647/why-is-my-freecad-file-so-large Why is my FreeCAD file so large?] question on Engineering Stack Exchange&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311599</id>
		<title>File Simplification</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311599"/>
		<updated>2025-09-09T17:00:13Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: fix link syntax&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction=&lt;br /&gt;
With FreeCAD, OSE practices 2 levels of file simplification. In both cases, the goals are is to reduce file size, and to simplify the part tree. OSE workflow assumes that we work with the part tree (especially the very useful feature of hiding and un-hiding parts for build instructionals purposes), and that we reduce file size as much as possible to make complex files quick to open and easy to manipulate without bogging down the computer. This is especially important when large teams are collaborating.&lt;br /&gt;
&lt;br /&gt;
The file simplification below refers to simplifying the actual features of a part - the Level of Detail section below. Another type of simplification can be done on the part tree to simplify the part tree during the design phase. This is the Part Tree Simplification section.&lt;br /&gt;
&lt;br /&gt;
=Identifying problem objects=&lt;br /&gt;
&lt;br /&gt;
If you have a large/slow FreeCAD file, you&#039;ll first want to identify &#039;&#039;which&#039;&#039; object is causing the problem.&lt;br /&gt;
&lt;br /&gt;
There is a distinction in two sizes:&lt;br /&gt;
&lt;br /&gt;
# The (compressed) on-disk size of the .FCStd file&lt;br /&gt;
# The (uncompressed) MemSize size of each object&lt;br /&gt;
&lt;br /&gt;
The two are _sometimes_ coorelated, but it&#039;s possible to have a &amp;lt;1 MB .FCStd file that is completely unusable because of a very large MemSize. This would happen, for example, if you made a very simple sketch and then an enormous array of the sketch in three dimensions (eg for a mesh object). That would compress to a very small file size, but explode to a very large (uncompressed) MemSize, crashing FreeCAD.&lt;br /&gt;
&lt;br /&gt;
Fortunately, FreeCAD is a very robust software that exposes the &amp;quot;python console&amp;quot; to the user, where you can paste custom code to interact with the objects. The snippet below will:&lt;br /&gt;
&lt;br /&gt;
# Iterate through every layer in the [https://wiki.freecad.org/Document_structure FreeCAD Document&#039;s Tree]&lt;br /&gt;
# Get the size [https://github.com/FreeCAD/FreeCAD/blob/6ab8589a03b498b237f8ba88c6ae4692bb3adba6/src/Mod/TemplatePyMod/DocumentObject.py#L117-L119 MemSize] of each layer,&lt;br /&gt;
# Sort the list of layers by their size, and&lt;br /&gt;
# Print the list of layers (sorted by size)&lt;br /&gt;
&lt;br /&gt;
To use this, you first need to open the [https://wiki.freecad.org/index.php?title=Python_Console Python Console] in FreeCAD. Do this by clicking to &#039;&#039;&#039;View -&amp;gt; Panels -&amp;gt; Python Console&#039;&#039;&#039;. Then &#039;&#039;&#039;paste the following snippet&#039;&#039;&#039; into the Python Console. And &#039;&#039;&#039;press enter&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
def printMem():&lt;br /&gt;
    objs = list(FreeCAD.ActiveDocument.Objects)&lt;br /&gt;
    objs.append(FreeCAD.ActiveDocument)              # add doc to list&lt;br /&gt;
    objs.sort(reverse=True, key=lambda x: x.MemSize) # max mem is first&lt;br /&gt;
    &lt;br /&gt;
    hdr = &amp;quot;MemSize (bytes) | Object Label\n&amp;quot;&lt;br /&gt;
    hLine = &amp;quot;-&amp;quot;*len(hdr) + &amp;quot;\n&amp;quot;&lt;br /&gt;
    linesList = [&amp;quot;\n&amp;quot;, hLine, hdr, hLine]&lt;br /&gt;
    for obj in objs:&lt;br /&gt;
        linesList.append(&amp;quot;{:&amp;gt;15,d} | {}\n&amp;quot;.format(obj.MemSize, obj.Label))&lt;br /&gt;
    linesList.append(hLine)&lt;br /&gt;
    s = &amp;quot;&amp;quot;.join(linesList)&lt;br /&gt;
    print(s)&lt;br /&gt;
&lt;br /&gt;
printMem();&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that it may take several seconds to finish the calculation.&lt;br /&gt;
&lt;br /&gt;
For more information (and an example) of using the above code snippet to find the MemSize of every object in your FreeCAD file&#039;s tree, please see [https://www.eco-libre.org/big-freecad-file-size/ Troubleshooting Large FreeCAD File Sizes]&lt;br /&gt;
&lt;br /&gt;
* https://www.eco-libre.org/big-freecad-file-size/&lt;br /&gt;
&lt;br /&gt;
=Part Tree Simplification=&lt;br /&gt;
When doing design work with multiple modules of similar parts, such as the Seed Eco-Home wall modules - it is useful to collapse the part tree into a single item.&lt;br /&gt;
&lt;br /&gt;
OSE usually creates detailed CAD where every single part (such as the tens of parts of wall modules - each appear as an individual item in the Part Tree. This is useful for making instructionals, where parts can be hidden and unhidden to allow for step-by-step build sequences. Also, exploded part animations can be done using the [[Exploded Assembly Workbench]]. &lt;br /&gt;
&lt;br /&gt;
However, in the design phase, it is challenging to keep track of dozens of parts, so it is useful to collapse the part tree into a more manageable form. This can be done by either removing information from the CAD file, or retaining it. To retain all information, right click on a part tree heading and Create Group - which creates a folder. Then you can drag and drop parts into that folder. This makes it easy to keep track of parts - or selecting a bunch of parts at once by selecting that folder. This does not reduce file size.&lt;br /&gt;
&lt;br /&gt;
To reduce file size, we can remove sketches by Create Simple Copy in the Part Workbench, or by clicking on a sketch and deleting. We can also Make Compound - collapsing a bunch of parts into one. However, Make Compound does not reduce file size further - in fact, a Compound of a bunch of simple parts takes more memory than the simple copies themselves. To reduce file size of a compound, Ctrl-C and Ctrl-V into a new document. Ctrl-V into the same document doesn&#039;t seem to reduce the file size. You will notice typically when you select a compound or part with sketch:&lt;br /&gt;
&lt;br /&gt;
[[File:dependenciescopy.png|300px]]&lt;br /&gt;
&lt;br /&gt;
Select no, and your paste will be lower in size.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;To summarize - remove sketches to reduce memory, make a compound to collapse all parts into one, and then copy-paste without detail into a new file - and you will have the minimum-size file possible under one item in the part tree. Such format makes the overall assembly file in a team workflow the smallest possible, allowing for large scale design. The limit here is a few thousand part files that can be manipulated readily. Once a file reaches an unmanageable size - we can go to file simplification in terms of Level of Detail - in the next section. This is like making thumbnails of pictures available: you can work with it, but it doesn&#039;t contain all the detail. The simple version is an abstract version of the original file. Thus, in large-scale team workflows - the part tree simplification and level of detail simplification can be pursued ad infinitum - abstracting the design further and furth - so that complex assemblies can be created. In principle, the complexity of design that this process can handle has no limit. Therefore, even the largest design problems can be solved in a day - with thousands or even millions of people collaborating in realtime.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=Level of Detail=&lt;br /&gt;
&lt;br /&gt;
We work with CAD files at different levels of detail. For example, we can download a file for a valve from [[McMaster-Carr]] and the thing is a few MB because it has details like threads. But - the problem comes in when we have an assembly of many parts. This leads easily to 100MB or GB size files if one doesn&#039;t pay attention to file size. This is rather unworkable - as the computer bogs down to very slow operations.&lt;br /&gt;
&lt;br /&gt;
The solution is creating very small part files that represent the original - but instead of say 2 MB - it would be like 10k or so. Just a placeholder - which shows relatively accurate dimensions (important for analyzing part interference and fit) - but shows them in the crudest way possible. Such that - say we have a file with 200 parts of 10k each - so the entire assembly remains at only 2MB. As a general practice - files above 50MB are unusable - the practical limit is 10-20MB. But if kept down to around 1MB, navigation is lightning fast and no time is wasted. &lt;br /&gt;
&lt;br /&gt;
We save these small files as individual files, and assemblies of individual files, in the OSE [[Part Library]]. Thus, if we want to create an excessively large file - we can handle complex files of hundreds of parts without any visible slowdown of the computer. Read more about our workflow of merging files together - see [[Merge Workflow]].&lt;br /&gt;
&lt;br /&gt;
=Working Doc=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;html&amp;gt;&amp;lt;iframe src=&amp;quot;https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/embed?start=false&amp;amp;loop=false&amp;amp;delayms=3000&amp;quot; frameborder=&amp;quot;0&amp;quot; width=&amp;quot;960&amp;quot; height=&amp;quot;569&amp;quot; allowfullscreen=&amp;quot;true&amp;quot; mozallowfullscreen=&amp;quot;true&amp;quot; webkitallowfullscreen=&amp;quot;true&amp;quot;&amp;gt;&amp;lt;/iframe&amp;gt;&amp;lt;/html&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/edit#slide=id.g22c1dd84ad_1_132 edit]&lt;br /&gt;
&lt;br /&gt;
=Notes=&lt;br /&gt;
*Note that FreeCAD file size is 2.8k minimum for a cubic shape in the above presentation.&lt;br /&gt;
*Thus, the simplest useful files start at about 10k. Files with about&lt;br /&gt;
*A cube should be only a few bytes - l, w, h. 8 bits are a byte. About 65,000 divisions is 2 bytes (16 bit depth). So each dimension should be stores in 2 bytes. Thus, a cube should be 6 bytes large. If we add angle and position, we have 18 bytes. Thus, memory size of FreeCAD files can be reduced by at least 100x if files were stores in their most efficient form, because minimum file size is on the order of kilobytes, not bytes. Just sayin&#039;.&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* [https://forum.freecad.org/viewtopic.php?p=844168#p844168 Why is my FreeCAD file so large? (grainular file size view)] question on FreeCAD Forums&lt;br /&gt;
* [https://engineering.stackexchange.com/questions/63647/why-is-my-freecad-file-so-large Why is my FreeCAD file so large?] question on Engineering Stack Exchange&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311598</id>
		<title>File Simplification</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311598"/>
		<updated>2025-09-09T16:58:52Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: fix bold syntax&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction=&lt;br /&gt;
With FreeCAD, OSE practices 2 levels of file simplification. In both cases, the goals are is to reduce file size, and to simplify the part tree. OSE workflow assumes that we work with the part tree (especially the very useful feature of hiding and un-hiding parts for build instructionals purposes), and that we reduce file size as much as possible to make complex files quick to open and easy to manipulate without bogging down the computer. This is especially important when large teams are collaborating.&lt;br /&gt;
&lt;br /&gt;
The file simplification below refers to simplifying the actual features of a part - the Level of Detail section below. Another type of simplification can be done on the part tree to simplify the part tree during the design phase. This is the Part Tree Simplification section.&lt;br /&gt;
&lt;br /&gt;
=Identifying problem objects=&lt;br /&gt;
&lt;br /&gt;
If you have a large/slow FreeCAD file, you&#039;ll first want to identify &#039;&#039;which&#039;&#039; object is causing the problem.&lt;br /&gt;
&lt;br /&gt;
There is a distinction in two sizes:&lt;br /&gt;
&lt;br /&gt;
# The (compressed) on-disk size of the .FCStd file&lt;br /&gt;
# The (uncompressed) MemSize size of each object&lt;br /&gt;
&lt;br /&gt;
The two are _sometimes_ coorelated, but it&#039;s possible to have a &amp;lt;1 MB .FCStd file that is completely unusable because of a very large MemSize. This would happen, for example, if you made a very simple sketch and then an enormous array of the sketch in three dimensions (eg for a mesh object). That would compress to a very small file size, but explode to a very large (uncompressed) MemSize, crashing FreeCAD.&lt;br /&gt;
&lt;br /&gt;
Fortunately, FreeCAD is a very robust software that exposes the &amp;quot;python console&amp;quot; to the user, where you can paste custom code to interact with the objects. The snippet below will:&lt;br /&gt;
&lt;br /&gt;
# Iterate through every layer in the [FreeCAD Document&#039;s Tree](https://wiki.freecad.org/Document_structure),&lt;br /&gt;
# Get the size [https://github.com/FreeCAD/FreeCAD/blob/6ab8589a03b498b237f8ba88c6ae4692bb3adba6/src/Mod/TemplatePyMod/DocumentObject.py#L117-L119 MemSize] of each layer,&lt;br /&gt;
# Sort the list of layers by their size, and&lt;br /&gt;
# Print the list of layers (sorted by size)&lt;br /&gt;
&lt;br /&gt;
To use this, you first need to open the [https://wiki.freecad.org/index.php?title=Python_Console Python Console] in FreeCAD. Do this by clicking to &#039;&#039;&#039;View -&amp;gt; Panels -&amp;gt; Python Console&#039;&#039;&#039;. Then &#039;&#039;&#039;paste the following snippet&#039;&#039;&#039; into the Python Console. And &#039;&#039;&#039;press enter&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
def printMem():&lt;br /&gt;
    objs = list(FreeCAD.ActiveDocument.Objects)&lt;br /&gt;
    objs.append(FreeCAD.ActiveDocument)              # add doc to list&lt;br /&gt;
    objs.sort(reverse=True, key=lambda x: x.MemSize) # max mem is first&lt;br /&gt;
    &lt;br /&gt;
    hdr = &amp;quot;MemSize (bytes) | Object Label\n&amp;quot;&lt;br /&gt;
    hLine = &amp;quot;-&amp;quot;*len(hdr) + &amp;quot;\n&amp;quot;&lt;br /&gt;
    linesList = [&amp;quot;\n&amp;quot;, hLine, hdr, hLine]&lt;br /&gt;
    for obj in objs:&lt;br /&gt;
        linesList.append(&amp;quot;{:&amp;gt;15,d} | {}\n&amp;quot;.format(obj.MemSize, obj.Label))&lt;br /&gt;
    linesList.append(hLine)&lt;br /&gt;
    s = &amp;quot;&amp;quot;.join(linesList)&lt;br /&gt;
    print(s)&lt;br /&gt;
&lt;br /&gt;
printMem();&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that it may take several seconds to finish the calculation.&lt;br /&gt;
&lt;br /&gt;
For more information (and an example) of using the above code snippet to find the MemSize of every object in your FreeCAD file&#039;s tree, please see [https://www.eco-libre.org/big-freecad-file-size/ Troubleshooting Large FreeCAD File Sizes]&lt;br /&gt;
&lt;br /&gt;
* https://www.eco-libre.org/big-freecad-file-size/&lt;br /&gt;
&lt;br /&gt;
=Part Tree Simplification=&lt;br /&gt;
When doing design work with multiple modules of similar parts, such as the Seed Eco-Home wall modules - it is useful to collapse the part tree into a single item.&lt;br /&gt;
&lt;br /&gt;
OSE usually creates detailed CAD where every single part (such as the tens of parts of wall modules - each appear as an individual item in the Part Tree. This is useful for making instructionals, where parts can be hidden and unhidden to allow for step-by-step build sequences. Also, exploded part animations can be done using the [[Exploded Assembly Workbench]]. &lt;br /&gt;
&lt;br /&gt;
However, in the design phase, it is challenging to keep track of dozens of parts, so it is useful to collapse the part tree into a more manageable form. This can be done by either removing information from the CAD file, or retaining it. To retain all information, right click on a part tree heading and Create Group - which creates a folder. Then you can drag and drop parts into that folder. This makes it easy to keep track of parts - or selecting a bunch of parts at once by selecting that folder. This does not reduce file size.&lt;br /&gt;
&lt;br /&gt;
To reduce file size, we can remove sketches by Create Simple Copy in the Part Workbench, or by clicking on a sketch and deleting. We can also Make Compound - collapsing a bunch of parts into one. However, Make Compound does not reduce file size further - in fact, a Compound of a bunch of simple parts takes more memory than the simple copies themselves. To reduce file size of a compound, Ctrl-C and Ctrl-V into a new document. Ctrl-V into the same document doesn&#039;t seem to reduce the file size. You will notice typically when you select a compound or part with sketch:&lt;br /&gt;
&lt;br /&gt;
[[File:dependenciescopy.png|300px]]&lt;br /&gt;
&lt;br /&gt;
Select no, and your paste will be lower in size.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;To summarize - remove sketches to reduce memory, make a compound to collapse all parts into one, and then copy-paste without detail into a new file - and you will have the minimum-size file possible under one item in the part tree. Such format makes the overall assembly file in a team workflow the smallest possible, allowing for large scale design. The limit here is a few thousand part files that can be manipulated readily. Once a file reaches an unmanageable size - we can go to file simplification in terms of Level of Detail - in the next section. This is like making thumbnails of pictures available: you can work with it, but it doesn&#039;t contain all the detail. The simple version is an abstract version of the original file. Thus, in large-scale team workflows - the part tree simplification and level of detail simplification can be pursued ad infinitum - abstracting the design further and furth - so that complex assemblies can be created. In principle, the complexity of design that this process can handle has no limit. Therefore, even the largest design problems can be solved in a day - with thousands or even millions of people collaborating in realtime.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=Level of Detail=&lt;br /&gt;
&lt;br /&gt;
We work with CAD files at different levels of detail. For example, we can download a file for a valve from [[McMaster-Carr]] and the thing is a few MB because it has details like threads. But - the problem comes in when we have an assembly of many parts. This leads easily to 100MB or GB size files if one doesn&#039;t pay attention to file size. This is rather unworkable - as the computer bogs down to very slow operations.&lt;br /&gt;
&lt;br /&gt;
The solution is creating very small part files that represent the original - but instead of say 2 MB - it would be like 10k or so. Just a placeholder - which shows relatively accurate dimensions (important for analyzing part interference and fit) - but shows them in the crudest way possible. Such that - say we have a file with 200 parts of 10k each - so the entire assembly remains at only 2MB. As a general practice - files above 50MB are unusable - the practical limit is 10-20MB. But if kept down to around 1MB, navigation is lightning fast and no time is wasted. &lt;br /&gt;
&lt;br /&gt;
We save these small files as individual files, and assemblies of individual files, in the OSE [[Part Library]]. Thus, if we want to create an excessively large file - we can handle complex files of hundreds of parts without any visible slowdown of the computer. Read more about our workflow of merging files together - see [[Merge Workflow]].&lt;br /&gt;
&lt;br /&gt;
=Working Doc=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;html&amp;gt;&amp;lt;iframe src=&amp;quot;https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/embed?start=false&amp;amp;loop=false&amp;amp;delayms=3000&amp;quot; frameborder=&amp;quot;0&amp;quot; width=&amp;quot;960&amp;quot; height=&amp;quot;569&amp;quot; allowfullscreen=&amp;quot;true&amp;quot; mozallowfullscreen=&amp;quot;true&amp;quot; webkitallowfullscreen=&amp;quot;true&amp;quot;&amp;gt;&amp;lt;/iframe&amp;gt;&amp;lt;/html&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/edit#slide=id.g22c1dd84ad_1_132 edit]&lt;br /&gt;
&lt;br /&gt;
=Notes=&lt;br /&gt;
*Note that FreeCAD file size is 2.8k minimum for a cubic shape in the above presentation.&lt;br /&gt;
*Thus, the simplest useful files start at about 10k. Files with about&lt;br /&gt;
*A cube should be only a few bytes - l, w, h. 8 bits are a byte. About 65,000 divisions is 2 bytes (16 bit depth). So each dimension should be stores in 2 bytes. Thus, a cube should be 6 bytes large. If we add angle and position, we have 18 bytes. Thus, memory size of FreeCAD files can be reduced by at least 100x if files were stores in their most efficient form, because minimum file size is on the order of kilobytes, not bytes. Just sayin&#039;.&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* [https://forum.freecad.org/viewtopic.php?p=844168#p844168 Why is my FreeCAD file so large? (grainular file size view)] question on FreeCAD Forums&lt;br /&gt;
* [https://engineering.stackexchange.com/questions/63647/why-is-my-freecad-file-so-large Why is my FreeCAD file so large?] question on Engineering Stack Exchange&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311597</id>
		<title>File Simplification</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311597"/>
		<updated>2025-09-09T16:58:17Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: fix syntax of code&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction=&lt;br /&gt;
With FreeCAD, OSE practices 2 levels of file simplification. In both cases, the goals are is to reduce file size, and to simplify the part tree. OSE workflow assumes that we work with the part tree (especially the very useful feature of hiding and un-hiding parts for build instructionals purposes), and that we reduce file size as much as possible to make complex files quick to open and easy to manipulate without bogging down the computer. This is especially important when large teams are collaborating.&lt;br /&gt;
&lt;br /&gt;
The file simplification below refers to simplifying the actual features of a part - the Level of Detail section below. Another type of simplification can be done on the part tree to simplify the part tree during the design phase. This is the Part Tree Simplification section.&lt;br /&gt;
&lt;br /&gt;
=Identifying problem objects=&lt;br /&gt;
&lt;br /&gt;
If you have a large/slow FreeCAD file, you&#039;ll first want to identify &#039;&#039;which&#039;&#039; object is causing the problem.&lt;br /&gt;
&lt;br /&gt;
There is a distinction in two sizes:&lt;br /&gt;
&lt;br /&gt;
# The (compressed) on-disk size of the .FCStd file&lt;br /&gt;
# The (uncompressed) MemSize size of each object&lt;br /&gt;
&lt;br /&gt;
The two are _sometimes_ coorelated, but it&#039;s possible to have a &amp;lt;1 MB .FCStd file that is completely unusable because of a very large MemSize. This would happen, for example, if you made a very simple sketch and then an enormous array of the sketch in three dimensions (eg for a mesh object). That would compress to a very small file size, but explode to a very large (uncompressed) MemSize, crashing FreeCAD.&lt;br /&gt;
&lt;br /&gt;
Fortunately, FreeCAD is a very robust software that exposes the &amp;quot;python console&amp;quot; to the user, where you can paste custom code to interact with the objects. The snippet below will:&lt;br /&gt;
&lt;br /&gt;
# Iterate through every layer in the [FreeCAD Document&#039;s Tree](https://wiki.freecad.org/Document_structure),&lt;br /&gt;
# Get the size [https://github.com/FreeCAD/FreeCAD/blob/6ab8589a03b498b237f8ba88c6ae4692bb3adba6/src/Mod/TemplatePyMod/DocumentObject.py#L117-L119 MemSize] of each layer,&lt;br /&gt;
# Sort the list of layers by their size, and&lt;br /&gt;
# Print the list of layers (sorted by size)&lt;br /&gt;
&lt;br /&gt;
To use this, you first need to open the [https://wiki.freecad.org/index.php?title=Python_Console Python Console] in FreeCAD. Do this by clicking to **View -&amp;gt; Panels -&amp;gt; Python Console**. Then **paste the following snippet** into the Python Console. And **press enter**.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
def printMem():&lt;br /&gt;
    objs = list(FreeCAD.ActiveDocument.Objects)&lt;br /&gt;
    objs.append(FreeCAD.ActiveDocument)              # add doc to list&lt;br /&gt;
    objs.sort(reverse=True, key=lambda x: x.MemSize) # max mem is first&lt;br /&gt;
    &lt;br /&gt;
    hdr = &amp;quot;MemSize (bytes) | Object Label\n&amp;quot;&lt;br /&gt;
    hLine = &amp;quot;-&amp;quot;*len(hdr) + &amp;quot;\n&amp;quot;&lt;br /&gt;
    linesList = [&amp;quot;\n&amp;quot;, hLine, hdr, hLine]&lt;br /&gt;
    for obj in objs:&lt;br /&gt;
        linesList.append(&amp;quot;{:&amp;gt;15,d} | {}\n&amp;quot;.format(obj.MemSize, obj.Label))&lt;br /&gt;
    linesList.append(hLine)&lt;br /&gt;
    s = &amp;quot;&amp;quot;.join(linesList)&lt;br /&gt;
    print(s)&lt;br /&gt;
&lt;br /&gt;
printMem();&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that it may take several seconds to finish the calculation.&lt;br /&gt;
&lt;br /&gt;
For more information (and an example) of using the above code snippet to find the MemSize of every object in your FreeCAD file&#039;s tree, please see [https://www.eco-libre.org/big-freecad-file-size/ Troubleshooting Large FreeCAD File Sizes]&lt;br /&gt;
&lt;br /&gt;
* https://www.eco-libre.org/big-freecad-file-size/&lt;br /&gt;
&lt;br /&gt;
=Part Tree Simplification=&lt;br /&gt;
When doing design work with multiple modules of similar parts, such as the Seed Eco-Home wall modules - it is useful to collapse the part tree into a single item.&lt;br /&gt;
&lt;br /&gt;
OSE usually creates detailed CAD where every single part (such as the tens of parts of wall modules - each appear as an individual item in the Part Tree. This is useful for making instructionals, where parts can be hidden and unhidden to allow for step-by-step build sequences. Also, exploded part animations can be done using the [[Exploded Assembly Workbench]]. &lt;br /&gt;
&lt;br /&gt;
However, in the design phase, it is challenging to keep track of dozens of parts, so it is useful to collapse the part tree into a more manageable form. This can be done by either removing information from the CAD file, or retaining it. To retain all information, right click on a part tree heading and Create Group - which creates a folder. Then you can drag and drop parts into that folder. This makes it easy to keep track of parts - or selecting a bunch of parts at once by selecting that folder. This does not reduce file size.&lt;br /&gt;
&lt;br /&gt;
To reduce file size, we can remove sketches by Create Simple Copy in the Part Workbench, or by clicking on a sketch and deleting. We can also Make Compound - collapsing a bunch of parts into one. However, Make Compound does not reduce file size further - in fact, a Compound of a bunch of simple parts takes more memory than the simple copies themselves. To reduce file size of a compound, Ctrl-C and Ctrl-V into a new document. Ctrl-V into the same document doesn&#039;t seem to reduce the file size. You will notice typically when you select a compound or part with sketch:&lt;br /&gt;
&lt;br /&gt;
[[File:dependenciescopy.png|300px]]&lt;br /&gt;
&lt;br /&gt;
Select no, and your paste will be lower in size.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;To summarize - remove sketches to reduce memory, make a compound to collapse all parts into one, and then copy-paste without detail into a new file - and you will have the minimum-size file possible under one item in the part tree. Such format makes the overall assembly file in a team workflow the smallest possible, allowing for large scale design. The limit here is a few thousand part files that can be manipulated readily. Once a file reaches an unmanageable size - we can go to file simplification in terms of Level of Detail - in the next section. This is like making thumbnails of pictures available: you can work with it, but it doesn&#039;t contain all the detail. The simple version is an abstract version of the original file. Thus, in large-scale team workflows - the part tree simplification and level of detail simplification can be pursued ad infinitum - abstracting the design further and furth - so that complex assemblies can be created. In principle, the complexity of design that this process can handle has no limit. Therefore, even the largest design problems can be solved in a day - with thousands or even millions of people collaborating in realtime.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=Level of Detail=&lt;br /&gt;
&lt;br /&gt;
We work with CAD files at different levels of detail. For example, we can download a file for a valve from [[McMaster-Carr]] and the thing is a few MB because it has details like threads. But - the problem comes in when we have an assembly of many parts. This leads easily to 100MB or GB size files if one doesn&#039;t pay attention to file size. This is rather unworkable - as the computer bogs down to very slow operations.&lt;br /&gt;
&lt;br /&gt;
The solution is creating very small part files that represent the original - but instead of say 2 MB - it would be like 10k or so. Just a placeholder - which shows relatively accurate dimensions (important for analyzing part interference and fit) - but shows them in the crudest way possible. Such that - say we have a file with 200 parts of 10k each - so the entire assembly remains at only 2MB. As a general practice - files above 50MB are unusable - the practical limit is 10-20MB. But if kept down to around 1MB, navigation is lightning fast and no time is wasted. &lt;br /&gt;
&lt;br /&gt;
We save these small files as individual files, and assemblies of individual files, in the OSE [[Part Library]]. Thus, if we want to create an excessively large file - we can handle complex files of hundreds of parts without any visible slowdown of the computer. Read more about our workflow of merging files together - see [[Merge Workflow]].&lt;br /&gt;
&lt;br /&gt;
=Working Doc=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;html&amp;gt;&amp;lt;iframe src=&amp;quot;https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/embed?start=false&amp;amp;loop=false&amp;amp;delayms=3000&amp;quot; frameborder=&amp;quot;0&amp;quot; width=&amp;quot;960&amp;quot; height=&amp;quot;569&amp;quot; allowfullscreen=&amp;quot;true&amp;quot; mozallowfullscreen=&amp;quot;true&amp;quot; webkitallowfullscreen=&amp;quot;true&amp;quot;&amp;gt;&amp;lt;/iframe&amp;gt;&amp;lt;/html&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/edit#slide=id.g22c1dd84ad_1_132 edit]&lt;br /&gt;
&lt;br /&gt;
=Notes=&lt;br /&gt;
*Note that FreeCAD file size is 2.8k minimum for a cubic shape in the above presentation.&lt;br /&gt;
*Thus, the simplest useful files start at about 10k. Files with about&lt;br /&gt;
*A cube should be only a few bytes - l, w, h. 8 bits are a byte. About 65,000 divisions is 2 bytes (16 bit depth). So each dimension should be stores in 2 bytes. Thus, a cube should be 6 bytes large. If we add angle and position, we have 18 bytes. Thus, memory size of FreeCAD files can be reduced by at least 100x if files were stores in their most efficient form, because minimum file size is on the order of kilobytes, not bytes. Just sayin&#039;.&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* [https://forum.freecad.org/viewtopic.php?p=844168#p844168 Why is my FreeCAD file so large? (grainular file size view)] question on FreeCAD Forums&lt;br /&gt;
* [https://engineering.stackexchange.com/questions/63647/why-is-my-freecad-file-so-large Why is my FreeCAD file so large?] question on Engineering Stack Exchange&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311596</id>
		<title>File Simplification</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311596"/>
		<updated>2025-09-09T16:56:47Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: added section for calculating MemSize of every object&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction=&lt;br /&gt;
With FreeCAD, OSE practices 2 levels of file simplification. In both cases, the goals are is to reduce file size, and to simplify the part tree. OSE workflow assumes that we work with the part tree (especially the very useful feature of hiding and un-hiding parts for build instructionals purposes), and that we reduce file size as much as possible to make complex files quick to open and easy to manipulate without bogging down the computer. This is especially important when large teams are collaborating.&lt;br /&gt;
&lt;br /&gt;
The file simplification below refers to simplifying the actual features of a part - the Level of Detail section below. Another type of simplification can be done on the part tree to simplify the part tree during the design phase. This is the Part Tree Simplification section.&lt;br /&gt;
&lt;br /&gt;
=Identifying problem objects=&lt;br /&gt;
&lt;br /&gt;
If you have a large/slow FreeCAD file, you&#039;ll first want to identify _which_ object is causing the problem.&lt;br /&gt;
&lt;br /&gt;
There is a distinction in two sizes:&lt;br /&gt;
&lt;br /&gt;
# The (compressed) on-disk size of the .FCStd file&lt;br /&gt;
# The (uncompressed) MemSize size of each object&lt;br /&gt;
&lt;br /&gt;
The two are _sometimes_ coorelated, but it&#039;s possible to have a &amp;lt;1 MB .FCStd file that is completely unusable because of a very large MemSize. This would happen, for example, if you made a very simple sketch and then an enormous array of the sketch in three dimensions (eg for a mesh object). That would compress to a very small file size, but explode to a very large (uncompressed) MemSize, crashing FreeCAD.&lt;br /&gt;
&lt;br /&gt;
Fortunately, FreeCAD is a very robust software that exposes the &amp;quot;python console&amp;quot; to the user, where you can paste custom code to interact with the objects. The snippet below will:&lt;br /&gt;
&lt;br /&gt;
# Iterate through every layer in the [FreeCAD Document&#039;s Tree](https://wiki.freecad.org/Document_structure),&lt;br /&gt;
# Get the size [https://github.com/FreeCAD/FreeCAD/blob/6ab8589a03b498b237f8ba88c6ae4692bb3adba6/src/Mod/TemplatePyMod/DocumentObject.py#L117-L119 MemSize] of each layer,&lt;br /&gt;
# Sort the list of layers by their size, and&lt;br /&gt;
# Print the list of layers (sorted by size)&lt;br /&gt;
&lt;br /&gt;
To use this, you first need to open the [https://wiki.freecad.org/index.php?title=Python_Console Python Console] in FreeCAD. Do this by clicking to **View -&amp;gt; Panels -&amp;gt; Python Console**. Then **paste the following snippet** into the Python Console. And **press enter**.&lt;br /&gt;
&lt;br /&gt;
```&lt;br /&gt;
def printMem():&lt;br /&gt;
    objs = list(FreeCAD.ActiveDocument.Objects)&lt;br /&gt;
    objs.append(FreeCAD.ActiveDocument)              # add doc to list&lt;br /&gt;
    objs.sort(reverse=True, key=lambda x: x.MemSize) # max mem is first&lt;br /&gt;
    &lt;br /&gt;
    hdr = &amp;quot;MemSize (bytes) | Object Label\n&amp;quot;&lt;br /&gt;
    hLine = &amp;quot;-&amp;quot;*len(hdr) + &amp;quot;\n&amp;quot;&lt;br /&gt;
    linesList = [&amp;quot;\n&amp;quot;, hLine, hdr, hLine]&lt;br /&gt;
    for obj in objs:&lt;br /&gt;
        linesList.append(&amp;quot;{:&amp;gt;15,d} | {}\n&amp;quot;.format(obj.MemSize, obj.Label))&lt;br /&gt;
    linesList.append(hLine)&lt;br /&gt;
    s = &amp;quot;&amp;quot;.join(linesList)&lt;br /&gt;
    print(s)&lt;br /&gt;
&lt;br /&gt;
printMem();&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Note that it may take several seconds to finish the calculation.&lt;br /&gt;
&lt;br /&gt;
For more information (and an example) of using the above code snippet to find the MemSize of every object in your FreeCAD file&#039;s tree, please see [https://www.eco-libre.org/big-freecad-file-size/ Troubleshooting Large FreeCAD File Sizes]&lt;br /&gt;
&lt;br /&gt;
* https://www.eco-libre.org/big-freecad-file-size/&lt;br /&gt;
&lt;br /&gt;
=Part Tree Simplification=&lt;br /&gt;
When doing design work with multiple modules of similar parts, such as the Seed Eco-Home wall modules - it is useful to collapse the part tree into a single item.&lt;br /&gt;
&lt;br /&gt;
OSE usually creates detailed CAD where every single part (such as the tens of parts of wall modules - each appear as an individual item in the Part Tree. This is useful for making instructionals, where parts can be hidden and unhidden to allow for step-by-step build sequences. Also, exploded part animations can be done using the [[Exploded Assembly Workbench]]. &lt;br /&gt;
&lt;br /&gt;
However, in the design phase, it is challenging to keep track of dozens of parts, so it is useful to collapse the part tree into a more manageable form. This can be done by either removing information from the CAD file, or retaining it. To retain all information, right click on a part tree heading and Create Group - which creates a folder. Then you can drag and drop parts into that folder. This makes it easy to keep track of parts - or selecting a bunch of parts at once by selecting that folder. This does not reduce file size.&lt;br /&gt;
&lt;br /&gt;
To reduce file size, we can remove sketches by Create Simple Copy in the Part Workbench, or by clicking on a sketch and deleting. We can also Make Compound - collapsing a bunch of parts into one. However, Make Compound does not reduce file size further - in fact, a Compound of a bunch of simple parts takes more memory than the simple copies themselves. To reduce file size of a compound, Ctrl-C and Ctrl-V into a new document. Ctrl-V into the same document doesn&#039;t seem to reduce the file size. You will notice typically when you select a compound or part with sketch:&lt;br /&gt;
&lt;br /&gt;
[[File:dependenciescopy.png|300px]]&lt;br /&gt;
&lt;br /&gt;
Select no, and your paste will be lower in size.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;To summarize - remove sketches to reduce memory, make a compound to collapse all parts into one, and then copy-paste without detail into a new file - and you will have the minimum-size file possible under one item in the part tree. Such format makes the overall assembly file in a team workflow the smallest possible, allowing for large scale design. The limit here is a few thousand part files that can be manipulated readily. Once a file reaches an unmanageable size - we can go to file simplification in terms of Level of Detail - in the next section. This is like making thumbnails of pictures available: you can work with it, but it doesn&#039;t contain all the detail. The simple version is an abstract version of the original file. Thus, in large-scale team workflows - the part tree simplification and level of detail simplification can be pursued ad infinitum - abstracting the design further and furth - so that complex assemblies can be created. In principle, the complexity of design that this process can handle has no limit. Therefore, even the largest design problems can be solved in a day - with thousands or even millions of people collaborating in realtime.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=Level of Detail=&lt;br /&gt;
&lt;br /&gt;
We work with CAD files at different levels of detail. For example, we can download a file for a valve from [[McMaster-Carr]] and the thing is a few MB because it has details like threads. But - the problem comes in when we have an assembly of many parts. This leads easily to 100MB or GB size files if one doesn&#039;t pay attention to file size. This is rather unworkable - as the computer bogs down to very slow operations.&lt;br /&gt;
&lt;br /&gt;
The solution is creating very small part files that represent the original - but instead of say 2 MB - it would be like 10k or so. Just a placeholder - which shows relatively accurate dimensions (important for analyzing part interference and fit) - but shows them in the crudest way possible. Such that - say we have a file with 200 parts of 10k each - so the entire assembly remains at only 2MB. As a general practice - files above 50MB are unusable - the practical limit is 10-20MB. But if kept down to around 1MB, navigation is lightning fast and no time is wasted. &lt;br /&gt;
&lt;br /&gt;
We save these small files as individual files, and assemblies of individual files, in the OSE [[Part Library]]. Thus, if we want to create an excessively large file - we can handle complex files of hundreds of parts without any visible slowdown of the computer. Read more about our workflow of merging files together - see [[Merge Workflow]].&lt;br /&gt;
&lt;br /&gt;
=Working Doc=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;html&amp;gt;&amp;lt;iframe src=&amp;quot;https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/embed?start=false&amp;amp;loop=false&amp;amp;delayms=3000&amp;quot; frameborder=&amp;quot;0&amp;quot; width=&amp;quot;960&amp;quot; height=&amp;quot;569&amp;quot; allowfullscreen=&amp;quot;true&amp;quot; mozallowfullscreen=&amp;quot;true&amp;quot; webkitallowfullscreen=&amp;quot;true&amp;quot;&amp;gt;&amp;lt;/iframe&amp;gt;&amp;lt;/html&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/edit#slide=id.g22c1dd84ad_1_132 edit]&lt;br /&gt;
&lt;br /&gt;
=Notes=&lt;br /&gt;
*Note that FreeCAD file size is 2.8k minimum for a cubic shape in the above presentation.&lt;br /&gt;
*Thus, the simplest useful files start at about 10k. Files with about&lt;br /&gt;
*A cube should be only a few bytes - l, w, h. 8 bits are a byte. About 65,000 divisions is 2 bytes (16 bit depth). So each dimension should be stores in 2 bytes. Thus, a cube should be 6 bytes large. If we add angle and position, we have 18 bytes. Thus, memory size of FreeCAD files can be reduced by at least 100x if files were stores in their most efficient form, because minimum file size is on the order of kilobytes, not bytes. Just sayin&#039;.&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* [https://forum.freecad.org/viewtopic.php?p=844168#p844168 Why is my FreeCAD file so large? (grainular file size view)] question on FreeCAD Forums&lt;br /&gt;
* [https://engineering.stackexchange.com/questions/63647/why-is-my-freecad-file-so-large Why is my FreeCAD file so large?] question on Engineering Stack Exchange&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311183</id>
		<title>File Simplification</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311183"/>
		<updated>2025-08-27T18:51:51Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: add link to SE&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction=&lt;br /&gt;
With FreeCAD, OSE practices 2 levels of file simplification. In both cases, the goals are is to reduce file size, and to simplify the part tree. OSE workflow assumes that we work with the part tree (especially the very useful feature of hiding and un-hiding parts for build instructionals purposes), and that we reduce file size as much as possible to make complex files quick to open and easy to manipulate without bogging down the computer. This is especially important when large teams are collaborating.&lt;br /&gt;
&lt;br /&gt;
The file simplification below refers to simplifying the actual features of a part - the Level of Detail section below. Another type of simplification can be done on the part tree to simplify the part tree during the design phase. This is the Part Tree Simplification section.&lt;br /&gt;
&lt;br /&gt;
=Part Tree Simplification=&lt;br /&gt;
When doing design work with multiple modules of similar parts, such as the Seed Eco-Home wall modules - it is useful to collapse the part tree into a single item.&lt;br /&gt;
&lt;br /&gt;
OSE usually creates detailed CAD where every single part (such as the tens of parts of wall modules - each appear as an individual item in the Part Tree. This is useful for making instructionals, where parts can be hidden and unhidden to allow for step-by-step build sequences. Also, exploded part animations can be done using the [[Exploded Assembly Workbench]]. &lt;br /&gt;
&lt;br /&gt;
However, in the design phase, it is challenging to keep track of dozens of parts, so it is useful to collapse the part tree into a more manageable form. This can be done by either removing information from the CAD file, or retaining it. To retain all information, right click on a part tree heading and Create Group - which creates a folder. Then you can drag and drop parts into that folder. This makes it easy to keep track of parts - or selecting a bunch of parts at once by selecting that folder. This does not reduce file size.&lt;br /&gt;
&lt;br /&gt;
To reduce file size, we can remove sketches by Create Simple Copy in the Part Workbench, or by clicking on a sketch and deleting. We can also Make Compound - collapsing a bunch of parts into one. However, Make Compound does not reduce file size further - in fact, a Compound of a bunch of simple parts takes more memory than the simple copies themselves. To reduce file size of a compound, Ctrl-C and Ctrl-V into a new document. Ctrl-V into the same document doesn&#039;t seem to reduce the file size. You will notice typically when you select a compound or part with sketch:&lt;br /&gt;
&lt;br /&gt;
[[File:dependenciescopy.png|300px]]&lt;br /&gt;
&lt;br /&gt;
Select no, and your paste will be lower in size.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;To summarize - remove sketches to reduce memory, make a compound to collapse all parts into one, and then copy-paste without detail into a new file - and you will have the minimum-size file possible under one item in the part tree. Such format makes the overall assembly file in a team workflow the smallest possible, allowing for large scale design. The limit here is a few thousand part files that can be manipulated readily. Once a file reaches an unmanageable size - we can go to file simplification in terms of Level of Detail - in the next section. This is like making thumbnails of pictures available: you can work with it, but it doesn&#039;t contain all the detail. The simple version is an abstract version of the original file. Thus, in large-scale team workflows - the part tree simplification and level of detail simplification can be pursued ad infinitum - abstracting the design further and furth - so that complex assemblies can be created. In principle, the complexity of design that this process can handle has no limit. Therefore, even the largest design problems can be solved in a day - with thousands or even millions of people collaborating in realtime.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=Level of Detail=&lt;br /&gt;
&lt;br /&gt;
We work with CAD files at different levels of detail. For example, we can download a file for a valve from [[McMaster-Carr]] and the thing is a few MB because it has details like threads. But - the problem comes in when we have an assembly of many parts. This leads easily to 100MB or GB size files if one doesn&#039;t pay attention to file size. This is rather unworkable - as the computer bogs down to very slow operations.&lt;br /&gt;
&lt;br /&gt;
The solution is creating very small part files that represent the original - but instead of say 2 MB - it would be like 10k or so. Just a placeholder - which shows relatively accurate dimensions (important for analyzing part interference and fit) - but shows them in the crudest way possible. Such that - say we have a file with 200 parts of 10k each - so the entire assembly remains at only 2MB. As a general practice - files above 50MB are unusable - the practical limit is 10-20MB. But if kept down to around 1MB, navigation is lightning fast and no time is wasted. &lt;br /&gt;
&lt;br /&gt;
We save these small files as individual files, and assemblies of individual files, in the OSE [[Part Library]]. Thus, if we want to create an excessively large file - we can handle complex files of hundreds of parts without any visible slowdown of the computer. Read more about our workflow of merging files together - see [[Merge Workflow]].&lt;br /&gt;
&lt;br /&gt;
=Working Doc=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;html&amp;gt;&amp;lt;iframe src=&amp;quot;https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/embed?start=false&amp;amp;loop=false&amp;amp;delayms=3000&amp;quot; frameborder=&amp;quot;0&amp;quot; width=&amp;quot;960&amp;quot; height=&amp;quot;569&amp;quot; allowfullscreen=&amp;quot;true&amp;quot; mozallowfullscreen=&amp;quot;true&amp;quot; webkitallowfullscreen=&amp;quot;true&amp;quot;&amp;gt;&amp;lt;/iframe&amp;gt;&amp;lt;/html&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/edit#slide=id.g22c1dd84ad_1_132 edit]&lt;br /&gt;
&lt;br /&gt;
=Notes=&lt;br /&gt;
*Note that FreeCAD file size is 2.8k minimum for a cubic shape in the above presentation.&lt;br /&gt;
*Thus, the simplest useful files start at about 10k. Files with about&lt;br /&gt;
*A cube should be only a few bytes - l, w, h. 8 bits are a byte. About 65,000 divisions is 2 bytes (16 bit depth). So each dimension should be stores in 2 bytes. Thus, a cube should be 6 bytes large. If we add angle and position, we have 18 bytes. Thus, memory size of FreeCAD files can be reduced by at least 100x if files were stores in their most efficient form, because minimum file size is on the order of kilobytes, not bytes. Just sayin&#039;.&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* [https://forum.freecad.org/viewtopic.php?p=844168#p844168 Why is my FreeCAD file so large? (grainular file size view)] question on FreeCAD Forums&lt;br /&gt;
* [https://engineering.stackexchange.com/questions/63647/why-is-my-freecad-file-so-large Why is my FreeCAD file so large?] question on Engineering Stack Exchange&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311182</id>
		<title>File Simplification</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311182"/>
		<updated>2025-08-27T18:51:07Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: fix syntax&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction=&lt;br /&gt;
With FreeCAD, OSE practices 2 levels of file simplification. In both cases, the goals are is to reduce file size, and to simplify the part tree. OSE workflow assumes that we work with the part tree (especially the very useful feature of hiding and un-hiding parts for build instructionals purposes), and that we reduce file size as much as possible to make complex files quick to open and easy to manipulate without bogging down the computer. This is especially important when large teams are collaborating.&lt;br /&gt;
&lt;br /&gt;
The file simplification below refers to simplifying the actual features of a part - the Level of Detail section below. Another type of simplification can be done on the part tree to simplify the part tree during the design phase. This is the Part Tree Simplification section.&lt;br /&gt;
&lt;br /&gt;
=Part Tree Simplification=&lt;br /&gt;
When doing design work with multiple modules of similar parts, such as the Seed Eco-Home wall modules - it is useful to collapse the part tree into a single item.&lt;br /&gt;
&lt;br /&gt;
OSE usually creates detailed CAD where every single part (such as the tens of parts of wall modules - each appear as an individual item in the Part Tree. This is useful for making instructionals, where parts can be hidden and unhidden to allow for step-by-step build sequences. Also, exploded part animations can be done using the [[Exploded Assembly Workbench]]. &lt;br /&gt;
&lt;br /&gt;
However, in the design phase, it is challenging to keep track of dozens of parts, so it is useful to collapse the part tree into a more manageable form. This can be done by either removing information from the CAD file, or retaining it. To retain all information, right click on a part tree heading and Create Group - which creates a folder. Then you can drag and drop parts into that folder. This makes it easy to keep track of parts - or selecting a bunch of parts at once by selecting that folder. This does not reduce file size.&lt;br /&gt;
&lt;br /&gt;
To reduce file size, we can remove sketches by Create Simple Copy in the Part Workbench, or by clicking on a sketch and deleting. We can also Make Compound - collapsing a bunch of parts into one. However, Make Compound does not reduce file size further - in fact, a Compound of a bunch of simple parts takes more memory than the simple copies themselves. To reduce file size of a compound, Ctrl-C and Ctrl-V into a new document. Ctrl-V into the same document doesn&#039;t seem to reduce the file size. You will notice typically when you select a compound or part with sketch:&lt;br /&gt;
&lt;br /&gt;
[[File:dependenciescopy.png|300px]]&lt;br /&gt;
&lt;br /&gt;
Select no, and your paste will be lower in size.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;To summarize - remove sketches to reduce memory, make a compound to collapse all parts into one, and then copy-paste without detail into a new file - and you will have the minimum-size file possible under one item in the part tree. Such format makes the overall assembly file in a team workflow the smallest possible, allowing for large scale design. The limit here is a few thousand part files that can be manipulated readily. Once a file reaches an unmanageable size - we can go to file simplification in terms of Level of Detail - in the next section. This is like making thumbnails of pictures available: you can work with it, but it doesn&#039;t contain all the detail. The simple version is an abstract version of the original file. Thus, in large-scale team workflows - the part tree simplification and level of detail simplification can be pursued ad infinitum - abstracting the design further and furth - so that complex assemblies can be created. In principle, the complexity of design that this process can handle has no limit. Therefore, even the largest design problems can be solved in a day - with thousands or even millions of people collaborating in realtime.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=Level of Detail=&lt;br /&gt;
&lt;br /&gt;
We work with CAD files at different levels of detail. For example, we can download a file for a valve from [[McMaster-Carr]] and the thing is a few MB because it has details like threads. But - the problem comes in when we have an assembly of many parts. This leads easily to 100MB or GB size files if one doesn&#039;t pay attention to file size. This is rather unworkable - as the computer bogs down to very slow operations.&lt;br /&gt;
&lt;br /&gt;
The solution is creating very small part files that represent the original - but instead of say 2 MB - it would be like 10k or so. Just a placeholder - which shows relatively accurate dimensions (important for analyzing part interference and fit) - but shows them in the crudest way possible. Such that - say we have a file with 200 parts of 10k each - so the entire assembly remains at only 2MB. As a general practice - files above 50MB are unusable - the practical limit is 10-20MB. But if kept down to around 1MB, navigation is lightning fast and no time is wasted. &lt;br /&gt;
&lt;br /&gt;
We save these small files as individual files, and assemblies of individual files, in the OSE [[Part Library]]. Thus, if we want to create an excessively large file - we can handle complex files of hundreds of parts without any visible slowdown of the computer. Read more about our workflow of merging files together - see [[Merge Workflow]].&lt;br /&gt;
&lt;br /&gt;
=Working Doc=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;html&amp;gt;&amp;lt;iframe src=&amp;quot;https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/embed?start=false&amp;amp;loop=false&amp;amp;delayms=3000&amp;quot; frameborder=&amp;quot;0&amp;quot; width=&amp;quot;960&amp;quot; height=&amp;quot;569&amp;quot; allowfullscreen=&amp;quot;true&amp;quot; mozallowfullscreen=&amp;quot;true&amp;quot; webkitallowfullscreen=&amp;quot;true&amp;quot;&amp;gt;&amp;lt;/iframe&amp;gt;&amp;lt;/html&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/edit#slide=id.g22c1dd84ad_1_132 edit]&lt;br /&gt;
&lt;br /&gt;
=Notes=&lt;br /&gt;
*Note that FreeCAD file size is 2.8k minimum for a cubic shape in the above presentation.&lt;br /&gt;
*Thus, the simplest useful files start at about 10k. Files with about&lt;br /&gt;
*A cube should be only a few bytes - l, w, h. 8 bits are a byte. About 65,000 divisions is 2 bytes (16 bit depth). So each dimension should be stores in 2 bytes. Thus, a cube should be 6 bytes large. If we add angle and position, we have 18 bytes. Thus, memory size of FreeCAD files can be reduced by at least 100x if files were stores in their most efficient form, because minimum file size is on the order of kilobytes, not bytes. Just sayin&#039;.&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* [https://forum.freecad.org/viewtopic.php?p=844168#p844168 Why is my FreeCAD file so large? (grainular file size view)] FreeCAD Forums&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311181</id>
		<title>File Simplification</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=File_Simplification&amp;diff=311181"/>
		<updated>2025-08-27T18:50:55Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: add link to freecad forums&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction=&lt;br /&gt;
With FreeCAD, OSE practices 2 levels of file simplification. In both cases, the goals are is to reduce file size, and to simplify the part tree. OSE workflow assumes that we work with the part tree (especially the very useful feature of hiding and un-hiding parts for build instructionals purposes), and that we reduce file size as much as possible to make complex files quick to open and easy to manipulate without bogging down the computer. This is especially important when large teams are collaborating.&lt;br /&gt;
&lt;br /&gt;
The file simplification below refers to simplifying the actual features of a part - the Level of Detail section below. Another type of simplification can be done on the part tree to simplify the part tree during the design phase. This is the Part Tree Simplification section.&lt;br /&gt;
&lt;br /&gt;
=Part Tree Simplification=&lt;br /&gt;
When doing design work with multiple modules of similar parts, such as the Seed Eco-Home wall modules - it is useful to collapse the part tree into a single item.&lt;br /&gt;
&lt;br /&gt;
OSE usually creates detailed CAD where every single part (such as the tens of parts of wall modules - each appear as an individual item in the Part Tree. This is useful for making instructionals, where parts can be hidden and unhidden to allow for step-by-step build sequences. Also, exploded part animations can be done using the [[Exploded Assembly Workbench]]. &lt;br /&gt;
&lt;br /&gt;
However, in the design phase, it is challenging to keep track of dozens of parts, so it is useful to collapse the part tree into a more manageable form. This can be done by either removing information from the CAD file, or retaining it. To retain all information, right click on a part tree heading and Create Group - which creates a folder. Then you can drag and drop parts into that folder. This makes it easy to keep track of parts - or selecting a bunch of parts at once by selecting that folder. This does not reduce file size.&lt;br /&gt;
&lt;br /&gt;
To reduce file size, we can remove sketches by Create Simple Copy in the Part Workbench, or by clicking on a sketch and deleting. We can also Make Compound - collapsing a bunch of parts into one. However, Make Compound does not reduce file size further - in fact, a Compound of a bunch of simple parts takes more memory than the simple copies themselves. To reduce file size of a compound, Ctrl-C and Ctrl-V into a new document. Ctrl-V into the same document doesn&#039;t seem to reduce the file size. You will notice typically when you select a compound or part with sketch:&lt;br /&gt;
&lt;br /&gt;
[[File:dependenciescopy.png|300px]]&lt;br /&gt;
&lt;br /&gt;
Select no, and your paste will be lower in size.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;To summarize - remove sketches to reduce memory, make a compound to collapse all parts into one, and then copy-paste without detail into a new file - and you will have the minimum-size file possible under one item in the part tree. Such format makes the overall assembly file in a team workflow the smallest possible, allowing for large scale design. The limit here is a few thousand part files that can be manipulated readily. Once a file reaches an unmanageable size - we can go to file simplification in terms of Level of Detail - in the next section. This is like making thumbnails of pictures available: you can work with it, but it doesn&#039;t contain all the detail. The simple version is an abstract version of the original file. Thus, in large-scale team workflows - the part tree simplification and level of detail simplification can be pursued ad infinitum - abstracting the design further and furth - so that complex assemblies can be created. In principle, the complexity of design that this process can handle has no limit. Therefore, even the largest design problems can be solved in a day - with thousands or even millions of people collaborating in realtime.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
=Level of Detail=&lt;br /&gt;
&lt;br /&gt;
We work with CAD files at different levels of detail. For example, we can download a file for a valve from [[McMaster-Carr]] and the thing is a few MB because it has details like threads. But - the problem comes in when we have an assembly of many parts. This leads easily to 100MB or GB size files if one doesn&#039;t pay attention to file size. This is rather unworkable - as the computer bogs down to very slow operations.&lt;br /&gt;
&lt;br /&gt;
The solution is creating very small part files that represent the original - but instead of say 2 MB - it would be like 10k or so. Just a placeholder - which shows relatively accurate dimensions (important for analyzing part interference and fit) - but shows them in the crudest way possible. Such that - say we have a file with 200 parts of 10k each - so the entire assembly remains at only 2MB. As a general practice - files above 50MB are unusable - the practical limit is 10-20MB. But if kept down to around 1MB, navigation is lightning fast and no time is wasted. &lt;br /&gt;
&lt;br /&gt;
We save these small files as individual files, and assemblies of individual files, in the OSE [[Part Library]]. Thus, if we want to create an excessively large file - we can handle complex files of hundreds of parts without any visible slowdown of the computer. Read more about our workflow of merging files together - see [[Merge Workflow]].&lt;br /&gt;
&lt;br /&gt;
=Working Doc=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;html&amp;gt;&amp;lt;iframe src=&amp;quot;https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/embed?start=false&amp;amp;loop=false&amp;amp;delayms=3000&amp;quot; frameborder=&amp;quot;0&amp;quot; width=&amp;quot;960&amp;quot; height=&amp;quot;569&amp;quot; allowfullscreen=&amp;quot;true&amp;quot; mozallowfullscreen=&amp;quot;true&amp;quot; webkitallowfullscreen=&amp;quot;true&amp;quot;&amp;gt;&amp;lt;/iframe&amp;gt;&amp;lt;/html&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://docs.google.com/presentation/d/11CXpjC2phOyV40SSsKuAET6jEAIGK036BmqTNzkxwmM/edit#slide=id.g22c1dd84ad_1_132 edit]&lt;br /&gt;
&lt;br /&gt;
=Notes=&lt;br /&gt;
*Note that FreeCAD file size is 2.8k minimum for a cubic shape in the above presentation.&lt;br /&gt;
*Thus, the simplest useful files start at about 10k. Files with about&lt;br /&gt;
*A cube should be only a few bytes - l, w, h. 8 bits are a byte. About 65,000 divisions is 2 bytes (16 bit depth). So each dimension should be stores in 2 bytes. Thus, a cube should be 6 bytes large. If we add angle and position, we have 18 bytes. Thus, memory size of FreeCAD files can be reduced by at least 100x if files were stores in their most efficient form, because minimum file size is on the order of kilobytes, not bytes. Just sayin&#039;.&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
 * [https://forum.freecad.org/viewtopic.php?p=844168#p844168 Why is my FreeCAD file so large? (grainular file size view)] FreeCAD Forums&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=308008</id>
		<title>Maltfield Log/2025 Q2</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=308008"/>
		<updated>2025-05-31T19:39:21Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: apr 30&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the second quarter of the year 2025. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Wed Apr 30, 2025=&lt;br /&gt;
# This morning we&#039;re going to replace /dev/sda on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-04-30_replace_hetzner2_sda&lt;br /&gt;
# unfortunately my computer was off when I woke up&lt;br /&gt;
# worse, my personal keepass db appeared to be corrupt&lt;br /&gt;
# I had to restore from my most recent on-boot backup of my keepass, which means I have ~3 weeks of data loss&lt;br /&gt;
# so I&#039;m starting this change about half an hour late due to ^ that&lt;br /&gt;
# first-off, I logged into hetzner and the wiki to make damn sure I have those creds before continuing&lt;br /&gt;
# last week I had asked hetzner support to ensure they had a stock of the replacement drive we needed&lt;br /&gt;
## they responded asking me to update an existing ticket, but idk how to even view existing tickets. when I click &amp;quot;support&amp;quot; after logging-in, it just sends me to create a new ticket&lt;br /&gt;
## probably too late, but I responded (by email) to this response&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; you have another ticket open and running parallel to this one.&lt;br /&gt;
&lt;br /&gt;
Can you please tell me where I can see my existing open tickets in your&lt;br /&gt;
website?&lt;br /&gt;
&lt;br /&gt;
When I click on &amp;quot;Support&amp;quot; after logging-in, I am only given an option to&lt;br /&gt;
create new tickets. I can&#039;t see any existing tickets here&lt;br /&gt;
&lt;br /&gt;
 * https://robot.hetzner.com/support/index&lt;br /&gt;
&lt;br /&gt;
If your request about an additional drive refers to REDACTED then&lt;br /&gt;
please copy / paste you request from this ticket into Ticket#&lt;br /&gt;
2025042403016013, as it will then land in the data center where your server&lt;br /&gt;
is online and the DC Support staff can respond to you according. Thank-you.&lt;br /&gt;
&lt;br /&gt;
Yes, this is regarding REDACTED&lt;br /&gt;
&lt;br /&gt;
Sorry, I can&#039;t paste it into any existing ticket because [a] existing&lt;br /&gt;
tickets are not visible when I login and [b] you didn&#039;t tell me how to&lt;br /&gt;
access the existing tickets..&lt;br /&gt;
&lt;br /&gt;
We need the disk for a scheduled change in a couple hours.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
Michael&lt;br /&gt;
&lt;br /&gt;
On Mon, Apr 28, 2025 at 5:03=E2=80=AFAM Support - Hetzner Online GmbH &amp;lt;&lt;br /&gt;
support@hetzner.com&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Dear Mr Altfield&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank-you for your request.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I notice that you are not referring to a specific server ID number in thi=&lt;br /&gt;
s&lt;br /&gt;
&amp;gt; ticket and that you have another ticket open and running parallel to this&lt;br /&gt;
&amp;gt; one.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If your request about an additional drive refers to REDACTED the=&lt;br /&gt;
n&lt;br /&gt;
&amp;gt; please copy / paste you request from this ticket into Ticket#&lt;br /&gt;
&amp;gt; REDACTED, as it will then land in the data center where your serv=&lt;br /&gt;
er&lt;br /&gt;
&amp;gt; is online and the DC Support staff can respond to you according. Thank-yo=&lt;br /&gt;
u.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Kind regards&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Robin Rabe&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Sales &amp;amp; Product Advic&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well it was worth trying; let&#039;s proceed in hopes they have stock.&lt;br /&gt;
# I logged-into hetzner2 and confirmed that it completed its daily reboot just 48 minutes ago, so we can proceed without worry it&#039;ll reboot again for ~23 hours&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:29:46 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# uptime&lt;br /&gt;
 11:29:48 up 48 min,  5 users,  load average: 1.29, 1.27, 1.01&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed that the RAID is currently healthy&lt;br /&gt;
# and today&#039;s backup (from a few hours ago) is sane and uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20133744108 daily_hetzner3_20250430_080904.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed again that /dev/sdb is PASSED and /dev/sda is FAIL&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed that our &amp;quot;new&amp;quot; (used) /dev/sdb (replaced last week) still has 4% of its life left (no change from last week)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52223&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1452&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       29&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   049   000    Old_age   Always       -       36 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       601634812550&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18904241237&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11849811867&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78658&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       63&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3454&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       56&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   062   046   000    Old_age   Always       -       38 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       408221767008&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12873452848&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26389101858&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I confirmed again the serial of the disk we want to replace matches the one listed in this CHG ticket&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I&#039;m removing sda from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:06 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm: set /dev/sda1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm: set /dev/sda2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm: set /dev/sda3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm: hot removed /dev/sda1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm: hot removed /dev/sda2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
mdadm: hot removed /dev/sda3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:58 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I submitted the request for support to swap the disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SMART says disk is FAILED and needs to be replaced asap.&lt;br /&gt;
&lt;br /&gt;
I&#039;ve removed /dev/sda (Crucial_CT250MX200SSD1_154410FA336C) from the RAID, and it is now ready to be replaced with a new disk (with &amp;lt;1,000 hours of operation). Please replace the disk asap.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# last time it took about 1 hour for them to respond saying the new disk was installed. I&#039;ll come back in about an hour&lt;br /&gt;
# ...&lt;br /&gt;
# I got an email at 13:20 UTC (08:20 my time), saying the drive was replaced&lt;br /&gt;
# ugh, they gave us a drive with 18,623 hours of use. It only has 32% of its life left&lt;br /&gt;
# I replied to the support ticket within 2 minutes telling them to replace it again with a drive that has &amp;lt;1,000 hours of use&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       18623&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       9&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   032   032   000    Old_age   Always       -       1030&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       2&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   068   047   000    Old_age   Always       -       32 (Min/Max 23/53)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   032   032   001    Old_age   Offline      -       68&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       96994281182&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       3059820027&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       31429771271&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2467&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 13:23:39 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I got a response back from hetzner 4 minutes later&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client.&lt;br /&gt;
&lt;br /&gt;
We do not have these drives &amp;quot;new&amp;quot; anymore. Therefore, this is not possible. We already selected a drive with less than 20.000h. We also did not charge the fee for a new drive.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like we got the drive free, but that&#039;s still nearly a waste of my time. I replied and asked them how long it would take for them to order a new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
I emailed last week about this to make sure you had time to order a new drive (check my support tickets).&lt;br /&gt;
&lt;br /&gt;
This drive you inserted has only 32% of its life left, according to SMART. It&#039;s closer to dead than new.&lt;br /&gt;
&lt;br /&gt;
How long would it take you to order a new drive?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m going to go ahead and provision it&lt;br /&gt;
# I tried to update the wiki, but it looks like I got logged-out and I can&#039;t login again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the disk isn&#039;t full, and I&#039;m not getting read only i/o errors like last time (when they removed both drives by mistake)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  157G   31G  84% /&lt;br /&gt;
/dev/md1        486M  383M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like they gave us another 500G disk; I bet they just don&#039;t stock the 250G anymore&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_18301DC6A088&lt;br /&gt;
ID_SERIAL_SHORT=18301DC6A088&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I made a backup of the partitions&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# pushd $chg_dir&lt;br /&gt;
/var/tmp/chg.20250430_134343 ~&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# du -sh ${chg_dir}/*&lt;br /&gt;
0       /var/tmp/chg.20250430_134343/sda_parttable_mbr.bak&lt;br /&gt;
4.0K    /var/tmp/chg.20250430_134343/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the sda partition is empty, which makes sense&lt;br /&gt;
# I copied the sdb partition to sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
Checking that no-one is using this disk right now ...&lt;br /&gt;
OK&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 62260 cylinders, 255 heads, 63 sectors/track&lt;br /&gt;
sfdisk:  /dev/sda: unrecognized partition table type&lt;br /&gt;
&lt;br /&gt;
Old situation:&lt;br /&gt;
sfdisk: No partitions found&lt;br /&gt;
&lt;br /&gt;
New situation:&lt;br /&gt;
Units: sectors of 512 bytes, counting from 0&lt;br /&gt;
&lt;br /&gt;
   Device Boot    Start       End   #sectors  Id  System&lt;br /&gt;
/dev/sda1          2048  67110912   67108865  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2      67112960  68161536    1048577  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3      68163584 488395120  420231537  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda4             0         -          0   0  Empty&lt;br /&gt;
Warning: partition 1 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not end at a cylinder boundary&lt;br /&gt;
Warning: no primary partition is marked bootable (active)&lt;br /&gt;
This does not matter for LILO, but the DOS MBR will not boot this disk.&lt;br /&gt;
Successfully wrote the new partition table&lt;br /&gt;
&lt;br /&gt;
Re-reading the partition table ...&lt;br /&gt;
&lt;br /&gt;
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)&lt;br /&gt;
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1&lt;br /&gt;
(See fdisk(8).)&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and reloaded the kernel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# blockdev --rereadpt /dev/sda&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I added the three partitions of the new disk to the RAID; note that this time I added /boot first, then /, then swap. I think it&#039;ll sync in that order (of priority)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm: added /dev/sda2&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
mdadm: added /dev/sda3&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm: added /dev/sda1&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# cool, that worked. /boot is already done, and it&#039;s syncing root (/) now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# date -u&lt;br /&gt;
Wed Apr 30 13:48:43 UTC 2025&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  [=&amp;gt;...................]  recovery =  9.1% (19231872/209984640) finish=16.5min speed=192161K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# I went ahead and installed grub. I guess I&#039;ll do this again after all the partitions sync, but I think it should actually work this time because the /boot partition was done first and is already done syncing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# as noted in the docs, those warnings can be safely ignored&lt;br /&gt;
# replication is finished; I guess these Micron 500G disks have better i/o throughput than our old 200GCrucial disks&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Wed Apr 30 14:07:12 UTC 2025&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  [====&amp;gt;................]  recovery = 21.2% (7124992/33521664) finish=2.2min speed=191533K/sec&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wed Apr 30 14:12:12 UTC 2025&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m going to double-tap the grub install before giving it a reboot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I rebooted it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
user@personal:~$ autossh opensourceecology.org&lt;br /&gt;
Last login: Wed Apr 30 11:28:26 2025 from REDACTED&lt;br /&gt;
[maltfield@opensourceecology ~]$ uptime&lt;br /&gt;
 14:17:14 up 1 min,  1 user,  load average: 0.85, 0.24, 0.08&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# cool, it came back.&lt;br /&gt;
# cool, raid looks healthy&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[3]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[3]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and SMART isn&#039;t yelling about failed disks anymore&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m marking this CHG as completed successfully&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin asked me how much longer the 4% disk will last; I replied&lt;br /&gt;
# so it says it&#039;s bee online for 52,235 hours, and it has 4% remaining&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52235&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       47&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1452&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       30&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   049   000    Old_age   Always       -       35 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       601655717236&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18904918036&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11850643256&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that&#039;s 52,235/96 = 544.114584 hours per percent&lt;br /&gt;
# so I guess we have 4*544.114584 = 2,176.458333332 hours = 90 days left&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Likely longer than 90 days. Plus or Minus a very large uncertainty.&lt;br /&gt;
&lt;br /&gt;
SMART data keys are a standard, but the values are very vendor-specific. So they vary *a lot*. And, for the &amp;quot;life left&amp;quot; percent -- obviously something can be said about designed obsolescence. Or at least it&#039;s in the interest of the vendor to tell you to replace a drive earlier than needed. I have no idea how long you&#039;ve been running on two disks that have 0% &amp;quot;life left&amp;quot;. In any case, I wouldn&#039;t be cheap about disks; it&#039;s not worth the risk.&lt;br /&gt;
&lt;br /&gt;
Anyway, the disk with 4% &amp;quot;life left&amp;quot; says that it&#039;s been online for 52,235 hours. So dividing that by 96% and then multiplying by 4 suggests that you have maybe 2,176 hours before it says it has 0% &amp;quot;life left&amp;quot;. But it might also depend on the read/write frequency/pattern used by the previous customer. So take it with a big grain of salt.&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/30/25 15:53, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; 4% of life left means how many days? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 27, 2025=&lt;br /&gt;
# Tom created a GitHub account https://github.com/tgriff-ose&lt;br /&gt;
# I invited this new account to become a member of the official OSE GitHub org, and sent them an email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Tom,&lt;br /&gt;
&lt;br /&gt;
I&#039;ve invited you to join the official OSE GitHub org:&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/orgs/OpenSourceEcology&lt;br /&gt;
&lt;br /&gt;
Please check your GitHub notifications and accept the invite.&lt;br /&gt;
&lt;br /&gt;
PS: If you haven&#039;t yet, can you please enable 2FA on your GitHub account?&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/26/25 22:42, REDACTED@tutanota.com wrote:&lt;br /&gt;
&amp;gt; Account name: tgriff-ose&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; -- &lt;br /&gt;
&amp;gt; Tom Griffing&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Apr 27, 2025, 03:24 by REDACTED@disroot.org:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; GitHub is owned by Microsoft, and it&#039;s free (as in beer) to create an account.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Could you please create a free GitHub account?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt;&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt;&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; On 4/26/25 21:06, REDACTED@tutanota.com wrote:&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; Michael;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; I don&#039;t have a github account, as it&#039;s a Microsoft thing requiring a paid account. I don&#039;t intent to support them.&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; Is there any other way to access the ansible repo?&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; -- &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; Tom Griffing &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin confirmed that he has not received a bill from AWS for some time, so it appears we did finally delete all of the glacier crap&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
I have not received another bill since January, so it looks like there is&lt;br /&gt;
nothing owed.&lt;br /&gt;
MJ&lt;br /&gt;
&lt;br /&gt;
On Sat, Apr 26, 2025 at 6:28 PM Michael Altfield &amp;lt;REDACTED&amp;gt;&lt;br /&gt;
wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Speaking of aws, can you confirm that your bill for last month was $0?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# I updated my wiki and osedev work logs for April so-far&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 26, 2025=&lt;br /&gt;
# Marcin authorized me to add Tom to our ops google groups mailing list and to give him access to our shared ose keepass&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Fri, Apr 25, 2025, 12:43 PM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; (re-sending without encryption)&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On 4/25/25 12:41, Michael Altfield wrote:&lt;br /&gt;
&amp;gt;&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Do you authorize:&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; 1. Giving Tom access to the shared OSE keepass file&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; 2. Adding Tom to the ops mailing list (this would allow him to password&lt;br /&gt;
&amp;gt;&amp;gt; reset many of our important accounts)&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Please let me know if you authorize the above.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom sent me his gpg public key, which I can use to add him to the wazuh emails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~$ gpg&lt;br /&gt;
gpg: WARNING: no command supplied.  Trying to guess what you mean ...&lt;br /&gt;
gpg: Go ahead and type your message ...&lt;br /&gt;
-----BEGIN PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
mQINBGgMJ7ABEACwllLJu87blFKJ8aZMR7pCjRzhhp266Rjxz7071iow43a7FkvN&lt;br /&gt;
pcXmYsuwW4dLhqA+Sose7Fjo9o9+7bOLcBAso9x9hk55+pDQm67wyXmxp+7pWVhj&lt;br /&gt;
hdLBsdB4faLQDHkHymKUs/UKRViN0an/6nARxVyah58Dh/OcnSIv0bnozze8YRJX&lt;br /&gt;
aklCs+OF2Jv+gBH5VWNMLloX+l+MsBYj9N14MsMeWJ8lSNFWBl/SOBGuOftZbljp&lt;br /&gt;
qb8dBZRo/4OR/Dr5zCUQ1KuPu2wFKfMRwi3NEdmUKpFf/U7Ydn7ZK2T+ZKl+x1eb&lt;br /&gt;
+0I0ZM0DgaTYTqd82wlag1hfrYM7SONYb0C03x5T4y+CsG9IchgQ2yihYIKgHOIW&lt;br /&gt;
Wiz6vC4N4EKmuKAqCOGS/gzp7xDqzXl2R2sWHyRuOn3yUr2z9HdDk2sjnobtaVli&lt;br /&gt;
wYaIoes9zrBgunLoK9S0FaHzSPX0FGwygV50E73BFxJBmL6eHeRVuYOi0FkAQmsN&lt;br /&gt;
dJeOvpCwKgBModyPbxin78KKbgF/0OnxWL+Zde6+J5l+aW81xbwNZYuyxWHSb7m3&lt;br /&gt;
2RM4dXhxAWM2cBQ5+b5yKopO8T4OzKl5C/rYzhuEYqpSEQJccFNHmQexkwqACVNl&lt;br /&gt;
h/D97jm0580ctnGCZuNzmLlsXX2mzqOj6UU2LlUFy0HT5tr93KBA+HkGhwARAQAB&lt;br /&gt;
tEBUb20gR3JpZmZpbmcgKE9TRSBQR1AgS2V5IDQtMjUtMjAyNSkgPHRvbS5ncmlm&lt;br /&gt;
ZmluZ0B0dXRhbm90YS5jb20+iQJRBBMBCgA7FiEEEzAJATSKmFEVZ5Fl+xN6Yz/R&lt;br /&gt;
60wFAmgMJ7ACGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQ+xN6Yz/R&lt;br /&gt;
60xHURAAqIUawudDI3dmIVPa/RHTOusoJA4KIXLNCMiILWd3iwZQFQNrt6YHpwJU&lt;br /&gt;
pyvsXAM4QWd/qt0D9IF6K9waOIA5ipX0yXFVxZ0V1BQ6aq3cK1r+NvQUcLJzS02W&lt;br /&gt;
T9UIJtHOs+8EbIIS6ybcnxS6RARinrJpTkoCWspWXMDnXcX3n4pbbhHQLViswf1C&lt;br /&gt;
tOE7uSfNPcxGLK4cYLxLL1VHC45eB2CTEAxfXSavCPI62IcYkZBdwWz7E8q1QpsP&lt;br /&gt;
vxgxe31b+v9NcaxW5tc2/4NwaObqKSZYlhK/pce3X18+uWzpmE3ubhPb7Ptb5GLo&lt;br /&gt;
42U9ymRFg7a14VFfq+wcwSlZR01o7Q2FofAOFpX+EoDBkughAX6hWyYxErJ4vD7k&lt;br /&gt;
ogYX25J5suxrixkTzDMJ0cCsZyt/Bu0liVnojaETUhrNUwBp7Rz7xx5x6Go/sZHK&lt;br /&gt;
mzhCe1q4xwSHeTZTjyG3oby4KDPgb0WEKCdUpa5BobgT9goGGXjCxe9dS8ZVUu4I&lt;br /&gt;
bso+h/SK95nmgsl/EDrmDXvWOh/Zy76GixCq48ydEkGbVz/6ri1+pD0NXYN/ijAu&lt;br /&gt;
h6EsLnoBLQCLlYYsBTfg31X2Sbzigeloy6iRWoHtCOAfI2Azdhby+BCGuSIvUOXa&lt;br /&gt;
Q4CQjmjYpsx7nwtjWOgCZ4rObTekj4O9ZnI8Gtxfpzy1gFdyfw65Ag0EaAwnsAEQ&lt;br /&gt;
ANnD6PMPT0CU1RqbAQtVw7eJksV96+tl/xG8mtje631n2uBe9WzyLch0fgC99eID&lt;br /&gt;
ZDGXfJUEdODuI9/H8037PnJmmMtP2eP1c/ztrql6pxPj9c0jIRWjtwmNhyYNaaEn&lt;br /&gt;
i0JyLz5SiTbuftlHXaKhVTuLc/Qp44FH5XK6LVHphDR8Ck43Mhj7enfvGvmAUgLW&lt;br /&gt;
OLQMst84oOCywYX+nUmov2rCIhuc6RhX4OcOBZcEA2W/CSsoNXR4To9mn8Gg3/dH&lt;br /&gt;
ZKS/3sDwJQxjFvkqc89+aTPY85TBoUGBUzbQG+KFQgDyVt4kABK1iyUA1PKZOb4Q&lt;br /&gt;
MZJnR9g0UI/ctfrOpz4hhEFaQ+rEYwdm5MSXOQGfjrnGu3t85IQzmxUXovqmfsjn&lt;br /&gt;
oFPSPd/91/rJJKxci+rCX7CpQSObPrwHNgPNQ5zleDV7d9/u9UaGRFeOaaM+abd0&lt;br /&gt;
RhPh4nJWbDdNOWpj3pxJkG3tzmbazBogxTq0SDRP8wvBAD0JYESoPVGWQ6czlTnu&lt;br /&gt;
T0ov9QKMb21mfUQ6DmfxTFQbkr1g1r2uYfJ1TbP0AcAK+Q/IMtt8F7chulfAe7/0&lt;br /&gt;
9nk7HwqWHTkj8+YB9+Ro2hkUTpL57uEYdG/ukGODfTNhu02wxG02zlYFsTyd/H62&lt;br /&gt;
VIgT1Cpf5HBb73lzdiSVtl45C34Fwu8ZO6dBdmk2c1nFABEBAAGJAjYEGAEKACAW&lt;br /&gt;
IQQTMAkBNIqYURVnkWX7E3pjP9HrTAUCaAwnsAIbDAAKCRD7E3pjP9HrTNxGD/wN&lt;br /&gt;
syvVZxm4hyw4l8U6J3B/3rKAup+l7GQCXthNK+f3YPwWdWc8DOo3kBrP4ppR5Ry9&lt;br /&gt;
YKb700wBDAYwWfy+ZJPHMi0vVUf8kX2QQEj4sFZHj9suTFvfLdsLTAhNtRXVtZiu&lt;br /&gt;
xfr1T3R3T0XSSFFdhiBO+BYRnlgFRiiR9FCTDaxrLRfhAhOwC6LHOarHnRi5nQS8&lt;br /&gt;
2PaHIYbWN7c5CdpH9dsPUt3xi1sEf8E87HTZo30Of/FYtB4eTOdx2DMqKscbJvZS&lt;br /&gt;
1ugK+2v7DMaiBMZCfbZSVNjn8+VcTOPW5KzJFsVR7UmfvTZu6c3jrshHuPOSguT7&lt;br /&gt;
l63AcfrJZOJe+djndWws2u0FpyMu0AHoS2r3EtBd/OydjEKG2P7qFb3KX9I9Tv35&lt;br /&gt;
zQmpHc4e2TJTYKpXyfarzgKFuUfOmZpm8maUTqFdEBL6pgwi1zcQ704g7Kzo/YUr&lt;br /&gt;
dHTA5yQ2WBBsrVKAZIt6Llkt0jIkpSyjjs5CAPJ2jsg61nq4uYw7w3jpwe80nbyc&lt;br /&gt;
7GgvdkJlTS7TfcYk3vlDQOQBpXqDZagQVUT8jc6mGiY/jbSzjGNt/8qObKSywFLY&lt;br /&gt;
XnxLVnGhKyzsWhR5fEbUCqywwc/c14gbjNguNZbU7e0Krf9ggYoglfPIOOp8XDX1&lt;br /&gt;
XwH+EXkSGW96dHXIYidONcMxClnA04zZY52Sr/r6Lw==&lt;br /&gt;
=UsaD&lt;br /&gt;
-----END PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
pub   rsa4096 2025-04-26 [SC]&lt;br /&gt;
	  13300901348A985115679165FB137A633FD1EB4C&lt;br /&gt;
uid           Tom Griffing (OSE PGP Key 4-25-2025) &amp;lt;REDACTED@tutanota.com&amp;gt;&lt;br /&gt;
sub   rsa4096 2025-04-26 [E]&lt;br /&gt;
user@ose:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I added Tom to the wazuh recipients, per https://wiki.opensourceecology.org/wiki/Wazuh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir -p /var/tmp/gpg&lt;br /&gt;
pushd /var/tmp/gpg&lt;br /&gt;
# write multi-line to file for documentation copy &amp;amp; paste&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; /var/tmp/gpg/tom.pubkey.asc&lt;br /&gt;
-----BEGIN PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
mQINBGgMJ7ABEACwllLJu87blFKJ8aZMR7pCjRzhhp266Rjxz7071iow43a7FkvN&lt;br /&gt;
pcXmYsuwW4dLhqA+Sose7Fjo9o9+7bOLcBAso9x9hk55+pDQm67wyXmxp+7pWVhj&lt;br /&gt;
hdLBsdB4faLQDHkHymKUs/UKRViN0an/6nARxVyah58Dh/OcnSIv0bnozze8YRJX&lt;br /&gt;
aklCs+OF2Jv+gBH5VWNMLloX+l+MsBYj9N14MsMeWJ8lSNFWBl/SOBGuOftZbljp&lt;br /&gt;
qb8dBZRo/4OR/Dr5zCUQ1KuPu2wFKfMRwi3NEdmUKpFf/U7Ydn7ZK2T+ZKl+x1eb&lt;br /&gt;
+0I0ZM0DgaTYTqd82wlag1hfrYM7SONYb0C03x5T4y+CsG9IchgQ2yihYIKgHOIW&lt;br /&gt;
Wiz6vC4N4EKmuKAqCOGS/gzp7xDqzXl2R2sWHyRuOn3yUr2z9HdDk2sjnobtaVli&lt;br /&gt;
wYaIoes9zrBgunLoK9S0FaHzSPX0FGwygV50E73BFxJBmL6eHeRVuYOi0FkAQmsN&lt;br /&gt;
dJeOvpCwKgBModyPbxin78KKbgF/0OnxWL+Zde6+J5l+aW81xbwNZYuyxWHSb7m3&lt;br /&gt;
2RM4dXhxAWM2cBQ5+b5yKopO8T4OzKl5C/rYzhuEYqpSEQJccFNHmQexkwqACVNl&lt;br /&gt;
h/D97jm0580ctnGCZuNzmLlsXX2mzqOj6UU2LlUFy0HT5tr93KBA+HkGhwARAQAB&lt;br /&gt;
tEBUb20gR3JpZmZpbmcgKE9TRSBQR1AgS2V5IDQtMjUtMjAyNSkgPHRvbS5ncmlm&lt;br /&gt;
ZmluZ0B0dXRhbm90YS5jb20+iQJRBBMBCgA7FiEEEzAJATSKmFEVZ5Fl+xN6Yz/R&lt;br /&gt;
60wFAmgMJ7ACGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQ+xN6Yz/R&lt;br /&gt;
60xHURAAqIUawudDI3dmIVPa/RHTOusoJA4KIXLNCMiILWd3iwZQFQNrt6YHpwJU&lt;br /&gt;
pyvsXAM4QWd/qt0D9IF6K9waOIA5ipX0yXFVxZ0V1BQ6aq3cK1r+NvQUcLJzS02W&lt;br /&gt;
T9UIJtHOs+8EbIIS6ybcnxS6RARinrJpTkoCWspWXMDnXcX3n4pbbhHQLViswf1C&lt;br /&gt;
tOE7uSfNPcxGLK4cYLxLL1VHC45eB2CTEAxfXSavCPI62IcYkZBdwWz7E8q1QpsP&lt;br /&gt;
vxgxe31b+v9NcaxW5tc2/4NwaObqKSZYlhK/pce3X18+uWzpmE3ubhPb7Ptb5GLo&lt;br /&gt;
42U9ymRFg7a14VFfq+wcwSlZR01o7Q2FofAOFpX+EoDBkughAX6hWyYxErJ4vD7k&lt;br /&gt;
ogYX25J5suxrixkTzDMJ0cCsZyt/Bu0liVnojaETUhrNUwBp7Rz7xx5x6Go/sZHK&lt;br /&gt;
mzhCe1q4xwSHeTZTjyG3oby4KDPgb0WEKCdUpa5BobgT9goGGXjCxe9dS8ZVUu4I&lt;br /&gt;
bso+h/SK95nmgsl/EDrmDXvWOh/Zy76GixCq48ydEkGbVz/6ri1+pD0NXYN/ijAu&lt;br /&gt;
h6EsLnoBLQCLlYYsBTfg31X2Sbzigeloy6iRWoHtCOAfI2Azdhby+BCGuSIvUOXa&lt;br /&gt;
Q4CQjmjYpsx7nwtjWOgCZ4rObTekj4O9ZnI8Gtxfpzy1gFdyfw65Ag0EaAwnsAEQ&lt;br /&gt;
ANnD6PMPT0CU1RqbAQtVw7eJksV96+tl/xG8mtje631n2uBe9WzyLch0fgC99eID&lt;br /&gt;
ZDGXfJUEdODuI9/H8037PnJmmMtP2eP1c/ztrql6pxPj9c0jIRWjtwmNhyYNaaEn&lt;br /&gt;
i0JyLz5SiTbuftlHXaKhVTuLc/Qp44FH5XK6LVHphDR8Ck43Mhj7enfvGvmAUgLW&lt;br /&gt;
OLQMst84oOCywYX+nUmov2rCIhuc6RhX4OcOBZcEA2W/CSsoNXR4To9mn8Gg3/dH&lt;br /&gt;
ZKS/3sDwJQxjFvkqc89+aTPY85TBoUGBUzbQG+KFQgDyVt4kABK1iyUA1PKZOb4Q&lt;br /&gt;
MZJnR9g0UI/ctfrOpz4hhEFaQ+rEYwdm5MSXOQGfjrnGu3t85IQzmxUXovqmfsjn&lt;br /&gt;
oFPSPd/91/rJJKxci+rCX7CpQSObPrwHNgPNQ5zleDV7d9/u9UaGRFeOaaM+abd0&lt;br /&gt;
RhPh4nJWbDdNOWpj3pxJkG3tzmbazBogxTq0SDRP8wvBAD0JYESoPVGWQ6czlTnu&lt;br /&gt;
T0ov9QKMb21mfUQ6DmfxTFQbkr1g1r2uYfJ1TbP0AcAK+Q/IMtt8F7chulfAe7/0&lt;br /&gt;
9nk7HwqWHTkj8+YB9+Ro2hkUTpL57uEYdG/ukGODfTNhu02wxG02zlYFsTyd/H62&lt;br /&gt;
VIgT1Cpf5HBb73lzdiSVtl45C34Fwu8ZO6dBdmk2c1nFABEBAAGJAjYEGAEKACAW&lt;br /&gt;
IQQTMAkBNIqYURVnkWX7E3pjP9HrTAUCaAwnsAIbDAAKCRD7E3pjP9HrTNxGD/wN&lt;br /&gt;
syvVZxm4hyw4l8U6J3B/3rKAup+l7GQCXthNK+f3YPwWdWc8DOo3kBrP4ppR5Ry9&lt;br /&gt;
YKb700wBDAYwWfy+ZJPHMi0vVUf8kX2QQEj4sFZHj9suTFvfLdsLTAhNtRXVtZiu&lt;br /&gt;
xfr1T3R3T0XSSFFdhiBO+BYRnlgFRiiR9FCTDaxrLRfhAhOwC6LHOarHnRi5nQS8&lt;br /&gt;
2PaHIYbWN7c5CdpH9dsPUt3xi1sEf8E87HTZo30Of/FYtB4eTOdx2DMqKscbJvZS&lt;br /&gt;
1ugK+2v7DMaiBMZCfbZSVNjn8+VcTOPW5KzJFsVR7UmfvTZu6c3jrshHuPOSguT7&lt;br /&gt;
l63AcfrJZOJe+djndWws2u0FpyMu0AHoS2r3EtBd/OydjEKG2P7qFb3KX9I9Tv35&lt;br /&gt;
zQmpHc4e2TJTYKpXyfarzgKFuUfOmZpm8maUTqFdEBL6pgwi1zcQ704g7Kzo/YUr&lt;br /&gt;
dHTA5yQ2WBBsrVKAZIt6Llkt0jIkpSyjjs5CAPJ2jsg61nq4uYw7w3jpwe80nbyc&lt;br /&gt;
7GgvdkJlTS7TfcYk3vlDQOQBpXqDZagQVUT8jc6mGiY/jbSzjGNt/8qObKSywFLY&lt;br /&gt;
XnxLVnGhKyzsWhR5fEbUCqywwc/c14gbjNguNZbU7e0Krf9ggYoglfPIOOp8XDX1&lt;br /&gt;
XwH+EXkSGW96dHXIYidONcMxClnA04zZY52Sr/r6Lw==&lt;br /&gt;
=UsaD&lt;br /&gt;
-----END PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
EOF&lt;br /&gt;
gpg --homedir /var/ossec/.gnupg --import /var/tmp/gpg/tom.pubkey.asc&lt;br /&gt;
popd&lt;br /&gt;
&lt;br /&gt;
# add marcin&#039;s email (that matches an email on a UID of his key above) to the space-delimited &amp;quot;recipients&amp;quot; variable&lt;br /&gt;
vim /var/ossec/sent_encrypted_alarm.settings&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I sent him an email asking him to confirm that it&#039;s working&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Tom,&lt;br /&gt;
&lt;br /&gt;
Can you please confirm that you&#039;re now receiving alerts from wazuh?&lt;br /&gt;
&lt;br /&gt;
Wazuh is our HIDS (Host-Based Intrusion Detection System). It&#039;s a fork of the HIDS and FIM (File Integrity Monitor) OSSEC. Because it sometimes sends sensitive information (eg diffs of config files with passwords), it&#039;s important that we encrypt its email notifications end-to-end with PGP.&lt;br /&gt;
&lt;br /&gt;
And because someone who compromises the server could &amp;quot;clean up&amp;quot; after themselves, these (off-server) alerts are critical to post-compromise investigations.&lt;br /&gt;
&lt;br /&gt;
For more info, see:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Wazuh&lt;br /&gt;
 * https://en.wikipedia.org/wiki/OSSEC&lt;br /&gt;
 * https://documentation.wazuh.com/current/getting-started/index.html&lt;br /&gt;
&lt;br /&gt;
Out-of-the-box, Wazuh has a ton of features, but probably where we use it the most is its ingestion of apache&#039;s mod_security WAF and its tie-in to Wazuh&#039;s Active Response. If an IP is found doing something bad (eg multiple consecutive 403 responses, such as a brute-force attack on wordpress [or ssh]), then the IP will get temp blocked by the firewall for 10 minutes. If it does it again shortly after the ban is lifted, it&#039;ll be banned for 12 hours. If again, 1 day. Then 2 days. Then 4 days. And the max ban for 5x repeat offenses is 8 days&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/OpenSourceEcology/ansible/blob/master/hetzner3/roles/maltfield.wazuh/templates/ossec.conf.j2#L256-L271&lt;br /&gt;
&lt;br /&gt;
It also has rootkit detection, and lots of other useful alerts that &amp;quot;just work&amp;quot; out of the box.&lt;br /&gt;
&lt;br /&gt;
Please confirm that you&#039;re now receiving encrypted wazuh alerts.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried to add Tom to our ops google groups email list, but it said I wasn&#039;t allowed to add members outside of our google workspace&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
An error occurred&lt;br /&gt;
1 user is outside of your organization. Based on your group or organization settings, you can only add organization users to this group. Contact your group owner or domain administrator for help.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I checked our user&#039;s group. it appears that Tom doesn&#039;t have an account @opensourceecology.org in gsuite&lt;br /&gt;
# I found the setting to change that here https://admin.google.com/ac/managedsettings/864450622151/GROUPS_SHARING_SETTINGS_TAB&lt;br /&gt;
## https://support.google.com/a/thread/63692725/&lt;br /&gt;
## https://support.google.com/a/answer/167097&lt;br /&gt;
# I checked the box that said &amp;quot;Group owners can allow external members&amp;quot;&lt;br /&gt;
## curiously the subline said &amp;quot;Organization admins can always add external members&amp;quot; – but I&#039;m a damn org admin, and I couldn&#039;t add him :/&lt;br /&gt;
# I tried to add him again, but I got the same error&lt;br /&gt;
# this time I went to the group settings https://groups.google.com/a/opensourceecology.org/g/REDACTED/settings&lt;br /&gt;
# I found the &amp;quot;allow external members&amp;quot; and changed it from &amp;quot;off&amp;quot; to &amp;quot;on&amp;quot; and clicked &amp;quot;save changes&amp;quot;&lt;br /&gt;
## this wasn&#039;t possible before. So first I had to change the workspace-wide settings to allow me to change the groups-specific settings. now it&#039;s changed.&lt;br /&gt;
# this time it worked.&lt;br /&gt;
# I sent an email to our ops google group, asking Tom to reply if he saw it&lt;br /&gt;
# ...&lt;br /&gt;
# I checked-in on hetzner2 to make sure it rebooted this morning&lt;br /&gt;
# looks like the cron is set to reboot at 10:40 UTC every day, and – indeed – uptime says it&#039;s been online for a bit less than 13 hours. And its last boot time was today at 10:41:25&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# uptime&lt;br /&gt;
 23:30:25 up 12:49,  7 users,  load average: 1.02, 0.98, 0.74&lt;br /&gt;
[root@opensourceecology ~]# journalctl | head&lt;br /&gt;
-- Logs begin at Sat 2025-04-26 10:41:25 UTC, end at Sat 2025-04-26 23:30:26 UTC. --&lt;br /&gt;
Apr 26 10:41:25 localhost systemd-journal[129]: Runtime journal is using 8.0M (max allowed 3.1G, trying to leave 4.0G free of 31.2G available → current limit 3.1G).&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpuset&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpu&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpuacct&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Linux version 3.10.0-1160.119.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Tue Jun 4 14:43:51 UTC 2024&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: e820: BIOS-provided physical RAM map:&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009c7ff] usable&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: BIOS-e820: [mem 0x000000000009c800-0x000000000009ffff] reserved&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /etc/cron.d/reboot &lt;br /&gt;
# 2025-04-24: temp hack for unstable hetzner2 while we build-out hetzner3 to replace it&lt;br /&gt;
40 10 * * * root /sbin/reboot&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Sat Apr 26 23:31:32 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like we&#039;ll have ~2 minutes of downtime every day in the very early morning in the US. I can live with that.&lt;br /&gt;
# and grub clearly is fixed&lt;br /&gt;
# oh, also the RAID looks healthy&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I asked Tom for his GitHub account profile username, so I can grant him write access to our OSE ansible repo&lt;br /&gt;
# I updated Tom&#039;s new ssh key to his authorized_keys file on hetzner2&lt;br /&gt;
# I sent Tom an email asking to confirm his access to hetzner2&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 25, 2025=&lt;br /&gt;
# I woke up this morning and discovered the wiki was offline&lt;br /&gt;
# I tried to ssh into the server; it&#039;s not responding&lt;br /&gt;
# I figured I&#039;d log into the hetzner wui, but – uhh – the credentials are in keepass and live on the server&lt;br /&gt;
# I mitigated this by giving Marcin a copy of the keepass file on his veracrypt drive, but he since changed the password a month or two ago, and we don&#039;t have a new local copy&lt;br /&gt;
# I sent an email to Marcin asking him to login to hetzner wui and boot hetzner2. if it doesn&#039;t come-up, then I&#039;ll have to get the password from him so I can load it in the wui from a rescue disk&lt;br /&gt;
# oh, I did find the new hetzner password in my personal keepass&lt;br /&gt;
# I logged-in, and I found the server was listed as being on. But I can&#039;t ping it. I gave it an &amp;quot;automatic hardware reset&amp;quot; from the wui&lt;br /&gt;
# I&#039;ll give it a few minutes before trying the rescue system&lt;br /&gt;
# their rescue systems are much nicer for their cloud product than their dedicated server product&lt;br /&gt;
# it looks like I have two options&lt;br /&gt;
## rescue boot mode: where I&#039;m given ssh access&lt;br /&gt;
## vnc&lt;br /&gt;
# the problem with the rescue boot is that – if this is a grub issue – I wouldn&#039;t be able to &amp;quot;see&amp;quot; the error&lt;br /&gt;
# I enabled VNC and gave the server a reboot&lt;br /&gt;
# I was able to connect via vnc, but it was the damn installation wizard for almalinux. I quit the installation, and the vnc session died.&lt;br /&gt;
# damn, I guess vnc won&#039;t let me see the boot process, after all&lt;br /&gt;
# instead I tried the &amp;quot;rescue system&amp;quot;&lt;br /&gt;
# that didn&#039;t work; I can&#039;t access ssh on either of the IP addresses&lt;br /&gt;
# the docs say to activate the rescue system and then reboot it; that&#039;s what I did https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system/&lt;br /&gt;
# this time I fully shut down the server, and then I enabled the rescue system (while it&#039;s off)&lt;br /&gt;
# I went back to the Reset tab, and it&#039;s still off. So I booted it&lt;br /&gt;
# somehow I was able to login from my ose vm using my personal ssh key, but with user root&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~$ ssh -v root@138.201.84.223&lt;br /&gt;
OpenSSH_9.2p1 Debian-2+deb12u5, OpenSSL 3.0.15 3 Sep 2024&lt;br /&gt;
debug1: Reading configuration data /home/user/.ssh/config&lt;br /&gt;
debug1: Reading configuration data /etc/ssh/ssh_config&lt;br /&gt;
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files&lt;br /&gt;
debug1: /etc/ssh/ssh_config line 21: Applying options for *&lt;br /&gt;
debug1: Connecting to 138.201.84.223 [138.201.84.223] port 22.&lt;br /&gt;
debug1: Connection established.&lt;br /&gt;
...&lt;br /&gt;
Linux rescue 6.12.19 #1 SMP Fri Mar 14 05:34:52 UTC 2025 x86_64&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
  Welcome to the Hetzner Rescue System.&lt;br /&gt;
&lt;br /&gt;
  This Rescue System is based on Debian GNU/Linux 12 (bookworm) with a custom kernel.&lt;br /&gt;
  You can install software like you would in a normal system.&lt;br /&gt;
&lt;br /&gt;
  To install a new operating system from one of our prebuilt images, run &#039;installimage&#039; and follow the instructions.&lt;br /&gt;
&lt;br /&gt;
  Important note: Any data that was not written to the disks will be lost during a reboot.&lt;br /&gt;
&lt;br /&gt;
  For additional information, check the following resources:&lt;br /&gt;
	Rescue System:           https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system&lt;br /&gt;
	Installimage:            https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage&lt;br /&gt;
	Install custom software: https://docs.hetzner.com/robot/dedicated-server/operating-systems/installing-custom-images&lt;br /&gt;
	other articles:          https://docs.hetzner.com/robot&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Rescue System (via Legacy/CSM) up since 2025-04-25 17:24 +02:00&lt;br /&gt;
&lt;br /&gt;
Hardware data:&lt;br /&gt;
&lt;br /&gt;
   CPU1: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Cores 8)&lt;br /&gt;
   Memory:  64153 MB (Non-ECC)&lt;br /&gt;
   Disk /dev/sda: 250 GB (=&amp;gt; 232 GiB) &lt;br /&gt;
   Disk /dev/sdb: 512 GB (=&amp;gt; 476 GiB) &lt;br /&gt;
   Total capacity 709 GiB with 2 Disks&lt;br /&gt;
&lt;br /&gt;
Network data:&lt;br /&gt;
   eth0  LINK: yes&lt;br /&gt;
		 MAC:  90:1b:0e:94:07:c4&lt;br /&gt;
		 IP:   138.201.84.223&lt;br /&gt;
		 IPv6: 2a01:4f8:172:209e::2/64&lt;br /&gt;
		 Intel(R) PRO/1000 Network Driver&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I was able to mount the root drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 0/2 pages [0KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
root@rescue ~ # mount /dev/md2 /mnt&lt;br /&gt;
root@rescue ~ # ls /mnt&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ # ls /mnt/home&lt;br /&gt;
b2user  crupp  hart     lberezhny  marcin      stagingsync  wp&lt;br /&gt;
cmota   Flipo  jthomas  maltfield  not-apache  tgriffing&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what the point of this is; I can&#039;t fix it if I can&#039;t watch it boot and see what&#039;s breaking&lt;br /&gt;
# ok, at the bottom of the docs, hetnzer lists another option = xKVM Rescue System https://docs.hetzner.com/robot/dedicated-server/virtualization/vkvm/&lt;br /&gt;
# it specifically says that&#039;s for debugging boot issues&lt;br /&gt;
# last thing before I try that: I downloaded a local copy of the keepass files from hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~/tmp/hetzner2$ rsync -av --progress root@138.201.84.223:/mnt/etc/keepass ./etc-keepass-20250525&lt;br /&gt;
receiving incremental file list&lt;br /&gt;
created directory ./etc-keepass-20250525&lt;br /&gt;
keepass/&lt;br /&gt;
keepass/passwords.kdbx&lt;br /&gt;
		 46,142 100%   44.00MB/s    0:00:00 (xfr#1, to-chk=6/8)&lt;br /&gt;
keepass/passwords.kdbx.20170728.bak&lt;br /&gt;
		  4,590 100%    4.38MB/s    0:00:00 (xfr#2, to-chk=5/8)&lt;br /&gt;
keepass/passwords.kdbx.20170804.bak&lt;br /&gt;
		  4,590 100%    4.38MB/s    0:00:00 (xfr#3, to-chk=4/8)&lt;br /&gt;
keepass/passwords.kdbx.20190820.bak&lt;br /&gt;
		 33,726 100%  143.20kB/s    0:00:00 (xfr#4, to-chk=3/8)&lt;br /&gt;
keepass/passwords.kdbx.20190909.bak&lt;br /&gt;
		 34,238 100%   71.75kB/s    0:00:00 (xfr#5, to-chk=2/8)&lt;br /&gt;
keepass/passwords.kdbx.20250316.bak&lt;br /&gt;
		 45,406 100%   94.55kB/s    0:00:00 (xfr#6, to-chk=1/8)&lt;br /&gt;
keepass/passwords.kdbxs.20180525.bak&lt;br /&gt;
		 27,102 100%   56.31kB/s    0:00:00 (xfr#7, to-chk=0/8)&lt;br /&gt;
&lt;br /&gt;
sent 161 bytes  received 196,407 bytes  35,739.64 bytes/sec&lt;br /&gt;
total size is 195,794  speedup is 1.00&lt;br /&gt;
user@ose:~/tmp/hetzner2$ &lt;br /&gt;
&lt;br /&gt;
user@ose:~/tmp/hetzner2$ du -sh etc-keepass-20250525/keepass/*&lt;br /&gt;
48K	etc-keepass-20250525/keepass/passwords.kdbx&lt;br /&gt;
8.0K	etc-keepass-20250525/keepass/passwords.kdbx.20170728.bak&lt;br /&gt;
8.0K	etc-keepass-20250525/keepass/passwords.kdbx.20170804.bak&lt;br /&gt;
36K	etc-keepass-20250525/keepass/passwords.kdbx.20190820.bak&lt;br /&gt;
36K	etc-keepass-20250525/keepass/passwords.kdbx.20190909.bak&lt;br /&gt;
48K	etc-keepass-20250525/keepass/passwords.kdbx.20250316.bak&lt;br /&gt;
28K	etc-keepass-20250525/keepass/passwords.kdbxs.20180525.bak&lt;br /&gt;
user@ose:~/tmp/hetzner2$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so this time was the same as the rescue system, except I choose &amp;quot;xKVM&amp;quot; instead of &amp;quot;Linux&amp;quot; in the &amp;quot;Operationg System&amp;quot; dropdown&lt;br /&gt;
# strange, it gave me an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Public key authentication is not available for the selected operating system.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I unselected my ssh key, and chose &amp;quot;no key&amp;quot; instead&lt;br /&gt;
# it gave me a URL and a password. I booted the server, but the URL didn&#039;t load (&amp;quot;Unable to connect&amp;quot; error)&lt;br /&gt;
# ok, it took a few minutes and had a self-signed cert&lt;br /&gt;
# I bypassed the cert error, and entered the username and password into the basic auth popup. It failed! Could I really have been MITM&#039;d?&lt;br /&gt;
# I immediately shut down the server from the wui, and I tried again.&lt;br /&gt;
# this time I was able to login – both from ssh and in the wui.&lt;br /&gt;
# as soon as it opened, I saw the error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
No more network devices&lt;br /&gt;
&lt;br /&gt;
Booting from Hard Disk...&lt;br /&gt;
.&lt;br /&gt;
error: symbol &#039;grub_calloc&#039; not found.&lt;br /&gt;
Entering rescue mode...&lt;br /&gt;
grub rescue&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I wonder if this is grub or grub2. I didn&#039;t have a binary &amp;quot;grub-install&amp;quot; before. I assumed it was an error with the hetzner docs when I did &amp;quot;grub2-install&amp;quot; instead, which said it worked (there was a warning that the docs said were safe to ignore)&lt;br /&gt;
# curoiusly, the opposite is true for the ssh session in vkvm: I have grub-install but not grub2-install&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@vKVM-rescue ~ # which grub-install&lt;br /&gt;
/usr/sbin/grub-install&lt;br /&gt;
root@vKVM-rescue ~ # &lt;br /&gt;
root@vKVM-rescue ~ # which grub2-install&lt;br /&gt;
root@vKVM-rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the docs in question https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
# I don&#039;t want to fuck with the grub without first taking a backup of these disks. But, uh, it looks like I can&#039;t access the RAID from inside this vkvm setup&lt;br /&gt;
# yeah, that&#039;s one of the limitations listed for VKVM https://docs.hetzner.com/robot/dedicated-server/virtualization/vkvm/#raid-controllers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Configured units are passed through as SCSI devices to the VM. However it is not possible to access the controller. Please use the regular Hetzner Rescue System for this purpose.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I shutdown VKVM and booted it into the regular rescue mode&lt;br /&gt;
# it took a few minutes to get back into the old rescue system, but here I can use the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS&lt;br /&gt;
loop0     7:0    0   3.4G  1 loop  &lt;br /&gt;
sda       8:0    0 476.9G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 &lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 &lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 &lt;br /&gt;
sdb       8:16   0 232.9G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 &lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 &lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 &lt;br /&gt;
root@rescue ~ # mkdir /mnt/md1&lt;br /&gt;
root@rescue ~ # mkdir /mnt/md2&lt;br /&gt;
root@rescue ~ # mount /dev/md1 /mnt/md1&lt;br /&gt;
root@rescue ~ # mount /dev/md2 /mnt/md2&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created a dir for these backups&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # mkdir /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # chown root:root /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # chmod 0700 /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first I made a backup from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # rsync -av --progress /mnt/md1 /mnt/md2/var/tmp/20250425-grub-fail/md1.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
...&lt;br /&gt;
md1/grub2/locale/zh_TW.mo&lt;br /&gt;
		 30,882 100%   31.38kB/s    0:00:00 (xfr#345, to-chk=0/355)&lt;br /&gt;
md1/lost+found/&lt;br /&gt;
&lt;br /&gt;
sent 399,450,301 bytes  received 6,709 bytes  159,782,804.00 bytes/sec&lt;br /&gt;
total size is 399,330,989  speedup is 1.00&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# then I figured I&#039;d make a backup of the two disk partitions directly, but I couldn&#039;t even mount it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # umount /mnt/md1&lt;br /&gt;
root@rescue ~ # mkdir /mnt/sda2&lt;br /&gt;
root@rescue ~ # mkdir /mnt/sdb2&lt;br /&gt;
root@rescue ~ # mount /dev/sda2 /mnt/sda2&lt;br /&gt;
mount: /mnt/sda2: unknown filesystem type &#039;linux_raid_member&#039;.&lt;br /&gt;
	   dmesg(1) may have more information after failed mount system call.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried this command (from the docs), which I skipped before because it said that the next command (grub-install) was enough; sure enough, it didn&#039;t work https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-mkdevicemap -n&lt;br /&gt;
grub-mkdevicemap: error: cannot open /boot/grub/device.map.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I investigated this before, and I thought I decided we&#039;re using grub2, not grub1&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # mount /dev/md1 /mnt/md1&lt;br /&gt;
root@rescue ~ # ls /mnt/md1/&lt;br /&gt;
config-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi&lt;br /&gt;
grub&lt;br /&gt;
grub2&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-327.18.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-514.26.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64kdump.img&lt;br /&gt;
initrd-plymouth.img&lt;br /&gt;
lost+found&lt;br /&gt;
symvers-3.10.0-1127.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-1160.119.1.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-327.18.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-514.26.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, shit, even the grub-install command is v2 https://askubuntu.com/questions/107486/how-to-know-the-version-of-grub&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-install --version&lt;br /&gt;
grub-install (GRUB) 2.06-13+deb12u1&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, this indicates we&#039;re not using lilo https://askubuntu.com/questions/24459/how-do-i-find-out-which-boot-loader-i-have&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2/etc/ | grep lilo&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can dd straight from the disk to read the MBR. And, yeah, it appears we are using grub via MBR .. and this info is stored on the disks, not the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # dd if=/dev/md1 bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # dd if=/dev/sda bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
214fb5736d1e5ad63e515dc2fffe44bd928cd8dab2c019dc11fb9fcaef5ea90dbf51f1ac507ab1cfbbe74ff&lt;br /&gt;
ZRr=&lt;br /&gt;
`|f	&lt;br /&gt;
\|f1&lt;br /&gt;
GRUB &lt;br /&gt;
Geom&lt;br /&gt;
Hard Disk&lt;br /&gt;
Read&lt;br /&gt;
 Error&lt;br /&gt;
DA/jjF&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # dd if=/dev/sdb bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
ZRr=&lt;br /&gt;
`|f	&lt;br /&gt;
\|f1&lt;br /&gt;
GRUB &lt;br /&gt;
Geom&lt;br /&gt;
Hard Disk&lt;br /&gt;
Read&lt;br /&gt;
 Error&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# idk what to do; I tried the grub-install again, but it gives me this error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-install /dev/sda&lt;br /&gt;
grub-install: error: /usr/lib/grub/i386-pc/modinfo.sh doesn&#039;t exist. Please specify --target or --directory.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # grub-install /dev/sdb&lt;br /&gt;
grub-install: error: /usr/lib/grub/i386-pc/modinfo.sh doesn&#039;t exist. Please specify --target or --directory.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried creating a chroot of our real raid disks first&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ # umount /mnt/md1&lt;br /&gt;
root@rescue ~ # chroot-prepare /mnt/md2&lt;br /&gt;
root@rescue ~ # chroot /mnt/md2&lt;br /&gt;
root@rescue / # ls /boot&lt;br /&gt;
root@rescue / # mount /dev/md1 /boot&lt;br /&gt;
root@rescue / # ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi&lt;br /&gt;
grub&lt;br /&gt;
grub2&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-327.18.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-514.26.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64kdump.img&lt;br /&gt;
initrd-plymouth.img&lt;br /&gt;
lost+found&lt;br /&gt;
symvers-3.10.0-1127.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-1160.119.1.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-327.18.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-514.26.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
root@rescue / # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I then tried the grub install again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue / # grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
root@rescue / #&lt;br /&gt;
&lt;br /&gt;
root@rescue / # grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
root@rescue / # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I exited the chroot and shutdown the rescue system&lt;br /&gt;
# I activated the VKVM resuce system, and booted it again&lt;br /&gt;
# when I connected to the KVM wui, I was shown a password prompt. So I think booting works!&lt;br /&gt;
# I rebooted it from the ssh&lt;br /&gt;
# and now I can ssh into the real system&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~$ autossh opensourceecology.org&lt;br /&gt;
Last login: Thu Apr 24 23:12:44 2025 from 146.70.199.15&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and now the wiki loads too&lt;br /&gt;
# I did another reboot test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ sudo su -&lt;br /&gt;
[sudo] password for maltfield: &lt;br /&gt;
Last login: Thu Apr 24 16:25:15 UTC 2025 on pts/0&lt;br /&gt;
[root@opensourceecology ~]# reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
Last login: Fri Apr 25 16:29:21 2025 from 185.204.1.184&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# idk, my takeaway is that either one or some of these assumptions are correct&lt;br /&gt;
## grub-install needs to be run *after* the RAID sync is finished&lt;br /&gt;
## grub-install needs to be run on *both* the new *and* the old disk&lt;br /&gt;
## grub-install needs to be run inside a chroot on the rescue system&lt;br /&gt;
# anyway, we&#039;re stable again&lt;br /&gt;
# I got an email from Marcin saying Tom could help with the migrations. I sent him some wiki articles to get caught-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Tom,&lt;br /&gt;
&lt;br /&gt;
I&#039;ll try to get you ssh access on hetzner2 soon. In the meantime, please read the following articles:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Hetzner2&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
&lt;br /&gt;
I&#039;ve started preparing draft &amp;quot;change tickets&amp;quot; for migrating each of the websites from hetzner2 to hetzner3. Note that some of these are not fully tested, so you&#039;ll want to execute them manually and make corrections as-needed.&lt;br /&gt;
&lt;br /&gt;
Please also read-through these:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_store_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_microfactory_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_fef&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_oswh&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_phplist_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_wiki_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
(There&#039;s also one CHG for the forum that I think needs to be made)&lt;br /&gt;
&lt;br /&gt;
The next item TODO is to finish the migration plan for these websites:&lt;br /&gt;
&lt;br /&gt;
 1. www.opensourceecology.org (osemain)&lt;br /&gt;
 2. www.openbuildinginstiture.org (obi)&lt;br /&gt;
&lt;br /&gt;
We decided that there would be 2 simultaneous versions of obi:&lt;br /&gt;
&lt;br /&gt;
1. A static site scraped with curl on hetzner3&lt;br /&gt;
2. The (broken) dynamic wordpress site on hetzner3&lt;br /&gt;
&lt;br /&gt;
And we decided that there would be 3 simultaneous versions of osemain:&lt;br /&gt;
&lt;br /&gt;
1. The live/current site on hetzner2&lt;br /&gt;
2. A static site scraped with curl on hetzner3&lt;br /&gt;
3. The (broken) dynamic wordpress site on hetzner3&lt;br /&gt;
&lt;br /&gt;
To have multiple sites with the same domain on the same server, we bought a second IPv4 address (FeF isn&#039;t setup with IPv6). This week I just finished updating the hetzer3 server to persist this new IPv4 address.&lt;br /&gt;
&lt;br /&gt;
The next item for you would be to update our ansible to push out new vhosts (in nginx, varnish, and apache) for the static sites that are bound to the second IPv4 address using the same hostname.&lt;br /&gt;
&lt;br /&gt;
Please read-through the ansible playbook and roles (most importantly for nginx, varnish, and apache) to understand how they&#039;re provisioned&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/OpenSourceEcology/ansible&lt;br /&gt;
&lt;br /&gt;
Since you have access to hetzner3, you can also poke around (read-only please) the configs for these three web services to understand how ansible provisions them.&lt;br /&gt;
&lt;br /&gt;
Once you&#039;ve updated and pushed-out the new vhosts with ansible, you&#039;ll need to update the migration plan&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_obi_to_hetzner3&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
And then you&#039;ll want to go-through each migration plan to create a temp &amp;quot;snapshot&amp;quot; of all the sites on hetzner3, where Marcin &amp;amp; Catarina can do a thorough verification of each site (by updating /etc/hosts) before we do the *real* migration -- which is nearly the same as the &amp;quot;snapshot&amp;quot; except we actually migrate DNS.&lt;br /&gt;
&lt;br /&gt;
Please let me know when you&#039;ve finished reading the above articles.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/24/25 22:16, REDACTED@tutanota.com wrote:&lt;br /&gt;
&amp;gt; Michael;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I need to reset my ssh key on hetzner2. Can you use the same as on 3 or best to generate a new one?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I spoke with Marcin and I think I can help with the admin, as I have time available.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Can you give a run-down of its status and what needs to be done for completing the migration to hetzner3?&lt;br /&gt;
&amp;gt; -- &lt;br /&gt;
&amp;gt; Tom Griffing&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 24, 2025=&lt;br /&gt;
# it&#039;s 05:00; I tried to login to the wiki, but I got an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, under that it says I&#039;m already logged-in?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
You are already logged in as Maltfield. Use the form below to log in as another user. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, let&#039;s start the CHG to replace the failing disk on hetzner 2 https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
# I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried to remove the first partition from the RAID, but it said I can&#039;t?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# apparently the docs say that if the RAID is healthy, you have to force it with &#039;--fail&#039; https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
# crap, I realized I have an issue in my CHG (we need two sysadmins for peer review *sigh*)&lt;br /&gt;
## I listed this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## but it should be this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, it looks like I first need to execute this, to force the RAID into a failure state&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I was able to remove it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# by 10:32 UTC, I submitted the request to hetzner to replace /dev/sdb = &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# it says they should do it within 2-4 hours&lt;br /&gt;
# meanwhile, I updated https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# at 08:00 my time, I checked and saw that we had an email come from hetzner at 06:36 (my time)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, crap. I tried to load the wiki CHG article, but there&#039;s an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry! This site is experiencing technical difficulties.&lt;br /&gt;
&lt;br /&gt;
Try waiting a few minutes and reloading.&lt;br /&gt;
&lt;br /&gt;
(Cannot access the database)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the server wasn&#039;t shutdown, and my screen session is still intact, but dmesg is being flooded with RAID and io errors&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
[11136.011313] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.011372] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11136.319267] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.319322] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.827642] EXT4-fs error: 5 callbacks suppressed&lt;br /&gt;
[11138.827693] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
[11138.827793] EXT4-fs: 5 callbacks suppressed&lt;br /&gt;
[11138.827841] EXT4-fs (md2): previous I/O error to superblock detected&lt;br /&gt;
[11138.835255] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835311] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835367] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11138.835472] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well anyway, I&#039;ll see if I can at least restart the RAID sync and install grub on the new disk&lt;br /&gt;
# son of a bitch, they removed the wrong drive!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 13:05:32 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT&lt;br /&gt;
sdb      8:16   0   477G  0 disk &lt;br /&gt;
sdc      8:32   0 232.9G  0 disk &lt;br /&gt;
├─sdc1   8:33   0    32G  0 part &lt;br /&gt;
├─sdc2   8:34   0   512M  0 part &lt;br /&gt;
└─sdc3   8:35   0 200.4G  0 part &lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
device node not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it shows a new drive (sdc) and and old drive (sdb)&lt;br /&gt;
# ugh, so now we have nothing in the raid?&lt;br /&gt;
# here&#039;s the new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdc | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# christ, so this new disk is half the size of our actual disk? what did they do?!?&lt;br /&gt;
# and now we have a prod server online with no redundancy. I can&#039;t tell them to put back-in the *correct* disk, or we&#039;ll have data loss&lt;br /&gt;
# I&#039;m going to stop all the web services before this disaster gets any worse&lt;br /&gt;
# great; io errors. this is a damn disaster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
Failed to stop apache2.service: Unit apache2.service not loaded.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and made partition backups, anyway&lt;br /&gt;
# wait, actually, it said that /dev/sdc = Crucial_CT250MX200SSD1_154410FA336C. That&#039;s our old /dev/sda&lt;br /&gt;
# so they *did* remove the right drive, but the re-insertion of the wrong drive pushed /dev/sda to /dev/sdc. That kinda breaks our ability to map the RAID, but let&#039;s at-least partition this new drive&lt;br /&gt;
# but this new drive isn&#039;t the right size. it&#039;s 512G while our old disk was 250G. I guess it&#039;s better to have too-big of a disk than too-small of a disk, but we won&#039;t be able to use that extra disk space. I&#039;m going to assume that they just didn&#039;t have 250G disks in-stock anymore.&lt;br /&gt;
# anyway, I tried to backup the partitions, but that wouldn&#039;t work since we&#039;re read-only&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
mkdir: cannot create directory ‘/var/tmp/chg.20250424_132010’: Read-only file system&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
chown: cannot access ‘/var/tmp/chg.20250424_132010’: No such file or directory&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what to do besides giving it a reboot, but that scares me&lt;br /&gt;
# I&#039;d like to take a backup, but I can&#039;t if I get read-only errors :(&lt;br /&gt;
# well, I guess that&#039;s why we made a backup before this. I don&#039;t think I have any option other than to reboot. and pray that grub is intact to bring it back.&lt;br /&gt;
# I gave it a reboot. If it doesn&#039;t come back, I&#039;ll try to boot to the rescue CD from within the hetzner wui&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date &amp;amp;&amp;amp; reboot&lt;br /&gt;
Thu Apr 24 13:24:18 UTC 2025&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&lt;br /&gt;
Failed to start reboot.target: Unit is not loaded properly: Input/output error.&lt;br /&gt;
See system logs and &#039;systemctl status reboot.target&#039; for details.&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# wtf, it can&#039;t even reboot it&#039;s so broken.&lt;br /&gt;
# I triggered a rest on the hetzner wui&lt;br /&gt;
# the server came back, and I immediately shutdown all services again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and triggered backups&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /etc/cron.d/backup_to_backblaze &lt;br /&gt;
20 07 * * * root time /bin/nice /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
20 04 03 * * root time /bin/nice /root/backups/backupReport.sh&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# time /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, sdc is gone. we have sda and sdb again, and sda is our original sda – as we wanted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I made a backup of the partitions; it&#039;s not surprising the sdb file is empty&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# pushd $chg_dir&lt;br /&gt;
/var/tmp/chg.20250424_133230 ~&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# du -sh ${chg_dir}/*&lt;br /&gt;
4.0K    /var/tmp/chg.20250424_133230/sda_parttable_mbr.bak&lt;br /&gt;
0       /var/tmp/chg.20250424_133230/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I copied the partition from sda to sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
Checking that no-one is using this disk right now ...&lt;br /&gt;
OK&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 62260 cylinders, 255 heads, 63 sectors/track&lt;br /&gt;
sfdisk:  /dev/sdb: unrecognized partition table type&lt;br /&gt;
&lt;br /&gt;
Old situation:&lt;br /&gt;
sfdisk: No partitions found&lt;br /&gt;
&lt;br /&gt;
New situation:&lt;br /&gt;
Units: sectors of 512 bytes, counting from 0&lt;br /&gt;
&lt;br /&gt;
   Device Boot    Start       End   #sectors  Id  System&lt;br /&gt;
/dev/sdb1          2048  67110912   67108865  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2      67112960  68161536    1048577  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3      68163584 488395120  420231537  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb4             0         -          0   0  Empty&lt;br /&gt;
Warning: partition 1 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not end at a cylinder boundary&lt;br /&gt;
Warning: no primary partition is marked bootable (active)&lt;br /&gt;
This does not matter for LILO, but the DOS MBR will not boot this disk.&lt;br /&gt;
Successfully wrote the new partition table&lt;br /&gt;
&lt;br /&gt;
Re-reading the partition table ...&lt;br /&gt;
&lt;br /&gt;
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)&lt;br /&gt;
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1&lt;br /&gt;
(See fdisk(8).)&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good, other than the complaint about not being able to boot from this disk; I&#039;ll check later what is LILO and if this will matter for raid grub&lt;br /&gt;
# I reloaded the partition table for this disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# blockdev --rereadpt /dev/sdb&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I added the new disk to the RAID, and it shows that it&#039;s starting to sync now. excellent&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm: added /dev/sdb1&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm: added /dev/sdb2&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
mdadm: added /dev/sdb3&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [&amp;gt;....................]  recovery =  0.0% (19712/33521664) finish=481.1min speed=1159K/sec&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, it looks like it&#039;s not syncing each partition of the RAID at the same time. it&#039;s doing md0 now and then it&#039;ll do the others after, I guess&lt;br /&gt;
# md0 is partition 1 (sda1/sdb1). That&#039;s *sigh* swap. It&#039;s 32GB.&lt;br /&gt;
# I kinda wish we&#039;d sync&#039;d /boot first. I don&#039;t think I can install grub until that&#039;s sync&#039;d. maybe?&lt;br /&gt;
# it says it&#039;s moving about 1024K/s. That&#039;s 1 MB per sec. 32G*1024 = 32,768 MB. That&#039;s 32,768 seconds / 60 = 546 minutes / 60 = 9 hours. Just for swap!&lt;br /&gt;
# assuming we have the same speed for the rest of the disk, that&#039;s 250 G * 1024 = 256,000 MB / 1 MB/s = 256,000 seconds. 256,000 seconds / 60 = 4,266.666666667 minutes / 60 = 4,266.666666667 = 71.11 hours. I guess we just have to accept the risk and hope that old /dev/sda with all our data doesn&#039;t fail within then next 3 days.&lt;br /&gt;
# I tried to go ahead and install grub on the new disk, but i got a command not found error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub-install /dev/sdb&lt;br /&gt;
-bash: grub-install: command not found&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub&lt;br /&gt;
grub2-bios-setup           grub2-glue-efi             grub2-mkconfig             grub2-mkpasswd-pbkdf2      grub2-probe                grub2-set-default&lt;br /&gt;
grub2-editenv              grub2-install              grub2-mkfont               grub2-mkrelpath            grub2-reboot               grub2-setpassword&lt;br /&gt;
grub2-file                 grub2-kbdcomp              grub2-mkimage              grub2-mkrescue             grub2-render-label         grub2-sparc64-setup&lt;br /&gt;
grub2-fstest               grub2-macbless             grub2-mklayout             grub2-mkstandalone         grub2-rpm-sort             grub2-syslinux2cfg&lt;br /&gt;
grub2-get-kernel-settings  grub2-menulst2cfg          grub2-mknetdir             grub2-ofpathname           grub2-script-check         grubby&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it should be &#039;grub2-install&#039; I tried that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that&#039;s two warnings but no errors; I&#039;ll take it.&lt;br /&gt;
# we&#039;re up to 12.4% on the RAID sync of swap. It&#039;s now going &amp;gt;50x faster than it was before; good news&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [==&amp;gt;..................]  recovery = 12.4% (4168832/33521664) finish=8.2min speed=59264K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# calculations at that speed would be 250*1024/58 = 4,413.793103448 seconds / 60 = 73 minutes. Oh, that&#039;s just over an hour.&lt;br /&gt;
# and now we&#039;re at 42.7%&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [========&amp;gt;............]  recovery = 42.7% (14334208/33521664) finish=6.6min speed=47845K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups are still running; I&#039;ll let them finish before starting-up the webservers again&lt;br /&gt;
# I wrote a status email to Marcin&lt;br /&gt;
# the backups still aren&#039;t finished&lt;br /&gt;
# I checked on the raid replication, and it shows md0 (swap) and md1 (boot) are both done. Horray! Now we just need to finish root (/), which is 9.8% done and going at 60 MB/s. Great!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:05:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [=&amp;gt;...................]  recovery =  9.8% (20767872/209984640) finish=50.5min speed=62429K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave the grub install a double-tap now that it&#039;s synced with the first disk; the output was the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the output of lsblk looks much nicer now, too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0 232.9G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups say they&#039;re 9% uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/backups/backup.log&lt;br /&gt;
...&lt;br /&gt;
2025/04/24 14:13:48 INFO  :&lt;br /&gt;
Transferred:        2.210G / 20.472 GBytes, 11%, 2.904 MBytes/s, ETA 1h47m20s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:      13m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg: 10% /20.472G, 2.997M/s, 1h43m59s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I decided to just kill the backup script and manually upload it without the bwlimit, so it&#039;ll go-out faster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# /bin/sudo -u b2user /bin/rclone -v copy /home/b2user/sync/daily_hetzner2_20250424_133017.tar.gpg b2:ose-server-backups&lt;br /&gt;
2025/04/24 14:15:20 INFO  :&lt;br /&gt;
Transferred:      116.500M / 20.472 GBytes, 1%, 1.958 MBytes/s, ETA 2h57m25s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:       1m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg:  0% /20.472G, 5.065M/s, 1h8m35s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# meanwhile we&#039;re at 24% on the RAID sync&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [====&amp;gt;................]  recovery = 23.9% (50200448/209984640) finish=101.1min speed=26325K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, important to note: our new disk doesn&#039;t say that it&#039;s failing :D&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# while the old disk says it&#039;s reached 100% of its lifecycle, the new disk says it&#039;s at – uhh – 96% of it&#039;s life? That doesn&#039;t sound very good :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78516&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       50&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3445&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       47&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   060   046   000    Old_age   Always       -       40 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       407132499909&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12839097351&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26313144762&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52083&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       33&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1449&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       20&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   061   049   000    Old_age   Always       -       39 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       600236629947&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18860233219&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11828985935&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Shame. I was hoping for at least something &amp;lt;50%. Well, I wonder how long that remaining 4% will last us :/&lt;br /&gt;
# ok, backups just finished; let&#039;s start the web services&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl start mariadb&lt;br /&gt;
[root@opensourceecology ~]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the wiki CHG with a status https://wiki.opensourceecology.org/wiki/Category:CHGs&lt;br /&gt;
# And I sent an email to Marcin recommending that he replace /dev/sda with an actual new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&lt;br /&gt;
Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&lt;br /&gt;
I was a bit disappointed to learn that hetzner replaced a disk with 0% &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for choosing the free disk replacement..&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&lt;br /&gt;
Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on replacing that one next week too, but I would recommend that you pay for a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&lt;br /&gt;
Do you authorize me selecting €41.18 for the replacement of /dev/sda on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# from the output above, our old drive said it had &amp;quot;Power_On_Hours&amp;quot; of 78516/24/365 = 8.96 years&lt;br /&gt;
# and our new drive says Power_On_Hours = 52083/24/365 = 5.95 years. Well that&#039;s better, I guess.&lt;br /&gt;
# oh wow, the power cycle count is crazy; our disk we only rebooted 50 times and the new one was only 33 times.&lt;br /&gt;
# also the SMART data for both of these drives has different keys (not just values). apparently it&#039;s very vendor-specific, so some of these comparisons are apples-to-oranges&lt;br /&gt;
# right, we&#039;re at 69.7% replication on root. I&#039;m going to go make breakfast and check-in again after&lt;br /&gt;
# ...&lt;br /&gt;
# over lunch, I realized that Marcin&#039;s last email was possibly hyperbolic panic&lt;br /&gt;
# he&#039;s worried that he just kicked-off a marketing campaign (for the apprenticeship), which now links to information on a broken website – where potential applicants can&#039;t read the info&lt;br /&gt;
# but I think the content actually *is* accessible, just not to Marcin&lt;br /&gt;
# when you&#039;re logged-into the wiki, the cookies bypass the cache. So, regretablly, when hetnzer2&#039;s backend is offline, Marcin sees an error&lt;br /&gt;
# but I&#039;d bet that the frontpage of all the websites and the recently-published apprenticeship info page that he&#039;s published &amp;amp; promoted are still online when he sees that error – for users who are *not* logged-into the site&lt;br /&gt;
# but if the backend site is broken for &amp;gt;24 hours, then the cache will cache the errors (not the content)&lt;br /&gt;
# as a short-term hack, I recommended that we setup a daily reboot of hetzner2 at 10:40 (a good buffer after the backups finish uploading)&lt;br /&gt;
# I asked Marcin if he&#039;d like me to setup a daily reboot at 10:40&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&lt;br /&gt;
Of course I agree it&#039;s not good, and we should migrate away from hetzner2 asap. And I do wish I had more bandwidth to finish the migration faster for you.&lt;br /&gt;
&lt;br /&gt;
But you have a varnish cache that caches pages for 24 hours. Even if your backend webserver and database are down, popular pages (like the frontpage of your wiki or a recent article that you&#039;ve recently promoted) should still load for users.&lt;br /&gt;
&lt;br /&gt;
The big issue isn&#039;t marketing and read-only content. The big issue is editing. That&#039;s what is breaking.&lt;br /&gt;
&lt;br /&gt;
When you&#039;re logged into the wiki, it bypasses the varnish cache. So, even if the wiki appears down to you, the contents of (most) articles viewed in the past 24 hours will be still visible to potential apprenticeship applicants.&lt;br /&gt;
&lt;br /&gt;
The next time you see the websites are down, try loading it from another device where you&#039;re not logged-in. You&#039;ll probably see that the apprenticeship info is still accessible, even though the backend for the site is down.&lt;br /&gt;
&lt;br /&gt;
As a short-term hack, I recommend setting-up a daily reboot of the server. Backups typically finish before 10:10 UTC. I recommend we add a cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&lt;br /&gt;
The server seems to function for some time after a fresh reboot, and it caches pages for 24 hours. So the first time someone loads a page in the wiki after that reboot, it&#039;ll be cached for the entire time that the server is online until its next reboot. I think this will ensure higher availability of your read-only content (eg information about the apprenticeship).&lt;br /&gt;
&lt;br /&gt;
Would you like me to setup a daily reboot at 10:40 UTC on hetzner2? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# I checked-in on the RAID replication status; it&#039;s finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [===================&amp;gt;.]  recovery = 96.5% (202794752/209984640) finish=2.5min speed=46324K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:20:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	 	 523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like I started it just after 13:32 and it finished just before 15:20. So it took just under 2 hours. Great!&lt;br /&gt;
# I updated the article with status updates, marking the CHG as completed successfully https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb#2025-04-24_16:18_UTC&lt;br /&gt;
# And I sent an email to Marcin &amp;amp; Catarana to let them know it was successful, and asked again about buying a new drive for replacing /dev/sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Update: your new (used) disk is now fully synced with the old (failing) disk.&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
According to SMART data, you now have one failing disk and one not-failing disk.&lt;br /&gt;
&lt;br /&gt;
Your hetzner2 RAID is now healthy, and you have redundancy spread across two mirrored disks again.&lt;br /&gt;
&lt;br /&gt;
Next week I&#039;d like to replace the other failing disk. Please let me know if you approve the purchase of a new disk for its replacement. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin got back to me, approving the purchase of the new disk; I updated the ticket https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# Note that the price is listed as &amp;quot;at cost&amp;quot; and it says&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# 1,000 hours is fine. That&#039;s compared to the 78,516 hours of /dev/sda and 52,083 hours of our &amp;quot;new&amp;quot; /dev/sdb&lt;br /&gt;
# but it&#039;s a bit concerning that it says it might not be in-stock. I&#039;m going to message them and ask if they can set one aside for us for next week&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hi Support,&lt;br /&gt;
&lt;br /&gt;
Can you set-aside a replacement disk for this server?&lt;br /&gt;
&lt;br /&gt;
Our disks&#039; SMART logs indicated that both disks should be replaced. Today we replaced one of the two disks, but the disk that you replaced it with has 4% of its life left, according to SMART data (it has 52,083 hours of operation).&lt;br /&gt;
&lt;br /&gt;
Next week we would like to replace the other disk, and this time we&#039;d like your &amp;quot;at cost&amp;quot; option, to get a disk with &amp;lt;1,000 hours of operation.&lt;br /&gt;
&lt;br /&gt;
But I was a bit concerned when I read this next to the WUI option for &amp;quot;at cost&amp;quot; on your website&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&lt;br /&gt;
Specifically what worries me is the &amp;quot;may not be in stock&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
Can you please tell us if you have stock now? And if you do, can you please reserve one disk for us for next week?&lt;br /&gt;
&lt;br /&gt;
We don&#039;t want to remove a disk from our RAID and plan for downtime, only to discover that you don&#039;t have a disk available for us..&lt;br /&gt;
&lt;br /&gt;
Please let us know if you can reserve 1 disk for us for next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I asked Marcin if Wed next week at 11:00 UTC is ok for replacing hetzner2&#039;s sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
   * 13:00 in Germany (where the server lives)&lt;br /&gt;
   * 06:00 here in Ecuador, and&lt;br /&gt;
   * 06:00 at FeF&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime,&lt;br /&gt;
please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
agreeable to you, and if you have any questions.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin returned the email confirming the time&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin got back to me and told me to setup the daily reboot cron on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, please set up reboot. That is decent for now&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 11:08 AM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;  &amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt;  &amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Of course I agree it&#039;s not good, and we should migrate away from&lt;br /&gt;
&amp;gt; hetzner2 asap. And I do wish I had more bandwidth to finish the&lt;br /&gt;
&amp;gt; migration faster for you.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; But you have a varnish cache that caches pages for 24 hours. Even if&lt;br /&gt;
&amp;gt; your backend webserver and database are down, popular pages (like the&lt;br /&gt;
&amp;gt; frontpage of your wiki or a recent article that you&#039;ve recently&lt;br /&gt;
&amp;gt; promoted) should still load for users.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The big issue isn&#039;t marketing and read-only content. The big issue is&lt;br /&gt;
&amp;gt; editing. That&#039;s what is breaking.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When you&#039;re logged into the wiki, it bypasses the varnish cache. So,&lt;br /&gt;
&amp;gt; even if the wiki appears down to you, the contents of (most) articles&lt;br /&gt;
&amp;gt; viewed in the past 24 hours will be still visible to potential&lt;br /&gt;
&amp;gt; apprenticeship applicants.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The next time you see the websites are down, try loading it from another&lt;br /&gt;
&amp;gt; device where you&#039;re not logged-in. You&#039;ll probably see that the&lt;br /&gt;
&amp;gt; apprenticeship info is still accessible, even though the backend for the&lt;br /&gt;
&amp;gt; site is down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; As a short-term hack, I recommend setting-up a daily reboot of the&lt;br /&gt;
&amp;gt; server. Backups typically finish before 10:10 UTC. I recommend we add a&lt;br /&gt;
&amp;gt; cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The server seems to function for some time after a fresh reboot, and it&lt;br /&gt;
&amp;gt; caches pages for 24 hours. So the first time someone loads a page in the&lt;br /&gt;
&amp;gt; wiki after that reboot, it&#039;ll be cached for the entire time that the&lt;br /&gt;
&amp;gt; server is online until its next reboot. I think this will ensure higher&lt;br /&gt;
&amp;gt; availability of your read-only content (eg information about the&lt;br /&gt;
&amp;gt; apprenticeship).&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you like me to setup a daily reboot at 10:40 UTC on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we don&#039;t have ansible for hetzner2; I did this manually&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology cron.d]# pwd&lt;br /&gt;
/etc/cron.d&lt;br /&gt;
[root@opensourceecology cron.d]# ls -lah&lt;br /&gt;
total 52K&lt;br /&gt;
drwxr-xr-x.   2 root root 4.0K Apr 24 17:56 .&lt;br /&gt;
drwxr-xr-x. 105 root root  12K Apr 18 21:52 ..&lt;br /&gt;
-rw-r--r--    1 root root  128 May 16  2023 0hourly&lt;br /&gt;
-rw-r--r--    1 root root 1.3K Apr  9  2019 awstats_generate_static_files&lt;br /&gt;
-rw-r--r--    1 root root  151 Apr 24 17:52 backup_to_backblaze&lt;br /&gt;
-rw-r--r--    1 root root   78 May 31  2024 cacti&lt;br /&gt;
-rw-r--r--    1 root root  125 Dec 11 00:16 letsencrypt&lt;br /&gt;
-rw-r--r--    1 root root  506 Mar 18  2019 phplist&lt;br /&gt;
-rw-r--r--    1 root root  108 Jan  7  2022 raid-check&lt;br /&gt;
-rw-r--r--    1 root root  118 Apr 24 17:56 reboot&lt;br /&gt;
-rw-------    1 root root  235 Dec 15  2022 sysstat&lt;br /&gt;
[root@opensourceecology cron.d]# cat reboot &lt;br /&gt;
# 2025-04-24: temp hack for unstable hetzner2 while we build-out hetzner3 to replace it&lt;br /&gt;
40 10 * * * root /sbin/reboot&lt;br /&gt;
[root@opensourceecology cron.d]# &lt;br /&gt;
# tomorrow morning I should check on the uptime and journalctl to make sure it rebooted sometime around 10:40 UTC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# ok, back to hetzner3: we bought a second IPv4 address for the static sites, but the server&#039;s networking was never setup for it; let&#039;s add that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # cp interfaces interfaces.20250424&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that failed.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
You have mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
I restored the backup file, and it still failed. The journal and status aren&#039;t helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl status networking&lt;br /&gt;
× networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: failed (Result: exit-code) since Thu 2025-04-24 17:18:55 UTC; 52s ago&lt;br /&gt;
   Duration: 2month 1w 20h 39min 50.765s&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 3259336 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)&lt;br /&gt;
	Process: 3259371 ExecStopPost=/usr/bin/touch /run/network/restart-hotplug (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 3259336 (code=exited, status=1/FAILURE)&lt;br /&gt;
		CPU: 29ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
root@hetzner3 ~ # journalctl -u networking | tail&lt;br /&gt;
Apr 24 17:16:36 hetzner3 ifup[3258504]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I run the ExecStart command manaully, I can add a verbose tag. but that&#039;s not especially helpful, either&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ifup --verbose -a --read-environment&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
&lt;br /&gt;
ifup: configuring interface enp0s31f6=enp0s31f6 (inet)&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
ip addr add 144.76.164.201/255.255.255.224 broadcast 144.76.164.223       dev enp0s31f6 label enp0s31f6&lt;br /&gt;
RTNETLINK answers: File exists&lt;br /&gt;
ifup: failed to bring up enp0s31f6&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/000resolvconf&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/ethtool&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/postfix&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/resolved&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# curiously, though, the new IPv4 address is listed in `ip a`&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to give this server a reboot before proceeding, to make sure the IP config is sticky&lt;br /&gt;
# when it came-up, it lost the new IP :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, at least it&#039;s restarting now without errors; I can work with that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # systemctl restart networking&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /etc/network # systemctlstatus networking&lt;br /&gt;
-bash: systemctlstatus: command not found&lt;br /&gt;
root@hetzner3 /etc/network # systemctl status networking&lt;br /&gt;
● networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (exited) since Thu 2025-04-24 17:33:40 UTC; 15s ago&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 8598 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)&lt;br /&gt;
	Process: 9022 ExecStart=/bin/sh -c if [ -f /run/network/restart-hotplug ]; then /sbin/ifup -a --read-environment --allow=hotplug; fi (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 9022 (code=exited, status=0/SUCCESS)&lt;br /&gt;
		CPU: 357ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:33:34 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:33:39 hetzner3 ifup[8663]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 ifup[8907]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 systemd[1]: Finished networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s try to add it now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces interfaces.20250424 &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,23&lt;br /&gt;
&amp;gt; iface enp0s31f6 inet static&lt;br /&gt;
&amp;gt;   address 144.76.164.195&lt;br /&gt;
&amp;gt;   netmask 255.255.255.224&lt;br /&gt;
&amp;gt;   gateway 144.76.164.193&lt;br /&gt;
&amp;gt;   # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
&amp;gt;   #up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&amp;gt; &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, but I have errors again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# curiously, it *did* add the new IP address; wtf&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet isn&#039;t very helpful because it seems the damn format has changed so many times over the years; lots of outdated info&lt;br /&gt;
# lots of people say they fixed this by deleting everything in interfaces.d/, but we don&#039;t have anything in that folder&lt;br /&gt;
# I did find this hetzner-specific docs on adding a second IP; it&#039;s totally different than what I&#039;ve read elsewhere https://docs.hetzner.com/robot/dedicated-server/network/net-config-debian-ubuntu&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
up ip addr add 10.4.2.1/32 dev eth0&lt;br /&gt;
down ip addr del 10.4.2.1/32 dev eth0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried this, and gave the server a reboot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,20&lt;br /&gt;
&amp;gt;   # 2025-04-24: add second IPv4 address&lt;br /&gt;
&amp;gt;   up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;   down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # cat interfaces&lt;br /&gt;
### Hetzner Online GmbH installimage&lt;br /&gt;
&lt;br /&gt;
source /etc/network/interfaces.d/*&lt;br /&gt;
&lt;br /&gt;
auto lo&lt;br /&gt;
iface lo inet loopback&lt;br /&gt;
iface lo inet6 loopback&lt;br /&gt;
&lt;br /&gt;
auto enp0s31f6&lt;br /&gt;
iface enp0s31f6 inet static&lt;br /&gt;
  address 144.76.164.201&lt;br /&gt;
  netmask 255.255.255.224&lt;br /&gt;
  gateway 144.76.164.193&lt;br /&gt;
  # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
  up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
  # 2025-04-24: add second IPv4 address&lt;br /&gt;
  up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
  down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::2&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::3&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the system came-up with the IP I want. Cool!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I&#039;m able to restart the service without it yelling at me (or breaking the IP config)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also able to ping the server on both IPs, which is a good sign&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ ping 144.76.164.201&lt;br /&gt;
PING 144.76.164.201 (144.76.164.201) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=1 ttl=50 time=490 ms&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=2 ttl=50 time=490 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.201 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1000ms&lt;br /&gt;
rtt min/avg/max/mdev = 489.558/489.676/489.795/0.118 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
user@disp9871:~$ ping 144.76.164.195&lt;br /&gt;
PING 144.76.164.195 (144.76.164.195) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=1 ttl=50 time=493 ms&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=2 ttl=50 time=512 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.195 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1001ms&lt;br /&gt;
rtt min/avg/max/mdev = 492.853/502.518/512.184/9.665 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I used netcat to test it. Most ports are closed, and I found that nginx is listening on most of the other ports on all IPs – except 4443&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # nc -s 144.76.164.195 -l -p 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and this was how it looked on my laptop&#039;s side&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ nc 144.76.164.195 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so the server&#039;s new IPv4 address is configured (and persistent between reboots)&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 20, 2025=&lt;br /&gt;
# Marcin replied to my email authorizing the replacement of the /dev/sdb disk on hetzner2 at 2025-04-24 10:00 UTC https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
## I updated the article with the defined date &amp;amp; time&lt;br /&gt;
# ...&lt;br /&gt;
# I also checked hetzner3. I see that I setup email alerts for the RAID, but not for SMART.&lt;br /&gt;
## on hetzner2, we had no errors of the RAID, but we did have SMART errors. I guess eventually if it failed enough that RAID replication was breaking, we would have gotten alerts. But it would be good if we could get alerts *before* that happened..&lt;br /&gt;
# I checked munin on hetzner2 to see what data it collects for monitoring disks @ /disk-day.html&lt;br /&gt;
## looks like we have latency, throughput, usage, utilization, i/o, and inode usage. There&#039;s nothing about &amp;quot;SMART errors&amp;quot;&lt;br /&gt;
# looks like there *is* a smart module for munin https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
# it&#039;s already there on hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Mar 21  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 smart_&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# hetzner2 has it too &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology munin]# ls -lah /usr/share/munin/plugins | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Nov  6  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Nov  6  2023 smart_&lt;br /&gt;
[root@opensourceecology munin]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# crap, I just checked hetzner3&#039;s munin, and I realized that varnish is missing :(&lt;br /&gt;
# it looks like ansible *has* pushed-out the script and plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /usr/share/munin/plugins/ | grep -i varnish&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 varnish_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Feb 12 00:14 varnish5_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 28  2024 varnish5_.175431.2025-02-12@00:16:02~&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 25  2024 varnish5_.20240928.orig&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /etc/munin/plugins/ | grep -i varnish&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_backend_traffic -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_bad -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_expunge -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_hit_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_main_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_memory_usage -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_mgt_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_objects -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_request_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_threads -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_transfer_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Feb 12 00:16 varnish_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
 # I did a diff of the varnish5_ script from my server and ose&#039;s server, and I found 2 new lines at the top of the hetzner3 server&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@mail:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
#&lt;br /&gt;
# This program is free software; you can redistribute it and/or modify&lt;br /&gt;
maltfield@mail:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
# Ansible managed&lt;br /&gt;
&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so basically the issue appears to be that my &amp;quot;ansible managed&amp;quot; comment comes before the shebang, so varnish is interpreting everything as shell, instead of perl&lt;br /&gt;
# we can see the result of all these syntax errors with a test run too&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@mail:/etc/munin# munin-run varnish_hit_rate&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
client_req.value 704255&lt;br /&gt;
cache_miss.value 202581&lt;br /&gt;
cache_hitmiss.value 2181&lt;br /&gt;
cache_hit.value 499493&lt;br /&gt;
root@mail:/etc/munin#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 26: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 28: varnish5_: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 30: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 32: Varnish: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 34: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 36: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 38: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 39: [varnish5_*]: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 40: group: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 41: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 42: env.name: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 44: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 108: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 111: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 114: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 117: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 119: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 123: Syntax error: &amp;quot;(&amp;quot; unexpected&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I moved the &amp;quot;ansible managed&amp;quot; comment below the shebang in ansible, and pushed it out; now it works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
client_req.value 10714&lt;br /&gt;
cache_hitmiss.value 9&lt;br /&gt;
cache_hit.value 6478&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
cache_miss.value 4227&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also pushed-out smart at the same time, but it&#039;s not working&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_ suggest&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the docs page for the smart_ munin plugin says that we need this section at-minimum in the munin config file, so I added it to hetzner2 https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# tail -n4 zzz-ose &lt;br /&gt;
&lt;br /&gt;
[smart_*]&lt;br /&gt;
user root&lt;br /&gt;
group disk&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I manually created the symlinks for sda &amp;amp; sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /etc/munin/plugins&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sda&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sdb&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# sweet, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# munin-run smart_sdb&lt;br /&gt;
Program_Fail_Count.value 100&lt;br /&gt;
Reallocated_Event_Count.value 100&lt;br /&gt;
Ave_Block_Erase_Count.value 001&lt;br /&gt;
Reallocate_NAND_Blk_Cnt.value 100&lt;br /&gt;
Erase_Fail_Count.value 100&lt;br /&gt;
Reported_Uncorrect.value 100&lt;br /&gt;
SATA_Interfac_Downshift.value 100&lt;br /&gt;
Offline_Uncorrectable.value 100&lt;br /&gt;
smartctl_exit_status.value 8&lt;br /&gt;
Write_Error_Rate.value 100&lt;br /&gt;
FTL_Program_Page_Count.value 100&lt;br /&gt;
Current_Pending_Sector.value 100&lt;br /&gt;
Success_RAIN_Recov_Cnt.value 100&lt;br /&gt;
UDMA_CRC_Error_Count.value 100&lt;br /&gt;
Error_Correction_Count.value 100&lt;br /&gt;
Temperature_Celsius.value 064&lt;br /&gt;
Raw_Read_Error_Rate.value 100&lt;br /&gt;
Total_Host_Sector_Write.value 100&lt;br /&gt;
Power_Cycle_Count.value 100&lt;br /&gt;
Power_On_Hours.value 100&lt;br /&gt;
Host_Program_Page_Count.value 100&lt;br /&gt;
Unused_Reserve_NAND_Blk.value 000&lt;br /&gt;
Percent_Lifetime_Remain.value 000&lt;br /&gt;
Unexpect_Power_Loss_Ct.value 100&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, I&#039;m not getting the same results on hetzner3. I wonder if this munin plugin doesn&#039;t support nvme drives?&lt;br /&gt;
# oh, it looks like I&#039;m actually not updating that file anymore in ansible, because it has a backup. I&#039;m going to make a note in ansible so I don&#039;t make that mistake again.&lt;br /&gt;
# meanwhile, I manually updated the config file on hetzner3 too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin # cd plugin-conf.d/&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls&lt;br /&gt;
dhcpd3  munin-node  README  spamstats  zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # touch /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chown root:root /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chmod 0600 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # cp zzz-myconf /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls -lah /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
-rw------- 1 root root 1,7K Apr 20 17:29 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # vim zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # diff /var/tmp/munin-zzz-myconf.20250420 /etc/munin/plugin-conf.d/zzz-myconf &lt;br /&gt;
3c3&lt;br /&gt;
&amp;lt; # Version: 0.2&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Version: 0.3&lt;br /&gt;
9c9&lt;br /&gt;
&amp;lt; # Updated: 2024-12-12&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Updated: 2025-04-20&lt;br /&gt;
31a32,35&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; [smart_*]&lt;br /&gt;
&amp;gt; user root&lt;br /&gt;
&amp;gt; group disk&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that still fails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but, if I restart the service first and then run it, it – uhh – kinda works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # service munin-node restart&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it exits with a non-error, just a U. no further stats. huh.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
smartctl_exit_status.value U&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, it looks like the smart_ plugin doesn&#039;t work for nvme drives :(&lt;br /&gt;
## https://github.com/munin-monitoring/munin/issues/790&lt;br /&gt;
## https://github.com/aranemac/munin-smart-nvme&lt;br /&gt;
# I&#039;m not looking to compile some binary. I think we&#039;ve reached the point of diminished return here&lt;br /&gt;
# while historical smart charts would be great, what I really want to achieve is some email alerts from SMART, like we setup for the RAID&lt;br /&gt;
# I found a few guides about this&lt;br /&gt;
## https://linuxconfig.org/how-to-configure-smartd-and-be-notified-of-hard-disk-problems-via-email&lt;br /&gt;
## https://serverfault.com/questions/426761/is-smartd-properly-configured-to-send-alerts-by-email&lt;br /&gt;
## https://unix.stackexchange.com/questions/662633/best-practices-to-enable-smart-disk-notifications-on-a-linux-workstation&lt;br /&gt;
# I replaced the files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # mv /etc/smartd.conf /etc/smartd.conf.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).orig&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc # echo &amp;quot;DEVICESCAN -d removable -n standby -m REDACTED@opensourceecology.org -M exec /usr/share/smartmontools/smartd-runner&amp;quot; &amp;gt; /etc/smartd.conf&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but that didn&#039;t work; no email came when I restarted the service (even if I added -M test)&lt;br /&gt;
# I checked the status in systemd, and it says that it did try to send the mail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # systemctl status smartd&lt;br /&gt;
● smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/smartmontools.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (running) since Sun 2025-04-20 20:58:57 UTC; 3min 22s ago&lt;br /&gt;
	   Docs: man:smartd(8)&lt;br /&gt;
			 man:smartd.conf(5)&lt;br /&gt;
   Main PID: 1466569 (smartd)&lt;br /&gt;
	 Status: &amp;quot;Next check of 2 devices will start at 21:28:57&amp;quot;&lt;br /&gt;
	  Tasks: 1 (limit: 76834)&lt;br /&gt;
	 Memory: 1.2M&lt;br /&gt;
		CPU: 66ms&lt;br /&gt;
	 CGroup: /system.slice/smartmontools.service&lt;br /&gt;
			 └─1466569 /usr/sbin/smartd -n&lt;br /&gt;
&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, is SMART capable. Adding to &amp;quot;monitor&amp;quot; list.&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state read from /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme0n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NX0M104566-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 systemd[1]: Started smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon.&lt;br /&gt;
root@hetzner3 /etc #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I checked the postfix logs, and it looks like google is rejecting our mail?!?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # journalctl -fu postfix@-&lt;br /&gt;
...&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: Untrusted TLS connection established to aspmx.l.google.com[108.177.15.27]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bit&lt;br /&gt;
s) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: CB6E5B94BB2: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[108.177.15.27]:25, delay=1.2, delays=0.01/0.01/0.86/0.27, dsn=2.0.0, status=sent (250 2.0.0 OK  1745183017 ffacd0b85a97d-39efa5a45b6si4251829f8f.798 - gsmtp)&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/qmgr[4510]: CB6E5B94BB2: removed&lt;br /&gt;
Apr 20 21:04:36 hetzner3 postfix/smtp[1468114]: Untrusted TLS connection established to aspmx.l.google.com[2404:6800:4003:c02::1b]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/bounce socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/bounce socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/defer socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/defer socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: D13CAB94BB3: defer service failure&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: D13CAB94BB3: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[2404:6800:4003:c02::1b]:25, delay=4.5, delays=0.01/0.01/3.5/1, dsn=4.3.0, status=deferred (bounce or trace service failure)&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I changed it to my personal email, restarted, and I got two emails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme1&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NA0M345614, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme0&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NX0M104566, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I changed it back to the google groups email list email address, and I updated the wiki https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
# after lunch, I refreshed munin on hetzne2 and hetzner3, to see if smart info was not being charted&lt;br /&gt;
## on hetzner2, there&#039;s no changes. I don&#039;t see any charts related to SMART&lt;br /&gt;
## on hetzner3, there&#039;s two new charts (S.M.A.R.T values for drive nvme0n1 &amp;amp; S.M.A.R.T values for drive nvme1n1), but they&#039;re both empty; it only has 1 value (smartctl_exit_status), and it&#039;s &amp;quot;nan&amp;quot; for all time charts. This is expected, since it can&#039;t read the nvme smartctl output format.&lt;br /&gt;
# I think maybe I forgot to restart munin on hetzner2, so I gave that a try&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# service munin-node restart&lt;br /&gt;
Redirecting to /bin/systemctl restart munin-node.service&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# sudo -u munin /usr/bin/munin-cron&lt;br /&gt;
2025/04/20 21:29:38 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 55.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 56.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:52 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# whatever; I guess no munin logs on SMART for this dying server&lt;br /&gt;
# I also confirmed that varnish logs are now visible in munin&lt;br /&gt;
# I committed my ansible changes https://github.com/OpenSourceEcology/ansible/commit/2fb906fd62cf0773d84f50f1cf113ddfe66910ec&lt;br /&gt;
# anyway, I also updated smartd.conf on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology smartmontools]# cp smartd.conf smartd.conf.20250420.bak&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# vim smartd.conf&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# diff smartd.conf.20250420.bak smartd.conf&lt;br /&gt;
23c23,24&lt;br /&gt;
&amp;lt; DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; #DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
&amp;gt; DEVICESCAN -H -m REDACTED@opensourceecology.org -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
[root@opensourceecology smartmontools]# systemctl restart smartd&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh wow, that screaming about the disks failing wasn&#039;t just printed to my tty; it got printed to every tty on my screen session. It really is angry..&lt;br /&gt;
# but, alas, no email was sent – even from hetzner2. where email should *definitely* be working&lt;br /&gt;
# this time the postfix logs on hetzner2 gave us an error from gmail saying why they&#039;re blocking us&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21221]: 297716847E6: host aspmx.l.google.com[64.233.167.27] said: 421-4.7.28 Gmail has detected an unusual rate of unso&lt;br /&gt;
licited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.go&lt;br /&gt;
ogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42a931si4417083f8f.167 - gsmtp (in reply to end &lt;br /&gt;
of DATA command)&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21094]: 3CBF7684804: host aspmx.l.google.com[142.251.168.27] said: 421-4.7.28 Gmail has detected an unusual rate of uns&lt;br /&gt;
olicited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.g&lt;br /&gt;
oogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42967csi4306047f8f.165 - gsmtp (in reply to end&lt;br /&gt;
 of DATA command)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# marcin sent an email campaign today with phpList. If that didn&#039;t make it out due to this, that&#039;s kinda  problem.&lt;br /&gt;
# I see in the log that we&#039;re kinda spamming phplist_bounces@opensourceecology.org&lt;br /&gt;
# that&#039;s basically where phplist is supposed to let our admins know that it failed to deliver to some people on the mailing list&lt;br /&gt;
## I confirmed that this account *does* exist in the gsuite admin wui user list&lt;br /&gt;
# yeah, crap, it&#039;s blocking other mail sent to my personal account from apache.&lt;br /&gt;
# woah, I&#039;m tailing the mail log and I just got probably hundereds or thousands of emails tried to be sent. phpList is *supposed* to do it in small batches, but I wonder if, once it fails and gets added to the queue, it&#039;ll do the re-send without batching it..&lt;br /&gt;
# I checked phpList wui settings and config.php, and I don&#039;t see anything about rate-limiting&lt;br /&gt;
# here&#039;s the docs on it https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
# it says it should be set in config.php. By default, I think it&#039;s 5,000 emails per hour&lt;br /&gt;
# Marcin&#039;s campaign today was sent to 14,111 people&lt;br /&gt;
# I checked the event log page, and I see a lot of these &amp;quot;Maximum time for queue processing: 99999&amp;quot; – which I guess means we need to break these up into batches https://phplist.opensourceecology.org/lists/admin/?page=eventlog&lt;br /&gt;
# looks like the easiest thing to do is to add a pause with MAILQUEUE_THROTTLE https://discuss.phplist.org/t/some-advice-for-correct-configuration-of-sending-rate/429&lt;br /&gt;
# if we send one per second, then we&#039;ll send 3,600 per hour.&lt;br /&gt;
## If we have 15,000 people on our list, then at that rate we&#039;d need 4-5 hours to send a campaign. That sounds like a good idea.&lt;br /&gt;
# I updated the phpList config file to send only one email per second&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# diff config.20250420.php config.php &lt;br /&gt;
83a84,87&lt;br /&gt;
&amp;gt; // only send 1 email per second&lt;br /&gt;
&amp;gt; //  * https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
&amp;gt; define(&#039;MAILQUEUE_THROTTLE&#039;,1);&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we should also probably throttle postfix https://serverfault.com/questions/110919/postfix-throttling-for-outgoing-messages&lt;br /&gt;
# looks like for both hetzner2 and hetzner3, this is set to no delay&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 0s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I set this on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology postfix]# diff main.cf.20250420 main.cf&lt;br /&gt;
683a684,686&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # limit emails to the same-destination-domain to one-email-per-2-seconds&lt;br /&gt;
&amp;gt; default_destination_rate_delay = 2s&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# systemctl restart postfix&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 2s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I also added this to ansible and pushed it out to the server on hetnzer3 https://github.com/OpenSourceEcology/ansible/commit/7ed339cad055a9a0c5b04f26d32c9416daf3a2c7&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 19, 2025=&lt;br /&gt;
&lt;br /&gt;
# I responded to Tom&#039;s email about ssh&lt;br /&gt;
# Tom wasn&#039;t able to reset their account&#039;s password&lt;br /&gt;
# I think I created these accounts with `--disabled-password`, probably as some layered security for ssh (to force keys), but that kinda breaks sudo, which requires the password. I could make sudo NOPASSWD, but I think it&#039;s safer to have a user password set (and have ssh disabled passoword logins still) rather than set sudoers to NOPASSWD, in general&lt;br /&gt;
# disabled passwords are set with the &#039;!&#039; in the second field of /etc/shadown&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing:!:20133:0:99999:7:::&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I just manually edited /etc/shadow with vim to remove the exclimation point&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # vim /etc/shadow&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing::20133:0:99999:7:::&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom replied, saying he can become root on hetzner3 now.&lt;br /&gt;
# ...&lt;br /&gt;
# I returned to work on the plan for replacing the disks on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb#Change_Steps&lt;br /&gt;
# I confirmed that the disks (on both hetzner2 and hetzner3) are MBR partition scheme (not GPT) – indicated by &amp;quot;Disk label type: dos&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sda&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0x9b8e1266&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sda1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sdb&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0xd904fc05&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sdb1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# A quick spot-check shows that our backups usually finish at 09:55 – one time as late as 10:07. That&#039;s UTC.&lt;br /&gt;
# 10:00 UTC is 05:00 my time and 12:00 in Berlin. God that&#039;s early, but better to do this early in Germany time..&lt;br /&gt;
# I sent an email to Marcin asking if Thr 2025-04-24 @ 10:00 UTC (~05:00 FeF) would be a good time to do this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the first disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
Our backups finish daily at 10:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
 * 12:00 in Germany (where the server lives)&lt;br /&gt;
 * 05:00 here in Ecuador, and&lt;br /&gt;
 * 05:00 at FeF&lt;br /&gt;
&lt;br /&gt;
I propose next week on Thursday 2025-04-24 10:00 UTC.&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime, please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 18, 2025=&lt;br /&gt;
# Marcin sent another email this morning asking why osemain is down too now, and I responded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the&lt;br /&gt;
&amp;gt; last message&lt;br /&gt;
&lt;br /&gt;
Your whole database service was down, and it won&#039;t start. You have a varnish cache that stores a subset of pages in-memory for 24 hours. That&#039;s probably what you saw.&lt;br /&gt;
&lt;br /&gt;
I took webservers down yesterday to prevent the possibility of them corrupting the database worse, if it manages to start in recovery mode.&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt; go straight to migration to Hetzner 3.&lt;br /&gt;
&lt;br /&gt;
If you want high uptime, I don&#039;t recommend migrating to hetzner3 at this time. It&#039;s still not fully provisioned, and I actively work on it like a dev server. Which means I&#039;ll be restarting it and its services. It&#039;s not a safe place for production. That&#039;s why the wiki is the *last* service to migrate.&lt;br /&gt;
&lt;br /&gt;
Status update: yesterday I investigated to see if your underlying storage (disk, filesystem, or RAID) are failing, which might cause corruption. The filesystems were fine. RAID didn&#039;t have errors. The SMART logs on the disk said both of your two mirrored drives are failing and should be replaced within 24 hours. But I don&#039;t think that&#039;s evidence of corruption; I think it&#039;s just a timer that&#039;s alerting us to the possibility that the disks will fail soon. afaict, disk replacement is free (from Hetzner) but not trivial and high-risk. I&#039;ll postpone until after restoring the database.&lt;br /&gt;
&lt;br /&gt;
Likely not all of your database is corrupt. We *could* restore from backup, but I don&#039;t recommend that -- as you only have daily backups, and likely you&#039;ll have data loss.&lt;br /&gt;
&lt;br /&gt;
Yesterday I put the database in two recovery modes and was unable to get it to start. My plan is to continue to follow this guide, to see if I can find out which databases/tables/pages are corrupt and which are not. That way we can restore only the data we need from backups and minimize data loss&lt;br /&gt;
&lt;br /&gt;
 * https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&lt;br /&gt;
I have to go to the hospital today. If I have time, I will try to continue later tonight. And I plan to work on this over the weekend. I hope to have your sites back online early next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cheers,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 02:58, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Michael,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the last message -&lt;br /&gt;
&amp;gt; but now I&#039;m trying to post the blog posts and the main site appears to be&lt;br /&gt;
&amp;gt; down. Is our whole backend crashing?  Or is that something you are doing on&lt;br /&gt;
&amp;gt; your end?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Marcin&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On Thu, Apr 17, 2025 at 6:41 PM Marcin Jakubowski &amp;lt;&lt;br /&gt;
&amp;gt; REDACTED@opensourceecology.org&amp;gt; wrote:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Can we prioritize the wiki at this point to migrate the wiki right over to&lt;br /&gt;
&amp;gt;&amp;gt; Hetzner 3 with the  current up to date software, using the wiki backup from&lt;br /&gt;
&amp;gt;&amp;gt; 2 days ago, which is before the crash?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; The wiki was working at least the first part of yesterday, and I noticed&lt;br /&gt;
&amp;gt;&amp;gt; the crash at about 11 PM CST yesterday. Thus taking the backup from 4/15/25&lt;br /&gt;
&amp;gt;&amp;gt; should solve this? Ie, forget about trying to fix on Hetzner 2, go straight&lt;br /&gt;
&amp;gt;&amp;gt; to migration to Hetzner 3. Is that consistent with a possible shift in your&lt;br /&gt;
&amp;gt;&amp;gt; plans, or does that throw off the entire process of migration? OSE stands&lt;br /&gt;
&amp;gt;&amp;gt; stuck without it, I will have to do everything in Google docs if I don&#039;t&lt;br /&gt;
&amp;gt;&amp;gt; have wiki access, and i am justvputtingvout the announcent and recruiting.&lt;br /&gt;
&amp;gt;&amp;gt; I can switcj ro more publishing on the website, assuming that all works.&lt;br /&gt;
&amp;gt;&amp;gt; Please tell me what would be your proposed solution and how quickly you&lt;br /&gt;
&amp;gt;&amp;gt; think we can get back up to a functioning wiki, based on your schedule of&lt;br /&gt;
&amp;gt;&amp;gt; availability to work on this, so I can plan accordingly.  This is a much&lt;br /&gt;
&amp;gt;&amp;gt; higher priority than doing any of the main website migration.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thanks,&lt;br /&gt;
&amp;gt;&amp;gt; Marcin &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so back to trying to figure out the corruption of the mariadb&lt;br /&gt;
# looks like the attempt to start it in recovery mode 2 fails after 10 minutes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because a fatal signal was delivered to the control process. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    10m0.435s&lt;br /&gt;
user    0m0.011s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and the tail of the db log&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/mariadb/mariadb.log&lt;br /&gt;
250417 23:06:00  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:01  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:02  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:03  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:05  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:06  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:07  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:08  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:09  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so we have one more recovery mode we can try before it becomes destructive = 3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 3&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and gave it a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# damn, looks like it&#039;s stuck on the same thing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:33:17 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:33:17 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 20076 ...&lt;br /&gt;
250418 19:33:17 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:33:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:33:17 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:33:17 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:33:17 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:33:17 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:33:17 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:33:17  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:33:17  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:33:18  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:19  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:20  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet suggests this infinite loop is caused by the default of innodb_purge_threads=1, and it says we should set this to 0&lt;br /&gt;
## https://serverfault.com/questions/851342/mysql-crashed-and-not-starting-even-after-adding-innodb-force-recovery&lt;br /&gt;
## https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# I tried to cut off the systemctl restart early, but it&#039;s just stuck. I guess I just have to wait 10 minutes.&lt;br /&gt;
# anyway, I set the recovery back down to 2 and added the purge threads to 0 line; I&#039;ll try that when it&#039;s not blocked&lt;br /&gt;
# meanwhile, I read up on innodb_purge_threads, which is documented here https://dev.mysql.com/doc/refman/8.4/en/innodb-purge-configuration.html&lt;br /&gt;
# oh shit, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m2.102s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
[root@opensourceecology etc]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 19:44:30 UTC; 19s ago&lt;br /&gt;
  Process: 22469 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 22433 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 22468 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─22468 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─22693 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-...&lt;br /&gt;
&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 19:44:30 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the logs are being spammed with these last 5 lines a bunch; I guess something is still trying to access the db?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:44:28 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 22693 ...&lt;br /&gt;
250418 19:44:28 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:44:28 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:44:28 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:44:28 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:44:28 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:44:28 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:44:28 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:44:28  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:44:28  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:44:28  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:44:29 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883505166&lt;br /&gt;
250418 19:44:29 InnoDB: !!! innodb_force_recovery is set to 2 !!!&lt;br /&gt;
250418 19:44:29 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 19:44:29 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 19:44:29 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, the spam stopped. maybe just some startup thing.&lt;br /&gt;
# I was hoping at startup it would tell us which DBs/tables/pages were corrupt; I guess we have to initiate a scan or something.&lt;br /&gt;
# this guide doesn&#039;t say anything about that https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
# but this one recommends running `mysqlcheck` https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# this took about a minute to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# good news; looks like the wiki isn&#039;t fucked. it&#039;s just osemain, oswh, and cacti. restoring those from backups is probably not going to cause any data loss&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@opensourceecology dbFail.20250417]# head mysqlcheck.20250418.log &lt;br /&gt;
3dp_db.wp_commentmeta                              OK&lt;br /&gt;
3dp_db.wp_comments                                 OK&lt;br /&gt;
3dp_db.wp_links                                    OK&lt;br /&gt;
3dp_db.wp_masterslider_options                     OK&lt;br /&gt;
3dp_db.wp_masterslider_sliders                     OK&lt;br /&gt;
3dp_db.wp_options                                  OK&lt;br /&gt;
3dp_db.wp_postmeta                                 OK&lt;br /&gt;
3dp_db.wp_posts                                    OK&lt;br /&gt;
3dp_db.wp_revslider_css                            OK&lt;br /&gt;
3dp_db.wp_revslider_layer_animations               OK&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418.log &lt;br /&gt;
cacti_db.automation_ips&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.automation_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_cache&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_last&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output_boost_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
osemain_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
osemain_s_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
oswh_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s go ahead and take a mysqldump now, including the corrupt data. then I&#039;ll drop these three databases and restore from backups&lt;br /&gt;
## cacti_db&lt;br /&gt;
## osemain_db&lt;br /&gt;
## oswh_db&lt;br /&gt;
# I sent Marcin a status update email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I was able to start your database in recovery mode, and I see the following databases have corrupt tables:&lt;br /&gt;
&lt;br /&gt;
1. osemain&lt;br /&gt;
2. cacti&lt;br /&gt;
3. oswh&lt;br /&gt;
&lt;br /&gt;
Good news that the wiki isn&#039;t in that list. And that those particular corrupt DBs don&#039;t change much, so recovering just those databases from backups should result in an acceptable data loss, if any.&lt;br /&gt;
&lt;br /&gt;
I&#039;ll keep you updated.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I made the post-corruption mysqldump backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass --all-databases | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    2m48.845s&lt;br /&gt;
user    3m19.170s&lt;br /&gt;
sys     0m2.023s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# ls mysqldump*&lt;br /&gt;
mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz &lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s drop those three databases.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 14&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE cacti_db;&lt;br /&gt;
Query OK, 108 rows affected (0.38 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_db;&lt;br /&gt;
Query OK, 22 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oswh_db;&lt;br /&gt;
Query OK, 12 rows affected (0.03 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| fef_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
15 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# recovery mode isn&#039;t going to let us INSERT to recover data from backups, so let&#039;s take it out of recovery mode and see if the db will start&lt;br /&gt;
# nah, it failed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m2.805s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# logs are the same, I think?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:10:04 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:10:04 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 24305 ...&lt;br /&gt;
250418 20:10:04 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:10:04 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:10:04 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:10:04 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:10:04 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:10:04 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:10:04 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:10:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:10:04  InnoDB: Assertion failure in thread 140076605044480 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:10:04 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x560180c61cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x560180875975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f664031f630]&lt;br /&gt;
:0(__GI_raise)[0x7f663ea46387]&lt;br /&gt;
:0(__GI_abort)[0x7f663ea47a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x560180a0a45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x560180a0afa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x560180b0d504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x560180b02487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x560180a0d17d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x560180a010f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6640317ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f663eb0eb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:10:04 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I re-enabled recovery mode, but this time just as 1. This time it did start, but this loop gets spammed to the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:11:42 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883708456&lt;br /&gt;
250418 20:11:42 InnoDB: !!! innodb_force_recovery is set to 1 !!!&lt;br /&gt;
250418 20:11:42 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:11:42 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
250418 20:11:42  InnoDB: Assertion failure in thread 140282494781184 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:11:42 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed, &lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to &lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x55e2d6dbbcad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x55e2d69cf975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f962fbdc630]&lt;br /&gt;
:0(__GI_raise)[0x7f962e303387]&lt;br /&gt;
:0(__GI_abort)[0x7f962e304a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x55e2d6b6445f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x55e2d6b64fa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x55e2d6c67504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x55e2d6c5c487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x55e2d6b6717d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62e83c)[0x55e2d6b5a83c]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f962fbd4ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f962e3cbb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:11:42 mysqld_safe Number of processes running now: 0&lt;br /&gt;
250418 20:11:42 mysqld_safe mysqld restarted&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 27371 ...&lt;br /&gt;
250418 20:11:42 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:11:42 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:11:42 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:11:42 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:11:42 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:11:42 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:11:42 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:11:42  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, even though it *says* it&#039;s started&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m5.156s&lt;br /&gt;
user    0m0.008s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 20:11:07 UTC; 13s ago&lt;br /&gt;
  Process: 24459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 24423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 24458 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─24458 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─25620 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/v...&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:11:07 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
&lt;br /&gt;
real    0m0.012s&lt;br /&gt;
user    0m0.001s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can&#039;t connect to it with mysqlcheck&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log                              &lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I set it back to recovery mode 2, restarted, and tried the mysqlcheck again&lt;br /&gt;
# huh, all lines say OK&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418&lt;br /&gt;
mysqlcheck.20250418_201348.log  mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log&lt;br /&gt;
&lt;br /&gt;
real    0m11.597s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.009s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418_201559.log &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well now I&#039;m wondering if I should have run CHECK TABLE and REPAIR TABLE rather than just DROP them https://dev.mysql.com/doc/refman/8.4/en/myisam-table-close.html&lt;br /&gt;
# I&#039;m going to restore from the backup and then see if I can do that&lt;br /&gt;
# oh, right, we can&#039;t INSERT in recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1030 (HY000) at line 91: Got error -1 from storage engine&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, fuck, now I don&#039;t know why it won&#039;t start. And it doesn&#039;t tell me why. The good news is that I was able to get a db dump. maybe I can copy this huge dump over to some other server for repair and then copy it back?&lt;br /&gt;
# we should have backups. I&#039;m going to just purge all the non-system databases and see if we can get this thing started at all&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db d3ddb;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;d3ddb&#039; at line 1&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE d3d_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE fef_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE microfactory_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_db;&lt;br /&gt;
Query OK, 21 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_stabing_db;&lt;br /&gt;
ERROR 1008 (HY000): Can&#039;t drop database &#039;obi_stabing_db&#039;; database doesn&#039;t exist&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oseforum_db;&lt;br /&gt;
Query OK, 35 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_s_db;&lt;br /&gt;
Query OK, 20 rows affected (0.04 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osewiki_db;&lt;br /&gt;
Query OK, 59 rows affected (0.31 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE phplist_db;&lt;br /&gt;
Query OK, 42 rows affected (0.16 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE seedhome_db;&lt;br /&gt;
Query OK, 12 rows affected (0.05 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE store_db;&lt;br /&gt;
Query OK, 36 rows affected (0.11 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_staging_db;&lt;br /&gt;
Query OK, 21 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# even after that, it still won&#039;t start :&#039;(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m4.863s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Fri 2025-04-18 20:34:47 UTC; 14s ago&lt;br /&gt;
  Process: 18459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 18458 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 18423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 18458 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I purge those three system-level DBs, I want to confirm they&#039;re in our backups&lt;br /&gt;
# as I feared, it looks like they&#039;re missing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zgrep -E &#039;CREATE DATABASE&#039; mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | grep &#039;IF NOT EXISTS&#039; | grep -E &#039;^.{,100}$&#039;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `3dp_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `cacti_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `d3d_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `fef_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `microfactory_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `mysql` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_staging_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oseforum_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_s_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osewiki_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oswh_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `phplist_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `seedhome_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `store_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# according to this, information_schema is essentially a cache that gets created &amp;amp; destroyed every time mysql is restarted, so we should be ok to loose that https://stackoverflow.com/questions/15306132/information-schema-error-when-restoring-database-dump&lt;br /&gt;
# I&#039;m just going to manually dump these three anyway. Or try to&lt;br /&gt;
# well, I was able to get one of the three to backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass information_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_information_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz &lt;br /&gt;
mysqldump: Got error: 1044: &amp;quot;Access denied for user &#039;root&#039;@&#039;localhost&#039; to database &#039;information_schema&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass mysql | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_mysql.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    0m0.142s&lt;br /&gt;
user    0m0.155s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass performance_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_performance_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
mysqldump: Got error: 1142: &amp;quot;SELECT,LOCK TABL command denied to user &#039;root&#039;@&#039;localhost&#039; for table &#039;cond_instances&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.009s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# mysql looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode*&lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_information_schema.20250418_205054.sql.gz&lt;br /&gt;
716K    mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_performance_schema.20250418_205157.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to move this whole db dir out of the way and see if we can start it fresh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /var/lib&lt;br /&gt;
[root@opensourceecology lib]# du -sh mysql/&lt;br /&gt;
6.5G    mysql/&lt;br /&gt;
[root@opensourceecology lib]# ls -lah | grep -i mysql&lt;br /&gt;
drwxr-xr-x   4 mysql   mysql   4.0K Apr 18 20:50 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mv mysql mysql.20250418&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mkdir mysql&lt;br /&gt;
[root@opensourceecology lib]# chown mysql:mysql mysql&lt;br /&gt;
[root@opensourceecology lib]# chmod 0755 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 8.0K&lt;br /&gt;
drwxr-xr-x   2 mysql mysql 4.0K Apr 18 20:55 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, it&#039;s started outside recovery mode now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m3.550s&lt;br /&gt;
user    0m0.007s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
250418 20:55:06 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
250418 20:56:23 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:56:23 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 21252 ...&lt;br /&gt;
250418 20:56:23 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:56:23 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:56:23 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:56:23 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:56:23 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:56:23 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
InnoDB: The first specified data file ./ibdata1 did not exist:&lt;br /&gt;
InnoDB: a new database to be created!&lt;br /&gt;
250418 20:56:23  InnoDB: Setting file ./ibdata1 size to 10 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile0 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile0 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile1 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile1 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
InnoDB: Doublewrite buffer not found: creating new&lt;br /&gt;
InnoDB: Doublewrite buffer created&lt;br /&gt;
InnoDB: 127 rollback segment(s) active.&lt;br /&gt;
InnoDB: Creating foreign key constraint system tables&lt;br /&gt;
InnoDB: Foreign key constraint system tables created&lt;br /&gt;
250418 20:56:23  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:56:24 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 0&lt;br /&gt;
250418 20:56:24 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:56:24 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:56:24 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it created all these files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 29M&lt;br /&gt;
drwxr-xr-x   5 mysql mysql 4.0K Apr 18 20:56 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
-rw-rw----   1 mysql mysql  16K Apr 18 20:56 aria_log.00000001&lt;br /&gt;
-rw-rw----   1 mysql mysql   52 Apr 18 20:56 aria_log_control&lt;br /&gt;
-rw-rw----   1 mysql mysql  18M Apr 18 20:56 ibdata1&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile0&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile1&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 mysql&lt;br /&gt;
srwxrwxrwx   1 mysql mysql    0 Apr 18 20:56 mysql.sock&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 performance_schema&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 test&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that also would have killed the mysql password; I can&#039;t login&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1045 (28000): Access denied for user &#039;root&#039;@&#039;localhost&#039; (using password: YES)&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I hacked my way in and set the root password&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysqld_safe --skip-grant-tables --skip-networking &amp;amp;&lt;br /&gt;
mysql -u root&lt;br /&gt;
use mysql;&lt;br /&gt;
update user set password=PASSWORD(&amp;quot;new-password&amp;quot;) where User=&#039;root&#039;;&lt;br /&gt;
flush privileges;&lt;br /&gt;
exit&lt;br /&gt;
jobs -l&lt;br /&gt;
# kill mysqld_safe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now I can see our three databases, plus one named test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# usually this is where I&#039;d run the mysql hardening script, but let&#039;s just drop test manually and restore from backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE test;&lt;br /&gt;
Query OK, 0 rows affected (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first let&#039;s just restore the &#039;mysql&#039; database&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that appears to have worked; our users are present now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [mysql]&amp;gt; select User from user limit 10;&lt;br /&gt;
+------------------+&lt;br /&gt;
| User             |&lt;br /&gt;
+------------------+&lt;br /&gt;
| oseforum_user    |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| 3dp_user         |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| d3d_user         |&lt;br /&gt;
| fef_user         |&lt;br /&gt;
| microfactory_usr |&lt;br /&gt;
| munin_user       |&lt;br /&gt;
| obi2_user        |&lt;br /&gt;
| obi3_user        |&lt;br /&gt;
+------------------+&lt;br /&gt;
10 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [mysql]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, and ensured it&#039;s still working. Great.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 2&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s restore the rest – including even our corrupt databases – and see if it works or breaks&lt;br /&gt;
# that took about 11.5 minutes to import ~6.8G of data&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
&lt;br /&gt;
real    11m36.530s&lt;br /&gt;
user    1m52.944s&lt;br /&gt;
sys     0m3.593s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh /var/lib/mysql&lt;br /&gt;
6.8G    /var/lib/mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m still able to connect, and now I see all our DBs – including the ones it said were corrupt&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 6&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# woah, I gave it a restart, and it came back fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 3&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess we fixed it with no data loss?&lt;br /&gt;
# let&#039;s bring up the web servers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology lib]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology lib]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the wiki loads now&lt;br /&gt;
# so does osemain&lt;br /&gt;
# I&#039;d say we&#039;re back in business&lt;br /&gt;
# I sent an email to Marcin&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I think all your sites are back now.&lt;br /&gt;
&lt;br /&gt;
I was able to restore all of your databases from a dump of the database in recovery mode. So nothing needed to be restored from backups.&lt;br /&gt;
&lt;br /&gt;
Please let me know if you see any issues. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now that Marcin has ssh access on the server again, I wonder if he has permission to execute `restart` – that would be better for him than logging into the hetzner wui and doing hard resets, which likely caused this corruption&lt;br /&gt;
# at the risk of taking everything down after I just told Marcin that everything is up, I&#039;m going to try it&lt;br /&gt;
# looks like it won&#039;t let him reboot if other users are logged-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ reboot&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
Please retry operation after closing inhibitors and logging out other users.&lt;br /&gt;
Alternatively, ignore inhibitors and users with &#039;systemctl reboot -i&#039;.&lt;br /&gt;
[marcin@opensourceecology ~]$ systemctl reboot -i&lt;br /&gt;
==== AUTHENTICATING FOR org.freedesktop.login1.reboot-multiple-sessions ===&lt;br /&gt;
Authentication is required for rebooting the system while other users are logged in.&lt;br /&gt;
Multiple identities can be used for authentication:&lt;br /&gt;
 1.  maltfield&lt;br /&gt;
 2.  crupp&lt;br /&gt;
 3.  Tom Griffing (tgriffing)&lt;br /&gt;
 4.  jthomas&lt;br /&gt;
Choose identity to authenticate as (1-4):&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the sudoers command to give marcin *just* access to the reboot command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# visudo&lt;br /&gt;
[root@opensourceecology lib]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology lib]# tail /etc/sudoers&lt;br /&gt;
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom&lt;br /&gt;
&lt;br /&gt;
## Allows members of the users group to shutdown this system&lt;br /&gt;
# %users  localhost=/sbin/shutdown -h now&lt;br /&gt;
&lt;br /&gt;
## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)&lt;br /&gt;
#includedir /etc/sudoers.d&lt;br /&gt;
&lt;br /&gt;
# let marcin reboot the machine gracefully&lt;br /&gt;
marcin ALL = NOPASSWD: /sbin/reboot&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I couldn&#039;t test this on the server without changing marcin&#039;s password, so I spun-up a quick DispVM to ensure it *only* gives him access to reboot&lt;br /&gt;
# it&#039;s debian, but sudoers syntax should (hopefully) be the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@debian-12-dvm:~$ sudo su -&lt;br /&gt;
root@debian-12-dvm:~# adduser marcin --disabled-password --gecos &#039;&#039;&lt;br /&gt;
Adding user `marcin&#039; ...&lt;br /&gt;
Adding new group `marcin&#039; (1001) ...&lt;br /&gt;
Adding new user `marcin&#039; (1001) with group `marcin (1001)&#039; ...&lt;br /&gt;
Creating home directory `/home/marcin&#039; ...&lt;br /&gt;
Copying files from `/etc/skel&#039; ...&lt;br /&gt;
Adding new user `marcin&#039; to supplemental / extra groups `users&#039; ...&lt;br /&gt;
Adding user `marcin&#039; to group `users&#039; ...&lt;br /&gt;
root@debian-12-dvm:~# &lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# visudo&lt;br /&gt;
root@debian-12-dvm:~#&lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# passwd marcin&lt;br /&gt;
New password: &lt;br /&gt;
Retype new password: &lt;br /&gt;
passwd: password updated successfully&lt;br /&gt;
root@debian-12-dvm:~# sudo su - marcin&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo su -&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/su -&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo echo hi&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/echo hi&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$ &lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ reboot&lt;br /&gt;
-bash: reboot: command not found&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, that worked. Perfect.&lt;br /&gt;
# I tested it on hetzner2; it worked too.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ sudo reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I sent Marcin a reply ask him to test reboots via ssh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry the server just went down; that was me testing to make sure your &#039;marcin&#039; user now has permission to do a proper &amp;amp; safer `sudo reboot` of hetzner2. It does.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that&lt;br /&gt;
&amp;gt; I should plan on potential breakage at any time?&lt;br /&gt;
&lt;br /&gt;
Great question. There&#039;s a couple things I&#039;d like to implement to prevent this from happening again:&lt;br /&gt;
&lt;br /&gt;
1. Replace both of your disks on hetzner2&lt;br /&gt;
&lt;br /&gt;
2. Give you reboot permission on hetzner2&lt;br /&gt;
&lt;br /&gt;
My best-guess is that the corruption happened because you abruptly shutdown the server. As you know, that&#039;s generally not a good idea as it can cause data loss.&lt;br /&gt;
&lt;br /&gt;
But filesystems use journals and databases use pages. They *should* be able to recover from abrupt shutdowns. They wouldn&#039;t be very useful if they were so frail as to not be able to recover from something like that...&lt;br /&gt;
&lt;br /&gt;
But in this case, I think it was a &amp;quot;perfect storm&amp;quot; that you caused corruption and it wasn&#039;t able to recover from it due to a bug in mariadb. And, because your OS is EOL, we can&#039;t update to a newer version of mariadb that *is* able to recover from such a unlucky combination of events.&lt;br /&gt;
&lt;br /&gt;
So, in the meantime, instead of you logging into hetzner&#039;s WUI to trigger reboots, I&#039;d prefer if you would ssh into the hetzner2 server and execute&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
Please test this on your computer now to make sure you&#039;re setup for it. To ssh into hetzner2, execute this command on your computer:&lt;br /&gt;
&lt;br /&gt;
  ssh -p 32415 marcin@opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
And then at the prompt, execute this command (make sure you type this *after* you&#039;ve logged into hetzner, or you&#039;ll end-up rebooting your own laptop!)&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
The second thing I&#039;d like to do is replace both of your disks on hetzner2. I don&#039;t think they caused corruption in this case, but I did discover that they&#039;re both screaming that they&#039;re going to die soon and asking to be replaced, so I would be a fool not to heed that warning.&lt;br /&gt;
&lt;br /&gt;
Hetzner shouldn&#039;t charge us to replace a failing disk, but I&#039;ll schedule some downtime for remote hetzner hands to shutdown the machine, then I&#039;ll need to format the new drive, add it to the RAID (the mirror of two redundant disks), and update your grub boot partition.&lt;br /&gt;
&lt;br /&gt;
There&#039;s some risk in doing this, because you&#039;ll be running on one non-redundant disk (a disk which is screaming at us saying it&#039;s going to die within 24 hours) while the RAID is re-building. But, of course, there&#039;s risk in not doing it..&lt;br /&gt;
&lt;br /&gt;
Please confirm that you can now reboot hetzner2 via ssh.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 16:39, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Thats excellent, thabk you, looks good. Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that I should plan&lt;br /&gt;
&amp;gt; on potential breakage at any time? Regarding the full migration, how many&lt;br /&gt;
&amp;gt; more hours/days of provisioning do tou still expwct to need? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created an article for the CHG to replace the first disk on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
## I wonder if I can figure out which one grub uses and replace that one second..&lt;br /&gt;
# from my log yesterday, here&#039;s our two drive&#039;s serial numbers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fuck; looks like neither is referenced in /boot/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA4520&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA336C&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the steps to setup grub are actually quite simple, according to the hetzner docs https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## it says if we&#039;re doing it on the booted system, then we just need to run `grub-install /dev/sdX`&lt;br /&gt;
# it has additional instructions for grub1. And, uh, looks like we have grub1, grub2, *and* an efi dir in /boot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64                            initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img  System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64                      initramfs-3.10.0-327.18.2.el7.x86_64.img         System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64                        initramfs-3.10.0-514.26.2.el7.x86_64.img         System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64                        initramfs-3.10.0-693.2.2.el7.x86_64.img          System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64                         initramfs-3.10.0-693.2.2.el7.x86_64kdump.img     System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi                                                      initrd-plymouth.img                              vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
grub                                                     lost+found                                       vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
grub2                                                    symvers-3.10.0-1127.el7.x86_64.gz                vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img  symvers-3.10.0-1160.119.1.el7.x86_64.gz          vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img                     symvers-3.10.0-327.18.2.el7.x86_64.gz            vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img                symvers-3.10.0-514.26.2.el7.x86_64.gz            vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img               symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m thinking we should actually just tell hetzner to do a hot swap while the system is on, so we can do this &amp;quot;easy install&amp;quot; of grub without risking the system not coming-up after they removed the drive&lt;br /&gt;
# oh, the efi dir is empty, so I&#039;m thinking we&#039;re using grub2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# find efi&lt;br /&gt;
efi&lt;br /&gt;
efi/EFI&lt;br /&gt;
efi/EFI/centos&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, the grub dir just has one file in it?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub&lt;br /&gt;
total 10K&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Apr 11  2016 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
-rw-r--r--  1 root root 1.4K Nov 15  2011 splash.xpm.gz&lt;br /&gt;
[root@opensourceecology boot]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# grub2 looks most sane&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub2&lt;br /&gt;
total 52K&lt;br /&gt;
drwx------. 5 root root 1.0K Jul 26  2024 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Dec 15  2015 fonts&lt;br /&gt;
-rw-r--r--  1 root root 7.8K Jul 26  2024 grub.cfg&lt;br /&gt;
-rw-r--r--  1 root root 5.3K Jun  1  2016 grub.cfg.1499616907.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 6.1K Jul  9  2017 grub.cfg.1506097734.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 7.0K Sep 22  2017 grub.cfg.1588589453.rpmsave&lt;br /&gt;
-rw-r--r--. 1 root root 1.0K Jul 26  2024 grubenv&lt;br /&gt;
drwxr-xr-x. 2 root root 9.0K May 31  2016 i386-pc&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K May 31  2016 locale&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it looks like it&#039;s referencing the raid, not the drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### BEGIN /etc/grub.d/10_linux ###&lt;br /&gt;
menuentry &#039;CentOS Linux (3.10.0-1160.119.1.el7.x86_64) 7 (Core)&#039; --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option &#039;gnulinux-3.10.0-327.13.1.el7.x86_64-advanced-af18bd25-f715-4003-b055-170a07591c60&#039; {&lt;br /&gt;
		load_video&lt;br /&gt;
		set gfxpayload=keep&lt;br /&gt;
		insmod gzio&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod diskfilter&lt;br /&gt;
		insmod mdraid1x&lt;br /&gt;
		insmod ext2&lt;br /&gt;
		set root=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;&lt;br /&gt;
		if [ x$feature_platform_search_hint = xy ]; then&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root --hint=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;  9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		else&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root 9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		fi&lt;br /&gt;
		linux16 /vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
		initrd16 /initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# right, so if I understand this correctly: we&#039;re not updating grub. We&#039;re using &#039;grub-install&#039; to copy our grub config *to* the drive. that&#039;s easier and less concerning than I thought.&lt;br /&gt;
# well, since I can&#039;t see any good reason to pick one drive or the other to replace first, I&#039;m going to have them replace /dev/sdb first. Just because &#039;sda&#039; seems like it would be primary. I know it&#039;s probably not, but, anyway..&lt;br /&gt;
# that means we&#039;ll replace Crucial_CT250MX200SSD1_154410FA4520 first; I created another wiki entry for that https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
# Marcin sent me an email confirming that he&#039;s able to restart hetzner2 with `sudo reboot`. I asked him to use this in the future if he needs to reboot it again.&lt;br /&gt;
# the disk is getting pretty full, but I&#039;m going to leave these files in /var/tmp/ for at least a few days, to make sure we don&#039;t actually need to restore from a backup again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  150G   38G  80% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/lib/mysql.20250418 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 17, 2025=&lt;br /&gt;
# Marcin sent me an email last night (and again this morning) asking why the wiki is down&lt;br /&gt;
# I hadn&#039;t touched ose infra since 6 days ago&lt;br /&gt;
# the wiki is still on hetzner2, which is on EOL Cent, so I&#039;m not terribly surprised it&#039;s falling apart.&lt;br /&gt;
# I first warned Marcin about this many years ago, and hopefully the migration to hetzner3 will be finished before the end of this year&lt;br /&gt;
# anyway, let&#039;s check what happened to the wiki on hetzner2&lt;br /&gt;
# it&#039;s a 500 error complaining about the db&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ curl -iL wiki.opensourceecology.org&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:52 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://wiki.opensourceecology.org/&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 500 Internal Server Error&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:54 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 976&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Varnish: 434054&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish-v4&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h1&amp;gt;Sorry! This site is experiencing technical difficulties.&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Try waiting a few minutes and reloading.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;small&amp;gt;(Cannot access the database)&amp;lt;/small&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;hr /&amp;gt;&amp;lt;div style=&amp;quot;margin: 1.5em&amp;quot;&amp;gt;You can try searching via Google in the meantime.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;small&amp;gt;Note that their indexes of our content may be out of date.&amp;lt;/small&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;form method=&amp;quot;get&amp;quot; action=&amp;quot;//www.google.com/search&amp;quot; id=&amp;quot;googlesearch&amp;quot;&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;domains&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;num&amp;quot; value=&amp;quot;50&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;ie&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;oe&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;q&amp;quot; size=&amp;quot;31&amp;quot; maxlength=&amp;quot;255&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;submit&amp;quot; name=&amp;quot;btnG&amp;quot; value=&amp;quot;Search&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;p&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; checked=&amp;quot;checked&amp;quot; /&amp;gt;Open Source Ecology&amp;lt;/label&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;WWW&amp;lt;/label&amp;gt;&lt;br /&gt;
	&amp;lt;/p&amp;gt;&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# disk is fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G   96G   92G  52% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s no new logs in the apache error log when I hit the site in real-time (bypassing the cache)&lt;br /&gt;
# there&#039;s also no new logs in the mariadb error log when I hit the site in real-time&lt;br /&gt;
# well, the db isn&#039;t running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Thu 2025-04-17 17:39:24 UTC; 2h 42min ago&lt;br /&gt;
  Process: 1227 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 1226 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 1103 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 1226 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# error logs aren&#039;t very helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# journalctl -fu mariadb&lt;br /&gt;
-- Logs begin at Thu 2025-04-17 17:38:59 UTC. --&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I try to restart it manually, nothing gets put in the journal logs, but there&#039;s a bunch to the actual log file that the journal log mentions (damn systemd)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the log that pops-up when we try a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 20:24:31 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 20:24:31 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 10583 ...&lt;br /&gt;
250417 20:24:31 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 20:24:31 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 20:24:31 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 20:24:31 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 20:24:31 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 20:24:31 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 20:24:31 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 20:24:31  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 20:24:31  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 20:24:31  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 20:24:31  InnoDB: Assertion failure in thread 140093400303360 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250417 20:24:31 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x563a1c105cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x563a1bd19975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f6a294c9630]&lt;br /&gt;
:0(__GI_raise)[0x7f6a27bf0387]&lt;br /&gt;
:0(__GI_abort)[0x7f6a27bf1a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x563a1beae45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638f69)[0x563a1beaef69]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x563a1bfb1504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x563a1bfa6487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x563a1beb117d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x563a1bea50f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6a294c1ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f6a27cb8b0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250417 20:24:31 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# google points to this https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
## they say it could be a bug that might be fixed in v5.7. We&#039;re using 5.5.68. hetzner3 uses 5.8.&lt;br /&gt;
# reddit says we&#039;re fucked and should restore from backup https://old.reddit.com/r/mysql/comments/d3nkc7/innodb_assertion_failure_in_thread_4560_in_file/&lt;br /&gt;
# before reading any more, I&#039;m going to immediately make a local copy of our most-recent backups&lt;br /&gt;
# looks like we have a backup from 13 hours ago and one from 27 hours ago&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ date&lt;br /&gt;
Thu Apr 17 20:36:56 UTC 2025&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync&lt;br /&gt;
total 21G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 17 07:49 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    21G Apr 17 07:48 daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync.old/&lt;br /&gt;
total 22G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 16 07:52 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    22G Apr 16 07:52 daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this SE answer is helpful https://serverfault.com/questions/592793/mysql-crashed-and-wont-start-up&lt;br /&gt;
## it says we can force the db to start (in &amp;quot;recovery mode&amp;quot;) and then try to figure out which table is corrupted. Then we might be able to backup more-recent data from the not-corrupt tables and only recover the fucked table&lt;br /&gt;
## other warnings suggest solving the underlying issue: why did the data become corrupt?&lt;br /&gt;
## well, we know Marcin has been hard-resetting the server (via the hetzner wui) about every week because it keeps breaking since some months ago (it&#039;s EOL and not worth debugging)&lt;br /&gt;
## but it&#039;s also possible we have a worse issue, like a disk failing. We do have RAID1 tho, so idk. Still, it would be wise to check the SMART data and RAID logs and filesystem for corruption&lt;br /&gt;
# I sent a quick status update to Marcin so he knows the severity of the issue and that this isn&#039;t going to be fixed soon&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Your database is corrupt and won&#039;t start.&lt;br /&gt;
&lt;br /&gt;
Quick internet search for the error messages suggests this could be a bug that&#039;s been fixed in mariadb 5.7. You&#039;re using 5.6 and can&#039;t upgrade because your OS is EOL. hetnzer3 is running 5.8.&lt;br /&gt;
&lt;br /&gt;
 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
I&#039;m looking into seeing what is corrupt, what isn&#039;t corrupt, and if we can restore from backup.&lt;br /&gt;
&lt;br /&gt;
This is not going to be an easy or fast fix, sorry. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the backups of the backups finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /home/b2user/sync*/* /var/tmp/&lt;br /&gt;
sending incremental file list&lt;br /&gt;
daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
 22,975,631,986 100%  139.63MB/s    0:02:36 (xfr#1, to-chk=1/2)&lt;br /&gt;
daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
 21,566,407,634 100%  103.43MB/s    0:03:18 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 44,552,914,338 bytes  received 54 bytes  125,324,653.70 bytes/sec&lt;br /&gt;
total size is 44,542,039,620  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  138G   50G  74% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also going to take down the webservers, so that they can&#039;t fuck-up the database worse, if we do start it in some recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I should also make a backup of /var/lib/mysql&lt;br /&gt;
# I&#039;m going to create a dif for all of this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mkdir /var/tmp/dbFail.20250417&lt;br /&gt;
[root@opensourceecology ~]# chown root:root /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041* /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# vim /var/tmp/dbFail.20250417/info.txt&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /var/tmp/dbFail.20250417/info.txt &lt;br /&gt;
2025-04-17: Marcin emailed me last night saying the wiki was down with a db error. Today I tried to start it, but it refues to come-up. Looks like it&#039;s preventing itself from starting because it realizes something is corrupt and starting it would make things worse. Internet says maybe this was fixed in a newer version; we can&#039;t upgrade because Cent is EOL. Hetzner3 has the newer version&lt;br /&gt;
&lt;br /&gt;
		 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
		Anyway, I&#039;m creating this folder to store some backups before we make things worse.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# aaaand I added a copy of /var/lib/mysql/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /var/lib/mysql /var/tmp/dbFail.20250417/var-lib-mysql.$(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
sending incremental file list&lt;br /&gt;
created directory /var/tmp/dbFail.20250417/var-lib-mysql.20250417&lt;br /&gt;
mysql/&lt;br /&gt;
mysql/aria_log.00000001&lt;br /&gt;
		 16,384 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=707/709)&lt;br /&gt;
...&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rate_locations.frm&lt;br /&gt;
		  8,714 100%    9.26kB/s    0:00:00 (xfr#689, to-chk=1/709)&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rates.frm&lt;br /&gt;
		 13,128 100%   13.95kB/s    0:00:00 (xfr#690, to-chk=0/709)&lt;br /&gt;
&lt;br /&gt;
sent 7,384,914,964 bytes  received 13,343 bytes  114,495,012.51 bytes/sec&lt;br /&gt;
total size is 7,383,062,830  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# another important note: apparently we can keep increasing the value of innodb_force_recovery until it starts, but anything &amp;gt;3 could corrupt the data worse https://dba.stackexchange.com/q/241714&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Marko, MariaDB Innodb lead: MDEV-15370 was a bug when ugprading to 10.3, caused by MDEV-12288. Actually upgrades can still fail (MDEV-15912) if a slow shutdown of the old server was not made. Because the scenario does not involve upgrading to 10.3 or later, I am afraid that the user witnessed some kind of undo log corruption. Starting up with innodb_force_recovery=3 might allow dumping all data. If that crashes, then try innodb_force_recovery=5, but be aware that anything &amp;gt;3 may corrupt the database further, and therefore you should not use the database for anything else than mysqldump&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, a lot of the links for how to fix this are now dead&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## https://forums.mysql.com/read.php?22,603093,604631#msg-604631&lt;br /&gt;
## https://support.plesk.com/hc/en-us/articles/12377798484375-Plesk-is-not-accessible-ERROR-Zend-Db-Adapter-Exception-SQLSTATE-HY000-2002-No-such-file-or-directory&lt;br /&gt;
# we&#039;re running 5.6, so it should be this https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html&lt;br /&gt;
## but note that redirects to 8.6 for some reason? https://dev.mysql.com/doc/refman/8.4/en/forcing-innodb-recovery.html&lt;br /&gt;
## ah, so does 1.1 – apparently anything it doesn&#039;t like just reidrects to the latest version https://dev.mysql.com/doc/refman/1.1/en/forcing-innodb-recovery.html&lt;br /&gt;
# this suggests that, if we&#039;re going to use innodb_force_recovery 4 or greater, we only do it on another machine. So basically take the data I just backed-up put it on a separate machine, and do the fucker *there* instead https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## it also says that dumps of 4 or greater could still render corrupt data, so they shouldn&#039;t be trusted, anyway&lt;br /&gt;
## good news: it says the db blocks all INSERT, UPDATE, and DELETE commands when any recovery mode is enabled&lt;br /&gt;
### but we *can* run DROP. so the idea is to dump everything in recovery mode and drop what is corrupt. then restart with the recovery value set to 0 and restore.&lt;br /&gt;
## it says that dumps from recover mode of 1 or 2 or 3 are safe, and only the page is corrupt&lt;br /&gt;
### here&#039;s the definition of a page https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A unit representing how much data InnoDB transfers at any one time between disk (the data files) and memory (the buffer pool). A page can contain one or more rows, depending on how much data is in each row. If a row does not fit entirely into a single page, InnoDB sets up additional pointer-style data structures so that the information about the row can be stored in one page.&lt;br /&gt;
&lt;br /&gt;
One way to fit more data in each page is to use compressed row format. For tables that use BLOBs or large text fields, compact row format allows those large columns to be stored separately from the rest of the row, reducing I/O overhead and memory usage for queries that do not reference those columns.&lt;br /&gt;
&lt;br /&gt;
When InnoDB reads or writes sets of pages as a batch to increase I/O throughput, it reads or writes an extent at a time.&lt;br /&gt;
&lt;br /&gt;
All the InnoDB disk data structures within a MySQL instance share the same page size.&lt;br /&gt;
&lt;br /&gt;
See Also buffer pool, compact row format, compressed row format, data files, extent, page size, row.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess that just means data that hasn&#039;t been written to disk yet. So I *think* it should be OK to trust data that only has corrupt pages?&lt;br /&gt;
# ok, I think I have enough to proceed – at least for recovery modes 1, 2, and 3.&lt;br /&gt;
# but first let&#039;s check SMART&lt;br /&gt;
# oh, fuck, my notes on this are on the wiki. Of course.&lt;br /&gt;
# arch wiki to the rescue https://wiki.archlinux.org/title/S.M.A.R.T.&lt;br /&gt;
# fail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
-bash: smartctl: command not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# luckily the yum servers for this EOL OS are still online, and I could install it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# yum install smartmontools&lt;br /&gt;
...&lt;br /&gt;
Total download size: 546 k&lt;br /&gt;
Installed size: 2.0 M&lt;br /&gt;
Is this ok [y/d/N]: y&lt;br /&gt;
Downloading packages:&lt;br /&gt;
smartmontools-7.0-2.el7.x86_64.rpm                                                                                                              | 546 kB  00:00:00     &lt;br /&gt;
Running transaction check&lt;br /&gt;
Running transaction test&lt;br /&gt;
Transaction test succeeded&lt;br /&gt;
Running transaction&lt;br /&gt;
  Installing : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
  Verifying  : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
&lt;br /&gt;
Installed:&lt;br /&gt;
  smartmontools.x86_64 1:7.0-2.el7                                                                                                                                     &lt;br /&gt;
&lt;br /&gt;
Complete!&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# better&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
SMART support is: Available - device has SMART capability.&lt;br /&gt;
SMART support is: Enabled&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well this is terrifying; it says both our disks are gonna fail within 24 hours&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# compare that to hetnzer3, which says all is good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m not 100% convinced that this is true. I still want to initiate a test on the drives, but I&#039;m going to go ahead and pass this to hetzner support asap and ask them if there&#039;s a fee for them to replace our drives.&lt;br /&gt;
# oh, interesting. they have a walkthrough that says it&#039;s free via Server -&amp;gt; Technical -&amp;gt; Disk Failure https://robot.hetzner.com/support/index&lt;br /&gt;
## well, it lists two options&lt;br /&gt;
### Free Replacement drive nearly new or used and tested; depends on what is in stock. &lt;br /&gt;
### At cost Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
## we were given an option if we should hot swap while the system is on or shutdown. I&#039;m going to say shutdown. That&#039;ll be simpler from the OS side, I think&lt;br /&gt;
## dang, it says they&#039;ll swap the drive within 2-4 hours.&lt;br /&gt;
# I&#039;ve never done this before, but it&#039;s a hardware raid. My understanding is that as soon as it comes-up, it&#039;ll begin copying the data from one disk to the other disk. But, christ, if both disks are fucked then which disk should I choose them to replace? Can I see which one is more fucked than the other?&lt;br /&gt;
# hetzner provides 4 docs for assistance on this&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#information-on-defective-drives&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/maintainance/nvme/#show-serial-number-of-a-specific-nvme-ssd&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#creating-a-complete-smart-log&lt;br /&gt;
# that first doc says to run the command we just ran&lt;br /&gt;
# hmm..it says for more info we should look at the &amp;quot;Failed Attributes&amp;quot; – but we have none for either disk&lt;br /&gt;
# ok, the docs say we can get more info with -A&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so both say &amp;quot;Percent_Lifetime_Remain&amp;quot; is an issue. does that mean it&#039;s not *actually* writing corrupt data, but it&#039;s literally just a timer that hit and said &amp;quot;yeah you should probably replace the disk??&amp;quot;&lt;br /&gt;
# well, &amp;quot;Percent_Lifetime_Remain&amp;quot; doesn&#039;t appear in the docs table. nor in the source wikipedia table https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes&lt;br /&gt;
# yeah, reddit suggests that means the drive &amp;quot;should be replaced soon&amp;quot; but not that it&#039;s actually detected as failing now https://www.reddit.com/r/homelab/comments/kaaqma/percent_lifetime_remain_failing_now/&lt;br /&gt;
# in that case, I guess it doesn&#039;t matter which disk we replace. But let&#039;s go ahead and get one replaced. I don&#039;t think this was the cause of the db corruption (I still think it&#039;s &amp;quot;shutting down the computer abruptly + a bug in old mariadb that prevents it from recovering&amp;quot;), but I would be stupid not to take a free replacement of a RAID1-mirrored disk that&#039;s alerting us that it&#039;s too old to be in prod.&lt;br /&gt;
# the second hetnzer docs refer to nvme. that&#039;s relevant on hetzner3 but not hetzner2. anyway, I do want to know how to check this on hetzer2 (even if I can&#039;t update the wiki right now with this docs)&lt;br /&gt;
# wow, the output for smartctl looks very different for NVMEs on Debian than it does on CentOS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        39 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    6%&lt;br /&gt;
Data Units Read:                    152.358.379 [78,0 TB]&lt;br /&gt;
Data Units Written:                 52.125.092 [26,6 TB]&lt;br /&gt;
Host Read Commands:                 6.873.372.480&lt;br /&gt;
Host Write Commands:                1.362.559.127&lt;br /&gt;
Controller Busy Time:               22.226&lt;br /&gt;
Power Cycles:                       28&lt;br /&gt;
Power On Hours:                     17.245&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      159&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               39 Celsius&lt;br /&gt;
Temperature Sensor 2:               48 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        40 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    7%&lt;br /&gt;
Data Units Read:                    140.811.605 [72,0 TB]&lt;br /&gt;
Data Units Written:                 56.604.901 [28,9 TB]&lt;br /&gt;
Host Read Commands:                 1.304.073.899&lt;br /&gt;
Host Write Commands:                1.364.668.115&lt;br /&gt;
Controller Busy Time:               21.180&lt;br /&gt;
Power Cycles:                       23&lt;br /&gt;
Power On Hours:                     15.565&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      149&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               40 Celsius&lt;br /&gt;
Temperature Sensor 2:               45 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that shows we&#039;re at 6% and 7% usage on hetzner3, whereas I guess we&#039;re at 100% on hetzner2&lt;br /&gt;
# the third hetzner doc refers to a software raid. actually, I thought we were using a hardware raid, but now I&#039;m not sure&lt;br /&gt;
# this indicates that our raid is fine. two UUs (eg `[UU]`) is fine. Bad would be a U and a missing U (eg `[U_]`)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat &lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sdb2[1] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[1] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah crap, the process to bring the new drive back into the RAID is not-trivial https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## first we have to format the new drive exactly as the old drive, then add each partition into the RAID array, then update grub. And, of course, meanwhile we&#039;ll be running on one disk. So if we fuck-up any of those steps, we loose everything. This could take me a few days (or weeks), and meanwhile the sites are all offline and our daily backups on backblaze are being deleted/rotated out of existance. Sadly, I think I&#039;m going to postpone this until after we get the sites back-up.&lt;br /&gt;
# the last hetzner doc shows us how to get the serial number of our disks (which hetzner will ask-for when we tell them to swap it)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and ran a SMART test; it says it&#039;ll take just 2 minutes to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:07:55 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:08:18 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also kicked-off a long test, which I can check tomorrow&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:12 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:14 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, then we have the filesystem. it looks like /var/lib/msyql/ lives on &#039;/&#039; which is /dev/md2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h /var/lib/mysql&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
/dev/md2        197G  145G   43G  78% /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/md2&lt;br /&gt;
&lt;br /&gt;
Disk /dev/md2: 215.0 GB, 215024271360 bytes, 419969280 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk /dev/md2&lt;br /&gt;
NAME MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
md2    9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it won&#039;t let me check the filesystem while it&#039;s mounted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fsck /dev/md2&lt;br /&gt;
fsck from util-linux 2.23.2&lt;br /&gt;
e2fsck 1.42.9 (28-Dec-2013)&lt;br /&gt;
/dev/md2 is mounted.&lt;br /&gt;
e2fsck: Cannot continue, aborting.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it probably should be happening on-boot, but I couldn&#039;t find it in dmesg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i check&lt;br /&gt;
[    0.000000] Early table checksum verification disabled&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i fsck&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, instead we can just use tune2fs to get the info on the last check that was run&lt;br /&gt;
# looks like it ran today; probably when Marcin rebooted it https://unix.stackexchange.com/questions/400851/what-should-i-do-to-force-the-root-filesystem-check-and-optionally-a-fix-at-bo&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2&lt;br /&gt;
tune2fs 1.42.9 (28-Dec-2013)&lt;br /&gt;
Filesystem volume name:   &amp;lt;none&amp;gt;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Filesystem UUID:          af18bd25-f715-4003-b055-170a07591c60&lt;br /&gt;
Filesystem magic number:  0xEF53&lt;br /&gt;
Filesystem revision #:    1 (dynamic)&lt;br /&gt;
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize&lt;br /&gt;
Filesystem flags:         signed_directory_hash&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Filesystem OS type:       Linux&lt;br /&gt;
Inode count:              13131776&lt;br /&gt;
Block count:              52496160&lt;br /&gt;
Reserved block count:     2624808&lt;br /&gt;
Free blocks:              26575102&lt;br /&gt;
Free inodes:              12417672&lt;br /&gt;
First block:              0&lt;br /&gt;
Block size:               4096&lt;br /&gt;
Fragment size:            4096&lt;br /&gt;
Reserved GDT blocks:      1011&lt;br /&gt;
Blocks per group:         32768&lt;br /&gt;
Fragments per group:      32768&lt;br /&gt;
Inodes per group:         8192&lt;br /&gt;
Inode blocks per group:   512&lt;br /&gt;
Flex block group size:    16&lt;br /&gt;
Filesystem created:       Tue May 31 06:01:12 2016&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Last write time:          Thu Apr 17 17:39:00 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
Check interval:           0 (&amp;lt;none&amp;gt;)&lt;br /&gt;
Lifetime writes:          124 TB&lt;br /&gt;
Reserved blocks uid:      0 (user root)&lt;br /&gt;
Reserved blocks gid:      0 (group root)&lt;br /&gt;
First inode:              11&lt;br /&gt;
Inode size:               256&lt;br /&gt;
Required extra isize:     28&lt;br /&gt;
Desired extra isize:      28&lt;br /&gt;
Journal inode:            8&lt;br /&gt;
Default directory hash:   half_md4&lt;br /&gt;
Directory Hash Seed:      b9456d9f-1608-4444-99c2-02e6f327e42d&lt;br /&gt;
Journal backup:           inode blocks&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# both of the filesystems (/ and /boot) look fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md1 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /boot&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              46&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Tue May 31 06:01:07 2016&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, so far I couldn&#039;t find any signs of corruption on the disk/fs level&lt;br /&gt;
# back to the db, I set the recovery option in the my.cnf file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# cp my.cnf my.cnf.20250417&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 1&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it didn&#039;t come-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried changing it to restore level 2; this time it got stuck &amp;quot;waiting for the background threads&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 22:32:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 22:32:49 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 14901 ...&lt;br /&gt;
250417 22:32:49 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 22:32:49 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 22:32:49 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 22:32:49 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 22:32:49 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 22:32:49 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 22:32:49 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 22:32:49  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 22:32:49  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 22:32:49  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:50  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:51  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:52  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:53  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:54  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:55  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:56  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:57  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:58  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it seems infinite. I don&#039;t know if it&#039;s going to time-out, but I&#039;m just going to leave it and come-back tomorrow.&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 11, 2025=&lt;br /&gt;
&lt;br /&gt;
# let&#039;s get Catarina that broken staging site for osemain on hetzner3&lt;br /&gt;
# Marcin still hasn&#039;t regained access to his ssh key (so he can update the ose keepass), but he did finally send me the password to our hetzner account&lt;br /&gt;
# so now I can order a second IPv4 address, as needed for obi &amp;amp; osemain to have two distinct sites on hetzner3&lt;br /&gt;
# I logged-into hetzner https://robot.hetzner.com/server&lt;br /&gt;
# I also typed a &amp;quot;name&amp;quot; into the blank &amp;quot;name&amp;quot; fields for our two servers. one is now called &amp;quot;hetzner2&amp;quot; and the new one &amp;quot;hetzner3&amp;quot;&lt;br /&gt;
# I clicked on the server for &amp;quot;hetzner3&amp;quot; and the tab &amp;quot;IPs&amp;quot;.&lt;br /&gt;
## Then I clicked on &amp;quot;Order additional IPs / Nets&amp;quot;&lt;br /&gt;
## I selected &amp;quot;One additional IP with costs (€ 1.70 max. per month / € 0.0027 per hour + € 4.90 once-off setup)&amp;quot;&lt;br /&gt;
## it required me to enter a reason (IPv4 is scarce) to which I wrote:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
we need to run two websites with the same domain name that are already running on our primary IPv4 address, and a client doesn&#039;t have IPv6 working at their office&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## and I clicked &amp;quot;Apply for IP/subnet in obligation&amp;quot;&lt;br /&gt;
## I got a message; looks like it needs human approval&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your request for additional IPs/subnets was successfully sent. We will send you an email as soon as your IP/subnet is ready.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I typed an email to Marcin and Catarina to notify them of this order&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
As authorized on our last call, I ordered an additional IPv4 address for your hetzner account.&lt;br /&gt;
&lt;br /&gt;
IPv4 addresses are scarce, and it appears that they need to approve it manually.&lt;br /&gt;
&lt;br /&gt;
The cost is €1.70 per month + € 4.90 once-off setup.&lt;br /&gt;
&lt;br /&gt;
This will allow us to run more than one website with the same domain off the same server. That will be needed for osemain and obi.&lt;br /&gt;
&lt;br /&gt;
Once you finish rebuilding those websites on hetzner3 to use a new not-broken theme, we can cancel this second IP address.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I finished typing ^ that email, I got an email from hetzner indicating that we have a new IP&lt;br /&gt;
# I refreshed the hetzner wui, and now I see the new IP&lt;br /&gt;
# ...&lt;br /&gt;
# following-up on the bus factor, I added Catarina &amp;amp; Tom&#039;s ssh keys to their authorized_keys files on hetzner3&lt;br /&gt;
## I sent them both emails asking them to confirm access&lt;br /&gt;
# I also emailed Marcin asking if he installed zulucrypt yet to try to recover his old ssh key&lt;br /&gt;
# update: within a few hours, Marcin had successfully decrypted and mounted his old veracrypt volume using zuluCrypt&lt;br /&gt;
# he created this article on the wiki https://wiki.opensourceecology.org/wiki/Zulucrypt&lt;br /&gt;
# I found that he had previously documented scattered articles about backups, luks, veracrypt, pgp, cybersec general, etc in a ton of different articles. So I spent some time adding categories and &amp;quot;see also&amp;quot; sections to those articles, in hopes he will be more easily able to do this in the future&lt;br /&gt;
# I also asked him to please document what he needed for himself 5 years from now into a README file next to the &#039;ose-veracrypt&#039; volume on his usb drive.&lt;br /&gt;
# Marcin confirmed that he was able to restore his ssh keys and ssh into hetzner3. awesome.&lt;br /&gt;
# ...&lt;br /&gt;
# I logged all my hours and sent an invoice to OSE for last month (Mar 2025)&lt;br /&gt;
# gah, I had obliterated half my 2025Q1 log. when I tried to restore it, I got a 413 error lgo&lt;br /&gt;
# I checked php and nginx; it&#039;s 10M. How did I write &amp;gt;10 MB of text in one quarter?&lt;br /&gt;
# there&#039;s too many layers on this server; I checked the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Apr 11 22:18:20.306872 2025] [:error] [pid 13182] [client 127.0.0.1:56606] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;], referer: https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&lt;br /&gt;
HTTP/1.1 413 Request Entity Too Large&lt;br /&gt;
Message: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413)&lt;br /&gt;
Apache-Error: [file &amp;quot;apache2_util.c&amp;quot;] [line 271] [level 3] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;]&lt;br /&gt;
127.0.0.1 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.0&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot;&lt;br /&gt;
146.70.199.124 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.1&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it&#039;s modsecurity?&lt;br /&gt;
# gah, that&#039;s a lot of files to review&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# find .  |grep -i security&lt;br /&gt;
./conf.d/mod_security.wordpress.include&lt;br /&gt;
./conf.d/mod_security.conf&lt;br /&gt;
./conf.modules.d/10-mod_security.conf&lt;br /&gt;
./modsecurity.d&lt;br /&gt;
./modsecurity.d/activated_rules&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_42_tight_security.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_35_bad_robots.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_45_trojans.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_48_local_exceptions.conf.example&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_bad_robots.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_23_request_limits.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_49_inbound_blocking.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_60_correlation.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound_malware.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_scanners.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_40_generic_attacks.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_50_outbound.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_47_common_exceptions.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_30_http_policy.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_20_protocol_violations.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_xss_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_59_outbound_blocking.conf&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf.20181024.orig&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf&lt;br /&gt;
./modsecurity.d/do_not_log_passwords.conf&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it&#039;s SecRequestBodyLimit http://stackoverflow.com/questions/13887812/ddg#14690797&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -irl &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf&lt;br /&gt;
modules/mod_security2.so&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s 13107200&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimit 13107200&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimitAction Reject&lt;br /&gt;
Binary file modules/mod_security2.so matches&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# docs say it&#039;s in bytes https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecRequestBodyLimit&lt;br /&gt;
# so 13107200 / 1024 / 1024 = 12.5 MB.&lt;br /&gt;
# jesus that&#039;s a lot of data; I&#039;m not gonna increase that in 4 places (nginx, apache, mod_security, php); let&#039;s just split it into two articles :(&lt;br /&gt;
# ...&lt;br /&gt;
# so Marcin is stressing urgancy to get Catarina a sandbox so she can rebuild osemain using some new theme that&#039;s not broken on the latest version of wordpress, php, etc on hetzner3&lt;br /&gt;
# I didn&#039;t want to do this site before the other less-priority ones, but it&#039;s just a sandbox&lt;br /&gt;
# I realized I never made a CHG file for osemain&lt;br /&gt;
# looks like I first did a snapshot Jan 31https://wiki.opensourceecology.org/wiki/Maltfield_Log/2025_Q1#Fri_Jan_31.2C_2025&lt;br /&gt;
# ugh, I just said I was &amp;quot;following the same guide as with the other sites&amp;quot;&lt;br /&gt;
## I was hoping to know which one to CHG to copy-from&lt;br /&gt;
## I guess it makes the most sense to copy from obi, which already has both a static and dynamic site setup (untested)&lt;br /&gt;
# ok, I made a first draft of our osemain CHG to migrate to hetnzer3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=308007</id>
		<title>Maltfield Log/2025 Q2</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=308007"/>
		<updated>2025-05-31T19:36:36Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: apr 27&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the second quarter of the year 2025. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 27, 2025=&lt;br /&gt;
# Tom created a GitHub account https://github.com/tgriff-ose&lt;br /&gt;
# I invited this new account to become a member of the official OSE GitHub org, and sent them an email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Tom,&lt;br /&gt;
&lt;br /&gt;
I&#039;ve invited you to join the official OSE GitHub org:&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/orgs/OpenSourceEcology&lt;br /&gt;
&lt;br /&gt;
Please check your GitHub notifications and accept the invite.&lt;br /&gt;
&lt;br /&gt;
PS: If you haven&#039;t yet, can you please enable 2FA on your GitHub account?&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/26/25 22:42, REDACTED@tutanota.com wrote:&lt;br /&gt;
&amp;gt; Account name: tgriff-ose&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; -- &lt;br /&gt;
&amp;gt; Tom Griffing&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Apr 27, 2025, 03:24 by REDACTED@disroot.org:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; GitHub is owned by Microsoft, and it&#039;s free (as in beer) to create an account.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Could you please create a free GitHub account?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt;&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt;&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; On 4/26/25 21:06, REDACTED@tutanota.com wrote:&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; Michael;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; I don&#039;t have a github account, as it&#039;s a Microsoft thing requiring a paid account. I don&#039;t intent to support them.&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; Is there any other way to access the ansible repo?&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; -- &lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; Tom Griffing &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin confirmed that he has not received a bill from AWS for some time, so it appears we did finally delete all of the glacier crap&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
I have not received another bill since January, so it looks like there is&lt;br /&gt;
nothing owed.&lt;br /&gt;
MJ&lt;br /&gt;
&lt;br /&gt;
On Sat, Apr 26, 2025 at 6:28 PM Michael Altfield &amp;lt;REDACTED&amp;gt;&lt;br /&gt;
wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Speaking of aws, can you confirm that your bill for last month was $0?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# I updated my wiki and osedev work logs for April so-far&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 26, 2025=&lt;br /&gt;
# Marcin authorized me to add Tom to our ops google groups mailing list and to give him access to our shared ose keepass&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Fri, Apr 25, 2025, 12:43 PM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; (re-sending without encryption)&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On 4/25/25 12:41, Michael Altfield wrote:&lt;br /&gt;
&amp;gt;&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Do you authorize:&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; 1. Giving Tom access to the shared OSE keepass file&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; 2. Adding Tom to the ops mailing list (this would allow him to password&lt;br /&gt;
&amp;gt;&amp;gt; reset many of our important accounts)&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Please let me know if you authorize the above.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom sent me his gpg public key, which I can use to add him to the wazuh emails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~$ gpg&lt;br /&gt;
gpg: WARNING: no command supplied.  Trying to guess what you mean ...&lt;br /&gt;
gpg: Go ahead and type your message ...&lt;br /&gt;
-----BEGIN PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
mQINBGgMJ7ABEACwllLJu87blFKJ8aZMR7pCjRzhhp266Rjxz7071iow43a7FkvN&lt;br /&gt;
pcXmYsuwW4dLhqA+Sose7Fjo9o9+7bOLcBAso9x9hk55+pDQm67wyXmxp+7pWVhj&lt;br /&gt;
hdLBsdB4faLQDHkHymKUs/UKRViN0an/6nARxVyah58Dh/OcnSIv0bnozze8YRJX&lt;br /&gt;
aklCs+OF2Jv+gBH5VWNMLloX+l+MsBYj9N14MsMeWJ8lSNFWBl/SOBGuOftZbljp&lt;br /&gt;
qb8dBZRo/4OR/Dr5zCUQ1KuPu2wFKfMRwi3NEdmUKpFf/U7Ydn7ZK2T+ZKl+x1eb&lt;br /&gt;
+0I0ZM0DgaTYTqd82wlag1hfrYM7SONYb0C03x5T4y+CsG9IchgQ2yihYIKgHOIW&lt;br /&gt;
Wiz6vC4N4EKmuKAqCOGS/gzp7xDqzXl2R2sWHyRuOn3yUr2z9HdDk2sjnobtaVli&lt;br /&gt;
wYaIoes9zrBgunLoK9S0FaHzSPX0FGwygV50E73BFxJBmL6eHeRVuYOi0FkAQmsN&lt;br /&gt;
dJeOvpCwKgBModyPbxin78KKbgF/0OnxWL+Zde6+J5l+aW81xbwNZYuyxWHSb7m3&lt;br /&gt;
2RM4dXhxAWM2cBQ5+b5yKopO8T4OzKl5C/rYzhuEYqpSEQJccFNHmQexkwqACVNl&lt;br /&gt;
h/D97jm0580ctnGCZuNzmLlsXX2mzqOj6UU2LlUFy0HT5tr93KBA+HkGhwARAQAB&lt;br /&gt;
tEBUb20gR3JpZmZpbmcgKE9TRSBQR1AgS2V5IDQtMjUtMjAyNSkgPHRvbS5ncmlm&lt;br /&gt;
ZmluZ0B0dXRhbm90YS5jb20+iQJRBBMBCgA7FiEEEzAJATSKmFEVZ5Fl+xN6Yz/R&lt;br /&gt;
60wFAmgMJ7ACGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQ+xN6Yz/R&lt;br /&gt;
60xHURAAqIUawudDI3dmIVPa/RHTOusoJA4KIXLNCMiILWd3iwZQFQNrt6YHpwJU&lt;br /&gt;
pyvsXAM4QWd/qt0D9IF6K9waOIA5ipX0yXFVxZ0V1BQ6aq3cK1r+NvQUcLJzS02W&lt;br /&gt;
T9UIJtHOs+8EbIIS6ybcnxS6RARinrJpTkoCWspWXMDnXcX3n4pbbhHQLViswf1C&lt;br /&gt;
tOE7uSfNPcxGLK4cYLxLL1VHC45eB2CTEAxfXSavCPI62IcYkZBdwWz7E8q1QpsP&lt;br /&gt;
vxgxe31b+v9NcaxW5tc2/4NwaObqKSZYlhK/pce3X18+uWzpmE3ubhPb7Ptb5GLo&lt;br /&gt;
42U9ymRFg7a14VFfq+wcwSlZR01o7Q2FofAOFpX+EoDBkughAX6hWyYxErJ4vD7k&lt;br /&gt;
ogYX25J5suxrixkTzDMJ0cCsZyt/Bu0liVnojaETUhrNUwBp7Rz7xx5x6Go/sZHK&lt;br /&gt;
mzhCe1q4xwSHeTZTjyG3oby4KDPgb0WEKCdUpa5BobgT9goGGXjCxe9dS8ZVUu4I&lt;br /&gt;
bso+h/SK95nmgsl/EDrmDXvWOh/Zy76GixCq48ydEkGbVz/6ri1+pD0NXYN/ijAu&lt;br /&gt;
h6EsLnoBLQCLlYYsBTfg31X2Sbzigeloy6iRWoHtCOAfI2Azdhby+BCGuSIvUOXa&lt;br /&gt;
Q4CQjmjYpsx7nwtjWOgCZ4rObTekj4O9ZnI8Gtxfpzy1gFdyfw65Ag0EaAwnsAEQ&lt;br /&gt;
ANnD6PMPT0CU1RqbAQtVw7eJksV96+tl/xG8mtje631n2uBe9WzyLch0fgC99eID&lt;br /&gt;
ZDGXfJUEdODuI9/H8037PnJmmMtP2eP1c/ztrql6pxPj9c0jIRWjtwmNhyYNaaEn&lt;br /&gt;
i0JyLz5SiTbuftlHXaKhVTuLc/Qp44FH5XK6LVHphDR8Ck43Mhj7enfvGvmAUgLW&lt;br /&gt;
OLQMst84oOCywYX+nUmov2rCIhuc6RhX4OcOBZcEA2W/CSsoNXR4To9mn8Gg3/dH&lt;br /&gt;
ZKS/3sDwJQxjFvkqc89+aTPY85TBoUGBUzbQG+KFQgDyVt4kABK1iyUA1PKZOb4Q&lt;br /&gt;
MZJnR9g0UI/ctfrOpz4hhEFaQ+rEYwdm5MSXOQGfjrnGu3t85IQzmxUXovqmfsjn&lt;br /&gt;
oFPSPd/91/rJJKxci+rCX7CpQSObPrwHNgPNQ5zleDV7d9/u9UaGRFeOaaM+abd0&lt;br /&gt;
RhPh4nJWbDdNOWpj3pxJkG3tzmbazBogxTq0SDRP8wvBAD0JYESoPVGWQ6czlTnu&lt;br /&gt;
T0ov9QKMb21mfUQ6DmfxTFQbkr1g1r2uYfJ1TbP0AcAK+Q/IMtt8F7chulfAe7/0&lt;br /&gt;
9nk7HwqWHTkj8+YB9+Ro2hkUTpL57uEYdG/ukGODfTNhu02wxG02zlYFsTyd/H62&lt;br /&gt;
VIgT1Cpf5HBb73lzdiSVtl45C34Fwu8ZO6dBdmk2c1nFABEBAAGJAjYEGAEKACAW&lt;br /&gt;
IQQTMAkBNIqYURVnkWX7E3pjP9HrTAUCaAwnsAIbDAAKCRD7E3pjP9HrTNxGD/wN&lt;br /&gt;
syvVZxm4hyw4l8U6J3B/3rKAup+l7GQCXthNK+f3YPwWdWc8DOo3kBrP4ppR5Ry9&lt;br /&gt;
YKb700wBDAYwWfy+ZJPHMi0vVUf8kX2QQEj4sFZHj9suTFvfLdsLTAhNtRXVtZiu&lt;br /&gt;
xfr1T3R3T0XSSFFdhiBO+BYRnlgFRiiR9FCTDaxrLRfhAhOwC6LHOarHnRi5nQS8&lt;br /&gt;
2PaHIYbWN7c5CdpH9dsPUt3xi1sEf8E87HTZo30Of/FYtB4eTOdx2DMqKscbJvZS&lt;br /&gt;
1ugK+2v7DMaiBMZCfbZSVNjn8+VcTOPW5KzJFsVR7UmfvTZu6c3jrshHuPOSguT7&lt;br /&gt;
l63AcfrJZOJe+djndWws2u0FpyMu0AHoS2r3EtBd/OydjEKG2P7qFb3KX9I9Tv35&lt;br /&gt;
zQmpHc4e2TJTYKpXyfarzgKFuUfOmZpm8maUTqFdEBL6pgwi1zcQ704g7Kzo/YUr&lt;br /&gt;
dHTA5yQ2WBBsrVKAZIt6Llkt0jIkpSyjjs5CAPJ2jsg61nq4uYw7w3jpwe80nbyc&lt;br /&gt;
7GgvdkJlTS7TfcYk3vlDQOQBpXqDZagQVUT8jc6mGiY/jbSzjGNt/8qObKSywFLY&lt;br /&gt;
XnxLVnGhKyzsWhR5fEbUCqywwc/c14gbjNguNZbU7e0Krf9ggYoglfPIOOp8XDX1&lt;br /&gt;
XwH+EXkSGW96dHXIYidONcMxClnA04zZY52Sr/r6Lw==&lt;br /&gt;
=UsaD&lt;br /&gt;
-----END PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
pub   rsa4096 2025-04-26 [SC]&lt;br /&gt;
	  13300901348A985115679165FB137A633FD1EB4C&lt;br /&gt;
uid           Tom Griffing (OSE PGP Key 4-25-2025) &amp;lt;REDACTED@tutanota.com&amp;gt;&lt;br /&gt;
sub   rsa4096 2025-04-26 [E]&lt;br /&gt;
user@ose:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I added Tom to the wazuh recipients, per https://wiki.opensourceecology.org/wiki/Wazuh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir -p /var/tmp/gpg&lt;br /&gt;
pushd /var/tmp/gpg&lt;br /&gt;
# write multi-line to file for documentation copy &amp;amp; paste&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; /var/tmp/gpg/tom.pubkey.asc&lt;br /&gt;
-----BEGIN PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
mQINBGgMJ7ABEACwllLJu87blFKJ8aZMR7pCjRzhhp266Rjxz7071iow43a7FkvN&lt;br /&gt;
pcXmYsuwW4dLhqA+Sose7Fjo9o9+7bOLcBAso9x9hk55+pDQm67wyXmxp+7pWVhj&lt;br /&gt;
hdLBsdB4faLQDHkHymKUs/UKRViN0an/6nARxVyah58Dh/OcnSIv0bnozze8YRJX&lt;br /&gt;
aklCs+OF2Jv+gBH5VWNMLloX+l+MsBYj9N14MsMeWJ8lSNFWBl/SOBGuOftZbljp&lt;br /&gt;
qb8dBZRo/4OR/Dr5zCUQ1KuPu2wFKfMRwi3NEdmUKpFf/U7Ydn7ZK2T+ZKl+x1eb&lt;br /&gt;
+0I0ZM0DgaTYTqd82wlag1hfrYM7SONYb0C03x5T4y+CsG9IchgQ2yihYIKgHOIW&lt;br /&gt;
Wiz6vC4N4EKmuKAqCOGS/gzp7xDqzXl2R2sWHyRuOn3yUr2z9HdDk2sjnobtaVli&lt;br /&gt;
wYaIoes9zrBgunLoK9S0FaHzSPX0FGwygV50E73BFxJBmL6eHeRVuYOi0FkAQmsN&lt;br /&gt;
dJeOvpCwKgBModyPbxin78KKbgF/0OnxWL+Zde6+J5l+aW81xbwNZYuyxWHSb7m3&lt;br /&gt;
2RM4dXhxAWM2cBQ5+b5yKopO8T4OzKl5C/rYzhuEYqpSEQJccFNHmQexkwqACVNl&lt;br /&gt;
h/D97jm0580ctnGCZuNzmLlsXX2mzqOj6UU2LlUFy0HT5tr93KBA+HkGhwARAQAB&lt;br /&gt;
tEBUb20gR3JpZmZpbmcgKE9TRSBQR1AgS2V5IDQtMjUtMjAyNSkgPHRvbS5ncmlm&lt;br /&gt;
ZmluZ0B0dXRhbm90YS5jb20+iQJRBBMBCgA7FiEEEzAJATSKmFEVZ5Fl+xN6Yz/R&lt;br /&gt;
60wFAmgMJ7ACGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQ+xN6Yz/R&lt;br /&gt;
60xHURAAqIUawudDI3dmIVPa/RHTOusoJA4KIXLNCMiILWd3iwZQFQNrt6YHpwJU&lt;br /&gt;
pyvsXAM4QWd/qt0D9IF6K9waOIA5ipX0yXFVxZ0V1BQ6aq3cK1r+NvQUcLJzS02W&lt;br /&gt;
T9UIJtHOs+8EbIIS6ybcnxS6RARinrJpTkoCWspWXMDnXcX3n4pbbhHQLViswf1C&lt;br /&gt;
tOE7uSfNPcxGLK4cYLxLL1VHC45eB2CTEAxfXSavCPI62IcYkZBdwWz7E8q1QpsP&lt;br /&gt;
vxgxe31b+v9NcaxW5tc2/4NwaObqKSZYlhK/pce3X18+uWzpmE3ubhPb7Ptb5GLo&lt;br /&gt;
42U9ymRFg7a14VFfq+wcwSlZR01o7Q2FofAOFpX+EoDBkughAX6hWyYxErJ4vD7k&lt;br /&gt;
ogYX25J5suxrixkTzDMJ0cCsZyt/Bu0liVnojaETUhrNUwBp7Rz7xx5x6Go/sZHK&lt;br /&gt;
mzhCe1q4xwSHeTZTjyG3oby4KDPgb0WEKCdUpa5BobgT9goGGXjCxe9dS8ZVUu4I&lt;br /&gt;
bso+h/SK95nmgsl/EDrmDXvWOh/Zy76GixCq48ydEkGbVz/6ri1+pD0NXYN/ijAu&lt;br /&gt;
h6EsLnoBLQCLlYYsBTfg31X2Sbzigeloy6iRWoHtCOAfI2Azdhby+BCGuSIvUOXa&lt;br /&gt;
Q4CQjmjYpsx7nwtjWOgCZ4rObTekj4O9ZnI8Gtxfpzy1gFdyfw65Ag0EaAwnsAEQ&lt;br /&gt;
ANnD6PMPT0CU1RqbAQtVw7eJksV96+tl/xG8mtje631n2uBe9WzyLch0fgC99eID&lt;br /&gt;
ZDGXfJUEdODuI9/H8037PnJmmMtP2eP1c/ztrql6pxPj9c0jIRWjtwmNhyYNaaEn&lt;br /&gt;
i0JyLz5SiTbuftlHXaKhVTuLc/Qp44FH5XK6LVHphDR8Ck43Mhj7enfvGvmAUgLW&lt;br /&gt;
OLQMst84oOCywYX+nUmov2rCIhuc6RhX4OcOBZcEA2W/CSsoNXR4To9mn8Gg3/dH&lt;br /&gt;
ZKS/3sDwJQxjFvkqc89+aTPY85TBoUGBUzbQG+KFQgDyVt4kABK1iyUA1PKZOb4Q&lt;br /&gt;
MZJnR9g0UI/ctfrOpz4hhEFaQ+rEYwdm5MSXOQGfjrnGu3t85IQzmxUXovqmfsjn&lt;br /&gt;
oFPSPd/91/rJJKxci+rCX7CpQSObPrwHNgPNQ5zleDV7d9/u9UaGRFeOaaM+abd0&lt;br /&gt;
RhPh4nJWbDdNOWpj3pxJkG3tzmbazBogxTq0SDRP8wvBAD0JYESoPVGWQ6czlTnu&lt;br /&gt;
T0ov9QKMb21mfUQ6DmfxTFQbkr1g1r2uYfJ1TbP0AcAK+Q/IMtt8F7chulfAe7/0&lt;br /&gt;
9nk7HwqWHTkj8+YB9+Ro2hkUTpL57uEYdG/ukGODfTNhu02wxG02zlYFsTyd/H62&lt;br /&gt;
VIgT1Cpf5HBb73lzdiSVtl45C34Fwu8ZO6dBdmk2c1nFABEBAAGJAjYEGAEKACAW&lt;br /&gt;
IQQTMAkBNIqYURVnkWX7E3pjP9HrTAUCaAwnsAIbDAAKCRD7E3pjP9HrTNxGD/wN&lt;br /&gt;
syvVZxm4hyw4l8U6J3B/3rKAup+l7GQCXthNK+f3YPwWdWc8DOo3kBrP4ppR5Ry9&lt;br /&gt;
YKb700wBDAYwWfy+ZJPHMi0vVUf8kX2QQEj4sFZHj9suTFvfLdsLTAhNtRXVtZiu&lt;br /&gt;
xfr1T3R3T0XSSFFdhiBO+BYRnlgFRiiR9FCTDaxrLRfhAhOwC6LHOarHnRi5nQS8&lt;br /&gt;
2PaHIYbWN7c5CdpH9dsPUt3xi1sEf8E87HTZo30Of/FYtB4eTOdx2DMqKscbJvZS&lt;br /&gt;
1ugK+2v7DMaiBMZCfbZSVNjn8+VcTOPW5KzJFsVR7UmfvTZu6c3jrshHuPOSguT7&lt;br /&gt;
l63AcfrJZOJe+djndWws2u0FpyMu0AHoS2r3EtBd/OydjEKG2P7qFb3KX9I9Tv35&lt;br /&gt;
zQmpHc4e2TJTYKpXyfarzgKFuUfOmZpm8maUTqFdEBL6pgwi1zcQ704g7Kzo/YUr&lt;br /&gt;
dHTA5yQ2WBBsrVKAZIt6Llkt0jIkpSyjjs5CAPJ2jsg61nq4uYw7w3jpwe80nbyc&lt;br /&gt;
7GgvdkJlTS7TfcYk3vlDQOQBpXqDZagQVUT8jc6mGiY/jbSzjGNt/8qObKSywFLY&lt;br /&gt;
XnxLVnGhKyzsWhR5fEbUCqywwc/c14gbjNguNZbU7e0Krf9ggYoglfPIOOp8XDX1&lt;br /&gt;
XwH+EXkSGW96dHXIYidONcMxClnA04zZY52Sr/r6Lw==&lt;br /&gt;
=UsaD&lt;br /&gt;
-----END PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
EOF&lt;br /&gt;
gpg --homedir /var/ossec/.gnupg --import /var/tmp/gpg/tom.pubkey.asc&lt;br /&gt;
popd&lt;br /&gt;
&lt;br /&gt;
# add marcin&#039;s email (that matches an email on a UID of his key above) to the space-delimited &amp;quot;recipients&amp;quot; variable&lt;br /&gt;
vim /var/ossec/sent_encrypted_alarm.settings&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I sent him an email asking him to confirm that it&#039;s working&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Tom,&lt;br /&gt;
&lt;br /&gt;
Can you please confirm that you&#039;re now receiving alerts from wazuh?&lt;br /&gt;
&lt;br /&gt;
Wazuh is our HIDS (Host-Based Intrusion Detection System). It&#039;s a fork of the HIDS and FIM (File Integrity Monitor) OSSEC. Because it sometimes sends sensitive information (eg diffs of config files with passwords), it&#039;s important that we encrypt its email notifications end-to-end with PGP.&lt;br /&gt;
&lt;br /&gt;
And because someone who compromises the server could &amp;quot;clean up&amp;quot; after themselves, these (off-server) alerts are critical to post-compromise investigations.&lt;br /&gt;
&lt;br /&gt;
For more info, see:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Wazuh&lt;br /&gt;
 * https://en.wikipedia.org/wiki/OSSEC&lt;br /&gt;
 * https://documentation.wazuh.com/current/getting-started/index.html&lt;br /&gt;
&lt;br /&gt;
Out-of-the-box, Wazuh has a ton of features, but probably where we use it the most is its ingestion of apache&#039;s mod_security WAF and its tie-in to Wazuh&#039;s Active Response. If an IP is found doing something bad (eg multiple consecutive 403 responses, such as a brute-force attack on wordpress [or ssh]), then the IP will get temp blocked by the firewall for 10 minutes. If it does it again shortly after the ban is lifted, it&#039;ll be banned for 12 hours. If again, 1 day. Then 2 days. Then 4 days. And the max ban for 5x repeat offenses is 8 days&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/OpenSourceEcology/ansible/blob/master/hetzner3/roles/maltfield.wazuh/templates/ossec.conf.j2#L256-L271&lt;br /&gt;
&lt;br /&gt;
It also has rootkit detection, and lots of other useful alerts that &amp;quot;just work&amp;quot; out of the box.&lt;br /&gt;
&lt;br /&gt;
Please confirm that you&#039;re now receiving encrypted wazuh alerts.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried to add Tom to our ops google groups email list, but it said I wasn&#039;t allowed to add members outside of our google workspace&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
An error occurred&lt;br /&gt;
1 user is outside of your organization. Based on your group or organization settings, you can only add organization users to this group. Contact your group owner or domain administrator for help.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I checked our user&#039;s group. it appears that Tom doesn&#039;t have an account @opensourceecology.org in gsuite&lt;br /&gt;
# I found the setting to change that here https://admin.google.com/ac/managedsettings/864450622151/GROUPS_SHARING_SETTINGS_TAB&lt;br /&gt;
## https://support.google.com/a/thread/63692725/&lt;br /&gt;
## https://support.google.com/a/answer/167097&lt;br /&gt;
# I checked the box that said &amp;quot;Group owners can allow external members&amp;quot;&lt;br /&gt;
## curiously the subline said &amp;quot;Organization admins can always add external members&amp;quot; – but I&#039;m a damn org admin, and I couldn&#039;t add him :/&lt;br /&gt;
# I tried to add him again, but I got the same error&lt;br /&gt;
# this time I went to the group settings https://groups.google.com/a/opensourceecology.org/g/REDACTED/settings&lt;br /&gt;
# I found the &amp;quot;allow external members&amp;quot; and changed it from &amp;quot;off&amp;quot; to &amp;quot;on&amp;quot; and clicked &amp;quot;save changes&amp;quot;&lt;br /&gt;
## this wasn&#039;t possible before. So first I had to change the workspace-wide settings to allow me to change the groups-specific settings. now it&#039;s changed.&lt;br /&gt;
# this time it worked.&lt;br /&gt;
# I sent an email to our ops google group, asking Tom to reply if he saw it&lt;br /&gt;
# ...&lt;br /&gt;
# I checked-in on hetzner2 to make sure it rebooted this morning&lt;br /&gt;
# looks like the cron is set to reboot at 10:40 UTC every day, and – indeed – uptime says it&#039;s been online for a bit less than 13 hours. And its last boot time was today at 10:41:25&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# uptime&lt;br /&gt;
 23:30:25 up 12:49,  7 users,  load average: 1.02, 0.98, 0.74&lt;br /&gt;
[root@opensourceecology ~]# journalctl | head&lt;br /&gt;
-- Logs begin at Sat 2025-04-26 10:41:25 UTC, end at Sat 2025-04-26 23:30:26 UTC. --&lt;br /&gt;
Apr 26 10:41:25 localhost systemd-journal[129]: Runtime journal is using 8.0M (max allowed 3.1G, trying to leave 4.0G free of 31.2G available → current limit 3.1G).&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpuset&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpu&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpuacct&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Linux version 3.10.0-1160.119.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Tue Jun 4 14:43:51 UTC 2024&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: e820: BIOS-provided physical RAM map:&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009c7ff] usable&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: BIOS-e820: [mem 0x000000000009c800-0x000000000009ffff] reserved&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /etc/cron.d/reboot &lt;br /&gt;
# 2025-04-24: temp hack for unstable hetzner2 while we build-out hetzner3 to replace it&lt;br /&gt;
40 10 * * * root /sbin/reboot&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Sat Apr 26 23:31:32 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like we&#039;ll have ~2 minutes of downtime every day in the very early morning in the US. I can live with that.&lt;br /&gt;
# and grub clearly is fixed&lt;br /&gt;
# oh, also the RAID looks healthy&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I asked Tom for his GitHub account profile username, so I can grant him write access to our OSE ansible repo&lt;br /&gt;
# I updated Tom&#039;s new ssh key to his authorized_keys file on hetzner2&lt;br /&gt;
# I sent Tom an email asking to confirm his access to hetzner2&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 25, 2025=&lt;br /&gt;
# I woke up this morning and discovered the wiki was offline&lt;br /&gt;
# I tried to ssh into the server; it&#039;s not responding&lt;br /&gt;
# I figured I&#039;d log into the hetzner wui, but – uhh – the credentials are in keepass and live on the server&lt;br /&gt;
# I mitigated this by giving Marcin a copy of the keepass file on his veracrypt drive, but he since changed the password a month or two ago, and we don&#039;t have a new local copy&lt;br /&gt;
# I sent an email to Marcin asking him to login to hetzner wui and boot hetzner2. if it doesn&#039;t come-up, then I&#039;ll have to get the password from him so I can load it in the wui from a rescue disk&lt;br /&gt;
# oh, I did find the new hetzner password in my personal keepass&lt;br /&gt;
# I logged-in, and I found the server was listed as being on. But I can&#039;t ping it. I gave it an &amp;quot;automatic hardware reset&amp;quot; from the wui&lt;br /&gt;
# I&#039;ll give it a few minutes before trying the rescue system&lt;br /&gt;
# their rescue systems are much nicer for their cloud product than their dedicated server product&lt;br /&gt;
# it looks like I have two options&lt;br /&gt;
## rescue boot mode: where I&#039;m given ssh access&lt;br /&gt;
## vnc&lt;br /&gt;
# the problem with the rescue boot is that – if this is a grub issue – I wouldn&#039;t be able to &amp;quot;see&amp;quot; the error&lt;br /&gt;
# I enabled VNC and gave the server a reboot&lt;br /&gt;
# I was able to connect via vnc, but it was the damn installation wizard for almalinux. I quit the installation, and the vnc session died.&lt;br /&gt;
# damn, I guess vnc won&#039;t let me see the boot process, after all&lt;br /&gt;
# instead I tried the &amp;quot;rescue system&amp;quot;&lt;br /&gt;
# that didn&#039;t work; I can&#039;t access ssh on either of the IP addresses&lt;br /&gt;
# the docs say to activate the rescue system and then reboot it; that&#039;s what I did https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system/&lt;br /&gt;
# this time I fully shut down the server, and then I enabled the rescue system (while it&#039;s off)&lt;br /&gt;
# I went back to the Reset tab, and it&#039;s still off. So I booted it&lt;br /&gt;
# somehow I was able to login from my ose vm using my personal ssh key, but with user root&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~$ ssh -v root@138.201.84.223&lt;br /&gt;
OpenSSH_9.2p1 Debian-2+deb12u5, OpenSSL 3.0.15 3 Sep 2024&lt;br /&gt;
debug1: Reading configuration data /home/user/.ssh/config&lt;br /&gt;
debug1: Reading configuration data /etc/ssh/ssh_config&lt;br /&gt;
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files&lt;br /&gt;
debug1: /etc/ssh/ssh_config line 21: Applying options for *&lt;br /&gt;
debug1: Connecting to 138.201.84.223 [138.201.84.223] port 22.&lt;br /&gt;
debug1: Connection established.&lt;br /&gt;
...&lt;br /&gt;
Linux rescue 6.12.19 #1 SMP Fri Mar 14 05:34:52 UTC 2025 x86_64&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
  Welcome to the Hetzner Rescue System.&lt;br /&gt;
&lt;br /&gt;
  This Rescue System is based on Debian GNU/Linux 12 (bookworm) with a custom kernel.&lt;br /&gt;
  You can install software like you would in a normal system.&lt;br /&gt;
&lt;br /&gt;
  To install a new operating system from one of our prebuilt images, run &#039;installimage&#039; and follow the instructions.&lt;br /&gt;
&lt;br /&gt;
  Important note: Any data that was not written to the disks will be lost during a reboot.&lt;br /&gt;
&lt;br /&gt;
  For additional information, check the following resources:&lt;br /&gt;
	Rescue System:           https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system&lt;br /&gt;
	Installimage:            https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage&lt;br /&gt;
	Install custom software: https://docs.hetzner.com/robot/dedicated-server/operating-systems/installing-custom-images&lt;br /&gt;
	other articles:          https://docs.hetzner.com/robot&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Rescue System (via Legacy/CSM) up since 2025-04-25 17:24 +02:00&lt;br /&gt;
&lt;br /&gt;
Hardware data:&lt;br /&gt;
&lt;br /&gt;
   CPU1: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Cores 8)&lt;br /&gt;
   Memory:  64153 MB (Non-ECC)&lt;br /&gt;
   Disk /dev/sda: 250 GB (=&amp;gt; 232 GiB) &lt;br /&gt;
   Disk /dev/sdb: 512 GB (=&amp;gt; 476 GiB) &lt;br /&gt;
   Total capacity 709 GiB with 2 Disks&lt;br /&gt;
&lt;br /&gt;
Network data:&lt;br /&gt;
   eth0  LINK: yes&lt;br /&gt;
		 MAC:  90:1b:0e:94:07:c4&lt;br /&gt;
		 IP:   138.201.84.223&lt;br /&gt;
		 IPv6: 2a01:4f8:172:209e::2/64&lt;br /&gt;
		 Intel(R) PRO/1000 Network Driver&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I was able to mount the root drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 0/2 pages [0KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
root@rescue ~ # mount /dev/md2 /mnt&lt;br /&gt;
root@rescue ~ # ls /mnt&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ # ls /mnt/home&lt;br /&gt;
b2user  crupp  hart     lberezhny  marcin      stagingsync  wp&lt;br /&gt;
cmota   Flipo  jthomas  maltfield  not-apache  tgriffing&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what the point of this is; I can&#039;t fix it if I can&#039;t watch it boot and see what&#039;s breaking&lt;br /&gt;
# ok, at the bottom of the docs, hetnzer lists another option = xKVM Rescue System https://docs.hetzner.com/robot/dedicated-server/virtualization/vkvm/&lt;br /&gt;
# it specifically says that&#039;s for debugging boot issues&lt;br /&gt;
# last thing before I try that: I downloaded a local copy of the keepass files from hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~/tmp/hetzner2$ rsync -av --progress root@138.201.84.223:/mnt/etc/keepass ./etc-keepass-20250525&lt;br /&gt;
receiving incremental file list&lt;br /&gt;
created directory ./etc-keepass-20250525&lt;br /&gt;
keepass/&lt;br /&gt;
keepass/passwords.kdbx&lt;br /&gt;
		 46,142 100%   44.00MB/s    0:00:00 (xfr#1, to-chk=6/8)&lt;br /&gt;
keepass/passwords.kdbx.20170728.bak&lt;br /&gt;
		  4,590 100%    4.38MB/s    0:00:00 (xfr#2, to-chk=5/8)&lt;br /&gt;
keepass/passwords.kdbx.20170804.bak&lt;br /&gt;
		  4,590 100%    4.38MB/s    0:00:00 (xfr#3, to-chk=4/8)&lt;br /&gt;
keepass/passwords.kdbx.20190820.bak&lt;br /&gt;
		 33,726 100%  143.20kB/s    0:00:00 (xfr#4, to-chk=3/8)&lt;br /&gt;
keepass/passwords.kdbx.20190909.bak&lt;br /&gt;
		 34,238 100%   71.75kB/s    0:00:00 (xfr#5, to-chk=2/8)&lt;br /&gt;
keepass/passwords.kdbx.20250316.bak&lt;br /&gt;
		 45,406 100%   94.55kB/s    0:00:00 (xfr#6, to-chk=1/8)&lt;br /&gt;
keepass/passwords.kdbxs.20180525.bak&lt;br /&gt;
		 27,102 100%   56.31kB/s    0:00:00 (xfr#7, to-chk=0/8)&lt;br /&gt;
&lt;br /&gt;
sent 161 bytes  received 196,407 bytes  35,739.64 bytes/sec&lt;br /&gt;
total size is 195,794  speedup is 1.00&lt;br /&gt;
user@ose:~/tmp/hetzner2$ &lt;br /&gt;
&lt;br /&gt;
user@ose:~/tmp/hetzner2$ du -sh etc-keepass-20250525/keepass/*&lt;br /&gt;
48K	etc-keepass-20250525/keepass/passwords.kdbx&lt;br /&gt;
8.0K	etc-keepass-20250525/keepass/passwords.kdbx.20170728.bak&lt;br /&gt;
8.0K	etc-keepass-20250525/keepass/passwords.kdbx.20170804.bak&lt;br /&gt;
36K	etc-keepass-20250525/keepass/passwords.kdbx.20190820.bak&lt;br /&gt;
36K	etc-keepass-20250525/keepass/passwords.kdbx.20190909.bak&lt;br /&gt;
48K	etc-keepass-20250525/keepass/passwords.kdbx.20250316.bak&lt;br /&gt;
28K	etc-keepass-20250525/keepass/passwords.kdbxs.20180525.bak&lt;br /&gt;
user@ose:~/tmp/hetzner2$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so this time was the same as the rescue system, except I choose &amp;quot;xKVM&amp;quot; instead of &amp;quot;Linux&amp;quot; in the &amp;quot;Operationg System&amp;quot; dropdown&lt;br /&gt;
# strange, it gave me an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Public key authentication is not available for the selected operating system.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I unselected my ssh key, and chose &amp;quot;no key&amp;quot; instead&lt;br /&gt;
# it gave me a URL and a password. I booted the server, but the URL didn&#039;t load (&amp;quot;Unable to connect&amp;quot; error)&lt;br /&gt;
# ok, it took a few minutes and had a self-signed cert&lt;br /&gt;
# I bypassed the cert error, and entered the username and password into the basic auth popup. It failed! Could I really have been MITM&#039;d?&lt;br /&gt;
# I immediately shut down the server from the wui, and I tried again.&lt;br /&gt;
# this time I was able to login – both from ssh and in the wui.&lt;br /&gt;
# as soon as it opened, I saw the error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
No more network devices&lt;br /&gt;
&lt;br /&gt;
Booting from Hard Disk...&lt;br /&gt;
.&lt;br /&gt;
error: symbol &#039;grub_calloc&#039; not found.&lt;br /&gt;
Entering rescue mode...&lt;br /&gt;
grub rescue&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I wonder if this is grub or grub2. I didn&#039;t have a binary &amp;quot;grub-install&amp;quot; before. I assumed it was an error with the hetzner docs when I did &amp;quot;grub2-install&amp;quot; instead, which said it worked (there was a warning that the docs said were safe to ignore)&lt;br /&gt;
# curoiusly, the opposite is true for the ssh session in vkvm: I have grub-install but not grub2-install&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@vKVM-rescue ~ # which grub-install&lt;br /&gt;
/usr/sbin/grub-install&lt;br /&gt;
root@vKVM-rescue ~ # &lt;br /&gt;
root@vKVM-rescue ~ # which grub2-install&lt;br /&gt;
root@vKVM-rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the docs in question https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
# I don&#039;t want to fuck with the grub without first taking a backup of these disks. But, uh, it looks like I can&#039;t access the RAID from inside this vkvm setup&lt;br /&gt;
# yeah, that&#039;s one of the limitations listed for VKVM https://docs.hetzner.com/robot/dedicated-server/virtualization/vkvm/#raid-controllers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Configured units are passed through as SCSI devices to the VM. However it is not possible to access the controller. Please use the regular Hetzner Rescue System for this purpose.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I shutdown VKVM and booted it into the regular rescue mode&lt;br /&gt;
# it took a few minutes to get back into the old rescue system, but here I can use the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS&lt;br /&gt;
loop0     7:0    0   3.4G  1 loop  &lt;br /&gt;
sda       8:0    0 476.9G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 &lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 &lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 &lt;br /&gt;
sdb       8:16   0 232.9G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 &lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 &lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 &lt;br /&gt;
root@rescue ~ # mkdir /mnt/md1&lt;br /&gt;
root@rescue ~ # mkdir /mnt/md2&lt;br /&gt;
root@rescue ~ # mount /dev/md1 /mnt/md1&lt;br /&gt;
root@rescue ~ # mount /dev/md2 /mnt/md2&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created a dir for these backups&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # mkdir /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # chown root:root /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # chmod 0700 /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first I made a backup from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # rsync -av --progress /mnt/md1 /mnt/md2/var/tmp/20250425-grub-fail/md1.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
...&lt;br /&gt;
md1/grub2/locale/zh_TW.mo&lt;br /&gt;
		 30,882 100%   31.38kB/s    0:00:00 (xfr#345, to-chk=0/355)&lt;br /&gt;
md1/lost+found/&lt;br /&gt;
&lt;br /&gt;
sent 399,450,301 bytes  received 6,709 bytes  159,782,804.00 bytes/sec&lt;br /&gt;
total size is 399,330,989  speedup is 1.00&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# then I figured I&#039;d make a backup of the two disk partitions directly, but I couldn&#039;t even mount it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # umount /mnt/md1&lt;br /&gt;
root@rescue ~ # mkdir /mnt/sda2&lt;br /&gt;
root@rescue ~ # mkdir /mnt/sdb2&lt;br /&gt;
root@rescue ~ # mount /dev/sda2 /mnt/sda2&lt;br /&gt;
mount: /mnt/sda2: unknown filesystem type &#039;linux_raid_member&#039;.&lt;br /&gt;
	   dmesg(1) may have more information after failed mount system call.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried this command (from the docs), which I skipped before because it said that the next command (grub-install) was enough; sure enough, it didn&#039;t work https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-mkdevicemap -n&lt;br /&gt;
grub-mkdevicemap: error: cannot open /boot/grub/device.map.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I investigated this before, and I thought I decided we&#039;re using grub2, not grub1&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # mount /dev/md1 /mnt/md1&lt;br /&gt;
root@rescue ~ # ls /mnt/md1/&lt;br /&gt;
config-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi&lt;br /&gt;
grub&lt;br /&gt;
grub2&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-327.18.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-514.26.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64kdump.img&lt;br /&gt;
initrd-plymouth.img&lt;br /&gt;
lost+found&lt;br /&gt;
symvers-3.10.0-1127.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-1160.119.1.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-327.18.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-514.26.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, shit, even the grub-install command is v2 https://askubuntu.com/questions/107486/how-to-know-the-version-of-grub&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-install --version&lt;br /&gt;
grub-install (GRUB) 2.06-13+deb12u1&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, this indicates we&#039;re not using lilo https://askubuntu.com/questions/24459/how-do-i-find-out-which-boot-loader-i-have&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2/etc/ | grep lilo&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can dd straight from the disk to read the MBR. And, yeah, it appears we are using grub via MBR .. and this info is stored on the disks, not the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # dd if=/dev/md1 bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # dd if=/dev/sda bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
214fb5736d1e5ad63e515dc2fffe44bd928cd8dab2c019dc11fb9fcaef5ea90dbf51f1ac507ab1cfbbe74ff&lt;br /&gt;
ZRr=&lt;br /&gt;
`|f	&lt;br /&gt;
\|f1&lt;br /&gt;
GRUB &lt;br /&gt;
Geom&lt;br /&gt;
Hard Disk&lt;br /&gt;
Read&lt;br /&gt;
 Error&lt;br /&gt;
DA/jjF&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # dd if=/dev/sdb bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
ZRr=&lt;br /&gt;
`|f	&lt;br /&gt;
\|f1&lt;br /&gt;
GRUB &lt;br /&gt;
Geom&lt;br /&gt;
Hard Disk&lt;br /&gt;
Read&lt;br /&gt;
 Error&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# idk what to do; I tried the grub-install again, but it gives me this error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-install /dev/sda&lt;br /&gt;
grub-install: error: /usr/lib/grub/i386-pc/modinfo.sh doesn&#039;t exist. Please specify --target or --directory.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # grub-install /dev/sdb&lt;br /&gt;
grub-install: error: /usr/lib/grub/i386-pc/modinfo.sh doesn&#039;t exist. Please specify --target or --directory.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried creating a chroot of our real raid disks first&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ # umount /mnt/md1&lt;br /&gt;
root@rescue ~ # chroot-prepare /mnt/md2&lt;br /&gt;
root@rescue ~ # chroot /mnt/md2&lt;br /&gt;
root@rescue / # ls /boot&lt;br /&gt;
root@rescue / # mount /dev/md1 /boot&lt;br /&gt;
root@rescue / # ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi&lt;br /&gt;
grub&lt;br /&gt;
grub2&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-327.18.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-514.26.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64kdump.img&lt;br /&gt;
initrd-plymouth.img&lt;br /&gt;
lost+found&lt;br /&gt;
symvers-3.10.0-1127.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-1160.119.1.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-327.18.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-514.26.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
root@rescue / # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I then tried the grub install again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue / # grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
root@rescue / #&lt;br /&gt;
&lt;br /&gt;
root@rescue / # grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
root@rescue / # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I exited the chroot and shutdown the rescue system&lt;br /&gt;
# I activated the VKVM resuce system, and booted it again&lt;br /&gt;
# when I connected to the KVM wui, I was shown a password prompt. So I think booting works!&lt;br /&gt;
# I rebooted it from the ssh&lt;br /&gt;
# and now I can ssh into the real system&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~$ autossh opensourceecology.org&lt;br /&gt;
Last login: Thu Apr 24 23:12:44 2025 from 146.70.199.15&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and now the wiki loads too&lt;br /&gt;
# I did another reboot test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ sudo su -&lt;br /&gt;
[sudo] password for maltfield: &lt;br /&gt;
Last login: Thu Apr 24 16:25:15 UTC 2025 on pts/0&lt;br /&gt;
[root@opensourceecology ~]# reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
Last login: Fri Apr 25 16:29:21 2025 from 185.204.1.184&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# idk, my takeaway is that either one or some of these assumptions are correct&lt;br /&gt;
## grub-install needs to be run *after* the RAID sync is finished&lt;br /&gt;
## grub-install needs to be run on *both* the new *and* the old disk&lt;br /&gt;
## grub-install needs to be run inside a chroot on the rescue system&lt;br /&gt;
# anyway, we&#039;re stable again&lt;br /&gt;
# I got an email from Marcin saying Tom could help with the migrations. I sent him some wiki articles to get caught-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Tom,&lt;br /&gt;
&lt;br /&gt;
I&#039;ll try to get you ssh access on hetzner2 soon. In the meantime, please read the following articles:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Hetzner2&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
&lt;br /&gt;
I&#039;ve started preparing draft &amp;quot;change tickets&amp;quot; for migrating each of the websites from hetzner2 to hetzner3. Note that some of these are not fully tested, so you&#039;ll want to execute them manually and make corrections as-needed.&lt;br /&gt;
&lt;br /&gt;
Please also read-through these:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_store_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_microfactory_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_fef&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_oswh&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_phplist_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_wiki_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
(There&#039;s also one CHG for the forum that I think needs to be made)&lt;br /&gt;
&lt;br /&gt;
The next item TODO is to finish the migration plan for these websites:&lt;br /&gt;
&lt;br /&gt;
 1. www.opensourceecology.org (osemain)&lt;br /&gt;
 2. www.openbuildinginstiture.org (obi)&lt;br /&gt;
&lt;br /&gt;
We decided that there would be 2 simultaneous versions of obi:&lt;br /&gt;
&lt;br /&gt;
1. A static site scraped with curl on hetzner3&lt;br /&gt;
2. The (broken) dynamic wordpress site on hetzner3&lt;br /&gt;
&lt;br /&gt;
And we decided that there would be 3 simultaneous versions of osemain:&lt;br /&gt;
&lt;br /&gt;
1. The live/current site on hetzner2&lt;br /&gt;
2. A static site scraped with curl on hetzner3&lt;br /&gt;
3. The (broken) dynamic wordpress site on hetzner3&lt;br /&gt;
&lt;br /&gt;
To have multiple sites with the same domain on the same server, we bought a second IPv4 address (FeF isn&#039;t setup with IPv6). This week I just finished updating the hetzer3 server to persist this new IPv4 address.&lt;br /&gt;
&lt;br /&gt;
The next item for you would be to update our ansible to push out new vhosts (in nginx, varnish, and apache) for the static sites that are bound to the second IPv4 address using the same hostname.&lt;br /&gt;
&lt;br /&gt;
Please read-through the ansible playbook and roles (most importantly for nginx, varnish, and apache) to understand how they&#039;re provisioned&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/OpenSourceEcology/ansible&lt;br /&gt;
&lt;br /&gt;
Since you have access to hetzner3, you can also poke around (read-only please) the configs for these three web services to understand how ansible provisions them.&lt;br /&gt;
&lt;br /&gt;
Once you&#039;ve updated and pushed-out the new vhosts with ansible, you&#039;ll need to update the migration plan&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_obi_to_hetzner3&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
And then you&#039;ll want to go-through each migration plan to create a temp &amp;quot;snapshot&amp;quot; of all the sites on hetzner3, where Marcin &amp;amp; Catarina can do a thorough verification of each site (by updating /etc/hosts) before we do the *real* migration -- which is nearly the same as the &amp;quot;snapshot&amp;quot; except we actually migrate DNS.&lt;br /&gt;
&lt;br /&gt;
Please let me know when you&#039;ve finished reading the above articles.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/24/25 22:16, REDACTED@tutanota.com wrote:&lt;br /&gt;
&amp;gt; Michael;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I need to reset my ssh key on hetzner2. Can you use the same as on 3 or best to generate a new one?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I spoke with Marcin and I think I can help with the admin, as I have time available.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Can you give a run-down of its status and what needs to be done for completing the migration to hetzner3?&lt;br /&gt;
&amp;gt; -- &lt;br /&gt;
&amp;gt; Tom Griffing&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 24, 2025=&lt;br /&gt;
# it&#039;s 05:00; I tried to login to the wiki, but I got an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, under that it says I&#039;m already logged-in?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
You are already logged in as Maltfield. Use the form below to log in as another user. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, let&#039;s start the CHG to replace the failing disk on hetzner 2 https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
# I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried to remove the first partition from the RAID, but it said I can&#039;t?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# apparently the docs say that if the RAID is healthy, you have to force it with &#039;--fail&#039; https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
# crap, I realized I have an issue in my CHG (we need two sysadmins for peer review *sigh*)&lt;br /&gt;
## I listed this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## but it should be this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, it looks like I first need to execute this, to force the RAID into a failure state&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I was able to remove it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# by 10:32 UTC, I submitted the request to hetzner to replace /dev/sdb = &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# it says they should do it within 2-4 hours&lt;br /&gt;
# meanwhile, I updated https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# at 08:00 my time, I checked and saw that we had an email come from hetzner at 06:36 (my time)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, crap. I tried to load the wiki CHG article, but there&#039;s an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry! This site is experiencing technical difficulties.&lt;br /&gt;
&lt;br /&gt;
Try waiting a few minutes and reloading.&lt;br /&gt;
&lt;br /&gt;
(Cannot access the database)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the server wasn&#039;t shutdown, and my screen session is still intact, but dmesg is being flooded with RAID and io errors&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
[11136.011313] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.011372] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11136.319267] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.319322] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.827642] EXT4-fs error: 5 callbacks suppressed&lt;br /&gt;
[11138.827693] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
[11138.827793] EXT4-fs: 5 callbacks suppressed&lt;br /&gt;
[11138.827841] EXT4-fs (md2): previous I/O error to superblock detected&lt;br /&gt;
[11138.835255] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835311] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835367] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11138.835472] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well anyway, I&#039;ll see if I can at least restart the RAID sync and install grub on the new disk&lt;br /&gt;
# son of a bitch, they removed the wrong drive!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 13:05:32 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT&lt;br /&gt;
sdb      8:16   0   477G  0 disk &lt;br /&gt;
sdc      8:32   0 232.9G  0 disk &lt;br /&gt;
├─sdc1   8:33   0    32G  0 part &lt;br /&gt;
├─sdc2   8:34   0   512M  0 part &lt;br /&gt;
└─sdc3   8:35   0 200.4G  0 part &lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
device node not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it shows a new drive (sdc) and and old drive (sdb)&lt;br /&gt;
# ugh, so now we have nothing in the raid?&lt;br /&gt;
# here&#039;s the new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdc | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# christ, so this new disk is half the size of our actual disk? what did they do?!?&lt;br /&gt;
# and now we have a prod server online with no redundancy. I can&#039;t tell them to put back-in the *correct* disk, or we&#039;ll have data loss&lt;br /&gt;
# I&#039;m going to stop all the web services before this disaster gets any worse&lt;br /&gt;
# great; io errors. this is a damn disaster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
Failed to stop apache2.service: Unit apache2.service not loaded.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and made partition backups, anyway&lt;br /&gt;
# wait, actually, it said that /dev/sdc = Crucial_CT250MX200SSD1_154410FA336C. That&#039;s our old /dev/sda&lt;br /&gt;
# so they *did* remove the right drive, but the re-insertion of the wrong drive pushed /dev/sda to /dev/sdc. That kinda breaks our ability to map the RAID, but let&#039;s at-least partition this new drive&lt;br /&gt;
# but this new drive isn&#039;t the right size. it&#039;s 512G while our old disk was 250G. I guess it&#039;s better to have too-big of a disk than too-small of a disk, but we won&#039;t be able to use that extra disk space. I&#039;m going to assume that they just didn&#039;t have 250G disks in-stock anymore.&lt;br /&gt;
# anyway, I tried to backup the partitions, but that wouldn&#039;t work since we&#039;re read-only&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
mkdir: cannot create directory ‘/var/tmp/chg.20250424_132010’: Read-only file system&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
chown: cannot access ‘/var/tmp/chg.20250424_132010’: No such file or directory&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what to do besides giving it a reboot, but that scares me&lt;br /&gt;
# I&#039;d like to take a backup, but I can&#039;t if I get read-only errors :(&lt;br /&gt;
# well, I guess that&#039;s why we made a backup before this. I don&#039;t think I have any option other than to reboot. and pray that grub is intact to bring it back.&lt;br /&gt;
# I gave it a reboot. If it doesn&#039;t come back, I&#039;ll try to boot to the rescue CD from within the hetzner wui&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date &amp;amp;&amp;amp; reboot&lt;br /&gt;
Thu Apr 24 13:24:18 UTC 2025&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&lt;br /&gt;
Failed to start reboot.target: Unit is not loaded properly: Input/output error.&lt;br /&gt;
See system logs and &#039;systemctl status reboot.target&#039; for details.&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# wtf, it can&#039;t even reboot it&#039;s so broken.&lt;br /&gt;
# I triggered a rest on the hetzner wui&lt;br /&gt;
# the server came back, and I immediately shutdown all services again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and triggered backups&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /etc/cron.d/backup_to_backblaze &lt;br /&gt;
20 07 * * * root time /bin/nice /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
20 04 03 * * root time /bin/nice /root/backups/backupReport.sh&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# time /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, sdc is gone. we have sda and sdb again, and sda is our original sda – as we wanted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I made a backup of the partitions; it&#039;s not surprising the sdb file is empty&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# pushd $chg_dir&lt;br /&gt;
/var/tmp/chg.20250424_133230 ~&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# du -sh ${chg_dir}/*&lt;br /&gt;
4.0K    /var/tmp/chg.20250424_133230/sda_parttable_mbr.bak&lt;br /&gt;
0       /var/tmp/chg.20250424_133230/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I copied the partition from sda to sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
Checking that no-one is using this disk right now ...&lt;br /&gt;
OK&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 62260 cylinders, 255 heads, 63 sectors/track&lt;br /&gt;
sfdisk:  /dev/sdb: unrecognized partition table type&lt;br /&gt;
&lt;br /&gt;
Old situation:&lt;br /&gt;
sfdisk: No partitions found&lt;br /&gt;
&lt;br /&gt;
New situation:&lt;br /&gt;
Units: sectors of 512 bytes, counting from 0&lt;br /&gt;
&lt;br /&gt;
   Device Boot    Start       End   #sectors  Id  System&lt;br /&gt;
/dev/sdb1          2048  67110912   67108865  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2      67112960  68161536    1048577  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3      68163584 488395120  420231537  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb4             0         -          0   0  Empty&lt;br /&gt;
Warning: partition 1 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not end at a cylinder boundary&lt;br /&gt;
Warning: no primary partition is marked bootable (active)&lt;br /&gt;
This does not matter for LILO, but the DOS MBR will not boot this disk.&lt;br /&gt;
Successfully wrote the new partition table&lt;br /&gt;
&lt;br /&gt;
Re-reading the partition table ...&lt;br /&gt;
&lt;br /&gt;
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)&lt;br /&gt;
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1&lt;br /&gt;
(See fdisk(8).)&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good, other than the complaint about not being able to boot from this disk; I&#039;ll check later what is LILO and if this will matter for raid grub&lt;br /&gt;
# I reloaded the partition table for this disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# blockdev --rereadpt /dev/sdb&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I added the new disk to the RAID, and it shows that it&#039;s starting to sync now. excellent&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm: added /dev/sdb1&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm: added /dev/sdb2&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
mdadm: added /dev/sdb3&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [&amp;gt;....................]  recovery =  0.0% (19712/33521664) finish=481.1min speed=1159K/sec&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, it looks like it&#039;s not syncing each partition of the RAID at the same time. it&#039;s doing md0 now and then it&#039;ll do the others after, I guess&lt;br /&gt;
# md0 is partition 1 (sda1/sdb1). That&#039;s *sigh* swap. It&#039;s 32GB.&lt;br /&gt;
# I kinda wish we&#039;d sync&#039;d /boot first. I don&#039;t think I can install grub until that&#039;s sync&#039;d. maybe?&lt;br /&gt;
# it says it&#039;s moving about 1024K/s. That&#039;s 1 MB per sec. 32G*1024 = 32,768 MB. That&#039;s 32,768 seconds / 60 = 546 minutes / 60 = 9 hours. Just for swap!&lt;br /&gt;
# assuming we have the same speed for the rest of the disk, that&#039;s 250 G * 1024 = 256,000 MB / 1 MB/s = 256,000 seconds. 256,000 seconds / 60 = 4,266.666666667 minutes / 60 = 4,266.666666667 = 71.11 hours. I guess we just have to accept the risk and hope that old /dev/sda with all our data doesn&#039;t fail within then next 3 days.&lt;br /&gt;
# I tried to go ahead and install grub on the new disk, but i got a command not found error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub-install /dev/sdb&lt;br /&gt;
-bash: grub-install: command not found&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub&lt;br /&gt;
grub2-bios-setup           grub2-glue-efi             grub2-mkconfig             grub2-mkpasswd-pbkdf2      grub2-probe                grub2-set-default&lt;br /&gt;
grub2-editenv              grub2-install              grub2-mkfont               grub2-mkrelpath            grub2-reboot               grub2-setpassword&lt;br /&gt;
grub2-file                 grub2-kbdcomp              grub2-mkimage              grub2-mkrescue             grub2-render-label         grub2-sparc64-setup&lt;br /&gt;
grub2-fstest               grub2-macbless             grub2-mklayout             grub2-mkstandalone         grub2-rpm-sort             grub2-syslinux2cfg&lt;br /&gt;
grub2-get-kernel-settings  grub2-menulst2cfg          grub2-mknetdir             grub2-ofpathname           grub2-script-check         grubby&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it should be &#039;grub2-install&#039; I tried that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that&#039;s two warnings but no errors; I&#039;ll take it.&lt;br /&gt;
# we&#039;re up to 12.4% on the RAID sync of swap. It&#039;s now going &amp;gt;50x faster than it was before; good news&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [==&amp;gt;..................]  recovery = 12.4% (4168832/33521664) finish=8.2min speed=59264K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# calculations at that speed would be 250*1024/58 = 4,413.793103448 seconds / 60 = 73 minutes. Oh, that&#039;s just over an hour.&lt;br /&gt;
# and now we&#039;re at 42.7%&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [========&amp;gt;............]  recovery = 42.7% (14334208/33521664) finish=6.6min speed=47845K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups are still running; I&#039;ll let them finish before starting-up the webservers again&lt;br /&gt;
# I wrote a status email to Marcin&lt;br /&gt;
# the backups still aren&#039;t finished&lt;br /&gt;
# I checked on the raid replication, and it shows md0 (swap) and md1 (boot) are both done. Horray! Now we just need to finish root (/), which is 9.8% done and going at 60 MB/s. Great!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:05:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [=&amp;gt;...................]  recovery =  9.8% (20767872/209984640) finish=50.5min speed=62429K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave the grub install a double-tap now that it&#039;s synced with the first disk; the output was the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the output of lsblk looks much nicer now, too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0 232.9G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups say they&#039;re 9% uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/backups/backup.log&lt;br /&gt;
...&lt;br /&gt;
2025/04/24 14:13:48 INFO  :&lt;br /&gt;
Transferred:        2.210G / 20.472 GBytes, 11%, 2.904 MBytes/s, ETA 1h47m20s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:      13m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg: 10% /20.472G, 2.997M/s, 1h43m59s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I decided to just kill the backup script and manually upload it without the bwlimit, so it&#039;ll go-out faster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# /bin/sudo -u b2user /bin/rclone -v copy /home/b2user/sync/daily_hetzner2_20250424_133017.tar.gpg b2:ose-server-backups&lt;br /&gt;
2025/04/24 14:15:20 INFO  :&lt;br /&gt;
Transferred:      116.500M / 20.472 GBytes, 1%, 1.958 MBytes/s, ETA 2h57m25s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:       1m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg:  0% /20.472G, 5.065M/s, 1h8m35s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# meanwhile we&#039;re at 24% on the RAID sync&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [====&amp;gt;................]  recovery = 23.9% (50200448/209984640) finish=101.1min speed=26325K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, important to note: our new disk doesn&#039;t say that it&#039;s failing :D&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# while the old disk says it&#039;s reached 100% of its lifecycle, the new disk says it&#039;s at – uhh – 96% of it&#039;s life? That doesn&#039;t sound very good :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78516&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       50&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3445&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       47&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   060   046   000    Old_age   Always       -       40 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       407132499909&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12839097351&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26313144762&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52083&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       33&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1449&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       20&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   061   049   000    Old_age   Always       -       39 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       600236629947&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18860233219&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11828985935&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Shame. I was hoping for at least something &amp;lt;50%. Well, I wonder how long that remaining 4% will last us :/&lt;br /&gt;
# ok, backups just finished; let&#039;s start the web services&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl start mariadb&lt;br /&gt;
[root@opensourceecology ~]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the wiki CHG with a status https://wiki.opensourceecology.org/wiki/Category:CHGs&lt;br /&gt;
# And I sent an email to Marcin recommending that he replace /dev/sda with an actual new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&lt;br /&gt;
Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&lt;br /&gt;
I was a bit disappointed to learn that hetzner replaced a disk with 0% &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for choosing the free disk replacement..&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&lt;br /&gt;
Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on replacing that one next week too, but I would recommend that you pay for a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&lt;br /&gt;
Do you authorize me selecting €41.18 for the replacement of /dev/sda on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# from the output above, our old drive said it had &amp;quot;Power_On_Hours&amp;quot; of 78516/24/365 = 8.96 years&lt;br /&gt;
# and our new drive says Power_On_Hours = 52083/24/365 = 5.95 years. Well that&#039;s better, I guess.&lt;br /&gt;
# oh wow, the power cycle count is crazy; our disk we only rebooted 50 times and the new one was only 33 times.&lt;br /&gt;
# also the SMART data for both of these drives has different keys (not just values). apparently it&#039;s very vendor-specific, so some of these comparisons are apples-to-oranges&lt;br /&gt;
# right, we&#039;re at 69.7% replication on root. I&#039;m going to go make breakfast and check-in again after&lt;br /&gt;
# ...&lt;br /&gt;
# over lunch, I realized that Marcin&#039;s last email was possibly hyperbolic panic&lt;br /&gt;
# he&#039;s worried that he just kicked-off a marketing campaign (for the apprenticeship), which now links to information on a broken website – where potential applicants can&#039;t read the info&lt;br /&gt;
# but I think the content actually *is* accessible, just not to Marcin&lt;br /&gt;
# when you&#039;re logged-into the wiki, the cookies bypass the cache. So, regretablly, when hetnzer2&#039;s backend is offline, Marcin sees an error&lt;br /&gt;
# but I&#039;d bet that the frontpage of all the websites and the recently-published apprenticeship info page that he&#039;s published &amp;amp; promoted are still online when he sees that error – for users who are *not* logged-into the site&lt;br /&gt;
# but if the backend site is broken for &amp;gt;24 hours, then the cache will cache the errors (not the content)&lt;br /&gt;
# as a short-term hack, I recommended that we setup a daily reboot of hetzner2 at 10:40 (a good buffer after the backups finish uploading)&lt;br /&gt;
# I asked Marcin if he&#039;d like me to setup a daily reboot at 10:40&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&lt;br /&gt;
Of course I agree it&#039;s not good, and we should migrate away from hetzner2 asap. And I do wish I had more bandwidth to finish the migration faster for you.&lt;br /&gt;
&lt;br /&gt;
But you have a varnish cache that caches pages for 24 hours. Even if your backend webserver and database are down, popular pages (like the frontpage of your wiki or a recent article that you&#039;ve recently promoted) should still load for users.&lt;br /&gt;
&lt;br /&gt;
The big issue isn&#039;t marketing and read-only content. The big issue is editing. That&#039;s what is breaking.&lt;br /&gt;
&lt;br /&gt;
When you&#039;re logged into the wiki, it bypasses the varnish cache. So, even if the wiki appears down to you, the contents of (most) articles viewed in the past 24 hours will be still visible to potential apprenticeship applicants.&lt;br /&gt;
&lt;br /&gt;
The next time you see the websites are down, try loading it from another device where you&#039;re not logged-in. You&#039;ll probably see that the apprenticeship info is still accessible, even though the backend for the site is down.&lt;br /&gt;
&lt;br /&gt;
As a short-term hack, I recommend setting-up a daily reboot of the server. Backups typically finish before 10:10 UTC. I recommend we add a cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&lt;br /&gt;
The server seems to function for some time after a fresh reboot, and it caches pages for 24 hours. So the first time someone loads a page in the wiki after that reboot, it&#039;ll be cached for the entire time that the server is online until its next reboot. I think this will ensure higher availability of your read-only content (eg information about the apprenticeship).&lt;br /&gt;
&lt;br /&gt;
Would you like me to setup a daily reboot at 10:40 UTC on hetzner2? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# I checked-in on the RAID replication status; it&#039;s finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [===================&amp;gt;.]  recovery = 96.5% (202794752/209984640) finish=2.5min speed=46324K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:20:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	 	 523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like I started it just after 13:32 and it finished just before 15:20. So it took just under 2 hours. Great!&lt;br /&gt;
# I updated the article with status updates, marking the CHG as completed successfully https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb#2025-04-24_16:18_UTC&lt;br /&gt;
# And I sent an email to Marcin &amp;amp; Catarana to let them know it was successful, and asked again about buying a new drive for replacing /dev/sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Update: your new (used) disk is now fully synced with the old (failing) disk.&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
According to SMART data, you now have one failing disk and one not-failing disk.&lt;br /&gt;
&lt;br /&gt;
Your hetzner2 RAID is now healthy, and you have redundancy spread across two mirrored disks again.&lt;br /&gt;
&lt;br /&gt;
Next week I&#039;d like to replace the other failing disk. Please let me know if you approve the purchase of a new disk for its replacement. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin got back to me, approving the purchase of the new disk; I updated the ticket https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# Note that the price is listed as &amp;quot;at cost&amp;quot; and it says&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# 1,000 hours is fine. That&#039;s compared to the 78,516 hours of /dev/sda and 52,083 hours of our &amp;quot;new&amp;quot; /dev/sdb&lt;br /&gt;
# but it&#039;s a bit concerning that it says it might not be in-stock. I&#039;m going to message them and ask if they can set one aside for us for next week&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hi Support,&lt;br /&gt;
&lt;br /&gt;
Can you set-aside a replacement disk for this server?&lt;br /&gt;
&lt;br /&gt;
Our disks&#039; SMART logs indicated that both disks should be replaced. Today we replaced one of the two disks, but the disk that you replaced it with has 4% of its life left, according to SMART data (it has 52,083 hours of operation).&lt;br /&gt;
&lt;br /&gt;
Next week we would like to replace the other disk, and this time we&#039;d like your &amp;quot;at cost&amp;quot; option, to get a disk with &amp;lt;1,000 hours of operation.&lt;br /&gt;
&lt;br /&gt;
But I was a bit concerned when I read this next to the WUI option for &amp;quot;at cost&amp;quot; on your website&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&lt;br /&gt;
Specifically what worries me is the &amp;quot;may not be in stock&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
Can you please tell us if you have stock now? And if you do, can you please reserve one disk for us for next week?&lt;br /&gt;
&lt;br /&gt;
We don&#039;t want to remove a disk from our RAID and plan for downtime, only to discover that you don&#039;t have a disk available for us..&lt;br /&gt;
&lt;br /&gt;
Please let us know if you can reserve 1 disk for us for next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I asked Marcin if Wed next week at 11:00 UTC is ok for replacing hetzner2&#039;s sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
   * 13:00 in Germany (where the server lives)&lt;br /&gt;
   * 06:00 here in Ecuador, and&lt;br /&gt;
   * 06:00 at FeF&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime,&lt;br /&gt;
please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
agreeable to you, and if you have any questions.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin returned the email confirming the time&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin got back to me and told me to setup the daily reboot cron on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, please set up reboot. That is decent for now&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 11:08 AM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;  &amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt;  &amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Of course I agree it&#039;s not good, and we should migrate away from&lt;br /&gt;
&amp;gt; hetzner2 asap. And I do wish I had more bandwidth to finish the&lt;br /&gt;
&amp;gt; migration faster for you.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; But you have a varnish cache that caches pages for 24 hours. Even if&lt;br /&gt;
&amp;gt; your backend webserver and database are down, popular pages (like the&lt;br /&gt;
&amp;gt; frontpage of your wiki or a recent article that you&#039;ve recently&lt;br /&gt;
&amp;gt; promoted) should still load for users.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The big issue isn&#039;t marketing and read-only content. The big issue is&lt;br /&gt;
&amp;gt; editing. That&#039;s what is breaking.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When you&#039;re logged into the wiki, it bypasses the varnish cache. So,&lt;br /&gt;
&amp;gt; even if the wiki appears down to you, the contents of (most) articles&lt;br /&gt;
&amp;gt; viewed in the past 24 hours will be still visible to potential&lt;br /&gt;
&amp;gt; apprenticeship applicants.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The next time you see the websites are down, try loading it from another&lt;br /&gt;
&amp;gt; device where you&#039;re not logged-in. You&#039;ll probably see that the&lt;br /&gt;
&amp;gt; apprenticeship info is still accessible, even though the backend for the&lt;br /&gt;
&amp;gt; site is down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; As a short-term hack, I recommend setting-up a daily reboot of the&lt;br /&gt;
&amp;gt; server. Backups typically finish before 10:10 UTC. I recommend we add a&lt;br /&gt;
&amp;gt; cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The server seems to function for some time after a fresh reboot, and it&lt;br /&gt;
&amp;gt; caches pages for 24 hours. So the first time someone loads a page in the&lt;br /&gt;
&amp;gt; wiki after that reboot, it&#039;ll be cached for the entire time that the&lt;br /&gt;
&amp;gt; server is online until its next reboot. I think this will ensure higher&lt;br /&gt;
&amp;gt; availability of your read-only content (eg information about the&lt;br /&gt;
&amp;gt; apprenticeship).&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you like me to setup a daily reboot at 10:40 UTC on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we don&#039;t have ansible for hetzner2; I did this manually&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology cron.d]# pwd&lt;br /&gt;
/etc/cron.d&lt;br /&gt;
[root@opensourceecology cron.d]# ls -lah&lt;br /&gt;
total 52K&lt;br /&gt;
drwxr-xr-x.   2 root root 4.0K Apr 24 17:56 .&lt;br /&gt;
drwxr-xr-x. 105 root root  12K Apr 18 21:52 ..&lt;br /&gt;
-rw-r--r--    1 root root  128 May 16  2023 0hourly&lt;br /&gt;
-rw-r--r--    1 root root 1.3K Apr  9  2019 awstats_generate_static_files&lt;br /&gt;
-rw-r--r--    1 root root  151 Apr 24 17:52 backup_to_backblaze&lt;br /&gt;
-rw-r--r--    1 root root   78 May 31  2024 cacti&lt;br /&gt;
-rw-r--r--    1 root root  125 Dec 11 00:16 letsencrypt&lt;br /&gt;
-rw-r--r--    1 root root  506 Mar 18  2019 phplist&lt;br /&gt;
-rw-r--r--    1 root root  108 Jan  7  2022 raid-check&lt;br /&gt;
-rw-r--r--    1 root root  118 Apr 24 17:56 reboot&lt;br /&gt;
-rw-------    1 root root  235 Dec 15  2022 sysstat&lt;br /&gt;
[root@opensourceecology cron.d]# cat reboot &lt;br /&gt;
# 2025-04-24: temp hack for unstable hetzner2 while we build-out hetzner3 to replace it&lt;br /&gt;
40 10 * * * root /sbin/reboot&lt;br /&gt;
[root@opensourceecology cron.d]# &lt;br /&gt;
# tomorrow morning I should check on the uptime and journalctl to make sure it rebooted sometime around 10:40 UTC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# ok, back to hetzner3: we bought a second IPv4 address for the static sites, but the server&#039;s networking was never setup for it; let&#039;s add that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # cp interfaces interfaces.20250424&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that failed.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
You have mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
I restored the backup file, and it still failed. The journal and status aren&#039;t helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl status networking&lt;br /&gt;
× networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: failed (Result: exit-code) since Thu 2025-04-24 17:18:55 UTC; 52s ago&lt;br /&gt;
   Duration: 2month 1w 20h 39min 50.765s&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 3259336 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)&lt;br /&gt;
	Process: 3259371 ExecStopPost=/usr/bin/touch /run/network/restart-hotplug (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 3259336 (code=exited, status=1/FAILURE)&lt;br /&gt;
		CPU: 29ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
root@hetzner3 ~ # journalctl -u networking | tail&lt;br /&gt;
Apr 24 17:16:36 hetzner3 ifup[3258504]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I run the ExecStart command manaully, I can add a verbose tag. but that&#039;s not especially helpful, either&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ifup --verbose -a --read-environment&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
&lt;br /&gt;
ifup: configuring interface enp0s31f6=enp0s31f6 (inet)&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
ip addr add 144.76.164.201/255.255.255.224 broadcast 144.76.164.223       dev enp0s31f6 label enp0s31f6&lt;br /&gt;
RTNETLINK answers: File exists&lt;br /&gt;
ifup: failed to bring up enp0s31f6&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/000resolvconf&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/ethtool&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/postfix&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/resolved&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# curiously, though, the new IPv4 address is listed in `ip a`&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to give this server a reboot before proceeding, to make sure the IP config is sticky&lt;br /&gt;
# when it came-up, it lost the new IP :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, at least it&#039;s restarting now without errors; I can work with that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # systemctl restart networking&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /etc/network # systemctlstatus networking&lt;br /&gt;
-bash: systemctlstatus: command not found&lt;br /&gt;
root@hetzner3 /etc/network # systemctl status networking&lt;br /&gt;
● networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (exited) since Thu 2025-04-24 17:33:40 UTC; 15s ago&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 8598 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)&lt;br /&gt;
	Process: 9022 ExecStart=/bin/sh -c if [ -f /run/network/restart-hotplug ]; then /sbin/ifup -a --read-environment --allow=hotplug; fi (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 9022 (code=exited, status=0/SUCCESS)&lt;br /&gt;
		CPU: 357ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:33:34 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:33:39 hetzner3 ifup[8663]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 ifup[8907]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 systemd[1]: Finished networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s try to add it now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces interfaces.20250424 &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,23&lt;br /&gt;
&amp;gt; iface enp0s31f6 inet static&lt;br /&gt;
&amp;gt;   address 144.76.164.195&lt;br /&gt;
&amp;gt;   netmask 255.255.255.224&lt;br /&gt;
&amp;gt;   gateway 144.76.164.193&lt;br /&gt;
&amp;gt;   # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
&amp;gt;   #up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&amp;gt; &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, but I have errors again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# curiously, it *did* add the new IP address; wtf&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet isn&#039;t very helpful because it seems the damn format has changed so many times over the years; lots of outdated info&lt;br /&gt;
# lots of people say they fixed this by deleting everything in interfaces.d/, but we don&#039;t have anything in that folder&lt;br /&gt;
# I did find this hetzner-specific docs on adding a second IP; it&#039;s totally different than what I&#039;ve read elsewhere https://docs.hetzner.com/robot/dedicated-server/network/net-config-debian-ubuntu&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
up ip addr add 10.4.2.1/32 dev eth0&lt;br /&gt;
down ip addr del 10.4.2.1/32 dev eth0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried this, and gave the server a reboot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,20&lt;br /&gt;
&amp;gt;   # 2025-04-24: add second IPv4 address&lt;br /&gt;
&amp;gt;   up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;   down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # cat interfaces&lt;br /&gt;
### Hetzner Online GmbH installimage&lt;br /&gt;
&lt;br /&gt;
source /etc/network/interfaces.d/*&lt;br /&gt;
&lt;br /&gt;
auto lo&lt;br /&gt;
iface lo inet loopback&lt;br /&gt;
iface lo inet6 loopback&lt;br /&gt;
&lt;br /&gt;
auto enp0s31f6&lt;br /&gt;
iface enp0s31f6 inet static&lt;br /&gt;
  address 144.76.164.201&lt;br /&gt;
  netmask 255.255.255.224&lt;br /&gt;
  gateway 144.76.164.193&lt;br /&gt;
  # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
  up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
  # 2025-04-24: add second IPv4 address&lt;br /&gt;
  up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
  down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::2&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::3&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the system came-up with the IP I want. Cool!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I&#039;m able to restart the service without it yelling at me (or breaking the IP config)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also able to ping the server on both IPs, which is a good sign&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ ping 144.76.164.201&lt;br /&gt;
PING 144.76.164.201 (144.76.164.201) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=1 ttl=50 time=490 ms&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=2 ttl=50 time=490 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.201 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1000ms&lt;br /&gt;
rtt min/avg/max/mdev = 489.558/489.676/489.795/0.118 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
user@disp9871:~$ ping 144.76.164.195&lt;br /&gt;
PING 144.76.164.195 (144.76.164.195) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=1 ttl=50 time=493 ms&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=2 ttl=50 time=512 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.195 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1001ms&lt;br /&gt;
rtt min/avg/max/mdev = 492.853/502.518/512.184/9.665 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I used netcat to test it. Most ports are closed, and I found that nginx is listening on most of the other ports on all IPs – except 4443&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # nc -s 144.76.164.195 -l -p 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and this was how it looked on my laptop&#039;s side&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ nc 144.76.164.195 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so the server&#039;s new IPv4 address is configured (and persistent between reboots)&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 20, 2025=&lt;br /&gt;
# Marcin replied to my email authorizing the replacement of the /dev/sdb disk on hetzner2 at 2025-04-24 10:00 UTC https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
## I updated the article with the defined date &amp;amp; time&lt;br /&gt;
# ...&lt;br /&gt;
# I also checked hetzner3. I see that I setup email alerts for the RAID, but not for SMART.&lt;br /&gt;
## on hetzner2, we had no errors of the RAID, but we did have SMART errors. I guess eventually if it failed enough that RAID replication was breaking, we would have gotten alerts. But it would be good if we could get alerts *before* that happened..&lt;br /&gt;
# I checked munin on hetzner2 to see what data it collects for monitoring disks @ /disk-day.html&lt;br /&gt;
## looks like we have latency, throughput, usage, utilization, i/o, and inode usage. There&#039;s nothing about &amp;quot;SMART errors&amp;quot;&lt;br /&gt;
# looks like there *is* a smart module for munin https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
# it&#039;s already there on hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Mar 21  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 smart_&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# hetzner2 has it too &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology munin]# ls -lah /usr/share/munin/plugins | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Nov  6  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Nov  6  2023 smart_&lt;br /&gt;
[root@opensourceecology munin]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# crap, I just checked hetzner3&#039;s munin, and I realized that varnish is missing :(&lt;br /&gt;
# it looks like ansible *has* pushed-out the script and plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /usr/share/munin/plugins/ | grep -i varnish&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 varnish_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Feb 12 00:14 varnish5_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 28  2024 varnish5_.175431.2025-02-12@00:16:02~&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 25  2024 varnish5_.20240928.orig&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /etc/munin/plugins/ | grep -i varnish&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_backend_traffic -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_bad -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_expunge -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_hit_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_main_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_memory_usage -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_mgt_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_objects -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_request_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_threads -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_transfer_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Feb 12 00:16 varnish_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
 # I did a diff of the varnish5_ script from my server and ose&#039;s server, and I found 2 new lines at the top of the hetzner3 server&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@mail:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
#&lt;br /&gt;
# This program is free software; you can redistribute it and/or modify&lt;br /&gt;
maltfield@mail:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
# Ansible managed&lt;br /&gt;
&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so basically the issue appears to be that my &amp;quot;ansible managed&amp;quot; comment comes before the shebang, so varnish is interpreting everything as shell, instead of perl&lt;br /&gt;
# we can see the result of all these syntax errors with a test run too&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@mail:/etc/munin# munin-run varnish_hit_rate&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
client_req.value 704255&lt;br /&gt;
cache_miss.value 202581&lt;br /&gt;
cache_hitmiss.value 2181&lt;br /&gt;
cache_hit.value 499493&lt;br /&gt;
root@mail:/etc/munin#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 26: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 28: varnish5_: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 30: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 32: Varnish: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 34: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 36: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 38: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 39: [varnish5_*]: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 40: group: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 41: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 42: env.name: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 44: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 108: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 111: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 114: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 117: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 119: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 123: Syntax error: &amp;quot;(&amp;quot; unexpected&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I moved the &amp;quot;ansible managed&amp;quot; comment below the shebang in ansible, and pushed it out; now it works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
client_req.value 10714&lt;br /&gt;
cache_hitmiss.value 9&lt;br /&gt;
cache_hit.value 6478&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
cache_miss.value 4227&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also pushed-out smart at the same time, but it&#039;s not working&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_ suggest&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the docs page for the smart_ munin plugin says that we need this section at-minimum in the munin config file, so I added it to hetzner2 https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# tail -n4 zzz-ose &lt;br /&gt;
&lt;br /&gt;
[smart_*]&lt;br /&gt;
user root&lt;br /&gt;
group disk&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I manually created the symlinks for sda &amp;amp; sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /etc/munin/plugins&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sda&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sdb&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# sweet, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# munin-run smart_sdb&lt;br /&gt;
Program_Fail_Count.value 100&lt;br /&gt;
Reallocated_Event_Count.value 100&lt;br /&gt;
Ave_Block_Erase_Count.value 001&lt;br /&gt;
Reallocate_NAND_Blk_Cnt.value 100&lt;br /&gt;
Erase_Fail_Count.value 100&lt;br /&gt;
Reported_Uncorrect.value 100&lt;br /&gt;
SATA_Interfac_Downshift.value 100&lt;br /&gt;
Offline_Uncorrectable.value 100&lt;br /&gt;
smartctl_exit_status.value 8&lt;br /&gt;
Write_Error_Rate.value 100&lt;br /&gt;
FTL_Program_Page_Count.value 100&lt;br /&gt;
Current_Pending_Sector.value 100&lt;br /&gt;
Success_RAIN_Recov_Cnt.value 100&lt;br /&gt;
UDMA_CRC_Error_Count.value 100&lt;br /&gt;
Error_Correction_Count.value 100&lt;br /&gt;
Temperature_Celsius.value 064&lt;br /&gt;
Raw_Read_Error_Rate.value 100&lt;br /&gt;
Total_Host_Sector_Write.value 100&lt;br /&gt;
Power_Cycle_Count.value 100&lt;br /&gt;
Power_On_Hours.value 100&lt;br /&gt;
Host_Program_Page_Count.value 100&lt;br /&gt;
Unused_Reserve_NAND_Blk.value 000&lt;br /&gt;
Percent_Lifetime_Remain.value 000&lt;br /&gt;
Unexpect_Power_Loss_Ct.value 100&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, I&#039;m not getting the same results on hetzner3. I wonder if this munin plugin doesn&#039;t support nvme drives?&lt;br /&gt;
# oh, it looks like I&#039;m actually not updating that file anymore in ansible, because it has a backup. I&#039;m going to make a note in ansible so I don&#039;t make that mistake again.&lt;br /&gt;
# meanwhile, I manually updated the config file on hetzner3 too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin # cd plugin-conf.d/&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls&lt;br /&gt;
dhcpd3  munin-node  README  spamstats  zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # touch /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chown root:root /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chmod 0600 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # cp zzz-myconf /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls -lah /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
-rw------- 1 root root 1,7K Apr 20 17:29 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # vim zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # diff /var/tmp/munin-zzz-myconf.20250420 /etc/munin/plugin-conf.d/zzz-myconf &lt;br /&gt;
3c3&lt;br /&gt;
&amp;lt; # Version: 0.2&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Version: 0.3&lt;br /&gt;
9c9&lt;br /&gt;
&amp;lt; # Updated: 2024-12-12&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Updated: 2025-04-20&lt;br /&gt;
31a32,35&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; [smart_*]&lt;br /&gt;
&amp;gt; user root&lt;br /&gt;
&amp;gt; group disk&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that still fails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but, if I restart the service first and then run it, it – uhh – kinda works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # service munin-node restart&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it exits with a non-error, just a U. no further stats. huh.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
smartctl_exit_status.value U&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, it looks like the smart_ plugin doesn&#039;t work for nvme drives :(&lt;br /&gt;
## https://github.com/munin-monitoring/munin/issues/790&lt;br /&gt;
## https://github.com/aranemac/munin-smart-nvme&lt;br /&gt;
# I&#039;m not looking to compile some binary. I think we&#039;ve reached the point of diminished return here&lt;br /&gt;
# while historical smart charts would be great, what I really want to achieve is some email alerts from SMART, like we setup for the RAID&lt;br /&gt;
# I found a few guides about this&lt;br /&gt;
## https://linuxconfig.org/how-to-configure-smartd-and-be-notified-of-hard-disk-problems-via-email&lt;br /&gt;
## https://serverfault.com/questions/426761/is-smartd-properly-configured-to-send-alerts-by-email&lt;br /&gt;
## https://unix.stackexchange.com/questions/662633/best-practices-to-enable-smart-disk-notifications-on-a-linux-workstation&lt;br /&gt;
# I replaced the files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # mv /etc/smartd.conf /etc/smartd.conf.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).orig&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc # echo &amp;quot;DEVICESCAN -d removable -n standby -m REDACTED@opensourceecology.org -M exec /usr/share/smartmontools/smartd-runner&amp;quot; &amp;gt; /etc/smartd.conf&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but that didn&#039;t work; no email came when I restarted the service (even if I added -M test)&lt;br /&gt;
# I checked the status in systemd, and it says that it did try to send the mail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # systemctl status smartd&lt;br /&gt;
● smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/smartmontools.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (running) since Sun 2025-04-20 20:58:57 UTC; 3min 22s ago&lt;br /&gt;
	   Docs: man:smartd(8)&lt;br /&gt;
			 man:smartd.conf(5)&lt;br /&gt;
   Main PID: 1466569 (smartd)&lt;br /&gt;
	 Status: &amp;quot;Next check of 2 devices will start at 21:28:57&amp;quot;&lt;br /&gt;
	  Tasks: 1 (limit: 76834)&lt;br /&gt;
	 Memory: 1.2M&lt;br /&gt;
		CPU: 66ms&lt;br /&gt;
	 CGroup: /system.slice/smartmontools.service&lt;br /&gt;
			 └─1466569 /usr/sbin/smartd -n&lt;br /&gt;
&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, is SMART capable. Adding to &amp;quot;monitor&amp;quot; list.&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state read from /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme0n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NX0M104566-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 systemd[1]: Started smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon.&lt;br /&gt;
root@hetzner3 /etc #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I checked the postfix logs, and it looks like google is rejecting our mail?!?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # journalctl -fu postfix@-&lt;br /&gt;
...&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: Untrusted TLS connection established to aspmx.l.google.com[108.177.15.27]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bit&lt;br /&gt;
s) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: CB6E5B94BB2: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[108.177.15.27]:25, delay=1.2, delays=0.01/0.01/0.86/0.27, dsn=2.0.0, status=sent (250 2.0.0 OK  1745183017 ffacd0b85a97d-39efa5a45b6si4251829f8f.798 - gsmtp)&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/qmgr[4510]: CB6E5B94BB2: removed&lt;br /&gt;
Apr 20 21:04:36 hetzner3 postfix/smtp[1468114]: Untrusted TLS connection established to aspmx.l.google.com[2404:6800:4003:c02::1b]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/bounce socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/bounce socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/defer socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/defer socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: D13CAB94BB3: defer service failure&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: D13CAB94BB3: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[2404:6800:4003:c02::1b]:25, delay=4.5, delays=0.01/0.01/3.5/1, dsn=4.3.0, status=deferred (bounce or trace service failure)&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I changed it to my personal email, restarted, and I got two emails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme1&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NA0M345614, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme0&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NX0M104566, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I changed it back to the google groups email list email address, and I updated the wiki https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
# after lunch, I refreshed munin on hetzne2 and hetzner3, to see if smart info was not being charted&lt;br /&gt;
## on hetzner2, there&#039;s no changes. I don&#039;t see any charts related to SMART&lt;br /&gt;
## on hetzner3, there&#039;s two new charts (S.M.A.R.T values for drive nvme0n1 &amp;amp; S.M.A.R.T values for drive nvme1n1), but they&#039;re both empty; it only has 1 value (smartctl_exit_status), and it&#039;s &amp;quot;nan&amp;quot; for all time charts. This is expected, since it can&#039;t read the nvme smartctl output format.&lt;br /&gt;
# I think maybe I forgot to restart munin on hetzner2, so I gave that a try&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# service munin-node restart&lt;br /&gt;
Redirecting to /bin/systemctl restart munin-node.service&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# sudo -u munin /usr/bin/munin-cron&lt;br /&gt;
2025/04/20 21:29:38 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 55.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 56.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:52 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# whatever; I guess no munin logs on SMART for this dying server&lt;br /&gt;
# I also confirmed that varnish logs are now visible in munin&lt;br /&gt;
# I committed my ansible changes https://github.com/OpenSourceEcology/ansible/commit/2fb906fd62cf0773d84f50f1cf113ddfe66910ec&lt;br /&gt;
# anyway, I also updated smartd.conf on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology smartmontools]# cp smartd.conf smartd.conf.20250420.bak&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# vim smartd.conf&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# diff smartd.conf.20250420.bak smartd.conf&lt;br /&gt;
23c23,24&lt;br /&gt;
&amp;lt; DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; #DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
&amp;gt; DEVICESCAN -H -m REDACTED@opensourceecology.org -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
[root@opensourceecology smartmontools]# systemctl restart smartd&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh wow, that screaming about the disks failing wasn&#039;t just printed to my tty; it got printed to every tty on my screen session. It really is angry..&lt;br /&gt;
# but, alas, no email was sent – even from hetzner2. where email should *definitely* be working&lt;br /&gt;
# this time the postfix logs on hetzner2 gave us an error from gmail saying why they&#039;re blocking us&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21221]: 297716847E6: host aspmx.l.google.com[64.233.167.27] said: 421-4.7.28 Gmail has detected an unusual rate of unso&lt;br /&gt;
licited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.go&lt;br /&gt;
ogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42a931si4417083f8f.167 - gsmtp (in reply to end &lt;br /&gt;
of DATA command)&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21094]: 3CBF7684804: host aspmx.l.google.com[142.251.168.27] said: 421-4.7.28 Gmail has detected an unusual rate of uns&lt;br /&gt;
olicited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.g&lt;br /&gt;
oogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42967csi4306047f8f.165 - gsmtp (in reply to end&lt;br /&gt;
 of DATA command)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# marcin sent an email campaign today with phpList. If that didn&#039;t make it out due to this, that&#039;s kinda  problem.&lt;br /&gt;
# I see in the log that we&#039;re kinda spamming phplist_bounces@opensourceecology.org&lt;br /&gt;
# that&#039;s basically where phplist is supposed to let our admins know that it failed to deliver to some people on the mailing list&lt;br /&gt;
## I confirmed that this account *does* exist in the gsuite admin wui user list&lt;br /&gt;
# yeah, crap, it&#039;s blocking other mail sent to my personal account from apache.&lt;br /&gt;
# woah, I&#039;m tailing the mail log and I just got probably hundereds or thousands of emails tried to be sent. phpList is *supposed* to do it in small batches, but I wonder if, once it fails and gets added to the queue, it&#039;ll do the re-send without batching it..&lt;br /&gt;
# I checked phpList wui settings and config.php, and I don&#039;t see anything about rate-limiting&lt;br /&gt;
# here&#039;s the docs on it https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
# it says it should be set in config.php. By default, I think it&#039;s 5,000 emails per hour&lt;br /&gt;
# Marcin&#039;s campaign today was sent to 14,111 people&lt;br /&gt;
# I checked the event log page, and I see a lot of these &amp;quot;Maximum time for queue processing: 99999&amp;quot; – which I guess means we need to break these up into batches https://phplist.opensourceecology.org/lists/admin/?page=eventlog&lt;br /&gt;
# looks like the easiest thing to do is to add a pause with MAILQUEUE_THROTTLE https://discuss.phplist.org/t/some-advice-for-correct-configuration-of-sending-rate/429&lt;br /&gt;
# if we send one per second, then we&#039;ll send 3,600 per hour.&lt;br /&gt;
## If we have 15,000 people on our list, then at that rate we&#039;d need 4-5 hours to send a campaign. That sounds like a good idea.&lt;br /&gt;
# I updated the phpList config file to send only one email per second&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# diff config.20250420.php config.php &lt;br /&gt;
83a84,87&lt;br /&gt;
&amp;gt; // only send 1 email per second&lt;br /&gt;
&amp;gt; //  * https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
&amp;gt; define(&#039;MAILQUEUE_THROTTLE&#039;,1);&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we should also probably throttle postfix https://serverfault.com/questions/110919/postfix-throttling-for-outgoing-messages&lt;br /&gt;
# looks like for both hetzner2 and hetzner3, this is set to no delay&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 0s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I set this on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology postfix]# diff main.cf.20250420 main.cf&lt;br /&gt;
683a684,686&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # limit emails to the same-destination-domain to one-email-per-2-seconds&lt;br /&gt;
&amp;gt; default_destination_rate_delay = 2s&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# systemctl restart postfix&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 2s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I also added this to ansible and pushed it out to the server on hetnzer3 https://github.com/OpenSourceEcology/ansible/commit/7ed339cad055a9a0c5b04f26d32c9416daf3a2c7&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 19, 2025=&lt;br /&gt;
&lt;br /&gt;
# I responded to Tom&#039;s email about ssh&lt;br /&gt;
# Tom wasn&#039;t able to reset their account&#039;s password&lt;br /&gt;
# I think I created these accounts with `--disabled-password`, probably as some layered security for ssh (to force keys), but that kinda breaks sudo, which requires the password. I could make sudo NOPASSWD, but I think it&#039;s safer to have a user password set (and have ssh disabled passoword logins still) rather than set sudoers to NOPASSWD, in general&lt;br /&gt;
# disabled passwords are set with the &#039;!&#039; in the second field of /etc/shadown&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing:!:20133:0:99999:7:::&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I just manually edited /etc/shadow with vim to remove the exclimation point&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # vim /etc/shadow&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing::20133:0:99999:7:::&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom replied, saying he can become root on hetzner3 now.&lt;br /&gt;
# ...&lt;br /&gt;
# I returned to work on the plan for replacing the disks on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb#Change_Steps&lt;br /&gt;
# I confirmed that the disks (on both hetzner2 and hetzner3) are MBR partition scheme (not GPT) – indicated by &amp;quot;Disk label type: dos&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sda&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0x9b8e1266&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sda1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sdb&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0xd904fc05&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sdb1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# A quick spot-check shows that our backups usually finish at 09:55 – one time as late as 10:07. That&#039;s UTC.&lt;br /&gt;
# 10:00 UTC is 05:00 my time and 12:00 in Berlin. God that&#039;s early, but better to do this early in Germany time..&lt;br /&gt;
# I sent an email to Marcin asking if Thr 2025-04-24 @ 10:00 UTC (~05:00 FeF) would be a good time to do this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the first disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
Our backups finish daily at 10:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
 * 12:00 in Germany (where the server lives)&lt;br /&gt;
 * 05:00 here in Ecuador, and&lt;br /&gt;
 * 05:00 at FeF&lt;br /&gt;
&lt;br /&gt;
I propose next week on Thursday 2025-04-24 10:00 UTC.&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime, please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 18, 2025=&lt;br /&gt;
# Marcin sent another email this morning asking why osemain is down too now, and I responded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the&lt;br /&gt;
&amp;gt; last message&lt;br /&gt;
&lt;br /&gt;
Your whole database service was down, and it won&#039;t start. You have a varnish cache that stores a subset of pages in-memory for 24 hours. That&#039;s probably what you saw.&lt;br /&gt;
&lt;br /&gt;
I took webservers down yesterday to prevent the possibility of them corrupting the database worse, if it manages to start in recovery mode.&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt; go straight to migration to Hetzner 3.&lt;br /&gt;
&lt;br /&gt;
If you want high uptime, I don&#039;t recommend migrating to hetzner3 at this time. It&#039;s still not fully provisioned, and I actively work on it like a dev server. Which means I&#039;ll be restarting it and its services. It&#039;s not a safe place for production. That&#039;s why the wiki is the *last* service to migrate.&lt;br /&gt;
&lt;br /&gt;
Status update: yesterday I investigated to see if your underlying storage (disk, filesystem, or RAID) are failing, which might cause corruption. The filesystems were fine. RAID didn&#039;t have errors. The SMART logs on the disk said both of your two mirrored drives are failing and should be replaced within 24 hours. But I don&#039;t think that&#039;s evidence of corruption; I think it&#039;s just a timer that&#039;s alerting us to the possibility that the disks will fail soon. afaict, disk replacement is free (from Hetzner) but not trivial and high-risk. I&#039;ll postpone until after restoring the database.&lt;br /&gt;
&lt;br /&gt;
Likely not all of your database is corrupt. We *could* restore from backup, but I don&#039;t recommend that -- as you only have daily backups, and likely you&#039;ll have data loss.&lt;br /&gt;
&lt;br /&gt;
Yesterday I put the database in two recovery modes and was unable to get it to start. My plan is to continue to follow this guide, to see if I can find out which databases/tables/pages are corrupt and which are not. That way we can restore only the data we need from backups and minimize data loss&lt;br /&gt;
&lt;br /&gt;
 * https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&lt;br /&gt;
I have to go to the hospital today. If I have time, I will try to continue later tonight. And I plan to work on this over the weekend. I hope to have your sites back online early next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cheers,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 02:58, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Michael,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the last message -&lt;br /&gt;
&amp;gt; but now I&#039;m trying to post the blog posts and the main site appears to be&lt;br /&gt;
&amp;gt; down. Is our whole backend crashing?  Or is that something you are doing on&lt;br /&gt;
&amp;gt; your end?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Marcin&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On Thu, Apr 17, 2025 at 6:41 PM Marcin Jakubowski &amp;lt;&lt;br /&gt;
&amp;gt; REDACTED@opensourceecology.org&amp;gt; wrote:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Can we prioritize the wiki at this point to migrate the wiki right over to&lt;br /&gt;
&amp;gt;&amp;gt; Hetzner 3 with the  current up to date software, using the wiki backup from&lt;br /&gt;
&amp;gt;&amp;gt; 2 days ago, which is before the crash?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; The wiki was working at least the first part of yesterday, and I noticed&lt;br /&gt;
&amp;gt;&amp;gt; the crash at about 11 PM CST yesterday. Thus taking the backup from 4/15/25&lt;br /&gt;
&amp;gt;&amp;gt; should solve this? Ie, forget about trying to fix on Hetzner 2, go straight&lt;br /&gt;
&amp;gt;&amp;gt; to migration to Hetzner 3. Is that consistent with a possible shift in your&lt;br /&gt;
&amp;gt;&amp;gt; plans, or does that throw off the entire process of migration? OSE stands&lt;br /&gt;
&amp;gt;&amp;gt; stuck without it, I will have to do everything in Google docs if I don&#039;t&lt;br /&gt;
&amp;gt;&amp;gt; have wiki access, and i am justvputtingvout the announcent and recruiting.&lt;br /&gt;
&amp;gt;&amp;gt; I can switcj ro more publishing on the website, assuming that all works.&lt;br /&gt;
&amp;gt;&amp;gt; Please tell me what would be your proposed solution and how quickly you&lt;br /&gt;
&amp;gt;&amp;gt; think we can get back up to a functioning wiki, based on your schedule of&lt;br /&gt;
&amp;gt;&amp;gt; availability to work on this, so I can plan accordingly.  This is a much&lt;br /&gt;
&amp;gt;&amp;gt; higher priority than doing any of the main website migration.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thanks,&lt;br /&gt;
&amp;gt;&amp;gt; Marcin &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so back to trying to figure out the corruption of the mariadb&lt;br /&gt;
# looks like the attempt to start it in recovery mode 2 fails after 10 minutes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because a fatal signal was delivered to the control process. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    10m0.435s&lt;br /&gt;
user    0m0.011s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and the tail of the db log&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/mariadb/mariadb.log&lt;br /&gt;
250417 23:06:00  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:01  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:02  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:03  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:05  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:06  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:07  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:08  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:09  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so we have one more recovery mode we can try before it becomes destructive = 3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 3&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and gave it a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# damn, looks like it&#039;s stuck on the same thing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:33:17 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:33:17 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 20076 ...&lt;br /&gt;
250418 19:33:17 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:33:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:33:17 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:33:17 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:33:17 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:33:17 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:33:17 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:33:17  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:33:17  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:33:18  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:19  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:20  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet suggests this infinite loop is caused by the default of innodb_purge_threads=1, and it says we should set this to 0&lt;br /&gt;
## https://serverfault.com/questions/851342/mysql-crashed-and-not-starting-even-after-adding-innodb-force-recovery&lt;br /&gt;
## https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# I tried to cut off the systemctl restart early, but it&#039;s just stuck. I guess I just have to wait 10 minutes.&lt;br /&gt;
# anyway, I set the recovery back down to 2 and added the purge threads to 0 line; I&#039;ll try that when it&#039;s not blocked&lt;br /&gt;
# meanwhile, I read up on innodb_purge_threads, which is documented here https://dev.mysql.com/doc/refman/8.4/en/innodb-purge-configuration.html&lt;br /&gt;
# oh shit, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m2.102s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
[root@opensourceecology etc]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 19:44:30 UTC; 19s ago&lt;br /&gt;
  Process: 22469 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 22433 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 22468 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─22468 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─22693 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-...&lt;br /&gt;
&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 19:44:30 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the logs are being spammed with these last 5 lines a bunch; I guess something is still trying to access the db?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:44:28 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 22693 ...&lt;br /&gt;
250418 19:44:28 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:44:28 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:44:28 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:44:28 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:44:28 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:44:28 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:44:28 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:44:28  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:44:28  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:44:28  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:44:29 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883505166&lt;br /&gt;
250418 19:44:29 InnoDB: !!! innodb_force_recovery is set to 2 !!!&lt;br /&gt;
250418 19:44:29 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 19:44:29 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 19:44:29 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, the spam stopped. maybe just some startup thing.&lt;br /&gt;
# I was hoping at startup it would tell us which DBs/tables/pages were corrupt; I guess we have to initiate a scan or something.&lt;br /&gt;
# this guide doesn&#039;t say anything about that https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
# but this one recommends running `mysqlcheck` https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# this took about a minute to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# good news; looks like the wiki isn&#039;t fucked. it&#039;s just osemain, oswh, and cacti. restoring those from backups is probably not going to cause any data loss&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@opensourceecology dbFail.20250417]# head mysqlcheck.20250418.log &lt;br /&gt;
3dp_db.wp_commentmeta                              OK&lt;br /&gt;
3dp_db.wp_comments                                 OK&lt;br /&gt;
3dp_db.wp_links                                    OK&lt;br /&gt;
3dp_db.wp_masterslider_options                     OK&lt;br /&gt;
3dp_db.wp_masterslider_sliders                     OK&lt;br /&gt;
3dp_db.wp_options                                  OK&lt;br /&gt;
3dp_db.wp_postmeta                                 OK&lt;br /&gt;
3dp_db.wp_posts                                    OK&lt;br /&gt;
3dp_db.wp_revslider_css                            OK&lt;br /&gt;
3dp_db.wp_revslider_layer_animations               OK&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418.log &lt;br /&gt;
cacti_db.automation_ips&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.automation_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_cache&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_last&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output_boost_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
osemain_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
osemain_s_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
oswh_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s go ahead and take a mysqldump now, including the corrupt data. then I&#039;ll drop these three databases and restore from backups&lt;br /&gt;
## cacti_db&lt;br /&gt;
## osemain_db&lt;br /&gt;
## oswh_db&lt;br /&gt;
# I sent Marcin a status update email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I was able to start your database in recovery mode, and I see the following databases have corrupt tables:&lt;br /&gt;
&lt;br /&gt;
1. osemain&lt;br /&gt;
2. cacti&lt;br /&gt;
3. oswh&lt;br /&gt;
&lt;br /&gt;
Good news that the wiki isn&#039;t in that list. And that those particular corrupt DBs don&#039;t change much, so recovering just those databases from backups should result in an acceptable data loss, if any.&lt;br /&gt;
&lt;br /&gt;
I&#039;ll keep you updated.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I made the post-corruption mysqldump backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass --all-databases | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    2m48.845s&lt;br /&gt;
user    3m19.170s&lt;br /&gt;
sys     0m2.023s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# ls mysqldump*&lt;br /&gt;
mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz &lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s drop those three databases.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 14&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE cacti_db;&lt;br /&gt;
Query OK, 108 rows affected (0.38 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_db;&lt;br /&gt;
Query OK, 22 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oswh_db;&lt;br /&gt;
Query OK, 12 rows affected (0.03 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| fef_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
15 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# recovery mode isn&#039;t going to let us INSERT to recover data from backups, so let&#039;s take it out of recovery mode and see if the db will start&lt;br /&gt;
# nah, it failed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m2.805s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# logs are the same, I think?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:10:04 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:10:04 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 24305 ...&lt;br /&gt;
250418 20:10:04 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:10:04 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:10:04 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:10:04 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:10:04 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:10:04 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:10:04 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:10:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:10:04  InnoDB: Assertion failure in thread 140076605044480 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:10:04 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x560180c61cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x560180875975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f664031f630]&lt;br /&gt;
:0(__GI_raise)[0x7f663ea46387]&lt;br /&gt;
:0(__GI_abort)[0x7f663ea47a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x560180a0a45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x560180a0afa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x560180b0d504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x560180b02487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x560180a0d17d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x560180a010f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6640317ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f663eb0eb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:10:04 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I re-enabled recovery mode, but this time just as 1. This time it did start, but this loop gets spammed to the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:11:42 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883708456&lt;br /&gt;
250418 20:11:42 InnoDB: !!! innodb_force_recovery is set to 1 !!!&lt;br /&gt;
250418 20:11:42 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:11:42 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
250418 20:11:42  InnoDB: Assertion failure in thread 140282494781184 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:11:42 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed, &lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to &lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x55e2d6dbbcad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x55e2d69cf975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f962fbdc630]&lt;br /&gt;
:0(__GI_raise)[0x7f962e303387]&lt;br /&gt;
:0(__GI_abort)[0x7f962e304a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x55e2d6b6445f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x55e2d6b64fa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x55e2d6c67504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x55e2d6c5c487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x55e2d6b6717d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62e83c)[0x55e2d6b5a83c]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f962fbd4ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f962e3cbb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:11:42 mysqld_safe Number of processes running now: 0&lt;br /&gt;
250418 20:11:42 mysqld_safe mysqld restarted&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 27371 ...&lt;br /&gt;
250418 20:11:42 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:11:42 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:11:42 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:11:42 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:11:42 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:11:42 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:11:42 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:11:42  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, even though it *says* it&#039;s started&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m5.156s&lt;br /&gt;
user    0m0.008s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 20:11:07 UTC; 13s ago&lt;br /&gt;
  Process: 24459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 24423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 24458 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─24458 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─25620 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/v...&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:11:07 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
&lt;br /&gt;
real    0m0.012s&lt;br /&gt;
user    0m0.001s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can&#039;t connect to it with mysqlcheck&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log                              &lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I set it back to recovery mode 2, restarted, and tried the mysqlcheck again&lt;br /&gt;
# huh, all lines say OK&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418&lt;br /&gt;
mysqlcheck.20250418_201348.log  mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log&lt;br /&gt;
&lt;br /&gt;
real    0m11.597s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.009s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418_201559.log &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well now I&#039;m wondering if I should have run CHECK TABLE and REPAIR TABLE rather than just DROP them https://dev.mysql.com/doc/refman/8.4/en/myisam-table-close.html&lt;br /&gt;
# I&#039;m going to restore from the backup and then see if I can do that&lt;br /&gt;
# oh, right, we can&#039;t INSERT in recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1030 (HY000) at line 91: Got error -1 from storage engine&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, fuck, now I don&#039;t know why it won&#039;t start. And it doesn&#039;t tell me why. The good news is that I was able to get a db dump. maybe I can copy this huge dump over to some other server for repair and then copy it back?&lt;br /&gt;
# we should have backups. I&#039;m going to just purge all the non-system databases and see if we can get this thing started at all&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db d3ddb;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;d3ddb&#039; at line 1&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE d3d_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE fef_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE microfactory_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_db;&lt;br /&gt;
Query OK, 21 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_stabing_db;&lt;br /&gt;
ERROR 1008 (HY000): Can&#039;t drop database &#039;obi_stabing_db&#039;; database doesn&#039;t exist&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oseforum_db;&lt;br /&gt;
Query OK, 35 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_s_db;&lt;br /&gt;
Query OK, 20 rows affected (0.04 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osewiki_db;&lt;br /&gt;
Query OK, 59 rows affected (0.31 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE phplist_db;&lt;br /&gt;
Query OK, 42 rows affected (0.16 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE seedhome_db;&lt;br /&gt;
Query OK, 12 rows affected (0.05 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE store_db;&lt;br /&gt;
Query OK, 36 rows affected (0.11 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_staging_db;&lt;br /&gt;
Query OK, 21 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# even after that, it still won&#039;t start :&#039;(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m4.863s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Fri 2025-04-18 20:34:47 UTC; 14s ago&lt;br /&gt;
  Process: 18459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 18458 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 18423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 18458 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I purge those three system-level DBs, I want to confirm they&#039;re in our backups&lt;br /&gt;
# as I feared, it looks like they&#039;re missing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zgrep -E &#039;CREATE DATABASE&#039; mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | grep &#039;IF NOT EXISTS&#039; | grep -E &#039;^.{,100}$&#039;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `3dp_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `cacti_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `d3d_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `fef_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `microfactory_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `mysql` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_staging_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oseforum_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_s_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osewiki_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oswh_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `phplist_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `seedhome_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `store_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# according to this, information_schema is essentially a cache that gets created &amp;amp; destroyed every time mysql is restarted, so we should be ok to loose that https://stackoverflow.com/questions/15306132/information-schema-error-when-restoring-database-dump&lt;br /&gt;
# I&#039;m just going to manually dump these three anyway. Or try to&lt;br /&gt;
# well, I was able to get one of the three to backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass information_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_information_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz &lt;br /&gt;
mysqldump: Got error: 1044: &amp;quot;Access denied for user &#039;root&#039;@&#039;localhost&#039; to database &#039;information_schema&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass mysql | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_mysql.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    0m0.142s&lt;br /&gt;
user    0m0.155s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass performance_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_performance_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
mysqldump: Got error: 1142: &amp;quot;SELECT,LOCK TABL command denied to user &#039;root&#039;@&#039;localhost&#039; for table &#039;cond_instances&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.009s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# mysql looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode*&lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_information_schema.20250418_205054.sql.gz&lt;br /&gt;
716K    mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_performance_schema.20250418_205157.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to move this whole db dir out of the way and see if we can start it fresh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /var/lib&lt;br /&gt;
[root@opensourceecology lib]# du -sh mysql/&lt;br /&gt;
6.5G    mysql/&lt;br /&gt;
[root@opensourceecology lib]# ls -lah | grep -i mysql&lt;br /&gt;
drwxr-xr-x   4 mysql   mysql   4.0K Apr 18 20:50 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mv mysql mysql.20250418&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mkdir mysql&lt;br /&gt;
[root@opensourceecology lib]# chown mysql:mysql mysql&lt;br /&gt;
[root@opensourceecology lib]# chmod 0755 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 8.0K&lt;br /&gt;
drwxr-xr-x   2 mysql mysql 4.0K Apr 18 20:55 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, it&#039;s started outside recovery mode now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m3.550s&lt;br /&gt;
user    0m0.007s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
250418 20:55:06 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
250418 20:56:23 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:56:23 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 21252 ...&lt;br /&gt;
250418 20:56:23 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:56:23 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:56:23 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:56:23 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:56:23 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:56:23 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
InnoDB: The first specified data file ./ibdata1 did not exist:&lt;br /&gt;
InnoDB: a new database to be created!&lt;br /&gt;
250418 20:56:23  InnoDB: Setting file ./ibdata1 size to 10 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile0 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile0 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile1 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile1 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
InnoDB: Doublewrite buffer not found: creating new&lt;br /&gt;
InnoDB: Doublewrite buffer created&lt;br /&gt;
InnoDB: 127 rollback segment(s) active.&lt;br /&gt;
InnoDB: Creating foreign key constraint system tables&lt;br /&gt;
InnoDB: Foreign key constraint system tables created&lt;br /&gt;
250418 20:56:23  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:56:24 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 0&lt;br /&gt;
250418 20:56:24 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:56:24 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:56:24 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it created all these files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 29M&lt;br /&gt;
drwxr-xr-x   5 mysql mysql 4.0K Apr 18 20:56 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
-rw-rw----   1 mysql mysql  16K Apr 18 20:56 aria_log.00000001&lt;br /&gt;
-rw-rw----   1 mysql mysql   52 Apr 18 20:56 aria_log_control&lt;br /&gt;
-rw-rw----   1 mysql mysql  18M Apr 18 20:56 ibdata1&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile0&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile1&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 mysql&lt;br /&gt;
srwxrwxrwx   1 mysql mysql    0 Apr 18 20:56 mysql.sock&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 performance_schema&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 test&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that also would have killed the mysql password; I can&#039;t login&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1045 (28000): Access denied for user &#039;root&#039;@&#039;localhost&#039; (using password: YES)&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I hacked my way in and set the root password&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysqld_safe --skip-grant-tables --skip-networking &amp;amp;&lt;br /&gt;
mysql -u root&lt;br /&gt;
use mysql;&lt;br /&gt;
update user set password=PASSWORD(&amp;quot;new-password&amp;quot;) where User=&#039;root&#039;;&lt;br /&gt;
flush privileges;&lt;br /&gt;
exit&lt;br /&gt;
jobs -l&lt;br /&gt;
# kill mysqld_safe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now I can see our three databases, plus one named test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# usually this is where I&#039;d run the mysql hardening script, but let&#039;s just drop test manually and restore from backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE test;&lt;br /&gt;
Query OK, 0 rows affected (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first let&#039;s just restore the &#039;mysql&#039; database&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that appears to have worked; our users are present now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [mysql]&amp;gt; select User from user limit 10;&lt;br /&gt;
+------------------+&lt;br /&gt;
| User             |&lt;br /&gt;
+------------------+&lt;br /&gt;
| oseforum_user    |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| 3dp_user         |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| d3d_user         |&lt;br /&gt;
| fef_user         |&lt;br /&gt;
| microfactory_usr |&lt;br /&gt;
| munin_user       |&lt;br /&gt;
| obi2_user        |&lt;br /&gt;
| obi3_user        |&lt;br /&gt;
+------------------+&lt;br /&gt;
10 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [mysql]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, and ensured it&#039;s still working. Great.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 2&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s restore the rest – including even our corrupt databases – and see if it works or breaks&lt;br /&gt;
# that took about 11.5 minutes to import ~6.8G of data&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
&lt;br /&gt;
real    11m36.530s&lt;br /&gt;
user    1m52.944s&lt;br /&gt;
sys     0m3.593s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh /var/lib/mysql&lt;br /&gt;
6.8G    /var/lib/mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m still able to connect, and now I see all our DBs – including the ones it said were corrupt&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 6&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# woah, I gave it a restart, and it came back fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 3&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess we fixed it with no data loss?&lt;br /&gt;
# let&#039;s bring up the web servers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology lib]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology lib]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the wiki loads now&lt;br /&gt;
# so does osemain&lt;br /&gt;
# I&#039;d say we&#039;re back in business&lt;br /&gt;
# I sent an email to Marcin&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I think all your sites are back now.&lt;br /&gt;
&lt;br /&gt;
I was able to restore all of your databases from a dump of the database in recovery mode. So nothing needed to be restored from backups.&lt;br /&gt;
&lt;br /&gt;
Please let me know if you see any issues. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now that Marcin has ssh access on the server again, I wonder if he has permission to execute `restart` – that would be better for him than logging into the hetzner wui and doing hard resets, which likely caused this corruption&lt;br /&gt;
# at the risk of taking everything down after I just told Marcin that everything is up, I&#039;m going to try it&lt;br /&gt;
# looks like it won&#039;t let him reboot if other users are logged-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ reboot&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
Please retry operation after closing inhibitors and logging out other users.&lt;br /&gt;
Alternatively, ignore inhibitors and users with &#039;systemctl reboot -i&#039;.&lt;br /&gt;
[marcin@opensourceecology ~]$ systemctl reboot -i&lt;br /&gt;
==== AUTHENTICATING FOR org.freedesktop.login1.reboot-multiple-sessions ===&lt;br /&gt;
Authentication is required for rebooting the system while other users are logged in.&lt;br /&gt;
Multiple identities can be used for authentication:&lt;br /&gt;
 1.  maltfield&lt;br /&gt;
 2.  crupp&lt;br /&gt;
 3.  Tom Griffing (tgriffing)&lt;br /&gt;
 4.  jthomas&lt;br /&gt;
Choose identity to authenticate as (1-4):&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the sudoers command to give marcin *just* access to the reboot command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# visudo&lt;br /&gt;
[root@opensourceecology lib]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology lib]# tail /etc/sudoers&lt;br /&gt;
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom&lt;br /&gt;
&lt;br /&gt;
## Allows members of the users group to shutdown this system&lt;br /&gt;
# %users  localhost=/sbin/shutdown -h now&lt;br /&gt;
&lt;br /&gt;
## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)&lt;br /&gt;
#includedir /etc/sudoers.d&lt;br /&gt;
&lt;br /&gt;
# let marcin reboot the machine gracefully&lt;br /&gt;
marcin ALL = NOPASSWD: /sbin/reboot&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I couldn&#039;t test this on the server without changing marcin&#039;s password, so I spun-up a quick DispVM to ensure it *only* gives him access to reboot&lt;br /&gt;
# it&#039;s debian, but sudoers syntax should (hopefully) be the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@debian-12-dvm:~$ sudo su -&lt;br /&gt;
root@debian-12-dvm:~# adduser marcin --disabled-password --gecos &#039;&#039;&lt;br /&gt;
Adding user `marcin&#039; ...&lt;br /&gt;
Adding new group `marcin&#039; (1001) ...&lt;br /&gt;
Adding new user `marcin&#039; (1001) with group `marcin (1001)&#039; ...&lt;br /&gt;
Creating home directory `/home/marcin&#039; ...&lt;br /&gt;
Copying files from `/etc/skel&#039; ...&lt;br /&gt;
Adding new user `marcin&#039; to supplemental / extra groups `users&#039; ...&lt;br /&gt;
Adding user `marcin&#039; to group `users&#039; ...&lt;br /&gt;
root@debian-12-dvm:~# &lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# visudo&lt;br /&gt;
root@debian-12-dvm:~#&lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# passwd marcin&lt;br /&gt;
New password: &lt;br /&gt;
Retype new password: &lt;br /&gt;
passwd: password updated successfully&lt;br /&gt;
root@debian-12-dvm:~# sudo su - marcin&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo su -&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/su -&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo echo hi&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/echo hi&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$ &lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ reboot&lt;br /&gt;
-bash: reboot: command not found&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, that worked. Perfect.&lt;br /&gt;
# I tested it on hetzner2; it worked too.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ sudo reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I sent Marcin a reply ask him to test reboots via ssh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry the server just went down; that was me testing to make sure your &#039;marcin&#039; user now has permission to do a proper &amp;amp; safer `sudo reboot` of hetzner2. It does.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that&lt;br /&gt;
&amp;gt; I should plan on potential breakage at any time?&lt;br /&gt;
&lt;br /&gt;
Great question. There&#039;s a couple things I&#039;d like to implement to prevent this from happening again:&lt;br /&gt;
&lt;br /&gt;
1. Replace both of your disks on hetzner2&lt;br /&gt;
&lt;br /&gt;
2. Give you reboot permission on hetzner2&lt;br /&gt;
&lt;br /&gt;
My best-guess is that the corruption happened because you abruptly shutdown the server. As you know, that&#039;s generally not a good idea as it can cause data loss.&lt;br /&gt;
&lt;br /&gt;
But filesystems use journals and databases use pages. They *should* be able to recover from abrupt shutdowns. They wouldn&#039;t be very useful if they were so frail as to not be able to recover from something like that...&lt;br /&gt;
&lt;br /&gt;
But in this case, I think it was a &amp;quot;perfect storm&amp;quot; that you caused corruption and it wasn&#039;t able to recover from it due to a bug in mariadb. And, because your OS is EOL, we can&#039;t update to a newer version of mariadb that *is* able to recover from such a unlucky combination of events.&lt;br /&gt;
&lt;br /&gt;
So, in the meantime, instead of you logging into hetzner&#039;s WUI to trigger reboots, I&#039;d prefer if you would ssh into the hetzner2 server and execute&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
Please test this on your computer now to make sure you&#039;re setup for it. To ssh into hetzner2, execute this command on your computer:&lt;br /&gt;
&lt;br /&gt;
  ssh -p 32415 marcin@opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
And then at the prompt, execute this command (make sure you type this *after* you&#039;ve logged into hetzner, or you&#039;ll end-up rebooting your own laptop!)&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
The second thing I&#039;d like to do is replace both of your disks on hetzner2. I don&#039;t think they caused corruption in this case, but I did discover that they&#039;re both screaming that they&#039;re going to die soon and asking to be replaced, so I would be a fool not to heed that warning.&lt;br /&gt;
&lt;br /&gt;
Hetzner shouldn&#039;t charge us to replace a failing disk, but I&#039;ll schedule some downtime for remote hetzner hands to shutdown the machine, then I&#039;ll need to format the new drive, add it to the RAID (the mirror of two redundant disks), and update your grub boot partition.&lt;br /&gt;
&lt;br /&gt;
There&#039;s some risk in doing this, because you&#039;ll be running on one non-redundant disk (a disk which is screaming at us saying it&#039;s going to die within 24 hours) while the RAID is re-building. But, of course, there&#039;s risk in not doing it..&lt;br /&gt;
&lt;br /&gt;
Please confirm that you can now reboot hetzner2 via ssh.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 16:39, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Thats excellent, thabk you, looks good. Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that I should plan&lt;br /&gt;
&amp;gt; on potential breakage at any time? Regarding the full migration, how many&lt;br /&gt;
&amp;gt; more hours/days of provisioning do tou still expwct to need? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created an article for the CHG to replace the first disk on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
## I wonder if I can figure out which one grub uses and replace that one second..&lt;br /&gt;
# from my log yesterday, here&#039;s our two drive&#039;s serial numbers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fuck; looks like neither is referenced in /boot/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA4520&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA336C&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the steps to setup grub are actually quite simple, according to the hetzner docs https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## it says if we&#039;re doing it on the booted system, then we just need to run `grub-install /dev/sdX`&lt;br /&gt;
# it has additional instructions for grub1. And, uh, looks like we have grub1, grub2, *and* an efi dir in /boot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64                            initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img  System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64                      initramfs-3.10.0-327.18.2.el7.x86_64.img         System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64                        initramfs-3.10.0-514.26.2.el7.x86_64.img         System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64                        initramfs-3.10.0-693.2.2.el7.x86_64.img          System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64                         initramfs-3.10.0-693.2.2.el7.x86_64kdump.img     System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi                                                      initrd-plymouth.img                              vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
grub                                                     lost+found                                       vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
grub2                                                    symvers-3.10.0-1127.el7.x86_64.gz                vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img  symvers-3.10.0-1160.119.1.el7.x86_64.gz          vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img                     symvers-3.10.0-327.18.2.el7.x86_64.gz            vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img                symvers-3.10.0-514.26.2.el7.x86_64.gz            vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img               symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m thinking we should actually just tell hetzner to do a hot swap while the system is on, so we can do this &amp;quot;easy install&amp;quot; of grub without risking the system not coming-up after they removed the drive&lt;br /&gt;
# oh, the efi dir is empty, so I&#039;m thinking we&#039;re using grub2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# find efi&lt;br /&gt;
efi&lt;br /&gt;
efi/EFI&lt;br /&gt;
efi/EFI/centos&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, the grub dir just has one file in it?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub&lt;br /&gt;
total 10K&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Apr 11  2016 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
-rw-r--r--  1 root root 1.4K Nov 15  2011 splash.xpm.gz&lt;br /&gt;
[root@opensourceecology boot]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# grub2 looks most sane&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub2&lt;br /&gt;
total 52K&lt;br /&gt;
drwx------. 5 root root 1.0K Jul 26  2024 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Dec 15  2015 fonts&lt;br /&gt;
-rw-r--r--  1 root root 7.8K Jul 26  2024 grub.cfg&lt;br /&gt;
-rw-r--r--  1 root root 5.3K Jun  1  2016 grub.cfg.1499616907.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 6.1K Jul  9  2017 grub.cfg.1506097734.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 7.0K Sep 22  2017 grub.cfg.1588589453.rpmsave&lt;br /&gt;
-rw-r--r--. 1 root root 1.0K Jul 26  2024 grubenv&lt;br /&gt;
drwxr-xr-x. 2 root root 9.0K May 31  2016 i386-pc&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K May 31  2016 locale&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it looks like it&#039;s referencing the raid, not the drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### BEGIN /etc/grub.d/10_linux ###&lt;br /&gt;
menuentry &#039;CentOS Linux (3.10.0-1160.119.1.el7.x86_64) 7 (Core)&#039; --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option &#039;gnulinux-3.10.0-327.13.1.el7.x86_64-advanced-af18bd25-f715-4003-b055-170a07591c60&#039; {&lt;br /&gt;
		load_video&lt;br /&gt;
		set gfxpayload=keep&lt;br /&gt;
		insmod gzio&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod diskfilter&lt;br /&gt;
		insmod mdraid1x&lt;br /&gt;
		insmod ext2&lt;br /&gt;
		set root=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;&lt;br /&gt;
		if [ x$feature_platform_search_hint = xy ]; then&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root --hint=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;  9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		else&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root 9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		fi&lt;br /&gt;
		linux16 /vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
		initrd16 /initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# right, so if I understand this correctly: we&#039;re not updating grub. We&#039;re using &#039;grub-install&#039; to copy our grub config *to* the drive. that&#039;s easier and less concerning than I thought.&lt;br /&gt;
# well, since I can&#039;t see any good reason to pick one drive or the other to replace first, I&#039;m going to have them replace /dev/sdb first. Just because &#039;sda&#039; seems like it would be primary. I know it&#039;s probably not, but, anyway..&lt;br /&gt;
# that means we&#039;ll replace Crucial_CT250MX200SSD1_154410FA4520 first; I created another wiki entry for that https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
# Marcin sent me an email confirming that he&#039;s able to restart hetzner2 with `sudo reboot`. I asked him to use this in the future if he needs to reboot it again.&lt;br /&gt;
# the disk is getting pretty full, but I&#039;m going to leave these files in /var/tmp/ for at least a few days, to make sure we don&#039;t actually need to restore from a backup again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  150G   38G  80% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/lib/mysql.20250418 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 17, 2025=&lt;br /&gt;
# Marcin sent me an email last night (and again this morning) asking why the wiki is down&lt;br /&gt;
# I hadn&#039;t touched ose infra since 6 days ago&lt;br /&gt;
# the wiki is still on hetzner2, which is on EOL Cent, so I&#039;m not terribly surprised it&#039;s falling apart.&lt;br /&gt;
# I first warned Marcin about this many years ago, and hopefully the migration to hetzner3 will be finished before the end of this year&lt;br /&gt;
# anyway, let&#039;s check what happened to the wiki on hetzner2&lt;br /&gt;
# it&#039;s a 500 error complaining about the db&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ curl -iL wiki.opensourceecology.org&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:52 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://wiki.opensourceecology.org/&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 500 Internal Server Error&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:54 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 976&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Varnish: 434054&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish-v4&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h1&amp;gt;Sorry! This site is experiencing technical difficulties.&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Try waiting a few minutes and reloading.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;small&amp;gt;(Cannot access the database)&amp;lt;/small&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;hr /&amp;gt;&amp;lt;div style=&amp;quot;margin: 1.5em&amp;quot;&amp;gt;You can try searching via Google in the meantime.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;small&amp;gt;Note that their indexes of our content may be out of date.&amp;lt;/small&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;form method=&amp;quot;get&amp;quot; action=&amp;quot;//www.google.com/search&amp;quot; id=&amp;quot;googlesearch&amp;quot;&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;domains&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;num&amp;quot; value=&amp;quot;50&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;ie&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;oe&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;q&amp;quot; size=&amp;quot;31&amp;quot; maxlength=&amp;quot;255&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;submit&amp;quot; name=&amp;quot;btnG&amp;quot; value=&amp;quot;Search&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;p&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; checked=&amp;quot;checked&amp;quot; /&amp;gt;Open Source Ecology&amp;lt;/label&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;WWW&amp;lt;/label&amp;gt;&lt;br /&gt;
	&amp;lt;/p&amp;gt;&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# disk is fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G   96G   92G  52% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s no new logs in the apache error log when I hit the site in real-time (bypassing the cache)&lt;br /&gt;
# there&#039;s also no new logs in the mariadb error log when I hit the site in real-time&lt;br /&gt;
# well, the db isn&#039;t running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Thu 2025-04-17 17:39:24 UTC; 2h 42min ago&lt;br /&gt;
  Process: 1227 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 1226 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 1103 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 1226 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# error logs aren&#039;t very helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# journalctl -fu mariadb&lt;br /&gt;
-- Logs begin at Thu 2025-04-17 17:38:59 UTC. --&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I try to restart it manually, nothing gets put in the journal logs, but there&#039;s a bunch to the actual log file that the journal log mentions (damn systemd)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the log that pops-up when we try a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 20:24:31 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 20:24:31 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 10583 ...&lt;br /&gt;
250417 20:24:31 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 20:24:31 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 20:24:31 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 20:24:31 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 20:24:31 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 20:24:31 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 20:24:31 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 20:24:31  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 20:24:31  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 20:24:31  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 20:24:31  InnoDB: Assertion failure in thread 140093400303360 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250417 20:24:31 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x563a1c105cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x563a1bd19975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f6a294c9630]&lt;br /&gt;
:0(__GI_raise)[0x7f6a27bf0387]&lt;br /&gt;
:0(__GI_abort)[0x7f6a27bf1a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x563a1beae45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638f69)[0x563a1beaef69]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x563a1bfb1504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x563a1bfa6487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x563a1beb117d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x563a1bea50f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6a294c1ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f6a27cb8b0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250417 20:24:31 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# google points to this https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
## they say it could be a bug that might be fixed in v5.7. We&#039;re using 5.5.68. hetzner3 uses 5.8.&lt;br /&gt;
# reddit says we&#039;re fucked and should restore from backup https://old.reddit.com/r/mysql/comments/d3nkc7/innodb_assertion_failure_in_thread_4560_in_file/&lt;br /&gt;
# before reading any more, I&#039;m going to immediately make a local copy of our most-recent backups&lt;br /&gt;
# looks like we have a backup from 13 hours ago and one from 27 hours ago&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ date&lt;br /&gt;
Thu Apr 17 20:36:56 UTC 2025&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync&lt;br /&gt;
total 21G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 17 07:49 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    21G Apr 17 07:48 daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync.old/&lt;br /&gt;
total 22G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 16 07:52 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    22G Apr 16 07:52 daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this SE answer is helpful https://serverfault.com/questions/592793/mysql-crashed-and-wont-start-up&lt;br /&gt;
## it says we can force the db to start (in &amp;quot;recovery mode&amp;quot;) and then try to figure out which table is corrupted. Then we might be able to backup more-recent data from the not-corrupt tables and only recover the fucked table&lt;br /&gt;
## other warnings suggest solving the underlying issue: why did the data become corrupt?&lt;br /&gt;
## well, we know Marcin has been hard-resetting the server (via the hetzner wui) about every week because it keeps breaking since some months ago (it&#039;s EOL and not worth debugging)&lt;br /&gt;
## but it&#039;s also possible we have a worse issue, like a disk failing. We do have RAID1 tho, so idk. Still, it would be wise to check the SMART data and RAID logs and filesystem for corruption&lt;br /&gt;
# I sent a quick status update to Marcin so he knows the severity of the issue and that this isn&#039;t going to be fixed soon&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Your database is corrupt and won&#039;t start.&lt;br /&gt;
&lt;br /&gt;
Quick internet search for the error messages suggests this could be a bug that&#039;s been fixed in mariadb 5.7. You&#039;re using 5.6 and can&#039;t upgrade because your OS is EOL. hetnzer3 is running 5.8.&lt;br /&gt;
&lt;br /&gt;
 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
I&#039;m looking into seeing what is corrupt, what isn&#039;t corrupt, and if we can restore from backup.&lt;br /&gt;
&lt;br /&gt;
This is not going to be an easy or fast fix, sorry. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the backups of the backups finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /home/b2user/sync*/* /var/tmp/&lt;br /&gt;
sending incremental file list&lt;br /&gt;
daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
 22,975,631,986 100%  139.63MB/s    0:02:36 (xfr#1, to-chk=1/2)&lt;br /&gt;
daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
 21,566,407,634 100%  103.43MB/s    0:03:18 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 44,552,914,338 bytes  received 54 bytes  125,324,653.70 bytes/sec&lt;br /&gt;
total size is 44,542,039,620  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  138G   50G  74% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also going to take down the webservers, so that they can&#039;t fuck-up the database worse, if we do start it in some recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I should also make a backup of /var/lib/mysql&lt;br /&gt;
# I&#039;m going to create a dif for all of this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mkdir /var/tmp/dbFail.20250417&lt;br /&gt;
[root@opensourceecology ~]# chown root:root /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041* /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# vim /var/tmp/dbFail.20250417/info.txt&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /var/tmp/dbFail.20250417/info.txt &lt;br /&gt;
2025-04-17: Marcin emailed me last night saying the wiki was down with a db error. Today I tried to start it, but it refues to come-up. Looks like it&#039;s preventing itself from starting because it realizes something is corrupt and starting it would make things worse. Internet says maybe this was fixed in a newer version; we can&#039;t upgrade because Cent is EOL. Hetzner3 has the newer version&lt;br /&gt;
&lt;br /&gt;
		 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
		Anyway, I&#039;m creating this folder to store some backups before we make things worse.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# aaaand I added a copy of /var/lib/mysql/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /var/lib/mysql /var/tmp/dbFail.20250417/var-lib-mysql.$(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
sending incremental file list&lt;br /&gt;
created directory /var/tmp/dbFail.20250417/var-lib-mysql.20250417&lt;br /&gt;
mysql/&lt;br /&gt;
mysql/aria_log.00000001&lt;br /&gt;
		 16,384 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=707/709)&lt;br /&gt;
...&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rate_locations.frm&lt;br /&gt;
		  8,714 100%    9.26kB/s    0:00:00 (xfr#689, to-chk=1/709)&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rates.frm&lt;br /&gt;
		 13,128 100%   13.95kB/s    0:00:00 (xfr#690, to-chk=0/709)&lt;br /&gt;
&lt;br /&gt;
sent 7,384,914,964 bytes  received 13,343 bytes  114,495,012.51 bytes/sec&lt;br /&gt;
total size is 7,383,062,830  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# another important note: apparently we can keep increasing the value of innodb_force_recovery until it starts, but anything &amp;gt;3 could corrupt the data worse https://dba.stackexchange.com/q/241714&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Marko, MariaDB Innodb lead: MDEV-15370 was a bug when ugprading to 10.3, caused by MDEV-12288. Actually upgrades can still fail (MDEV-15912) if a slow shutdown of the old server was not made. Because the scenario does not involve upgrading to 10.3 or later, I am afraid that the user witnessed some kind of undo log corruption. Starting up with innodb_force_recovery=3 might allow dumping all data. If that crashes, then try innodb_force_recovery=5, but be aware that anything &amp;gt;3 may corrupt the database further, and therefore you should not use the database for anything else than mysqldump&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, a lot of the links for how to fix this are now dead&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## https://forums.mysql.com/read.php?22,603093,604631#msg-604631&lt;br /&gt;
## https://support.plesk.com/hc/en-us/articles/12377798484375-Plesk-is-not-accessible-ERROR-Zend-Db-Adapter-Exception-SQLSTATE-HY000-2002-No-such-file-or-directory&lt;br /&gt;
# we&#039;re running 5.6, so it should be this https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html&lt;br /&gt;
## but note that redirects to 8.6 for some reason? https://dev.mysql.com/doc/refman/8.4/en/forcing-innodb-recovery.html&lt;br /&gt;
## ah, so does 1.1 – apparently anything it doesn&#039;t like just reidrects to the latest version https://dev.mysql.com/doc/refman/1.1/en/forcing-innodb-recovery.html&lt;br /&gt;
# this suggests that, if we&#039;re going to use innodb_force_recovery 4 or greater, we only do it on another machine. So basically take the data I just backed-up put it on a separate machine, and do the fucker *there* instead https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## it also says that dumps of 4 or greater could still render corrupt data, so they shouldn&#039;t be trusted, anyway&lt;br /&gt;
## good news: it says the db blocks all INSERT, UPDATE, and DELETE commands when any recovery mode is enabled&lt;br /&gt;
### but we *can* run DROP. so the idea is to dump everything in recovery mode and drop what is corrupt. then restart with the recovery value set to 0 and restore.&lt;br /&gt;
## it says that dumps from recover mode of 1 or 2 or 3 are safe, and only the page is corrupt&lt;br /&gt;
### here&#039;s the definition of a page https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A unit representing how much data InnoDB transfers at any one time between disk (the data files) and memory (the buffer pool). A page can contain one or more rows, depending on how much data is in each row. If a row does not fit entirely into a single page, InnoDB sets up additional pointer-style data structures so that the information about the row can be stored in one page.&lt;br /&gt;
&lt;br /&gt;
One way to fit more data in each page is to use compressed row format. For tables that use BLOBs or large text fields, compact row format allows those large columns to be stored separately from the rest of the row, reducing I/O overhead and memory usage for queries that do not reference those columns.&lt;br /&gt;
&lt;br /&gt;
When InnoDB reads or writes sets of pages as a batch to increase I/O throughput, it reads or writes an extent at a time.&lt;br /&gt;
&lt;br /&gt;
All the InnoDB disk data structures within a MySQL instance share the same page size.&lt;br /&gt;
&lt;br /&gt;
See Also buffer pool, compact row format, compressed row format, data files, extent, page size, row.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess that just means data that hasn&#039;t been written to disk yet. So I *think* it should be OK to trust data that only has corrupt pages?&lt;br /&gt;
# ok, I think I have enough to proceed – at least for recovery modes 1, 2, and 3.&lt;br /&gt;
# but first let&#039;s check SMART&lt;br /&gt;
# oh, fuck, my notes on this are on the wiki. Of course.&lt;br /&gt;
# arch wiki to the rescue https://wiki.archlinux.org/title/S.M.A.R.T.&lt;br /&gt;
# fail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
-bash: smartctl: command not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# luckily the yum servers for this EOL OS are still online, and I could install it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# yum install smartmontools&lt;br /&gt;
...&lt;br /&gt;
Total download size: 546 k&lt;br /&gt;
Installed size: 2.0 M&lt;br /&gt;
Is this ok [y/d/N]: y&lt;br /&gt;
Downloading packages:&lt;br /&gt;
smartmontools-7.0-2.el7.x86_64.rpm                                                                                                              | 546 kB  00:00:00     &lt;br /&gt;
Running transaction check&lt;br /&gt;
Running transaction test&lt;br /&gt;
Transaction test succeeded&lt;br /&gt;
Running transaction&lt;br /&gt;
  Installing : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
  Verifying  : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
&lt;br /&gt;
Installed:&lt;br /&gt;
  smartmontools.x86_64 1:7.0-2.el7                                                                                                                                     &lt;br /&gt;
&lt;br /&gt;
Complete!&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# better&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
SMART support is: Available - device has SMART capability.&lt;br /&gt;
SMART support is: Enabled&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well this is terrifying; it says both our disks are gonna fail within 24 hours&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# compare that to hetnzer3, which says all is good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m not 100% convinced that this is true. I still want to initiate a test on the drives, but I&#039;m going to go ahead and pass this to hetzner support asap and ask them if there&#039;s a fee for them to replace our drives.&lt;br /&gt;
# oh, interesting. they have a walkthrough that says it&#039;s free via Server -&amp;gt; Technical -&amp;gt; Disk Failure https://robot.hetzner.com/support/index&lt;br /&gt;
## well, it lists two options&lt;br /&gt;
### Free Replacement drive nearly new or used and tested; depends on what is in stock. &lt;br /&gt;
### At cost Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
## we were given an option if we should hot swap while the system is on or shutdown. I&#039;m going to say shutdown. That&#039;ll be simpler from the OS side, I think&lt;br /&gt;
## dang, it says they&#039;ll swap the drive within 2-4 hours.&lt;br /&gt;
# I&#039;ve never done this before, but it&#039;s a hardware raid. My understanding is that as soon as it comes-up, it&#039;ll begin copying the data from one disk to the other disk. But, christ, if both disks are fucked then which disk should I choose them to replace? Can I see which one is more fucked than the other?&lt;br /&gt;
# hetzner provides 4 docs for assistance on this&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#information-on-defective-drives&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/maintainance/nvme/#show-serial-number-of-a-specific-nvme-ssd&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#creating-a-complete-smart-log&lt;br /&gt;
# that first doc says to run the command we just ran&lt;br /&gt;
# hmm..it says for more info we should look at the &amp;quot;Failed Attributes&amp;quot; – but we have none for either disk&lt;br /&gt;
# ok, the docs say we can get more info with -A&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so both say &amp;quot;Percent_Lifetime_Remain&amp;quot; is an issue. does that mean it&#039;s not *actually* writing corrupt data, but it&#039;s literally just a timer that hit and said &amp;quot;yeah you should probably replace the disk??&amp;quot;&lt;br /&gt;
# well, &amp;quot;Percent_Lifetime_Remain&amp;quot; doesn&#039;t appear in the docs table. nor in the source wikipedia table https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes&lt;br /&gt;
# yeah, reddit suggests that means the drive &amp;quot;should be replaced soon&amp;quot; but not that it&#039;s actually detected as failing now https://www.reddit.com/r/homelab/comments/kaaqma/percent_lifetime_remain_failing_now/&lt;br /&gt;
# in that case, I guess it doesn&#039;t matter which disk we replace. But let&#039;s go ahead and get one replaced. I don&#039;t think this was the cause of the db corruption (I still think it&#039;s &amp;quot;shutting down the computer abruptly + a bug in old mariadb that prevents it from recovering&amp;quot;), but I would be stupid not to take a free replacement of a RAID1-mirrored disk that&#039;s alerting us that it&#039;s too old to be in prod.&lt;br /&gt;
# the second hetnzer docs refer to nvme. that&#039;s relevant on hetzner3 but not hetzner2. anyway, I do want to know how to check this on hetzer2 (even if I can&#039;t update the wiki right now with this docs)&lt;br /&gt;
# wow, the output for smartctl looks very different for NVMEs on Debian than it does on CentOS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        39 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    6%&lt;br /&gt;
Data Units Read:                    152.358.379 [78,0 TB]&lt;br /&gt;
Data Units Written:                 52.125.092 [26,6 TB]&lt;br /&gt;
Host Read Commands:                 6.873.372.480&lt;br /&gt;
Host Write Commands:                1.362.559.127&lt;br /&gt;
Controller Busy Time:               22.226&lt;br /&gt;
Power Cycles:                       28&lt;br /&gt;
Power On Hours:                     17.245&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      159&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               39 Celsius&lt;br /&gt;
Temperature Sensor 2:               48 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        40 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    7%&lt;br /&gt;
Data Units Read:                    140.811.605 [72,0 TB]&lt;br /&gt;
Data Units Written:                 56.604.901 [28,9 TB]&lt;br /&gt;
Host Read Commands:                 1.304.073.899&lt;br /&gt;
Host Write Commands:                1.364.668.115&lt;br /&gt;
Controller Busy Time:               21.180&lt;br /&gt;
Power Cycles:                       23&lt;br /&gt;
Power On Hours:                     15.565&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      149&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               40 Celsius&lt;br /&gt;
Temperature Sensor 2:               45 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that shows we&#039;re at 6% and 7% usage on hetzner3, whereas I guess we&#039;re at 100% on hetzner2&lt;br /&gt;
# the third hetzner doc refers to a software raid. actually, I thought we were using a hardware raid, but now I&#039;m not sure&lt;br /&gt;
# this indicates that our raid is fine. two UUs (eg `[UU]`) is fine. Bad would be a U and a missing U (eg `[U_]`)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat &lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sdb2[1] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[1] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah crap, the process to bring the new drive back into the RAID is not-trivial https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## first we have to format the new drive exactly as the old drive, then add each partition into the RAID array, then update grub. And, of course, meanwhile we&#039;ll be running on one disk. So if we fuck-up any of those steps, we loose everything. This could take me a few days (or weeks), and meanwhile the sites are all offline and our daily backups on backblaze are being deleted/rotated out of existance. Sadly, I think I&#039;m going to postpone this until after we get the sites back-up.&lt;br /&gt;
# the last hetzner doc shows us how to get the serial number of our disks (which hetzner will ask-for when we tell them to swap it)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and ran a SMART test; it says it&#039;ll take just 2 minutes to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:07:55 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:08:18 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also kicked-off a long test, which I can check tomorrow&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:12 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:14 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, then we have the filesystem. it looks like /var/lib/msyql/ lives on &#039;/&#039; which is /dev/md2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h /var/lib/mysql&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
/dev/md2        197G  145G   43G  78% /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/md2&lt;br /&gt;
&lt;br /&gt;
Disk /dev/md2: 215.0 GB, 215024271360 bytes, 419969280 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk /dev/md2&lt;br /&gt;
NAME MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
md2    9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it won&#039;t let me check the filesystem while it&#039;s mounted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fsck /dev/md2&lt;br /&gt;
fsck from util-linux 2.23.2&lt;br /&gt;
e2fsck 1.42.9 (28-Dec-2013)&lt;br /&gt;
/dev/md2 is mounted.&lt;br /&gt;
e2fsck: Cannot continue, aborting.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it probably should be happening on-boot, but I couldn&#039;t find it in dmesg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i check&lt;br /&gt;
[    0.000000] Early table checksum verification disabled&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i fsck&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, instead we can just use tune2fs to get the info on the last check that was run&lt;br /&gt;
# looks like it ran today; probably when Marcin rebooted it https://unix.stackexchange.com/questions/400851/what-should-i-do-to-force-the-root-filesystem-check-and-optionally-a-fix-at-bo&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2&lt;br /&gt;
tune2fs 1.42.9 (28-Dec-2013)&lt;br /&gt;
Filesystem volume name:   &amp;lt;none&amp;gt;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Filesystem UUID:          af18bd25-f715-4003-b055-170a07591c60&lt;br /&gt;
Filesystem magic number:  0xEF53&lt;br /&gt;
Filesystem revision #:    1 (dynamic)&lt;br /&gt;
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize&lt;br /&gt;
Filesystem flags:         signed_directory_hash&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Filesystem OS type:       Linux&lt;br /&gt;
Inode count:              13131776&lt;br /&gt;
Block count:              52496160&lt;br /&gt;
Reserved block count:     2624808&lt;br /&gt;
Free blocks:              26575102&lt;br /&gt;
Free inodes:              12417672&lt;br /&gt;
First block:              0&lt;br /&gt;
Block size:               4096&lt;br /&gt;
Fragment size:            4096&lt;br /&gt;
Reserved GDT blocks:      1011&lt;br /&gt;
Blocks per group:         32768&lt;br /&gt;
Fragments per group:      32768&lt;br /&gt;
Inodes per group:         8192&lt;br /&gt;
Inode blocks per group:   512&lt;br /&gt;
Flex block group size:    16&lt;br /&gt;
Filesystem created:       Tue May 31 06:01:12 2016&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Last write time:          Thu Apr 17 17:39:00 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
Check interval:           0 (&amp;lt;none&amp;gt;)&lt;br /&gt;
Lifetime writes:          124 TB&lt;br /&gt;
Reserved blocks uid:      0 (user root)&lt;br /&gt;
Reserved blocks gid:      0 (group root)&lt;br /&gt;
First inode:              11&lt;br /&gt;
Inode size:               256&lt;br /&gt;
Required extra isize:     28&lt;br /&gt;
Desired extra isize:      28&lt;br /&gt;
Journal inode:            8&lt;br /&gt;
Default directory hash:   half_md4&lt;br /&gt;
Directory Hash Seed:      b9456d9f-1608-4444-99c2-02e6f327e42d&lt;br /&gt;
Journal backup:           inode blocks&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# both of the filesystems (/ and /boot) look fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md1 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /boot&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              46&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Tue May 31 06:01:07 2016&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, so far I couldn&#039;t find any signs of corruption on the disk/fs level&lt;br /&gt;
# back to the db, I set the recovery option in the my.cnf file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# cp my.cnf my.cnf.20250417&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 1&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it didn&#039;t come-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried changing it to restore level 2; this time it got stuck &amp;quot;waiting for the background threads&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 22:32:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 22:32:49 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 14901 ...&lt;br /&gt;
250417 22:32:49 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 22:32:49 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 22:32:49 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 22:32:49 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 22:32:49 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 22:32:49 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 22:32:49 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 22:32:49  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 22:32:49  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 22:32:49  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:50  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:51  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:52  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:53  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:54  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:55  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:56  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:57  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:58  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it seems infinite. I don&#039;t know if it&#039;s going to time-out, but I&#039;m just going to leave it and come-back tomorrow.&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 11, 2025=&lt;br /&gt;
&lt;br /&gt;
# let&#039;s get Catarina that broken staging site for osemain on hetzner3&lt;br /&gt;
# Marcin still hasn&#039;t regained access to his ssh key (so he can update the ose keepass), but he did finally send me the password to our hetzner account&lt;br /&gt;
# so now I can order a second IPv4 address, as needed for obi &amp;amp; osemain to have two distinct sites on hetzner3&lt;br /&gt;
# I logged-into hetzner https://robot.hetzner.com/server&lt;br /&gt;
# I also typed a &amp;quot;name&amp;quot; into the blank &amp;quot;name&amp;quot; fields for our two servers. one is now called &amp;quot;hetzner2&amp;quot; and the new one &amp;quot;hetzner3&amp;quot;&lt;br /&gt;
# I clicked on the server for &amp;quot;hetzner3&amp;quot; and the tab &amp;quot;IPs&amp;quot;.&lt;br /&gt;
## Then I clicked on &amp;quot;Order additional IPs / Nets&amp;quot;&lt;br /&gt;
## I selected &amp;quot;One additional IP with costs (€ 1.70 max. per month / € 0.0027 per hour + € 4.90 once-off setup)&amp;quot;&lt;br /&gt;
## it required me to enter a reason (IPv4 is scarce) to which I wrote:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
we need to run two websites with the same domain name that are already running on our primary IPv4 address, and a client doesn&#039;t have IPv6 working at their office&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## and I clicked &amp;quot;Apply for IP/subnet in obligation&amp;quot;&lt;br /&gt;
## I got a message; looks like it needs human approval&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your request for additional IPs/subnets was successfully sent. We will send you an email as soon as your IP/subnet is ready.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I typed an email to Marcin and Catarina to notify them of this order&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
As authorized on our last call, I ordered an additional IPv4 address for your hetzner account.&lt;br /&gt;
&lt;br /&gt;
IPv4 addresses are scarce, and it appears that they need to approve it manually.&lt;br /&gt;
&lt;br /&gt;
The cost is €1.70 per month + € 4.90 once-off setup.&lt;br /&gt;
&lt;br /&gt;
This will allow us to run more than one website with the same domain off the same server. That will be needed for osemain and obi.&lt;br /&gt;
&lt;br /&gt;
Once you finish rebuilding those websites on hetzner3 to use a new not-broken theme, we can cancel this second IP address.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I finished typing ^ that email, I got an email from hetzner indicating that we have a new IP&lt;br /&gt;
# I refreshed the hetzner wui, and now I see the new IP&lt;br /&gt;
# ...&lt;br /&gt;
# following-up on the bus factor, I added Catarina &amp;amp; Tom&#039;s ssh keys to their authorized_keys files on hetzner3&lt;br /&gt;
## I sent them both emails asking them to confirm access&lt;br /&gt;
# I also emailed Marcin asking if he installed zulucrypt yet to try to recover his old ssh key&lt;br /&gt;
# update: within a few hours, Marcin had successfully decrypted and mounted his old veracrypt volume using zuluCrypt&lt;br /&gt;
# he created this article on the wiki https://wiki.opensourceecology.org/wiki/Zulucrypt&lt;br /&gt;
# I found that he had previously documented scattered articles about backups, luks, veracrypt, pgp, cybersec general, etc in a ton of different articles. So I spent some time adding categories and &amp;quot;see also&amp;quot; sections to those articles, in hopes he will be more easily able to do this in the future&lt;br /&gt;
# I also asked him to please document what he needed for himself 5 years from now into a README file next to the &#039;ose-veracrypt&#039; volume on his usb drive.&lt;br /&gt;
# Marcin confirmed that he was able to restore his ssh keys and ssh into hetzner3. awesome.&lt;br /&gt;
# ...&lt;br /&gt;
# I logged all my hours and sent an invoice to OSE for last month (Mar 2025)&lt;br /&gt;
# gah, I had obliterated half my 2025Q1 log. when I tried to restore it, I got a 413 error lgo&lt;br /&gt;
# I checked php and nginx; it&#039;s 10M. How did I write &amp;gt;10 MB of text in one quarter?&lt;br /&gt;
# there&#039;s too many layers on this server; I checked the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Apr 11 22:18:20.306872 2025] [:error] [pid 13182] [client 127.0.0.1:56606] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;], referer: https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&lt;br /&gt;
HTTP/1.1 413 Request Entity Too Large&lt;br /&gt;
Message: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413)&lt;br /&gt;
Apache-Error: [file &amp;quot;apache2_util.c&amp;quot;] [line 271] [level 3] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;]&lt;br /&gt;
127.0.0.1 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.0&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot;&lt;br /&gt;
146.70.199.124 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.1&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it&#039;s modsecurity?&lt;br /&gt;
# gah, that&#039;s a lot of files to review&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# find .  |grep -i security&lt;br /&gt;
./conf.d/mod_security.wordpress.include&lt;br /&gt;
./conf.d/mod_security.conf&lt;br /&gt;
./conf.modules.d/10-mod_security.conf&lt;br /&gt;
./modsecurity.d&lt;br /&gt;
./modsecurity.d/activated_rules&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_42_tight_security.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_35_bad_robots.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_45_trojans.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_48_local_exceptions.conf.example&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_bad_robots.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_23_request_limits.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_49_inbound_blocking.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_60_correlation.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound_malware.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_scanners.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_40_generic_attacks.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_50_outbound.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_47_common_exceptions.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_30_http_policy.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_20_protocol_violations.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_xss_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_59_outbound_blocking.conf&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf.20181024.orig&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf&lt;br /&gt;
./modsecurity.d/do_not_log_passwords.conf&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it&#039;s SecRequestBodyLimit http://stackoverflow.com/questions/13887812/ddg#14690797&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -irl &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf&lt;br /&gt;
modules/mod_security2.so&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s 13107200&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimit 13107200&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimitAction Reject&lt;br /&gt;
Binary file modules/mod_security2.so matches&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# docs say it&#039;s in bytes https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecRequestBodyLimit&lt;br /&gt;
# so 13107200 / 1024 / 1024 = 12.5 MB.&lt;br /&gt;
# jesus that&#039;s a lot of data; I&#039;m not gonna increase that in 4 places (nginx, apache, mod_security, php); let&#039;s just split it into two articles :(&lt;br /&gt;
# ...&lt;br /&gt;
# so Marcin is stressing urgancy to get Catarina a sandbox so she can rebuild osemain using some new theme that&#039;s not broken on the latest version of wordpress, php, etc on hetzner3&lt;br /&gt;
# I didn&#039;t want to do this site before the other less-priority ones, but it&#039;s just a sandbox&lt;br /&gt;
# I realized I never made a CHG file for osemain&lt;br /&gt;
# looks like I first did a snapshot Jan 31https://wiki.opensourceecology.org/wiki/Maltfield_Log/2025_Q1#Fri_Jan_31.2C_2025&lt;br /&gt;
# ugh, I just said I was &amp;quot;following the same guide as with the other sites&amp;quot;&lt;br /&gt;
## I was hoping to know which one to CHG to copy-from&lt;br /&gt;
## I guess it makes the most sense to copy from obi, which already has both a static and dynamic site setup (untested)&lt;br /&gt;
# ok, I made a first draft of our osemain CHG to migrate to hetnzer3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306178</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306178"/>
		<updated>2025-04-30T14:29:40Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* Status */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 14:30 UTC==&lt;br /&gt;
&lt;br /&gt;
This change was completed successfully&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 14:18 UTC==&lt;br /&gt;
&lt;br /&gt;
# I&#039;m going to double-tap the grub install before giving it a reboot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I rebooted it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
user@personal:~$ autossh opensourceecology.org&lt;br /&gt;
Last login: Wed Apr 30 11:28:26 2025 from REDACTED&lt;br /&gt;
[maltfield@opensourceecology ~]$ uptime&lt;br /&gt;
 14:17:14 up 1 min,  1 user,  load average: 0.85, 0.24, 0.08&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# cool, it came back.&lt;br /&gt;
# cool, raid looks healthy&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[3]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[3]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and SMART isn&#039;t yelling about failed disks anymore&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 14:13 UTC==&lt;br /&gt;
&lt;br /&gt;
The RAID sync is finished; I guess these Micron 500G disks have better i/o throughput than our old 200GCrucial disks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Wed Apr 30 14:07:12 UTC 2025&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  [====&amp;gt;................]  recovery = 21.2% (7124992/33521664) finish=2.2min speed=191533K/sec&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wed Apr 30 14:12:12 UTC 2025&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:48 UTC==&lt;br /&gt;
&lt;br /&gt;
Since we can&#039;t add a new drive, I went ahead and added the drive they gave us to the RAID&lt;br /&gt;
&lt;br /&gt;
# looks like they gave us another 500G disk; I bet they just don&#039;t stock the 250G anymore&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_18301DC6A088&lt;br /&gt;
ID_SERIAL_SHORT=18301DC6A088&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I made a backup of the partitions&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# pushd $chg_dir&lt;br /&gt;
/var/tmp/chg.20250430_134343 ~&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# du -sh ${chg_dir}/*&lt;br /&gt;
0       /var/tmp/chg.20250430_134343/sda_parttable_mbr.bak&lt;br /&gt;
4.0K    /var/tmp/chg.20250430_134343/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the sda partition is empty, which makes sense&lt;br /&gt;
# I copied the sdb partition to sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
Checking that no-one is using this disk right now ...&lt;br /&gt;
OK&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 62260 cylinders, 255 heads, 63 sectors/track&lt;br /&gt;
sfdisk:  /dev/sda: unrecognized partition table type&lt;br /&gt;
&lt;br /&gt;
Old situation:&lt;br /&gt;
sfdisk: No partitions found&lt;br /&gt;
&lt;br /&gt;
New situation:&lt;br /&gt;
Units: sectors of 512 bytes, counting from 0&lt;br /&gt;
&lt;br /&gt;
   Device Boot    Start       End   #sectors  Id  System&lt;br /&gt;
/dev/sda1          2048  67110912   67108865  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2      67112960  68161536    1048577  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3      68163584 488395120  420231537  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda4             0         -          0   0  Empty&lt;br /&gt;
Warning: partition 1 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not end at a cylinder boundary&lt;br /&gt;
Warning: no primary partition is marked bootable (active)&lt;br /&gt;
This does not matter for LILO, but the DOS MBR will not boot this disk.&lt;br /&gt;
Successfully wrote the new partition table&lt;br /&gt;
&lt;br /&gt;
Re-reading the partition table ...&lt;br /&gt;
&lt;br /&gt;
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)&lt;br /&gt;
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1&lt;br /&gt;
(See fdisk(8).)&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and reloaded the kernel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# blockdev --rereadpt /dev/sda&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I added the three partitions of the new disk to the RAID; note that this time I added /boot first, then /, then swap. I think it&#039;ll sync in that order (of priority)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm: added /dev/sda2&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
mdadm: added /dev/sda3&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm: added /dev/sda1&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# cool, that worked. /boot is already done, and it&#039;s syncing root (/) now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# date -u&lt;br /&gt;
Wed Apr 30 13:48:43 UTC 2025&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  [=&amp;gt;...................]  recovery =  9.1% (19231872/209984640) finish=16.5min speed=192161K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# I went ahead and installed grub. I guess I&#039;ll do this again after all the partitions sync, but I think it should actually work this time because the /boot partition was done first and is already done syncing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# as noted in the docs, those warnings can be safely ignored&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:26 UTC==&lt;br /&gt;
&lt;br /&gt;
# I got a response back from hetzner 4 minutes later&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client.&lt;br /&gt;
&lt;br /&gt;
We do not have these drives &amp;quot;new&amp;quot; anymore. Therefore, this is not possible. We already selected a drive with less than 20.000h. We also did not charge the fee for a new drive.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like we got the drive free, but that&#039;s still nearly a waste of my time. I replied and asked them how long it would take for them to order a new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
I emailed last week about this to make sure you had time to order a new drive (check my support tickets).&lt;br /&gt;
&lt;br /&gt;
This drive you inserted has only 32% of its life left, according to SMART. It&#039;s closer to dead than new.&lt;br /&gt;
&lt;br /&gt;
How long would it take you to order a new drive?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:20 UTC==&lt;br /&gt;
&lt;br /&gt;
We&#039;re still waiting on hetzner.&lt;br /&gt;
&lt;br /&gt;
Hetzner replaced the drive with one that already has been used for 18,623 hours, which means it has only 32% of its life left.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       18623&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       9&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   032   032   000    Old_age   Always       -       1030&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       2&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   068   047   000    Old_age   Always       -       32 (Min/Max 23/53)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   032   032   001    Old_age   Offline      -       68&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       96994281182&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       3059820027&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       31429771271&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2467&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 13:23:39 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the remote-hands support request, I was very clear that they should replace it with a new drive (with &amp;lt;1,000 hours of use). We&#039;re paying for that.&lt;br /&gt;
&lt;br /&gt;
I asked them to insert an actually new drive with &amp;lt;1,000 hours of use. &lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:44 UTC==&lt;br /&gt;
&lt;br /&gt;
# I confirmed that the RAID is currently healthy&lt;br /&gt;
# and today&#039;s backup (from a few hours ago) is sane and uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20133744108 daily_hetzner3_20250430_080904.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed again that /dev/sdb is PASSED and /dev/sda is FAIL&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed that our &amp;quot;new&amp;quot; (used) /dev/sdb (replaced last week) still has 4% of its life left (no change from last week)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52223&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1452&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       29&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   049   000    Old_age   Always       -       36 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       601634812550&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18904241237&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11849811867&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78658&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       63&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3454&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       56&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   062   046   000    Old_age   Always       -       38 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       408221767008&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12873452848&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26389101858&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I confirmed again the serial of the disk we want to replace matches the one listed in this CHG ticket&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I&#039;m removing sda from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:06 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm: set /dev/sda1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm: set /dev/sda2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm: set /dev/sda3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm: hot removed /dev/sda1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm: hot removed /dev/sda2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
mdadm: hot removed /dev/sda3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:58 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I submitted the request for support to swap the disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SMART says disk is FAILED and needs to be replaced asap.&lt;br /&gt;
&lt;br /&gt;
I&#039;ve removed /dev/sda (Crucial_CT250MX200SSD1_154410FA336C) from the RAID, and it is now ready to be replaced with a new disk (with &amp;lt;1,000 hours of operation). Please replace the disk asap.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:29 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting Change&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:56 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved the start time of this CHG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:37 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved purchasing a new disk for this replacement&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 9:37 AM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I was a bit disappointed to learn that hetzner replaced a disk with 0%&lt;br /&gt;
&amp;gt; &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for&lt;br /&gt;
&amp;gt; choosing the free disk replacement..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive&lt;br /&gt;
&amp;gt; nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously&lt;br /&gt;
&amp;gt; they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on&lt;br /&gt;
&amp;gt; replacing that one next week too, but I would recommend that you pay for&lt;br /&gt;
&amp;gt; a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do you authorize me selecting €41.18 for the replacement of /dev/sda on&lt;br /&gt;
&amp;gt; hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-30 11:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-30 06:00 Kansas City, US&lt;br /&gt;
* = 2025-04-30 06:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20250430T110000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;At cost&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306177</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306177"/>
		<updated>2025-04-30T14:29:17Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* Status */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 14:18 UTC==&lt;br /&gt;
&lt;br /&gt;
# I&#039;m going to double-tap the grub install before giving it a reboot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I rebooted it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
user@personal:~$ autossh opensourceecology.org&lt;br /&gt;
Last login: Wed Apr 30 11:28:26 2025 from REDACTED&lt;br /&gt;
[maltfield@opensourceecology ~]$ uptime&lt;br /&gt;
 14:17:14 up 1 min,  1 user,  load average: 0.85, 0.24, 0.08&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# cool, it came back.&lt;br /&gt;
# cool, raid looks healthy&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[3]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[3]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and SMART isn&#039;t yelling about failed disks anymore&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 14:13 UTC==&lt;br /&gt;
&lt;br /&gt;
The RAID sync is finished; I guess these Micron 500G disks have better i/o throughput than our old 200GCrucial disks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Wed Apr 30 14:07:12 UTC 2025&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  [====&amp;gt;................]  recovery = 21.2% (7124992/33521664) finish=2.2min speed=191533K/sec&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wed Apr 30 14:12:12 UTC 2025&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:48 UTC==&lt;br /&gt;
&lt;br /&gt;
Since we can&#039;t add a new drive, I went ahead and added the drive they gave us to the RAID&lt;br /&gt;
&lt;br /&gt;
# looks like they gave us another 500G disk; I bet they just don&#039;t stock the 250G anymore&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_18301DC6A088&lt;br /&gt;
ID_SERIAL_SHORT=18301DC6A088&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I made a backup of the partitions&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# pushd $chg_dir&lt;br /&gt;
/var/tmp/chg.20250430_134343 ~&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# du -sh ${chg_dir}/*&lt;br /&gt;
0       /var/tmp/chg.20250430_134343/sda_parttable_mbr.bak&lt;br /&gt;
4.0K    /var/tmp/chg.20250430_134343/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the sda partition is empty, which makes sense&lt;br /&gt;
# I copied the sdb partition to sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
Checking that no-one is using this disk right now ...&lt;br /&gt;
OK&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 62260 cylinders, 255 heads, 63 sectors/track&lt;br /&gt;
sfdisk:  /dev/sda: unrecognized partition table type&lt;br /&gt;
&lt;br /&gt;
Old situation:&lt;br /&gt;
sfdisk: No partitions found&lt;br /&gt;
&lt;br /&gt;
New situation:&lt;br /&gt;
Units: sectors of 512 bytes, counting from 0&lt;br /&gt;
&lt;br /&gt;
   Device Boot    Start       End   #sectors  Id  System&lt;br /&gt;
/dev/sda1          2048  67110912   67108865  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2      67112960  68161536    1048577  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3      68163584 488395120  420231537  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda4             0         -          0   0  Empty&lt;br /&gt;
Warning: partition 1 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not end at a cylinder boundary&lt;br /&gt;
Warning: no primary partition is marked bootable (active)&lt;br /&gt;
This does not matter for LILO, but the DOS MBR will not boot this disk.&lt;br /&gt;
Successfully wrote the new partition table&lt;br /&gt;
&lt;br /&gt;
Re-reading the partition table ...&lt;br /&gt;
&lt;br /&gt;
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)&lt;br /&gt;
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1&lt;br /&gt;
(See fdisk(8).)&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and reloaded the kernel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# blockdev --rereadpt /dev/sda&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I added the three partitions of the new disk to the RAID; note that this time I added /boot first, then /, then swap. I think it&#039;ll sync in that order (of priority)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm: added /dev/sda2&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
mdadm: added /dev/sda3&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm: added /dev/sda1&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# cool, that worked. /boot is already done, and it&#039;s syncing root (/) now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# date -u&lt;br /&gt;
Wed Apr 30 13:48:43 UTC 2025&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  [=&amp;gt;...................]  recovery =  9.1% (19231872/209984640) finish=16.5min speed=192161K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# I went ahead and installed grub. I guess I&#039;ll do this again after all the partitions sync, but I think it should actually work this time because the /boot partition was done first and is already done syncing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# as noted in the docs, those warnings can be safely ignored&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:26 UTC==&lt;br /&gt;
&lt;br /&gt;
# I got a response back from hetzner 4 minutes later&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client.&lt;br /&gt;
&lt;br /&gt;
We do not have these drives &amp;quot;new&amp;quot; anymore. Therefore, this is not possible. We already selected a drive with less than 20.000h. We also did not charge the fee for a new drive.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like we got the drive free, but that&#039;s still nearly a waste of my time. I replied and asked them how long it would take for them to order a new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
I emailed last week about this to make sure you had time to order a new drive (check my support tickets).&lt;br /&gt;
&lt;br /&gt;
This drive you inserted has only 32% of its life left, according to SMART. It&#039;s closer to dead than new.&lt;br /&gt;
&lt;br /&gt;
How long would it take you to order a new drive?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:20 UTC==&lt;br /&gt;
&lt;br /&gt;
We&#039;re still waiting on hetzner.&lt;br /&gt;
&lt;br /&gt;
Hetzner replaced the drive with one that already has been used for 18,623 hours, which means it has only 32% of its life left.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       18623&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       9&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   032   032   000    Old_age   Always       -       1030&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       2&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   068   047   000    Old_age   Always       -       32 (Min/Max 23/53)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   032   032   001    Old_age   Offline      -       68&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       96994281182&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       3059820027&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       31429771271&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2467&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 13:23:39 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the remote-hands support request, I was very clear that they should replace it with a new drive (with &amp;lt;1,000 hours of use). We&#039;re paying for that.&lt;br /&gt;
&lt;br /&gt;
I asked them to insert an actually new drive with &amp;lt;1,000 hours of use. &lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:44 UTC==&lt;br /&gt;
&lt;br /&gt;
# I confirmed that the RAID is currently healthy&lt;br /&gt;
# and today&#039;s backup (from a few hours ago) is sane and uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20133744108 daily_hetzner3_20250430_080904.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed again that /dev/sdb is PASSED and /dev/sda is FAIL&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed that our &amp;quot;new&amp;quot; (used) /dev/sdb (replaced last week) still has 4% of its life left (no change from last week)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52223&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1452&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       29&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   049   000    Old_age   Always       -       36 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       601634812550&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18904241237&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11849811867&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78658&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       63&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3454&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       56&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   062   046   000    Old_age   Always       -       38 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       408221767008&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12873452848&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26389101858&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I confirmed again the serial of the disk we want to replace matches the one listed in this CHG ticket&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I&#039;m removing sda from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:06 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm: set /dev/sda1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm: set /dev/sda2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm: set /dev/sda3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm: hot removed /dev/sda1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm: hot removed /dev/sda2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
mdadm: hot removed /dev/sda3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:58 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I submitted the request for support to swap the disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SMART says disk is FAILED and needs to be replaced asap.&lt;br /&gt;
&lt;br /&gt;
I&#039;ve removed /dev/sda (Crucial_CT250MX200SSD1_154410FA336C) from the RAID, and it is now ready to be replaced with a new disk (with &amp;lt;1,000 hours of operation). Please replace the disk asap.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:29 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting Change&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:56 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved the start time of this CHG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:37 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved purchasing a new disk for this replacement&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 9:37 AM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I was a bit disappointed to learn that hetzner replaced a disk with 0%&lt;br /&gt;
&amp;gt; &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for&lt;br /&gt;
&amp;gt; choosing the free disk replacement..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive&lt;br /&gt;
&amp;gt; nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously&lt;br /&gt;
&amp;gt; they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on&lt;br /&gt;
&amp;gt; replacing that one next week too, but I would recommend that you pay for&lt;br /&gt;
&amp;gt; a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do you authorize me selecting €41.18 for the replacement of /dev/sda on&lt;br /&gt;
&amp;gt; hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-30 11:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-30 06:00 Kansas City, US&lt;br /&gt;
* = 2025-04-30 06:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20250430T110000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;At cost&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306176</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306176"/>
		<updated>2025-04-30T14:27:33Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* Status */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 14:13 UTC==&lt;br /&gt;
&lt;br /&gt;
The RAID sync is finished; I guess these Micron 500G disks have better i/o throughput than our old 200GCrucial disks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Wed Apr 30 14:07:12 UTC 2025&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  [====&amp;gt;................]  recovery = 21.2% (7124992/33521664) finish=2.2min speed=191533K/sec&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wed Apr 30 14:12:12 UTC 2025&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:48 UTC==&lt;br /&gt;
&lt;br /&gt;
Since we can&#039;t add a new drive, I went ahead and added the drive they gave us to the RAID&lt;br /&gt;
&lt;br /&gt;
# looks like they gave us another 500G disk; I bet they just don&#039;t stock the 250G anymore&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_18301DC6A088&lt;br /&gt;
ID_SERIAL_SHORT=18301DC6A088&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I made a backup of the partitions&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# pushd $chg_dir&lt;br /&gt;
/var/tmp/chg.20250430_134343 ~&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# du -sh ${chg_dir}/*&lt;br /&gt;
0       /var/tmp/chg.20250430_134343/sda_parttable_mbr.bak&lt;br /&gt;
4.0K    /var/tmp/chg.20250430_134343/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the sda partition is empty, which makes sense&lt;br /&gt;
# I copied the sdb partition to sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
Checking that no-one is using this disk right now ...&lt;br /&gt;
OK&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 62260 cylinders, 255 heads, 63 sectors/track&lt;br /&gt;
sfdisk:  /dev/sda: unrecognized partition table type&lt;br /&gt;
&lt;br /&gt;
Old situation:&lt;br /&gt;
sfdisk: No partitions found&lt;br /&gt;
&lt;br /&gt;
New situation:&lt;br /&gt;
Units: sectors of 512 bytes, counting from 0&lt;br /&gt;
&lt;br /&gt;
   Device Boot    Start       End   #sectors  Id  System&lt;br /&gt;
/dev/sda1          2048  67110912   67108865  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2      67112960  68161536    1048577  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3      68163584 488395120  420231537  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda4             0         -          0   0  Empty&lt;br /&gt;
Warning: partition 1 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not end at a cylinder boundary&lt;br /&gt;
Warning: no primary partition is marked bootable (active)&lt;br /&gt;
This does not matter for LILO, but the DOS MBR will not boot this disk.&lt;br /&gt;
Successfully wrote the new partition table&lt;br /&gt;
&lt;br /&gt;
Re-reading the partition table ...&lt;br /&gt;
&lt;br /&gt;
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)&lt;br /&gt;
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1&lt;br /&gt;
(See fdisk(8).)&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and reloaded the kernel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# blockdev --rereadpt /dev/sda&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I added the three partitions of the new disk to the RAID; note that this time I added /boot first, then /, then swap. I think it&#039;ll sync in that order (of priority)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm: added /dev/sda2&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
mdadm: added /dev/sda3&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm: added /dev/sda1&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# cool, that worked. /boot is already done, and it&#039;s syncing root (/) now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# date -u&lt;br /&gt;
Wed Apr 30 13:48:43 UTC 2025&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  [=&amp;gt;...................]  recovery =  9.1% (19231872/209984640) finish=16.5min speed=192161K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# I went ahead and installed grub. I guess I&#039;ll do this again after all the partitions sync, but I think it should actually work this time because the /boot partition was done first and is already done syncing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# as noted in the docs, those warnings can be safely ignored&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:26 UTC==&lt;br /&gt;
&lt;br /&gt;
# I got a response back from hetzner 4 minutes later&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client.&lt;br /&gt;
&lt;br /&gt;
We do not have these drives &amp;quot;new&amp;quot; anymore. Therefore, this is not possible. We already selected a drive with less than 20.000h. We also did not charge the fee for a new drive.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like we got the drive free, but that&#039;s still nearly a waste of my time. I replied and asked them how long it would take for them to order a new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
I emailed last week about this to make sure you had time to order a new drive (check my support tickets).&lt;br /&gt;
&lt;br /&gt;
This drive you inserted has only 32% of its life left, according to SMART. It&#039;s closer to dead than new.&lt;br /&gt;
&lt;br /&gt;
How long would it take you to order a new drive?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:20 UTC==&lt;br /&gt;
&lt;br /&gt;
We&#039;re still waiting on hetzner.&lt;br /&gt;
&lt;br /&gt;
Hetzner replaced the drive with one that already has been used for 18,623 hours, which means it has only 32% of its life left.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       18623&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       9&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   032   032   000    Old_age   Always       -       1030&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       2&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   068   047   000    Old_age   Always       -       32 (Min/Max 23/53)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   032   032   001    Old_age   Offline      -       68&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       96994281182&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       3059820027&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       31429771271&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2467&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 13:23:39 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the remote-hands support request, I was very clear that they should replace it with a new drive (with &amp;lt;1,000 hours of use). We&#039;re paying for that.&lt;br /&gt;
&lt;br /&gt;
I asked them to insert an actually new drive with &amp;lt;1,000 hours of use. &lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:44 UTC==&lt;br /&gt;
&lt;br /&gt;
# I confirmed that the RAID is currently healthy&lt;br /&gt;
# and today&#039;s backup (from a few hours ago) is sane and uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20133744108 daily_hetzner3_20250430_080904.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed again that /dev/sdb is PASSED and /dev/sda is FAIL&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed that our &amp;quot;new&amp;quot; (used) /dev/sdb (replaced last week) still has 4% of its life left (no change from last week)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52223&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1452&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       29&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   049   000    Old_age   Always       -       36 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       601634812550&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18904241237&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11849811867&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78658&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       63&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3454&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       56&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   062   046   000    Old_age   Always       -       38 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       408221767008&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12873452848&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26389101858&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I confirmed again the serial of the disk we want to replace matches the one listed in this CHG ticket&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I&#039;m removing sda from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:06 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm: set /dev/sda1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm: set /dev/sda2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm: set /dev/sda3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm: hot removed /dev/sda1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm: hot removed /dev/sda2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
mdadm: hot removed /dev/sda3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:58 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I submitted the request for support to swap the disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SMART says disk is FAILED and needs to be replaced asap.&lt;br /&gt;
&lt;br /&gt;
I&#039;ve removed /dev/sda (Crucial_CT250MX200SSD1_154410FA336C) from the RAID, and it is now ready to be replaced with a new disk (with &amp;lt;1,000 hours of operation). Please replace the disk asap.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:29 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting Change&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:56 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved the start time of this CHG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:37 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved purchasing a new disk for this replacement&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 9:37 AM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I was a bit disappointed to learn that hetzner replaced a disk with 0%&lt;br /&gt;
&amp;gt; &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for&lt;br /&gt;
&amp;gt; choosing the free disk replacement..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive&lt;br /&gt;
&amp;gt; nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously&lt;br /&gt;
&amp;gt; they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on&lt;br /&gt;
&amp;gt; replacing that one next week too, but I would recommend that you pay for&lt;br /&gt;
&amp;gt; a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do you authorize me selecting €41.18 for the replacement of /dev/sda on&lt;br /&gt;
&amp;gt; hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-30 11:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-30 06:00 Kansas City, US&lt;br /&gt;
* = 2025-04-30 06:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20250430T110000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;At cost&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306175</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306175"/>
		<updated>2025-04-30T14:24:22Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* Status */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:48 UTC==&lt;br /&gt;
&lt;br /&gt;
Since we can&#039;t add a new drive, I went ahead and added the drive they gave us to the RAID&lt;br /&gt;
&lt;br /&gt;
# looks like they gave us another 500G disk; I bet they just don&#039;t stock the 250G anymore&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_18301DC6A088&lt;br /&gt;
ID_SERIAL_SHORT=18301DC6A088&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I made a backup of the partitions&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# pushd $chg_dir&lt;br /&gt;
/var/tmp/chg.20250430_134343 ~&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# du -sh ${chg_dir}/*&lt;br /&gt;
0       /var/tmp/chg.20250430_134343/sda_parttable_mbr.bak&lt;br /&gt;
4.0K    /var/tmp/chg.20250430_134343/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the sda partition is empty, which makes sense&lt;br /&gt;
# I copied the sdb partition to sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
Checking that no-one is using this disk right now ...&lt;br /&gt;
OK&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 62260 cylinders, 255 heads, 63 sectors/track&lt;br /&gt;
sfdisk:  /dev/sda: unrecognized partition table type&lt;br /&gt;
&lt;br /&gt;
Old situation:&lt;br /&gt;
sfdisk: No partitions found&lt;br /&gt;
&lt;br /&gt;
New situation:&lt;br /&gt;
Units: sectors of 512 bytes, counting from 0&lt;br /&gt;
&lt;br /&gt;
   Device Boot    Start       End   #sectors  Id  System&lt;br /&gt;
/dev/sda1          2048  67110912   67108865  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2      67112960  68161536    1048577  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3      68163584 488395120  420231537  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda4             0         -          0   0  Empty&lt;br /&gt;
Warning: partition 1 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not end at a cylinder boundary&lt;br /&gt;
Warning: no primary partition is marked bootable (active)&lt;br /&gt;
This does not matter for LILO, but the DOS MBR will not boot this disk.&lt;br /&gt;
Successfully wrote the new partition table&lt;br /&gt;
&lt;br /&gt;
Re-reading the partition table ...&lt;br /&gt;
&lt;br /&gt;
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)&lt;br /&gt;
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1&lt;br /&gt;
(See fdisk(8).)&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and reloaded the kernel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# blockdev --rereadpt /dev/sda&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I added the three partitions of the new disk to the RAID; note that this time I added /boot first, then /, then swap. I think it&#039;ll sync in that order (of priority)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0   477G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm: added /dev/sda2&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
mdadm: added /dev/sda3&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm: added /dev/sda1&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# cool, that worked. /boot is already done, and it&#039;s syncing root (/) now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# date -u&lt;br /&gt;
Wed Apr 30 13:48:43 UTC 2025&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[3] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[3] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  [=&amp;gt;...................]  recovery =  9.1% (19231872/209984640) finish=16.5min speed=192161K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[3] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250430_134343]# &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# I went ahead and installed grub. I guess I&#039;ll do this again after all the partitions sync, but I think it should actually work this time because the /boot partition was done first and is already done syncing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# as noted in the docs, those warnings can be safely ignored&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:26 UTC==&lt;br /&gt;
&lt;br /&gt;
# I got a response back from hetzner 4 minutes later&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client.&lt;br /&gt;
&lt;br /&gt;
We do not have these drives &amp;quot;new&amp;quot; anymore. Therefore, this is not possible. We already selected a drive with less than 20.000h. We also did not charge the fee for a new drive.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like we got the drive free, but that&#039;s still nearly a waste of my time. I replied and asked them how long it would take for them to order a new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
I emailed last week about this to make sure you had time to order a new drive (check my support tickets).&lt;br /&gt;
&lt;br /&gt;
This drive you inserted has only 32% of its life left, according to SMART. It&#039;s closer to dead than new.&lt;br /&gt;
&lt;br /&gt;
How long would it take you to order a new drive?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:20 UTC==&lt;br /&gt;
&lt;br /&gt;
We&#039;re still waiting on hetzner.&lt;br /&gt;
&lt;br /&gt;
Hetzner replaced the drive with one that already has been used for 18,623 hours, which means it has only 32% of its life left.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       18623&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       9&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   032   032   000    Old_age   Always       -       1030&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       2&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   068   047   000    Old_age   Always       -       32 (Min/Max 23/53)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   032   032   001    Old_age   Offline      -       68&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       96994281182&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       3059820027&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       31429771271&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2467&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 13:23:39 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the remote-hands support request, I was very clear that they should replace it with a new drive (with &amp;lt;1,000 hours of use). We&#039;re paying for that.&lt;br /&gt;
&lt;br /&gt;
I asked them to insert an actually new drive with &amp;lt;1,000 hours of use. &lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:44 UTC==&lt;br /&gt;
&lt;br /&gt;
# I confirmed that the RAID is currently healthy&lt;br /&gt;
# and today&#039;s backup (from a few hours ago) is sane and uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20133744108 daily_hetzner3_20250430_080904.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed again that /dev/sdb is PASSED and /dev/sda is FAIL&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed that our &amp;quot;new&amp;quot; (used) /dev/sdb (replaced last week) still has 4% of its life left (no change from last week)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52223&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1452&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       29&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   049   000    Old_age   Always       -       36 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       601634812550&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18904241237&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11849811867&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78658&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       63&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3454&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       56&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   062   046   000    Old_age   Always       -       38 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       408221767008&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12873452848&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26389101858&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I confirmed again the serial of the disk we want to replace matches the one listed in this CHG ticket&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I&#039;m removing sda from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:06 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm: set /dev/sda1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm: set /dev/sda2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm: set /dev/sda3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm: hot removed /dev/sda1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm: hot removed /dev/sda2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
mdadm: hot removed /dev/sda3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:58 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I submitted the request for support to swap the disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SMART says disk is FAILED and needs to be replaced asap.&lt;br /&gt;
&lt;br /&gt;
I&#039;ve removed /dev/sda (Crucial_CT250MX200SSD1_154410FA336C) from the RAID, and it is now ready to be replaced with a new disk (with &amp;lt;1,000 hours of operation). Please replace the disk asap.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:29 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting Change&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:56 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved the start time of this CHG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:37 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved purchasing a new disk for this replacement&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 9:37 AM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I was a bit disappointed to learn that hetzner replaced a disk with 0%&lt;br /&gt;
&amp;gt; &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for&lt;br /&gt;
&amp;gt; choosing the free disk replacement..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive&lt;br /&gt;
&amp;gt; nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously&lt;br /&gt;
&amp;gt; they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on&lt;br /&gt;
&amp;gt; replacing that one next week too, but I would recommend that you pay for&lt;br /&gt;
&amp;gt; a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do you authorize me selecting €41.18 for the replacement of /dev/sda on&lt;br /&gt;
&amp;gt; hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-30 11:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-30 06:00 Kansas City, US&lt;br /&gt;
* = 2025-04-30 06:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20250430T110000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;At cost&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306174</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306174"/>
		<updated>2025-04-30T14:22:41Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* Status */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:26 UTC==&lt;br /&gt;
&lt;br /&gt;
# I got a response back from hetzner 4 minutes later&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client.&lt;br /&gt;
&lt;br /&gt;
We do not have these drives &amp;quot;new&amp;quot; anymore. Therefore, this is not possible. We already selected a drive with less than 20.000h. We also did not charge the fee for a new drive.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like we got the drive free, but that&#039;s still nearly a waste of my time. I replied and asked them how long it would take for them to order a new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
I emailed last week about this to make sure you had time to order a new drive (check my support tickets).&lt;br /&gt;
&lt;br /&gt;
This drive you inserted has only 32% of its life left, according to SMART. It&#039;s closer to dead than new.&lt;br /&gt;
&lt;br /&gt;
How long would it take you to order a new drive?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:20 UTC==&lt;br /&gt;
&lt;br /&gt;
We&#039;re still waiting on hetzner.&lt;br /&gt;
&lt;br /&gt;
Hetzner replaced the drive with one that already has been used for 18,623 hours, which means it has only 32% of its life left.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       18623&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       9&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   032   032   000    Old_age   Always       -       1030&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       2&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   068   047   000    Old_age   Always       -       32 (Min/Max 23/53)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   032   032   001    Old_age   Offline      -       68&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       96994281182&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       3059820027&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       31429771271&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2467&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 13:23:39 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the remote-hands support request, I was very clear that they should replace it with a new drive (with &amp;lt;1,000 hours of use). We&#039;re paying for that.&lt;br /&gt;
&lt;br /&gt;
I asked them to insert an actually new drive with &amp;lt;1,000 hours of use. &lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:44 UTC==&lt;br /&gt;
&lt;br /&gt;
# I confirmed that the RAID is currently healthy&lt;br /&gt;
# and today&#039;s backup (from a few hours ago) is sane and uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20133744108 daily_hetzner3_20250430_080904.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed again that /dev/sdb is PASSED and /dev/sda is FAIL&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed that our &amp;quot;new&amp;quot; (used) /dev/sdb (replaced last week) still has 4% of its life left (no change from last week)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52223&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1452&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       29&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   049   000    Old_age   Always       -       36 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       601634812550&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18904241237&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11849811867&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78658&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       63&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3454&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       56&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   062   046   000    Old_age   Always       -       38 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       408221767008&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12873452848&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26389101858&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I confirmed again the serial of the disk we want to replace matches the one listed in this CHG ticket&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I&#039;m removing sda from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:06 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm: set /dev/sda1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm: set /dev/sda2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm: set /dev/sda3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm: hot removed /dev/sda1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm: hot removed /dev/sda2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
mdadm: hot removed /dev/sda3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:58 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I submitted the request for support to swap the disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SMART says disk is FAILED and needs to be replaced asap.&lt;br /&gt;
&lt;br /&gt;
I&#039;ve removed /dev/sda (Crucial_CT250MX200SSD1_154410FA336C) from the RAID, and it is now ready to be replaced with a new disk (with &amp;lt;1,000 hours of operation). Please replace the disk asap.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:29 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting Change&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:56 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved the start time of this CHG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:37 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved purchasing a new disk for this replacement&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 9:37 AM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I was a bit disappointed to learn that hetzner replaced a disk with 0%&lt;br /&gt;
&amp;gt; &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for&lt;br /&gt;
&amp;gt; choosing the free disk replacement..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive&lt;br /&gt;
&amp;gt; nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously&lt;br /&gt;
&amp;gt; they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on&lt;br /&gt;
&amp;gt; replacing that one next week too, but I would recommend that you pay for&lt;br /&gt;
&amp;gt; a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do you authorize me selecting €41.18 for the replacement of /dev/sda on&lt;br /&gt;
&amp;gt; hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-30 11:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-30 06:00 Kansas City, US&lt;br /&gt;
* = 2025-04-30 06:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20250430T110000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;At cost&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306173</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306173"/>
		<updated>2025-04-30T14:21:15Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: hetzner gave us a very used drive :(&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 13:20 UTC==&lt;br /&gt;
&lt;br /&gt;
We&#039;re still waiting on hetzner.&lt;br /&gt;
&lt;br /&gt;
Hetzner replaced the drive with one that already has been used for 18,623 hours, which means it has only 32% of its life left.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       18623&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       9&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   032   032   000    Old_age   Always       -       1030&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       2&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   068   047   000    Old_age   Always       -       32 (Min/Max 23/53)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   032   032   001    Old_age   Offline      -       68&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       96994281182&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       3059820027&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       31429771271&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2467&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 13:23:39 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In the remote-hands support request, I was very clear that they should replace it with a new drive (with &amp;lt;1,000 hours of use). We&#039;re paying for that.&lt;br /&gt;
&lt;br /&gt;
I asked them to insert an actually new drive with &amp;lt;1,000 hours of use. &lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:44 UTC==&lt;br /&gt;
&lt;br /&gt;
# I confirmed that the RAID is currently healthy&lt;br /&gt;
# and today&#039;s backup (from a few hours ago) is sane and uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20133744108 daily_hetzner3_20250430_080904.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed again that /dev/sdb is PASSED and /dev/sda is FAIL&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed that our &amp;quot;new&amp;quot; (used) /dev/sdb (replaced last week) still has 4% of its life left (no change from last week)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52223&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1452&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       29&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   049   000    Old_age   Always       -       36 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       601634812550&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18904241237&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11849811867&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78658&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       63&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3454&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       56&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   062   046   000    Old_age   Always       -       38 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       408221767008&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12873452848&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26389101858&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I confirmed again the serial of the disk we want to replace matches the one listed in this CHG ticket&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I&#039;m removing sda from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:06 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm: set /dev/sda1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm: set /dev/sda2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm: set /dev/sda3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm: hot removed /dev/sda1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm: hot removed /dev/sda2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
mdadm: hot removed /dev/sda3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:58 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I submitted the request for support to swap the disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SMART says disk is FAILED and needs to be replaced asap.&lt;br /&gt;
&lt;br /&gt;
I&#039;ve removed /dev/sda (Crucial_CT250MX200SSD1_154410FA336C) from the RAID, and it is now ready to be replaced with a new disk (with &amp;lt;1,000 hours of operation). Please replace the disk asap.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:29 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting Change&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:56 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved the start time of this CHG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:37 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved purchasing a new disk for this replacement&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 9:37 AM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I was a bit disappointed to learn that hetzner replaced a disk with 0%&lt;br /&gt;
&amp;gt; &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for&lt;br /&gt;
&amp;gt; choosing the free disk replacement..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive&lt;br /&gt;
&amp;gt; nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously&lt;br /&gt;
&amp;gt; they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on&lt;br /&gt;
&amp;gt; replacing that one next week too, but I would recommend that you pay for&lt;br /&gt;
&amp;gt; a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do you authorize me selecting €41.18 for the replacement of /dev/sda on&lt;br /&gt;
&amp;gt; hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-30 11:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-30 06:00 Kansas City, US&lt;br /&gt;
* = 2025-04-30 06:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20250430T110000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;At cost&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306172</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306172"/>
		<updated>2025-04-30T11:45:02Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: status update: removed sda from RAID and submitted remote hands request to swap disk&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:44 UTC==&lt;br /&gt;
&lt;br /&gt;
# I confirmed that the RAID is currently healthy&lt;br /&gt;
# and today&#039;s backup (from a few hours ago) is sane and uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20133744108 daily_hetzner3_20250430_080904.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed again that /dev/sdb is PASSED and /dev/sda is FAIL&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed that our &amp;quot;new&amp;quot; (used) /dev/sdb (replaced last week) still has 4% of its life left (no change from last week)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52223&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1452&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       29&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   049   000    Old_age   Always       -       36 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       601634812550&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18904241237&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11849811867&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78658&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       63&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3454&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       56&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   062   046   000    Old_age   Always       -       38 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       408221767008&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12873452848&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26389101858&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I confirmed again the serial of the disk we want to replace matches the one listed in this CHG ticket&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I&#039;m removing sda from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:06 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm: set /dev/sda1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm: set /dev/sda2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm: set /dev/sda3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm: hot removed /dev/sda1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm: hot removed /dev/sda2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
mdadm: hot removed /dev/sda3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [_U]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Wed Apr 30 11:38:58 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I submitted the request for support to swap the disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SMART says disk is FAILED and needs to be replaced asap.&lt;br /&gt;
&lt;br /&gt;
I&#039;ve removed /dev/sda (Crucial_CT250MX200SSD1_154410FA336C) from the RAID, and it is now ready to be replaced with a new disk (with &amp;lt;1,000 hours of operation). Please replace the disk asap.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:29 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting Change&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:56 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved the start time of this CHG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:37 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved purchasing a new disk for this replacement&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 9:37 AM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I was a bit disappointed to learn that hetzner replaced a disk with 0%&lt;br /&gt;
&amp;gt; &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for&lt;br /&gt;
&amp;gt; choosing the free disk replacement..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive&lt;br /&gt;
&amp;gt; nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously&lt;br /&gt;
&amp;gt; they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on&lt;br /&gt;
&amp;gt; replacing that one next week too, but I would recommend that you pay for&lt;br /&gt;
&amp;gt; a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do you authorize me selecting €41.18 for the replacement of /dev/sda on&lt;br /&gt;
&amp;gt; hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-30 11:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-30 06:00 Kansas City, US&lt;br /&gt;
* = 2025-04-30 06:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20250430T110000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;At cost&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306171</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=306171"/>
		<updated>2025-04-30T11:30:36Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: change begun&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-30 11:29 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting Change&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:56 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved the start time of this CHG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:37 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved purchasing a new disk for this replacement&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 9:37 AM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I was a bit disappointed to learn that hetzner replaced a disk with 0%&lt;br /&gt;
&amp;gt; &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for&lt;br /&gt;
&amp;gt; choosing the free disk replacement..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive&lt;br /&gt;
&amp;gt; nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously&lt;br /&gt;
&amp;gt; they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on&lt;br /&gt;
&amp;gt; replacing that one next week too, but I would recommend that you pay for&lt;br /&gt;
&amp;gt; a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do you authorize me selecting €41.18 for the replacement of /dev/sda on&lt;br /&gt;
&amp;gt; hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-30 11:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-30 06:00 Kansas City, US&lt;br /&gt;
* = 2025-04-30 06:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20250430T110000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;At cost&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306067</id>
		<title>Maltfield Log/2025 Q2</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306067"/>
		<updated>2025-04-27T22:04:56Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the second quarter of the year 2025. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 26, 2025=&lt;br /&gt;
# Marcin authorized me to add Tom to our ops google groups mailing list and to give him access to our shared ose keepass&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Fri, Apr 25, 2025, 12:43 PM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; (re-sending without encryption)&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On 4/25/25 12:41, Michael Altfield wrote:&lt;br /&gt;
&amp;gt;&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Do you authorize:&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; 1. Giving Tom access to the shared OSE keepass file&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; 2. Adding Tom to the ops mailing list (this would allow him to password&lt;br /&gt;
&amp;gt;&amp;gt; reset many of our important accounts)&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Please let me know if you authorize the above.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom sent me his gpg public key, which I can use to add him to the wazuh emails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~$ gpg&lt;br /&gt;
gpg: WARNING: no command supplied.  Trying to guess what you mean ...&lt;br /&gt;
gpg: Go ahead and type your message ...&lt;br /&gt;
-----BEGIN PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
mQINBGgMJ7ABEACwllLJu87blFKJ8aZMR7pCjRzhhp266Rjxz7071iow43a7FkvN&lt;br /&gt;
pcXmYsuwW4dLhqA+Sose7Fjo9o9+7bOLcBAso9x9hk55+pDQm67wyXmxp+7pWVhj&lt;br /&gt;
hdLBsdB4faLQDHkHymKUs/UKRViN0an/6nARxVyah58Dh/OcnSIv0bnozze8YRJX&lt;br /&gt;
aklCs+OF2Jv+gBH5VWNMLloX+l+MsBYj9N14MsMeWJ8lSNFWBl/SOBGuOftZbljp&lt;br /&gt;
qb8dBZRo/4OR/Dr5zCUQ1KuPu2wFKfMRwi3NEdmUKpFf/U7Ydn7ZK2T+ZKl+x1eb&lt;br /&gt;
+0I0ZM0DgaTYTqd82wlag1hfrYM7SONYb0C03x5T4y+CsG9IchgQ2yihYIKgHOIW&lt;br /&gt;
Wiz6vC4N4EKmuKAqCOGS/gzp7xDqzXl2R2sWHyRuOn3yUr2z9HdDk2sjnobtaVli&lt;br /&gt;
wYaIoes9zrBgunLoK9S0FaHzSPX0FGwygV50E73BFxJBmL6eHeRVuYOi0FkAQmsN&lt;br /&gt;
dJeOvpCwKgBModyPbxin78KKbgF/0OnxWL+Zde6+J5l+aW81xbwNZYuyxWHSb7m3&lt;br /&gt;
2RM4dXhxAWM2cBQ5+b5yKopO8T4OzKl5C/rYzhuEYqpSEQJccFNHmQexkwqACVNl&lt;br /&gt;
h/D97jm0580ctnGCZuNzmLlsXX2mzqOj6UU2LlUFy0HT5tr93KBA+HkGhwARAQAB&lt;br /&gt;
tEBUb20gR3JpZmZpbmcgKE9TRSBQR1AgS2V5IDQtMjUtMjAyNSkgPHRvbS5ncmlm&lt;br /&gt;
ZmluZ0B0dXRhbm90YS5jb20+iQJRBBMBCgA7FiEEEzAJATSKmFEVZ5Fl+xN6Yz/R&lt;br /&gt;
60wFAmgMJ7ACGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQ+xN6Yz/R&lt;br /&gt;
60xHURAAqIUawudDI3dmIVPa/RHTOusoJA4KIXLNCMiILWd3iwZQFQNrt6YHpwJU&lt;br /&gt;
pyvsXAM4QWd/qt0D9IF6K9waOIA5ipX0yXFVxZ0V1BQ6aq3cK1r+NvQUcLJzS02W&lt;br /&gt;
T9UIJtHOs+8EbIIS6ybcnxS6RARinrJpTkoCWspWXMDnXcX3n4pbbhHQLViswf1C&lt;br /&gt;
tOE7uSfNPcxGLK4cYLxLL1VHC45eB2CTEAxfXSavCPI62IcYkZBdwWz7E8q1QpsP&lt;br /&gt;
vxgxe31b+v9NcaxW5tc2/4NwaObqKSZYlhK/pce3X18+uWzpmE3ubhPb7Ptb5GLo&lt;br /&gt;
42U9ymRFg7a14VFfq+wcwSlZR01o7Q2FofAOFpX+EoDBkughAX6hWyYxErJ4vD7k&lt;br /&gt;
ogYX25J5suxrixkTzDMJ0cCsZyt/Bu0liVnojaETUhrNUwBp7Rz7xx5x6Go/sZHK&lt;br /&gt;
mzhCe1q4xwSHeTZTjyG3oby4KDPgb0WEKCdUpa5BobgT9goGGXjCxe9dS8ZVUu4I&lt;br /&gt;
bso+h/SK95nmgsl/EDrmDXvWOh/Zy76GixCq48ydEkGbVz/6ri1+pD0NXYN/ijAu&lt;br /&gt;
h6EsLnoBLQCLlYYsBTfg31X2Sbzigeloy6iRWoHtCOAfI2Azdhby+BCGuSIvUOXa&lt;br /&gt;
Q4CQjmjYpsx7nwtjWOgCZ4rObTekj4O9ZnI8Gtxfpzy1gFdyfw65Ag0EaAwnsAEQ&lt;br /&gt;
ANnD6PMPT0CU1RqbAQtVw7eJksV96+tl/xG8mtje631n2uBe9WzyLch0fgC99eID&lt;br /&gt;
ZDGXfJUEdODuI9/H8037PnJmmMtP2eP1c/ztrql6pxPj9c0jIRWjtwmNhyYNaaEn&lt;br /&gt;
i0JyLz5SiTbuftlHXaKhVTuLc/Qp44FH5XK6LVHphDR8Ck43Mhj7enfvGvmAUgLW&lt;br /&gt;
OLQMst84oOCywYX+nUmov2rCIhuc6RhX4OcOBZcEA2W/CSsoNXR4To9mn8Gg3/dH&lt;br /&gt;
ZKS/3sDwJQxjFvkqc89+aTPY85TBoUGBUzbQG+KFQgDyVt4kABK1iyUA1PKZOb4Q&lt;br /&gt;
MZJnR9g0UI/ctfrOpz4hhEFaQ+rEYwdm5MSXOQGfjrnGu3t85IQzmxUXovqmfsjn&lt;br /&gt;
oFPSPd/91/rJJKxci+rCX7CpQSObPrwHNgPNQ5zleDV7d9/u9UaGRFeOaaM+abd0&lt;br /&gt;
RhPh4nJWbDdNOWpj3pxJkG3tzmbazBogxTq0SDRP8wvBAD0JYESoPVGWQ6czlTnu&lt;br /&gt;
T0ov9QKMb21mfUQ6DmfxTFQbkr1g1r2uYfJ1TbP0AcAK+Q/IMtt8F7chulfAe7/0&lt;br /&gt;
9nk7HwqWHTkj8+YB9+Ro2hkUTpL57uEYdG/ukGODfTNhu02wxG02zlYFsTyd/H62&lt;br /&gt;
VIgT1Cpf5HBb73lzdiSVtl45C34Fwu8ZO6dBdmk2c1nFABEBAAGJAjYEGAEKACAW&lt;br /&gt;
IQQTMAkBNIqYURVnkWX7E3pjP9HrTAUCaAwnsAIbDAAKCRD7E3pjP9HrTNxGD/wN&lt;br /&gt;
syvVZxm4hyw4l8U6J3B/3rKAup+l7GQCXthNK+f3YPwWdWc8DOo3kBrP4ppR5Ry9&lt;br /&gt;
YKb700wBDAYwWfy+ZJPHMi0vVUf8kX2QQEj4sFZHj9suTFvfLdsLTAhNtRXVtZiu&lt;br /&gt;
xfr1T3R3T0XSSFFdhiBO+BYRnlgFRiiR9FCTDaxrLRfhAhOwC6LHOarHnRi5nQS8&lt;br /&gt;
2PaHIYbWN7c5CdpH9dsPUt3xi1sEf8E87HTZo30Of/FYtB4eTOdx2DMqKscbJvZS&lt;br /&gt;
1ugK+2v7DMaiBMZCfbZSVNjn8+VcTOPW5KzJFsVR7UmfvTZu6c3jrshHuPOSguT7&lt;br /&gt;
l63AcfrJZOJe+djndWws2u0FpyMu0AHoS2r3EtBd/OydjEKG2P7qFb3KX9I9Tv35&lt;br /&gt;
zQmpHc4e2TJTYKpXyfarzgKFuUfOmZpm8maUTqFdEBL6pgwi1zcQ704g7Kzo/YUr&lt;br /&gt;
dHTA5yQ2WBBsrVKAZIt6Llkt0jIkpSyjjs5CAPJ2jsg61nq4uYw7w3jpwe80nbyc&lt;br /&gt;
7GgvdkJlTS7TfcYk3vlDQOQBpXqDZagQVUT8jc6mGiY/jbSzjGNt/8qObKSywFLY&lt;br /&gt;
XnxLVnGhKyzsWhR5fEbUCqywwc/c14gbjNguNZbU7e0Krf9ggYoglfPIOOp8XDX1&lt;br /&gt;
XwH+EXkSGW96dHXIYidONcMxClnA04zZY52Sr/r6Lw==&lt;br /&gt;
=UsaD&lt;br /&gt;
-----END PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
pub   rsa4096 2025-04-26 [SC]&lt;br /&gt;
	  13300901348A985115679165FB137A633FD1EB4C&lt;br /&gt;
uid           Tom Griffing (OSE PGP Key 4-25-2025) &amp;lt;REDACTED@tutanota.com&amp;gt;&lt;br /&gt;
sub   rsa4096 2025-04-26 [E]&lt;br /&gt;
user@ose:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I added Tom to the wazuh recipients, per https://wiki.opensourceecology.org/wiki/Wazuh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir -p /var/tmp/gpg&lt;br /&gt;
pushd /var/tmp/gpg&lt;br /&gt;
# write multi-line to file for documentation copy &amp;amp; paste&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; /var/tmp/gpg/tom.pubkey.asc&lt;br /&gt;
-----BEGIN PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
mQINBGgMJ7ABEACwllLJu87blFKJ8aZMR7pCjRzhhp266Rjxz7071iow43a7FkvN&lt;br /&gt;
pcXmYsuwW4dLhqA+Sose7Fjo9o9+7bOLcBAso9x9hk55+pDQm67wyXmxp+7pWVhj&lt;br /&gt;
hdLBsdB4faLQDHkHymKUs/UKRViN0an/6nARxVyah58Dh/OcnSIv0bnozze8YRJX&lt;br /&gt;
aklCs+OF2Jv+gBH5VWNMLloX+l+MsBYj9N14MsMeWJ8lSNFWBl/SOBGuOftZbljp&lt;br /&gt;
qb8dBZRo/4OR/Dr5zCUQ1KuPu2wFKfMRwi3NEdmUKpFf/U7Ydn7ZK2T+ZKl+x1eb&lt;br /&gt;
+0I0ZM0DgaTYTqd82wlag1hfrYM7SONYb0C03x5T4y+CsG9IchgQ2yihYIKgHOIW&lt;br /&gt;
Wiz6vC4N4EKmuKAqCOGS/gzp7xDqzXl2R2sWHyRuOn3yUr2z9HdDk2sjnobtaVli&lt;br /&gt;
wYaIoes9zrBgunLoK9S0FaHzSPX0FGwygV50E73BFxJBmL6eHeRVuYOi0FkAQmsN&lt;br /&gt;
dJeOvpCwKgBModyPbxin78KKbgF/0OnxWL+Zde6+J5l+aW81xbwNZYuyxWHSb7m3&lt;br /&gt;
2RM4dXhxAWM2cBQ5+b5yKopO8T4OzKl5C/rYzhuEYqpSEQJccFNHmQexkwqACVNl&lt;br /&gt;
h/D97jm0580ctnGCZuNzmLlsXX2mzqOj6UU2LlUFy0HT5tr93KBA+HkGhwARAQAB&lt;br /&gt;
tEBUb20gR3JpZmZpbmcgKE9TRSBQR1AgS2V5IDQtMjUtMjAyNSkgPHRvbS5ncmlm&lt;br /&gt;
ZmluZ0B0dXRhbm90YS5jb20+iQJRBBMBCgA7FiEEEzAJATSKmFEVZ5Fl+xN6Yz/R&lt;br /&gt;
60wFAmgMJ7ACGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQ+xN6Yz/R&lt;br /&gt;
60xHURAAqIUawudDI3dmIVPa/RHTOusoJA4KIXLNCMiILWd3iwZQFQNrt6YHpwJU&lt;br /&gt;
pyvsXAM4QWd/qt0D9IF6K9waOIA5ipX0yXFVxZ0V1BQ6aq3cK1r+NvQUcLJzS02W&lt;br /&gt;
T9UIJtHOs+8EbIIS6ybcnxS6RARinrJpTkoCWspWXMDnXcX3n4pbbhHQLViswf1C&lt;br /&gt;
tOE7uSfNPcxGLK4cYLxLL1VHC45eB2CTEAxfXSavCPI62IcYkZBdwWz7E8q1QpsP&lt;br /&gt;
vxgxe31b+v9NcaxW5tc2/4NwaObqKSZYlhK/pce3X18+uWzpmE3ubhPb7Ptb5GLo&lt;br /&gt;
42U9ymRFg7a14VFfq+wcwSlZR01o7Q2FofAOFpX+EoDBkughAX6hWyYxErJ4vD7k&lt;br /&gt;
ogYX25J5suxrixkTzDMJ0cCsZyt/Bu0liVnojaETUhrNUwBp7Rz7xx5x6Go/sZHK&lt;br /&gt;
mzhCe1q4xwSHeTZTjyG3oby4KDPgb0WEKCdUpa5BobgT9goGGXjCxe9dS8ZVUu4I&lt;br /&gt;
bso+h/SK95nmgsl/EDrmDXvWOh/Zy76GixCq48ydEkGbVz/6ri1+pD0NXYN/ijAu&lt;br /&gt;
h6EsLnoBLQCLlYYsBTfg31X2Sbzigeloy6iRWoHtCOAfI2Azdhby+BCGuSIvUOXa&lt;br /&gt;
Q4CQjmjYpsx7nwtjWOgCZ4rObTekj4O9ZnI8Gtxfpzy1gFdyfw65Ag0EaAwnsAEQ&lt;br /&gt;
ANnD6PMPT0CU1RqbAQtVw7eJksV96+tl/xG8mtje631n2uBe9WzyLch0fgC99eID&lt;br /&gt;
ZDGXfJUEdODuI9/H8037PnJmmMtP2eP1c/ztrql6pxPj9c0jIRWjtwmNhyYNaaEn&lt;br /&gt;
i0JyLz5SiTbuftlHXaKhVTuLc/Qp44FH5XK6LVHphDR8Ck43Mhj7enfvGvmAUgLW&lt;br /&gt;
OLQMst84oOCywYX+nUmov2rCIhuc6RhX4OcOBZcEA2W/CSsoNXR4To9mn8Gg3/dH&lt;br /&gt;
ZKS/3sDwJQxjFvkqc89+aTPY85TBoUGBUzbQG+KFQgDyVt4kABK1iyUA1PKZOb4Q&lt;br /&gt;
MZJnR9g0UI/ctfrOpz4hhEFaQ+rEYwdm5MSXOQGfjrnGu3t85IQzmxUXovqmfsjn&lt;br /&gt;
oFPSPd/91/rJJKxci+rCX7CpQSObPrwHNgPNQ5zleDV7d9/u9UaGRFeOaaM+abd0&lt;br /&gt;
RhPh4nJWbDdNOWpj3pxJkG3tzmbazBogxTq0SDRP8wvBAD0JYESoPVGWQ6czlTnu&lt;br /&gt;
T0ov9QKMb21mfUQ6DmfxTFQbkr1g1r2uYfJ1TbP0AcAK+Q/IMtt8F7chulfAe7/0&lt;br /&gt;
9nk7HwqWHTkj8+YB9+Ro2hkUTpL57uEYdG/ukGODfTNhu02wxG02zlYFsTyd/H62&lt;br /&gt;
VIgT1Cpf5HBb73lzdiSVtl45C34Fwu8ZO6dBdmk2c1nFABEBAAGJAjYEGAEKACAW&lt;br /&gt;
IQQTMAkBNIqYURVnkWX7E3pjP9HrTAUCaAwnsAIbDAAKCRD7E3pjP9HrTNxGD/wN&lt;br /&gt;
syvVZxm4hyw4l8U6J3B/3rKAup+l7GQCXthNK+f3YPwWdWc8DOo3kBrP4ppR5Ry9&lt;br /&gt;
YKb700wBDAYwWfy+ZJPHMi0vVUf8kX2QQEj4sFZHj9suTFvfLdsLTAhNtRXVtZiu&lt;br /&gt;
xfr1T3R3T0XSSFFdhiBO+BYRnlgFRiiR9FCTDaxrLRfhAhOwC6LHOarHnRi5nQS8&lt;br /&gt;
2PaHIYbWN7c5CdpH9dsPUt3xi1sEf8E87HTZo30Of/FYtB4eTOdx2DMqKscbJvZS&lt;br /&gt;
1ugK+2v7DMaiBMZCfbZSVNjn8+VcTOPW5KzJFsVR7UmfvTZu6c3jrshHuPOSguT7&lt;br /&gt;
l63AcfrJZOJe+djndWws2u0FpyMu0AHoS2r3EtBd/OydjEKG2P7qFb3KX9I9Tv35&lt;br /&gt;
zQmpHc4e2TJTYKpXyfarzgKFuUfOmZpm8maUTqFdEBL6pgwi1zcQ704g7Kzo/YUr&lt;br /&gt;
dHTA5yQ2WBBsrVKAZIt6Llkt0jIkpSyjjs5CAPJ2jsg61nq4uYw7w3jpwe80nbyc&lt;br /&gt;
7GgvdkJlTS7TfcYk3vlDQOQBpXqDZagQVUT8jc6mGiY/jbSzjGNt/8qObKSywFLY&lt;br /&gt;
XnxLVnGhKyzsWhR5fEbUCqywwc/c14gbjNguNZbU7e0Krf9ggYoglfPIOOp8XDX1&lt;br /&gt;
XwH+EXkSGW96dHXIYidONcMxClnA04zZY52Sr/r6Lw==&lt;br /&gt;
=UsaD&lt;br /&gt;
-----END PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
EOF&lt;br /&gt;
gpg --homedir /var/ossec/.gnupg --import /var/tmp/gpg/tom.pubkey.asc&lt;br /&gt;
popd&lt;br /&gt;
&lt;br /&gt;
# add marcin&#039;s email (that matches an email on a UID of his key above) to the space-delimited &amp;quot;recipients&amp;quot; variable&lt;br /&gt;
vim /var/ossec/sent_encrypted_alarm.settings&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I sent him an email asking him to confirm that it&#039;s working&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Tom,&lt;br /&gt;
&lt;br /&gt;
Can you please confirm that you&#039;re now receiving alerts from wazuh?&lt;br /&gt;
&lt;br /&gt;
Wazuh is our HIDS (Host-Based Intrusion Detection System). It&#039;s a fork of the HIDS and FIM (File Integrity Monitor) OSSEC. Because it sometimes sends sensitive information (eg diffs of config files with passwords), it&#039;s important that we encrypt its email notifications end-to-end with PGP.&lt;br /&gt;
&lt;br /&gt;
And because someone who compromises the server could &amp;quot;clean up&amp;quot; after themselves, these (off-server) alerts are critical to post-compromise investigations.&lt;br /&gt;
&lt;br /&gt;
For more info, see:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Wazuh&lt;br /&gt;
 * https://en.wikipedia.org/wiki/OSSEC&lt;br /&gt;
 * https://documentation.wazuh.com/current/getting-started/index.html&lt;br /&gt;
&lt;br /&gt;
Out-of-the-box, Wazuh has a ton of features, but probably where we use it the most is its ingestion of apache&#039;s mod_security WAF and its tie-in to Wazuh&#039;s Active Response. If an IP is found doing something bad (eg multiple consecutive 403 responses, such as a brute-force attack on wordpress [or ssh]), then the IP will get temp blocked by the firewall for 10 minutes. If it does it again shortly after the ban is lifted, it&#039;ll be banned for 12 hours. If again, 1 day. Then 2 days. Then 4 days. And the max ban for 5x repeat offenses is 8 days&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/OpenSourceEcology/ansible/blob/master/hetzner3/roles/maltfield.wazuh/templates/ossec.conf.j2#L256-L271&lt;br /&gt;
&lt;br /&gt;
It also has rootkit detection, and lots of other useful alerts that &amp;quot;just work&amp;quot; out of the box.&lt;br /&gt;
&lt;br /&gt;
Please confirm that you&#039;re now receiving encrypted wazuh alerts.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried to add Tom to our ops google groups email list, but it said I wasn&#039;t allowed to add members outside of our google workspace&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
An error occurred&lt;br /&gt;
1 user is outside of your organization. Based on your group or organization settings, you can only add organization users to this group. Contact your group owner or domain administrator for help.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I checked our user&#039;s group. it appears that Tom doesn&#039;t have an account @opensourceecology.org in gsuite&lt;br /&gt;
# I found the setting to change that here https://admin.google.com/ac/managedsettings/864450622151/GROUPS_SHARING_SETTINGS_TAB&lt;br /&gt;
## https://support.google.com/a/thread/63692725/&lt;br /&gt;
## https://support.google.com/a/answer/167097&lt;br /&gt;
# I checked the box that said &amp;quot;Group owners can allow external members&amp;quot;&lt;br /&gt;
## curiously the subline said &amp;quot;Organization admins can always add external members&amp;quot; – but I&#039;m a damn org admin, and I couldn&#039;t add him :/&lt;br /&gt;
# I tried to add him again, but I got the same error&lt;br /&gt;
# this time I went to the group settings https://groups.google.com/a/opensourceecology.org/g/REDACTED/settings&lt;br /&gt;
# I found the &amp;quot;allow external members&amp;quot; and changed it from &amp;quot;off&amp;quot; to &amp;quot;on&amp;quot; and clicked &amp;quot;save changes&amp;quot;&lt;br /&gt;
## this wasn&#039;t possible before. So first I had to change the workspace-wide settings to allow me to change the groups-specific settings. now it&#039;s changed.&lt;br /&gt;
# this time it worked.&lt;br /&gt;
# I sent an email to our ops google group, asking Tom to reply if he saw it&lt;br /&gt;
# ...&lt;br /&gt;
# I checked-in on hetzner2 to make sure it rebooted this morning&lt;br /&gt;
# looks like the cron is set to reboot at 10:40 UTC every day, and – indeed – uptime says it&#039;s been online for a bit less than 13 hours. And its last boot time was today at 10:41:25&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# uptime&lt;br /&gt;
 23:30:25 up 12:49,  7 users,  load average: 1.02, 0.98, 0.74&lt;br /&gt;
[root@opensourceecology ~]# journalctl | head&lt;br /&gt;
-- Logs begin at Sat 2025-04-26 10:41:25 UTC, end at Sat 2025-04-26 23:30:26 UTC. --&lt;br /&gt;
Apr 26 10:41:25 localhost systemd-journal[129]: Runtime journal is using 8.0M (max allowed 3.1G, trying to leave 4.0G free of 31.2G available → current limit 3.1G).&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpuset&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpu&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpuacct&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Linux version 3.10.0-1160.119.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Tue Jun 4 14:43:51 UTC 2024&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: e820: BIOS-provided physical RAM map:&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009c7ff] usable&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: BIOS-e820: [mem 0x000000000009c800-0x000000000009ffff] reserved&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /etc/cron.d/reboot &lt;br /&gt;
# 2025-04-24: temp hack for unstable hetzner2 while we build-out hetzner3 to replace it&lt;br /&gt;
40 10 * * * root /sbin/reboot&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Sat Apr 26 23:31:32 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like we&#039;ll have ~2 minutes of downtime every day in the very early morning in the US. I can live with that.&lt;br /&gt;
# and grub clearly is fixed&lt;br /&gt;
# oh, also the RAID looks healthy&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I asked Tom for his GitHub account profile username, so I can grant him write access to our OSE ansible repo&lt;br /&gt;
# I updated Tom&#039;s new ssh key to his authorized_keys file on hetzner2&lt;br /&gt;
# I sent Tom an email asking to confirm his access to hetzner2&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 25, 2025=&lt;br /&gt;
# I woke up this morning and discovered the wiki was offline&lt;br /&gt;
# I tried to ssh into the server; it&#039;s not responding&lt;br /&gt;
# I figured I&#039;d log into the hetzner wui, but – uhh – the credentials are in keepass and live on the server&lt;br /&gt;
# I mitigated this by giving Marcin a copy of the keepass file on his veracrypt drive, but he since changed the password a month or two ago, and we don&#039;t have a new local copy&lt;br /&gt;
# I sent an email to Marcin asking him to login to hetzner wui and boot hetzner2. if it doesn&#039;t come-up, then I&#039;ll have to get the password from him so I can load it in the wui from a rescue disk&lt;br /&gt;
# oh, I did find the new hetzner password in my personal keepass&lt;br /&gt;
# I logged-in, and I found the server was listed as being on. But I can&#039;t ping it. I gave it an &amp;quot;automatic hardware reset&amp;quot; from the wui&lt;br /&gt;
# I&#039;ll give it a few minutes before trying the rescue system&lt;br /&gt;
# their rescue systems are much nicer for their cloud product than their dedicated server product&lt;br /&gt;
# it looks like I have two options&lt;br /&gt;
## rescue boot mode: where I&#039;m given ssh access&lt;br /&gt;
## vnc&lt;br /&gt;
# the problem with the rescue boot is that – if this is a grub issue – I wouldn&#039;t be able to &amp;quot;see&amp;quot; the error&lt;br /&gt;
# I enabled VNC and gave the server a reboot&lt;br /&gt;
# I was able to connect via vnc, but it was the damn installation wizard for almalinux. I quit the installation, and the vnc session died.&lt;br /&gt;
# damn, I guess vnc won&#039;t let me see the boot process, after all&lt;br /&gt;
# instead I tried the &amp;quot;rescue system&amp;quot;&lt;br /&gt;
# that didn&#039;t work; I can&#039;t access ssh on either of the IP addresses&lt;br /&gt;
# the docs say to activate the rescue system and then reboot it; that&#039;s what I did https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system/&lt;br /&gt;
# this time I fully shut down the server, and then I enabled the rescue system (while it&#039;s off)&lt;br /&gt;
# I went back to the Reset tab, and it&#039;s still off. So I booted it&lt;br /&gt;
# somehow I was able to login from my ose vm using my personal ssh key, but with user root&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~$ ssh -v root@138.201.84.223&lt;br /&gt;
OpenSSH_9.2p1 Debian-2+deb12u5, OpenSSL 3.0.15 3 Sep 2024&lt;br /&gt;
debug1: Reading configuration data /home/user/.ssh/config&lt;br /&gt;
debug1: Reading configuration data /etc/ssh/ssh_config&lt;br /&gt;
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files&lt;br /&gt;
debug1: /etc/ssh/ssh_config line 21: Applying options for *&lt;br /&gt;
debug1: Connecting to 138.201.84.223 [138.201.84.223] port 22.&lt;br /&gt;
debug1: Connection established.&lt;br /&gt;
...&lt;br /&gt;
Linux rescue 6.12.19 #1 SMP Fri Mar 14 05:34:52 UTC 2025 x86_64&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
  Welcome to the Hetzner Rescue System.&lt;br /&gt;
&lt;br /&gt;
  This Rescue System is based on Debian GNU/Linux 12 (bookworm) with a custom kernel.&lt;br /&gt;
  You can install software like you would in a normal system.&lt;br /&gt;
&lt;br /&gt;
  To install a new operating system from one of our prebuilt images, run &#039;installimage&#039; and follow the instructions.&lt;br /&gt;
&lt;br /&gt;
  Important note: Any data that was not written to the disks will be lost during a reboot.&lt;br /&gt;
&lt;br /&gt;
  For additional information, check the following resources:&lt;br /&gt;
	Rescue System:           https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system&lt;br /&gt;
	Installimage:            https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage&lt;br /&gt;
	Install custom software: https://docs.hetzner.com/robot/dedicated-server/operating-systems/installing-custom-images&lt;br /&gt;
	other articles:          https://docs.hetzner.com/robot&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Rescue System (via Legacy/CSM) up since 2025-04-25 17:24 +02:00&lt;br /&gt;
&lt;br /&gt;
Hardware data:&lt;br /&gt;
&lt;br /&gt;
   CPU1: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Cores 8)&lt;br /&gt;
   Memory:  64153 MB (Non-ECC)&lt;br /&gt;
   Disk /dev/sda: 250 GB (=&amp;gt; 232 GiB) &lt;br /&gt;
   Disk /dev/sdb: 512 GB (=&amp;gt; 476 GiB) &lt;br /&gt;
   Total capacity 709 GiB with 2 Disks&lt;br /&gt;
&lt;br /&gt;
Network data:&lt;br /&gt;
   eth0  LINK: yes&lt;br /&gt;
		 MAC:  90:1b:0e:94:07:c4&lt;br /&gt;
		 IP:   138.201.84.223&lt;br /&gt;
		 IPv6: 2a01:4f8:172:209e::2/64&lt;br /&gt;
		 Intel(R) PRO/1000 Network Driver&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I was able to mount the root drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 0/2 pages [0KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
root@rescue ~ # mount /dev/md2 /mnt&lt;br /&gt;
root@rescue ~ # ls /mnt&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ # ls /mnt/home&lt;br /&gt;
b2user  crupp  hart     lberezhny  marcin      stagingsync  wp&lt;br /&gt;
cmota   Flipo  jthomas  maltfield  not-apache  tgriffing&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what the point of this is; I can&#039;t fix it if I can&#039;t watch it boot and see what&#039;s breaking&lt;br /&gt;
# ok, at the bottom of the docs, hetnzer lists another option = xKVM Rescue System https://docs.hetzner.com/robot/dedicated-server/virtualization/vkvm/&lt;br /&gt;
# it specifically says that&#039;s for debugging boot issues&lt;br /&gt;
# last thing before I try that: I downloaded a local copy of the keepass files from hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~/tmp/hetzner2$ rsync -av --progress root@138.201.84.223:/mnt/etc/keepass ./etc-keepass-20250525&lt;br /&gt;
receiving incremental file list&lt;br /&gt;
created directory ./etc-keepass-20250525&lt;br /&gt;
keepass/&lt;br /&gt;
keepass/passwords.kdbx&lt;br /&gt;
		 46,142 100%   44.00MB/s    0:00:00 (xfr#1, to-chk=6/8)&lt;br /&gt;
keepass/passwords.kdbx.20170728.bak&lt;br /&gt;
		  4,590 100%    4.38MB/s    0:00:00 (xfr#2, to-chk=5/8)&lt;br /&gt;
keepass/passwords.kdbx.20170804.bak&lt;br /&gt;
		  4,590 100%    4.38MB/s    0:00:00 (xfr#3, to-chk=4/8)&lt;br /&gt;
keepass/passwords.kdbx.20190820.bak&lt;br /&gt;
		 33,726 100%  143.20kB/s    0:00:00 (xfr#4, to-chk=3/8)&lt;br /&gt;
keepass/passwords.kdbx.20190909.bak&lt;br /&gt;
		 34,238 100%   71.75kB/s    0:00:00 (xfr#5, to-chk=2/8)&lt;br /&gt;
keepass/passwords.kdbx.20250316.bak&lt;br /&gt;
		 45,406 100%   94.55kB/s    0:00:00 (xfr#6, to-chk=1/8)&lt;br /&gt;
keepass/passwords.kdbxs.20180525.bak&lt;br /&gt;
		 27,102 100%   56.31kB/s    0:00:00 (xfr#7, to-chk=0/8)&lt;br /&gt;
&lt;br /&gt;
sent 161 bytes  received 196,407 bytes  35,739.64 bytes/sec&lt;br /&gt;
total size is 195,794  speedup is 1.00&lt;br /&gt;
user@ose:~/tmp/hetzner2$ &lt;br /&gt;
&lt;br /&gt;
user@ose:~/tmp/hetzner2$ du -sh etc-keepass-20250525/keepass/*&lt;br /&gt;
48K	etc-keepass-20250525/keepass/passwords.kdbx&lt;br /&gt;
8.0K	etc-keepass-20250525/keepass/passwords.kdbx.20170728.bak&lt;br /&gt;
8.0K	etc-keepass-20250525/keepass/passwords.kdbx.20170804.bak&lt;br /&gt;
36K	etc-keepass-20250525/keepass/passwords.kdbx.20190820.bak&lt;br /&gt;
36K	etc-keepass-20250525/keepass/passwords.kdbx.20190909.bak&lt;br /&gt;
48K	etc-keepass-20250525/keepass/passwords.kdbx.20250316.bak&lt;br /&gt;
28K	etc-keepass-20250525/keepass/passwords.kdbxs.20180525.bak&lt;br /&gt;
user@ose:~/tmp/hetzner2$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so this time was the same as the rescue system, except I choose &amp;quot;xKVM&amp;quot; instead of &amp;quot;Linux&amp;quot; in the &amp;quot;Operationg System&amp;quot; dropdown&lt;br /&gt;
# strange, it gave me an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Public key authentication is not available for the selected operating system.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I unselected my ssh key, and chose &amp;quot;no key&amp;quot; instead&lt;br /&gt;
# it gave me a URL and a password. I booted the server, but the URL didn&#039;t load (&amp;quot;Unable to connect&amp;quot; error)&lt;br /&gt;
# ok, it took a few minutes and had a self-signed cert&lt;br /&gt;
# I bypassed the cert error, and entered the username and password into the basic auth popup. It failed! Could I really have been MITM&#039;d?&lt;br /&gt;
# I immediately shut down the server from the wui, and I tried again.&lt;br /&gt;
# this time I was able to login – both from ssh and in the wui.&lt;br /&gt;
# as soon as it opened, I saw the error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
No more network devices&lt;br /&gt;
&lt;br /&gt;
Booting from Hard Disk...&lt;br /&gt;
.&lt;br /&gt;
error: symbol &#039;grub_calloc&#039; not found.&lt;br /&gt;
Entering rescue mode...&lt;br /&gt;
grub rescue&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I wonder if this is grub or grub2. I didn&#039;t have a binary &amp;quot;grub-install&amp;quot; before. I assumed it was an error with the hetzner docs when I did &amp;quot;grub2-install&amp;quot; instead, which said it worked (there was a warning that the docs said were safe to ignore)&lt;br /&gt;
# curoiusly, the opposite is true for the ssh session in vkvm: I have grub-install but not grub2-install&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@vKVM-rescue ~ # which grub-install&lt;br /&gt;
/usr/sbin/grub-install&lt;br /&gt;
root@vKVM-rescue ~ # &lt;br /&gt;
root@vKVM-rescue ~ # which grub2-install&lt;br /&gt;
root@vKVM-rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the docs in question https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
# I don&#039;t want to fuck with the grub without first taking a backup of these disks. But, uh, it looks like I can&#039;t access the RAID from inside this vkvm setup&lt;br /&gt;
# yeah, that&#039;s one of the limitations listed for VKVM https://docs.hetzner.com/robot/dedicated-server/virtualization/vkvm/#raid-controllers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Configured units are passed through as SCSI devices to the VM. However it is not possible to access the controller. Please use the regular Hetzner Rescue System for this purpose.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I shutdown VKVM and booted it into the regular rescue mode&lt;br /&gt;
# it took a few minutes to get back into the old rescue system, but here I can use the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS&lt;br /&gt;
loop0     7:0    0   3.4G  1 loop  &lt;br /&gt;
sda       8:0    0 476.9G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 &lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 &lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 &lt;br /&gt;
sdb       8:16   0 232.9G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 &lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 &lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 &lt;br /&gt;
root@rescue ~ # mkdir /mnt/md1&lt;br /&gt;
root@rescue ~ # mkdir /mnt/md2&lt;br /&gt;
root@rescue ~ # mount /dev/md1 /mnt/md1&lt;br /&gt;
root@rescue ~ # mount /dev/md2 /mnt/md2&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created a dir for these backups&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # mkdir /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # chown root:root /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # chmod 0700 /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first I made a backup from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # rsync -av --progress /mnt/md1 /mnt/md2/var/tmp/20250425-grub-fail/md1.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
...&lt;br /&gt;
md1/grub2/locale/zh_TW.mo&lt;br /&gt;
		 30,882 100%   31.38kB/s    0:00:00 (xfr#345, to-chk=0/355)&lt;br /&gt;
md1/lost+found/&lt;br /&gt;
&lt;br /&gt;
sent 399,450,301 bytes  received 6,709 bytes  159,782,804.00 bytes/sec&lt;br /&gt;
total size is 399,330,989  speedup is 1.00&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# then I figured I&#039;d make a backup of the two disk partitions directly, but I couldn&#039;t even mount it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # umount /mnt/md1&lt;br /&gt;
root@rescue ~ # mkdir /mnt/sda2&lt;br /&gt;
root@rescue ~ # mkdir /mnt/sdb2&lt;br /&gt;
root@rescue ~ # mount /dev/sda2 /mnt/sda2&lt;br /&gt;
mount: /mnt/sda2: unknown filesystem type &#039;linux_raid_member&#039;.&lt;br /&gt;
	   dmesg(1) may have more information after failed mount system call.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried this command (from the docs), which I skipped before because it said that the next command (grub-install) was enough; sure enough, it didn&#039;t work https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-mkdevicemap -n&lt;br /&gt;
grub-mkdevicemap: error: cannot open /boot/grub/device.map.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I investigated this before, and I thought I decided we&#039;re using grub2, not grub1&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # mount /dev/md1 /mnt/md1&lt;br /&gt;
root@rescue ~ # ls /mnt/md1/&lt;br /&gt;
config-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi&lt;br /&gt;
grub&lt;br /&gt;
grub2&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-327.18.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-514.26.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64kdump.img&lt;br /&gt;
initrd-plymouth.img&lt;br /&gt;
lost+found&lt;br /&gt;
symvers-3.10.0-1127.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-1160.119.1.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-327.18.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-514.26.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, shit, even the grub-install command is v2 https://askubuntu.com/questions/107486/how-to-know-the-version-of-grub&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-install --version&lt;br /&gt;
grub-install (GRUB) 2.06-13+deb12u1&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, this indicates we&#039;re not using lilo https://askubuntu.com/questions/24459/how-do-i-find-out-which-boot-loader-i-have&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2/etc/ | grep lilo&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can dd straight from the disk to read the MBR. And, yeah, it appears we are using grub via MBR .. and this info is stored on the disks, not the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # dd if=/dev/md1 bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # dd if=/dev/sda bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
214fb5736d1e5ad63e515dc2fffe44bd928cd8dab2c019dc11fb9fcaef5ea90dbf51f1ac507ab1cfbbe74ff&lt;br /&gt;
ZRr=&lt;br /&gt;
`|f	&lt;br /&gt;
\|f1&lt;br /&gt;
GRUB &lt;br /&gt;
Geom&lt;br /&gt;
Hard Disk&lt;br /&gt;
Read&lt;br /&gt;
 Error&lt;br /&gt;
DA/jjF&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # dd if=/dev/sdb bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
ZRr=&lt;br /&gt;
`|f	&lt;br /&gt;
\|f1&lt;br /&gt;
GRUB &lt;br /&gt;
Geom&lt;br /&gt;
Hard Disk&lt;br /&gt;
Read&lt;br /&gt;
 Error&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# idk what to do; I tried the grub-install again, but it gives me this error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-install /dev/sda&lt;br /&gt;
grub-install: error: /usr/lib/grub/i386-pc/modinfo.sh doesn&#039;t exist. Please specify --target or --directory.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # grub-install /dev/sdb&lt;br /&gt;
grub-install: error: /usr/lib/grub/i386-pc/modinfo.sh doesn&#039;t exist. Please specify --target or --directory.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried creating a chroot of our real raid disks first&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ # umount /mnt/md1&lt;br /&gt;
root@rescue ~ # chroot-prepare /mnt/md2&lt;br /&gt;
root@rescue ~ # chroot /mnt/md2&lt;br /&gt;
root@rescue / # ls /boot&lt;br /&gt;
root@rescue / # mount /dev/md1 /boot&lt;br /&gt;
root@rescue / # ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi&lt;br /&gt;
grub&lt;br /&gt;
grub2&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-327.18.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-514.26.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64kdump.img&lt;br /&gt;
initrd-plymouth.img&lt;br /&gt;
lost+found&lt;br /&gt;
symvers-3.10.0-1127.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-1160.119.1.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-327.18.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-514.26.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
root@rescue / # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I then tried the grub install again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue / # grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
root@rescue / #&lt;br /&gt;
&lt;br /&gt;
root@rescue / # grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
root@rescue / # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I exited the chroot and shutdown the rescue system&lt;br /&gt;
# I activated the VKVM resuce system, and booted it again&lt;br /&gt;
# when I connected to the KVM wui, I was shown a password prompt. So I think booting works!&lt;br /&gt;
# I rebooted it from the ssh&lt;br /&gt;
# and now I can ssh into the real system&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~$ autossh opensourceecology.org&lt;br /&gt;
Last login: Thu Apr 24 23:12:44 2025 from 146.70.199.15&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and now the wiki loads too&lt;br /&gt;
# I did another reboot test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ sudo su -&lt;br /&gt;
[sudo] password for maltfield: &lt;br /&gt;
Last login: Thu Apr 24 16:25:15 UTC 2025 on pts/0&lt;br /&gt;
[root@opensourceecology ~]# reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
Last login: Fri Apr 25 16:29:21 2025 from 185.204.1.184&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# idk, my takeaway is that either one or some of these assumptions are correct&lt;br /&gt;
## grub-install needs to be run *after* the RAID sync is finished&lt;br /&gt;
## grub-install needs to be run on *both* the new *and* the old disk&lt;br /&gt;
## grub-install needs to be run inside a chroot on the rescue system&lt;br /&gt;
# anyway, we&#039;re stable again&lt;br /&gt;
# I got an email from Marcin saying Tom could help with the migrations. I sent him some wiki articles to get caught-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Tom,&lt;br /&gt;
&lt;br /&gt;
I&#039;ll try to get you ssh access on hetzner2 soon. In the meantime, please read the following articles:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Hetzner2&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
&lt;br /&gt;
I&#039;ve started preparing draft &amp;quot;change tickets&amp;quot; for migrating each of the websites from hetzner2 to hetzner3. Note that some of these are not fully tested, so you&#039;ll want to execute them manually and make corrections as-needed.&lt;br /&gt;
&lt;br /&gt;
Please also read-through these:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_store_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_microfactory_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_fef&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_oswh&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_phplist_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_wiki_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
(There&#039;s also one CHG for the forum that I think needs to be made)&lt;br /&gt;
&lt;br /&gt;
The next item TODO is to finish the migration plan for these websites:&lt;br /&gt;
&lt;br /&gt;
 1. www.opensourceecology.org (osemain)&lt;br /&gt;
 2. www.openbuildinginstiture.org (obi)&lt;br /&gt;
&lt;br /&gt;
We decided that there would be 2 simultaneous versions of obi:&lt;br /&gt;
&lt;br /&gt;
1. A static site scraped with curl on hetzner3&lt;br /&gt;
2. The (broken) dynamic wordpress site on hetzner3&lt;br /&gt;
&lt;br /&gt;
And we decided that there would be 3 simultaneous versions of osemain:&lt;br /&gt;
&lt;br /&gt;
1. The live/current site on hetzner2&lt;br /&gt;
2. A static site scraped with curl on hetzner3&lt;br /&gt;
3. The (broken) dynamic wordpress site on hetzner3&lt;br /&gt;
&lt;br /&gt;
To have multiple sites with the same domain on the same server, we bought a second IPv4 address (FeF isn&#039;t setup with IPv6). This week I just finished updating the hetzer3 server to persist this new IPv4 address.&lt;br /&gt;
&lt;br /&gt;
The next item for you would be to update our ansible to push out new vhosts (in nginx, varnish, and apache) for the static sites that are bound to the second IPv4 address using the same hostname.&lt;br /&gt;
&lt;br /&gt;
Please read-through the ansible playbook and roles (most importantly for nginx, varnish, and apache) to understand how they&#039;re provisioned&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/OpenSourceEcology/ansible&lt;br /&gt;
&lt;br /&gt;
Since you have access to hetzner3, you can also poke around (read-only please) the configs for these three web services to understand how ansible provisions them.&lt;br /&gt;
&lt;br /&gt;
Once you&#039;ve updated and pushed-out the new vhosts with ansible, you&#039;ll need to update the migration plan&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_obi_to_hetzner3&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
And then you&#039;ll want to go-through each migration plan to create a temp &amp;quot;snapshot&amp;quot; of all the sites on hetzner3, where Marcin &amp;amp; Catarina can do a thorough verification of each site (by updating /etc/hosts) before we do the *real* migration -- which is nearly the same as the &amp;quot;snapshot&amp;quot; except we actually migrate DNS.&lt;br /&gt;
&lt;br /&gt;
Please let me know when you&#039;ve finished reading the above articles.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/24/25 22:16, REDACTED@tutanota.com wrote:&lt;br /&gt;
&amp;gt; Michael;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I need to reset my ssh key on hetzner2. Can you use the same as on 3 or best to generate a new one?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I spoke with Marcin and I think I can help with the admin, as I have time available.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Can you give a run-down of its status and what needs to be done for completing the migration to hetzner3?&lt;br /&gt;
&amp;gt; -- &lt;br /&gt;
&amp;gt; Tom Griffing&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 24, 2025=&lt;br /&gt;
# it&#039;s 05:00; I tried to login to the wiki, but I got an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, under that it says I&#039;m already logged-in?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
You are already logged in as Maltfield. Use the form below to log in as another user. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, let&#039;s start the CHG to replace the failing disk on hetzner 2 https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
# I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried to remove the first partition from the RAID, but it said I can&#039;t?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# apparently the docs say that if the RAID is healthy, you have to force it with &#039;--fail&#039; https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
# crap, I realized I have an issue in my CHG (we need two sysadmins for peer review *sigh*)&lt;br /&gt;
## I listed this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## but it should be this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, it looks like I first need to execute this, to force the RAID into a failure state&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I was able to remove it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# by 10:32 UTC, I submitted the request to hetzner to replace /dev/sdb = &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# it says they should do it within 2-4 hours&lt;br /&gt;
# meanwhile, I updated https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# at 08:00 my time, I checked and saw that we had an email come from hetzner at 06:36 (my time)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, crap. I tried to load the wiki CHG article, but there&#039;s an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry! This site is experiencing technical difficulties.&lt;br /&gt;
&lt;br /&gt;
Try waiting a few minutes and reloading.&lt;br /&gt;
&lt;br /&gt;
(Cannot access the database)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the server wasn&#039;t shutdown, and my screen session is still intact, but dmesg is being flooded with RAID and io errors&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
[11136.011313] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.011372] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11136.319267] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.319322] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.827642] EXT4-fs error: 5 callbacks suppressed&lt;br /&gt;
[11138.827693] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
[11138.827793] EXT4-fs: 5 callbacks suppressed&lt;br /&gt;
[11138.827841] EXT4-fs (md2): previous I/O error to superblock detected&lt;br /&gt;
[11138.835255] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835311] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835367] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11138.835472] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well anyway, I&#039;ll see if I can at least restart the RAID sync and install grub on the new disk&lt;br /&gt;
# son of a bitch, they removed the wrong drive!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 13:05:32 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT&lt;br /&gt;
sdb      8:16   0   477G  0 disk &lt;br /&gt;
sdc      8:32   0 232.9G  0 disk &lt;br /&gt;
├─sdc1   8:33   0    32G  0 part &lt;br /&gt;
├─sdc2   8:34   0   512M  0 part &lt;br /&gt;
└─sdc3   8:35   0 200.4G  0 part &lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
device node not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it shows a new drive (sdc) and and old drive (sdb)&lt;br /&gt;
# ugh, so now we have nothing in the raid?&lt;br /&gt;
# here&#039;s the new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdc | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# christ, so this new disk is half the size of our actual disk? what did they do?!?&lt;br /&gt;
# and now we have a prod server online with no redundancy. I can&#039;t tell them to put back-in the *correct* disk, or we&#039;ll have data loss&lt;br /&gt;
# I&#039;m going to stop all the web services before this disaster gets any worse&lt;br /&gt;
# great; io errors. this is a damn disaster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
Failed to stop apache2.service: Unit apache2.service not loaded.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and made partition backups, anyway&lt;br /&gt;
# wait, actually, it said that /dev/sdc = Crucial_CT250MX200SSD1_154410FA336C. That&#039;s our old /dev/sda&lt;br /&gt;
# so they *did* remove the right drive, but the re-insertion of the wrong drive pushed /dev/sda to /dev/sdc. That kinda breaks our ability to map the RAID, but let&#039;s at-least partition this new drive&lt;br /&gt;
# but this new drive isn&#039;t the right size. it&#039;s 512G while our old disk was 250G. I guess it&#039;s better to have too-big of a disk than too-small of a disk, but we won&#039;t be able to use that extra disk space. I&#039;m going to assume that they just didn&#039;t have 250G disks in-stock anymore.&lt;br /&gt;
# anyway, I tried to backup the partitions, but that wouldn&#039;t work since we&#039;re read-only&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
mkdir: cannot create directory ‘/var/tmp/chg.20250424_132010’: Read-only file system&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
chown: cannot access ‘/var/tmp/chg.20250424_132010’: No such file or directory&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what to do besides giving it a reboot, but that scares me&lt;br /&gt;
# I&#039;d like to take a backup, but I can&#039;t if I get read-only errors :(&lt;br /&gt;
# well, I guess that&#039;s why we made a backup before this. I don&#039;t think I have any option other than to reboot. and pray that grub is intact to bring it back.&lt;br /&gt;
# I gave it a reboot. If it doesn&#039;t come back, I&#039;ll try to boot to the rescue CD from within the hetzner wui&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date &amp;amp;&amp;amp; reboot&lt;br /&gt;
Thu Apr 24 13:24:18 UTC 2025&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&lt;br /&gt;
Failed to start reboot.target: Unit is not loaded properly: Input/output error.&lt;br /&gt;
See system logs and &#039;systemctl status reboot.target&#039; for details.&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# wtf, it can&#039;t even reboot it&#039;s so broken.&lt;br /&gt;
# I triggered a rest on the hetzner wui&lt;br /&gt;
# the server came back, and I immediately shutdown all services again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and triggered backups&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /etc/cron.d/backup_to_backblaze &lt;br /&gt;
20 07 * * * root time /bin/nice /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
20 04 03 * * root time /bin/nice /root/backups/backupReport.sh&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# time /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, sdc is gone. we have sda and sdb again, and sda is our original sda – as we wanted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I made a backup of the partitions; it&#039;s not surprising the sdb file is empty&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# pushd $chg_dir&lt;br /&gt;
/var/tmp/chg.20250424_133230 ~&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# du -sh ${chg_dir}/*&lt;br /&gt;
4.0K    /var/tmp/chg.20250424_133230/sda_parttable_mbr.bak&lt;br /&gt;
0       /var/tmp/chg.20250424_133230/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I copied the partition from sda to sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
Checking that no-one is using this disk right now ...&lt;br /&gt;
OK&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 62260 cylinders, 255 heads, 63 sectors/track&lt;br /&gt;
sfdisk:  /dev/sdb: unrecognized partition table type&lt;br /&gt;
&lt;br /&gt;
Old situation:&lt;br /&gt;
sfdisk: No partitions found&lt;br /&gt;
&lt;br /&gt;
New situation:&lt;br /&gt;
Units: sectors of 512 bytes, counting from 0&lt;br /&gt;
&lt;br /&gt;
   Device Boot    Start       End   #sectors  Id  System&lt;br /&gt;
/dev/sdb1          2048  67110912   67108865  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2      67112960  68161536    1048577  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3      68163584 488395120  420231537  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb4             0         -          0   0  Empty&lt;br /&gt;
Warning: partition 1 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not end at a cylinder boundary&lt;br /&gt;
Warning: no primary partition is marked bootable (active)&lt;br /&gt;
This does not matter for LILO, but the DOS MBR will not boot this disk.&lt;br /&gt;
Successfully wrote the new partition table&lt;br /&gt;
&lt;br /&gt;
Re-reading the partition table ...&lt;br /&gt;
&lt;br /&gt;
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)&lt;br /&gt;
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1&lt;br /&gt;
(See fdisk(8).)&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good, other than the complaint about not being able to boot from this disk; I&#039;ll check later what is LILO and if this will matter for raid grub&lt;br /&gt;
# I reloaded the partition table for this disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# blockdev --rereadpt /dev/sdb&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I added the new disk to the RAID, and it shows that it&#039;s starting to sync now. excellent&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm: added /dev/sdb1&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm: added /dev/sdb2&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
mdadm: added /dev/sdb3&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [&amp;gt;....................]  recovery =  0.0% (19712/33521664) finish=481.1min speed=1159K/sec&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, it looks like it&#039;s not syncing each partition of the RAID at the same time. it&#039;s doing md0 now and then it&#039;ll do the others after, I guess&lt;br /&gt;
# md0 is partition 1 (sda1/sdb1). That&#039;s *sigh* swap. It&#039;s 32GB.&lt;br /&gt;
# I kinda wish we&#039;d sync&#039;d /boot first. I don&#039;t think I can install grub until that&#039;s sync&#039;d. maybe?&lt;br /&gt;
# it says it&#039;s moving about 1024K/s. That&#039;s 1 MB per sec. 32G*1024 = 32,768 MB. That&#039;s 32,768 seconds / 60 = 546 minutes / 60 = 9 hours. Just for swap!&lt;br /&gt;
# assuming we have the same speed for the rest of the disk, that&#039;s 250 G * 1024 = 256,000 MB / 1 MB/s = 256,000 seconds. 256,000 seconds / 60 = 4,266.666666667 minutes / 60 = 4,266.666666667 = 71.11 hours. I guess we just have to accept the risk and hope that old /dev/sda with all our data doesn&#039;t fail within then next 3 days.&lt;br /&gt;
# I tried to go ahead and install grub on the new disk, but i got a command not found error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub-install /dev/sdb&lt;br /&gt;
-bash: grub-install: command not found&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub&lt;br /&gt;
grub2-bios-setup           grub2-glue-efi             grub2-mkconfig             grub2-mkpasswd-pbkdf2      grub2-probe                grub2-set-default&lt;br /&gt;
grub2-editenv              grub2-install              grub2-mkfont               grub2-mkrelpath            grub2-reboot               grub2-setpassword&lt;br /&gt;
grub2-file                 grub2-kbdcomp              grub2-mkimage              grub2-mkrescue             grub2-render-label         grub2-sparc64-setup&lt;br /&gt;
grub2-fstest               grub2-macbless             grub2-mklayout             grub2-mkstandalone         grub2-rpm-sort             grub2-syslinux2cfg&lt;br /&gt;
grub2-get-kernel-settings  grub2-menulst2cfg          grub2-mknetdir             grub2-ofpathname           grub2-script-check         grubby&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it should be &#039;grub2-install&#039; I tried that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that&#039;s two warnings but no errors; I&#039;ll take it.&lt;br /&gt;
# we&#039;re up to 12.4% on the RAID sync of swap. It&#039;s now going &amp;gt;50x faster than it was before; good news&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [==&amp;gt;..................]  recovery = 12.4% (4168832/33521664) finish=8.2min speed=59264K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# calculations at that speed would be 250*1024/58 = 4,413.793103448 seconds / 60 = 73 minutes. Oh, that&#039;s just over an hour.&lt;br /&gt;
# and now we&#039;re at 42.7%&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [========&amp;gt;............]  recovery = 42.7% (14334208/33521664) finish=6.6min speed=47845K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups are still running; I&#039;ll let them finish before starting-up the webservers again&lt;br /&gt;
# I wrote a status email to Marcin&lt;br /&gt;
# the backups still aren&#039;t finished&lt;br /&gt;
# I checked on the raid replication, and it shows md0 (swap) and md1 (boot) are both done. Horray! Now we just need to finish root (/), which is 9.8% done and going at 60 MB/s. Great!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:05:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [=&amp;gt;...................]  recovery =  9.8% (20767872/209984640) finish=50.5min speed=62429K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave the grub install a double-tap now that it&#039;s synced with the first disk; the output was the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the output of lsblk looks much nicer now, too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0 232.9G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups say they&#039;re 9% uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/backups/backup.log&lt;br /&gt;
...&lt;br /&gt;
2025/04/24 14:13:48 INFO  :&lt;br /&gt;
Transferred:        2.210G / 20.472 GBytes, 11%, 2.904 MBytes/s, ETA 1h47m20s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:      13m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg: 10% /20.472G, 2.997M/s, 1h43m59s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I decided to just kill the backup script and manually upload it without the bwlimit, so it&#039;ll go-out faster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# /bin/sudo -u b2user /bin/rclone -v copy /home/b2user/sync/daily_hetzner2_20250424_133017.tar.gpg b2:ose-server-backups&lt;br /&gt;
2025/04/24 14:15:20 INFO  :&lt;br /&gt;
Transferred:      116.500M / 20.472 GBytes, 1%, 1.958 MBytes/s, ETA 2h57m25s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:       1m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg:  0% /20.472G, 5.065M/s, 1h8m35s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# meanwhile we&#039;re at 24% on the RAID sync&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [====&amp;gt;................]  recovery = 23.9% (50200448/209984640) finish=101.1min speed=26325K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, important to note: our new disk doesn&#039;t say that it&#039;s failing :D&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# while the old disk says it&#039;s reached 100% of its lifecycle, the new disk says it&#039;s at – uhh – 96% of it&#039;s life? That doesn&#039;t sound very good :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78516&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       50&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3445&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       47&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   060   046   000    Old_age   Always       -       40 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       407132499909&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12839097351&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26313144762&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52083&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       33&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1449&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       20&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   061   049   000    Old_age   Always       -       39 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       600236629947&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18860233219&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11828985935&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Shame. I was hoping for at least something &amp;lt;50%. Well, I wonder how long that remaining 4% will last us :/&lt;br /&gt;
# ok, backups just finished; let&#039;s start the web services&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl start mariadb&lt;br /&gt;
[root@opensourceecology ~]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the wiki CHG with a status https://wiki.opensourceecology.org/wiki/Category:CHGs&lt;br /&gt;
# And I sent an email to Marcin recommending that he replace /dev/sda with an actual new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&lt;br /&gt;
Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&lt;br /&gt;
I was a bit disappointed to learn that hetzner replaced a disk with 0% &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for choosing the free disk replacement..&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&lt;br /&gt;
Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on replacing that one next week too, but I would recommend that you pay for a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&lt;br /&gt;
Do you authorize me selecting €41.18 for the replacement of /dev/sda on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# from the output above, our old drive said it had &amp;quot;Power_On_Hours&amp;quot; of 78516/24/365 = 8.96 years&lt;br /&gt;
# and our new drive says Power_On_Hours = 52083/24/365 = 5.95 years. Well that&#039;s better, I guess.&lt;br /&gt;
# oh wow, the power cycle count is crazy; our disk we only rebooted 50 times and the new one was only 33 times.&lt;br /&gt;
# also the SMART data for both of these drives has different keys (not just values). apparently it&#039;s very vendor-specific, so some of these comparisons are apples-to-oranges&lt;br /&gt;
# right, we&#039;re at 69.7% replication on root. I&#039;m going to go make breakfast and check-in again after&lt;br /&gt;
# ...&lt;br /&gt;
# over lunch, I realized that Marcin&#039;s last email was possibly hyperbolic panic&lt;br /&gt;
# he&#039;s worried that he just kicked-off a marketing campaign (for the apprenticeship), which now links to information on a broken website – where potential applicants can&#039;t read the info&lt;br /&gt;
# but I think the content actually *is* accessible, just not to Marcin&lt;br /&gt;
# when you&#039;re logged-into the wiki, the cookies bypass the cache. So, regretablly, when hetnzer2&#039;s backend is offline, Marcin sees an error&lt;br /&gt;
# but I&#039;d bet that the frontpage of all the websites and the recently-published apprenticeship info page that he&#039;s published &amp;amp; promoted are still online when he sees that error – for users who are *not* logged-into the site&lt;br /&gt;
# but if the backend site is broken for &amp;gt;24 hours, then the cache will cache the errors (not the content)&lt;br /&gt;
# as a short-term hack, I recommended that we setup a daily reboot of hetzner2 at 10:40 (a good buffer after the backups finish uploading)&lt;br /&gt;
# I asked Marcin if he&#039;d like me to setup a daily reboot at 10:40&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&lt;br /&gt;
Of course I agree it&#039;s not good, and we should migrate away from hetzner2 asap. And I do wish I had more bandwidth to finish the migration faster for you.&lt;br /&gt;
&lt;br /&gt;
But you have a varnish cache that caches pages for 24 hours. Even if your backend webserver and database are down, popular pages (like the frontpage of your wiki or a recent article that you&#039;ve recently promoted) should still load for users.&lt;br /&gt;
&lt;br /&gt;
The big issue isn&#039;t marketing and read-only content. The big issue is editing. That&#039;s what is breaking.&lt;br /&gt;
&lt;br /&gt;
When you&#039;re logged into the wiki, it bypasses the varnish cache. So, even if the wiki appears down to you, the contents of (most) articles viewed in the past 24 hours will be still visible to potential apprenticeship applicants.&lt;br /&gt;
&lt;br /&gt;
The next time you see the websites are down, try loading it from another device where you&#039;re not logged-in. You&#039;ll probably see that the apprenticeship info is still accessible, even though the backend for the site is down.&lt;br /&gt;
&lt;br /&gt;
As a short-term hack, I recommend setting-up a daily reboot of the server. Backups typically finish before 10:10 UTC. I recommend we add a cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&lt;br /&gt;
The server seems to function for some time after a fresh reboot, and it caches pages for 24 hours. So the first time someone loads a page in the wiki after that reboot, it&#039;ll be cached for the entire time that the server is online until its next reboot. I think this will ensure higher availability of your read-only content (eg information about the apprenticeship).&lt;br /&gt;
&lt;br /&gt;
Would you like me to setup a daily reboot at 10:40 UTC on hetzner2? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# I checked-in on the RAID replication status; it&#039;s finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [===================&amp;gt;.]  recovery = 96.5% (202794752/209984640) finish=2.5min speed=46324K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:20:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	 	 523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like I started it just after 13:32 and it finished just before 15:20. So it took just under 2 hours. Great!&lt;br /&gt;
# I updated the article with status updates, marking the CHG as completed successfully https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb#2025-04-24_16:18_UTC&lt;br /&gt;
# And I sent an email to Marcin &amp;amp; Catarana to let them know it was successful, and asked again about buying a new drive for replacing /dev/sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Update: your new (used) disk is now fully synced with the old (failing) disk.&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
According to SMART data, you now have one failing disk and one not-failing disk.&lt;br /&gt;
&lt;br /&gt;
Your hetzner2 RAID is now healthy, and you have redundancy spread across two mirrored disks again.&lt;br /&gt;
&lt;br /&gt;
Next week I&#039;d like to replace the other failing disk. Please let me know if you approve the purchase of a new disk for its replacement. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin got back to me, approving the purchase of the new disk; I updated the ticket https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# Note that the price is listed as &amp;quot;at cost&amp;quot; and it says&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# 1,000 hours is fine. That&#039;s compared to the 78,516 hours of /dev/sda and 52,083 hours of our &amp;quot;new&amp;quot; /dev/sdb&lt;br /&gt;
# but it&#039;s a bit concerning that it says it might not be in-stock. I&#039;m going to message them and ask if they can set one aside for us for next week&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hi Support,&lt;br /&gt;
&lt;br /&gt;
Can you set-aside a replacement disk for this server?&lt;br /&gt;
&lt;br /&gt;
Our disks&#039; SMART logs indicated that both disks should be replaced. Today we replaced one of the two disks, but the disk that you replaced it with has 4% of its life left, according to SMART data (it has 52,083 hours of operation).&lt;br /&gt;
&lt;br /&gt;
Next week we would like to replace the other disk, and this time we&#039;d like your &amp;quot;at cost&amp;quot; option, to get a disk with &amp;lt;1,000 hours of operation.&lt;br /&gt;
&lt;br /&gt;
But I was a bit concerned when I read this next to the WUI option for &amp;quot;at cost&amp;quot; on your website&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&lt;br /&gt;
Specifically what worries me is the &amp;quot;may not be in stock&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
Can you please tell us if you have stock now? And if you do, can you please reserve one disk for us for next week?&lt;br /&gt;
&lt;br /&gt;
We don&#039;t want to remove a disk from our RAID and plan for downtime, only to discover that you don&#039;t have a disk available for us..&lt;br /&gt;
&lt;br /&gt;
Please let us know if you can reserve 1 disk for us for next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I asked Marcin if Wed next week at 11:00 UTC is ok for replacing hetzner2&#039;s sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
   * 13:00 in Germany (where the server lives)&lt;br /&gt;
   * 06:00 here in Ecuador, and&lt;br /&gt;
   * 06:00 at FeF&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime,&lt;br /&gt;
please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
agreeable to you, and if you have any questions.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin returned the email confirming the time&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin got back to me and told me to setup the daily reboot cron on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, please set up reboot. That is decent for now&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 11:08 AM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;  &amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt;  &amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Of course I agree it&#039;s not good, and we should migrate away from&lt;br /&gt;
&amp;gt; hetzner2 asap. And I do wish I had more bandwidth to finish the&lt;br /&gt;
&amp;gt; migration faster for you.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; But you have a varnish cache that caches pages for 24 hours. Even if&lt;br /&gt;
&amp;gt; your backend webserver and database are down, popular pages (like the&lt;br /&gt;
&amp;gt; frontpage of your wiki or a recent article that you&#039;ve recently&lt;br /&gt;
&amp;gt; promoted) should still load for users.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The big issue isn&#039;t marketing and read-only content. The big issue is&lt;br /&gt;
&amp;gt; editing. That&#039;s what is breaking.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When you&#039;re logged into the wiki, it bypasses the varnish cache. So,&lt;br /&gt;
&amp;gt; even if the wiki appears down to you, the contents of (most) articles&lt;br /&gt;
&amp;gt; viewed in the past 24 hours will be still visible to potential&lt;br /&gt;
&amp;gt; apprenticeship applicants.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The next time you see the websites are down, try loading it from another&lt;br /&gt;
&amp;gt; device where you&#039;re not logged-in. You&#039;ll probably see that the&lt;br /&gt;
&amp;gt; apprenticeship info is still accessible, even though the backend for the&lt;br /&gt;
&amp;gt; site is down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; As a short-term hack, I recommend setting-up a daily reboot of the&lt;br /&gt;
&amp;gt; server. Backups typically finish before 10:10 UTC. I recommend we add a&lt;br /&gt;
&amp;gt; cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The server seems to function for some time after a fresh reboot, and it&lt;br /&gt;
&amp;gt; caches pages for 24 hours. So the first time someone loads a page in the&lt;br /&gt;
&amp;gt; wiki after that reboot, it&#039;ll be cached for the entire time that the&lt;br /&gt;
&amp;gt; server is online until its next reboot. I think this will ensure higher&lt;br /&gt;
&amp;gt; availability of your read-only content (eg information about the&lt;br /&gt;
&amp;gt; apprenticeship).&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you like me to setup a daily reboot at 10:40 UTC on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we don&#039;t have ansible for hetzner2; I did this manually&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology cron.d]# pwd&lt;br /&gt;
/etc/cron.d&lt;br /&gt;
[root@opensourceecology cron.d]# ls -lah&lt;br /&gt;
total 52K&lt;br /&gt;
drwxr-xr-x.   2 root root 4.0K Apr 24 17:56 .&lt;br /&gt;
drwxr-xr-x. 105 root root  12K Apr 18 21:52 ..&lt;br /&gt;
-rw-r--r--    1 root root  128 May 16  2023 0hourly&lt;br /&gt;
-rw-r--r--    1 root root 1.3K Apr  9  2019 awstats_generate_static_files&lt;br /&gt;
-rw-r--r--    1 root root  151 Apr 24 17:52 backup_to_backblaze&lt;br /&gt;
-rw-r--r--    1 root root   78 May 31  2024 cacti&lt;br /&gt;
-rw-r--r--    1 root root  125 Dec 11 00:16 letsencrypt&lt;br /&gt;
-rw-r--r--    1 root root  506 Mar 18  2019 phplist&lt;br /&gt;
-rw-r--r--    1 root root  108 Jan  7  2022 raid-check&lt;br /&gt;
-rw-r--r--    1 root root  118 Apr 24 17:56 reboot&lt;br /&gt;
-rw-------    1 root root  235 Dec 15  2022 sysstat&lt;br /&gt;
[root@opensourceecology cron.d]# cat reboot &lt;br /&gt;
# 2025-04-24: temp hack for unstable hetzner2 while we build-out hetzner3 to replace it&lt;br /&gt;
40 10 * * * root /sbin/reboot&lt;br /&gt;
[root@opensourceecology cron.d]# &lt;br /&gt;
# tomorrow morning I should check on the uptime and journalctl to make sure it rebooted sometime around 10:40 UTC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# ok, back to hetzner3: we bought a second IPv4 address for the static sites, but the server&#039;s networking was never setup for it; let&#039;s add that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # cp interfaces interfaces.20250424&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that failed.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
You have mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
I restored the backup file, and it still failed. The journal and status aren&#039;t helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl status networking&lt;br /&gt;
× networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: failed (Result: exit-code) since Thu 2025-04-24 17:18:55 UTC; 52s ago&lt;br /&gt;
   Duration: 2month 1w 20h 39min 50.765s&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 3259336 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)&lt;br /&gt;
	Process: 3259371 ExecStopPost=/usr/bin/touch /run/network/restart-hotplug (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 3259336 (code=exited, status=1/FAILURE)&lt;br /&gt;
		CPU: 29ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
root@hetzner3 ~ # journalctl -u networking | tail&lt;br /&gt;
Apr 24 17:16:36 hetzner3 ifup[3258504]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I run the ExecStart command manaully, I can add a verbose tag. but that&#039;s not especially helpful, either&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ifup --verbose -a --read-environment&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
&lt;br /&gt;
ifup: configuring interface enp0s31f6=enp0s31f6 (inet)&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
ip addr add 144.76.164.201/255.255.255.224 broadcast 144.76.164.223       dev enp0s31f6 label enp0s31f6&lt;br /&gt;
RTNETLINK answers: File exists&lt;br /&gt;
ifup: failed to bring up enp0s31f6&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/000resolvconf&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/ethtool&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/postfix&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/resolved&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# curiously, though, the new IPv4 address is listed in `ip a`&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to give this server a reboot before proceeding, to make sure the IP config is sticky&lt;br /&gt;
# when it came-up, it lost the new IP :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, at least it&#039;s restarting now without errors; I can work with that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # systemctl restart networking&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /etc/network # systemctlstatus networking&lt;br /&gt;
-bash: systemctlstatus: command not found&lt;br /&gt;
root@hetzner3 /etc/network # systemctl status networking&lt;br /&gt;
● networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (exited) since Thu 2025-04-24 17:33:40 UTC; 15s ago&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 8598 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)&lt;br /&gt;
	Process: 9022 ExecStart=/bin/sh -c if [ -f /run/network/restart-hotplug ]; then /sbin/ifup -a --read-environment --allow=hotplug; fi (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 9022 (code=exited, status=0/SUCCESS)&lt;br /&gt;
		CPU: 357ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:33:34 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:33:39 hetzner3 ifup[8663]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 ifup[8907]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 systemd[1]: Finished networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s try to add it now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces interfaces.20250424 &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,23&lt;br /&gt;
&amp;gt; iface enp0s31f6 inet static&lt;br /&gt;
&amp;gt;   address 144.76.164.195&lt;br /&gt;
&amp;gt;   netmask 255.255.255.224&lt;br /&gt;
&amp;gt;   gateway 144.76.164.193&lt;br /&gt;
&amp;gt;   # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
&amp;gt;   #up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&amp;gt; &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, but I have errors again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# curiously, it *did* add the new IP address; wtf&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet isn&#039;t very helpful because it seems the damn format has changed so many times over the years; lots of outdated info&lt;br /&gt;
# lots of people say they fixed this by deleting everything in interfaces.d/, but we don&#039;t have anything in that folder&lt;br /&gt;
# I did find this hetzner-specific docs on adding a second IP; it&#039;s totally different than what I&#039;ve read elsewhere https://docs.hetzner.com/robot/dedicated-server/network/net-config-debian-ubuntu&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
up ip addr add 10.4.2.1/32 dev eth0&lt;br /&gt;
down ip addr del 10.4.2.1/32 dev eth0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried this, and gave the server a reboot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,20&lt;br /&gt;
&amp;gt;   # 2025-04-24: add second IPv4 address&lt;br /&gt;
&amp;gt;   up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;   down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # cat interfaces&lt;br /&gt;
### Hetzner Online GmbH installimage&lt;br /&gt;
&lt;br /&gt;
source /etc/network/interfaces.d/*&lt;br /&gt;
&lt;br /&gt;
auto lo&lt;br /&gt;
iface lo inet loopback&lt;br /&gt;
iface lo inet6 loopback&lt;br /&gt;
&lt;br /&gt;
auto enp0s31f6&lt;br /&gt;
iface enp0s31f6 inet static&lt;br /&gt;
  address 144.76.164.201&lt;br /&gt;
  netmask 255.255.255.224&lt;br /&gt;
  gateway 144.76.164.193&lt;br /&gt;
  # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
  up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
  # 2025-04-24: add second IPv4 address&lt;br /&gt;
  up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
  down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::2&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::3&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the system came-up with the IP I want. Cool!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I&#039;m able to restart the service without it yelling at me (or breaking the IP config)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also able to ping the server on both IPs, which is a good sign&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ ping 144.76.164.201&lt;br /&gt;
PING 144.76.164.201 (144.76.164.201) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=1 ttl=50 time=490 ms&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=2 ttl=50 time=490 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.201 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1000ms&lt;br /&gt;
rtt min/avg/max/mdev = 489.558/489.676/489.795/0.118 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
user@disp9871:~$ ping 144.76.164.195&lt;br /&gt;
PING 144.76.164.195 (144.76.164.195) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=1 ttl=50 time=493 ms&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=2 ttl=50 time=512 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.195 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1001ms&lt;br /&gt;
rtt min/avg/max/mdev = 492.853/502.518/512.184/9.665 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I used netcat to test it. Most ports are closed, and I found that nginx is listening on most of the other ports on all IPs – except 4443&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # nc -s 144.76.164.195 -l -p 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and this was how it looked on my laptop&#039;s side&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ nc 144.76.164.195 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so the server&#039;s new IPv4 address is configured (and persistent between reboots)&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 20, 2025=&lt;br /&gt;
# Marcin replied to my email authorizing the replacement of the /dev/sdb disk on hetzner2 at 2025-04-24 10:00 UTC https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
## I updated the article with the defined date &amp;amp; time&lt;br /&gt;
# ...&lt;br /&gt;
# I also checked hetzner3. I see that I setup email alerts for the RAID, but not for SMART.&lt;br /&gt;
## on hetzner2, we had no errors of the RAID, but we did have SMART errors. I guess eventually if it failed enough that RAID replication was breaking, we would have gotten alerts. But it would be good if we could get alerts *before* that happened..&lt;br /&gt;
# I checked munin on hetzner2 to see what data it collects for monitoring disks @ /disk-day.html&lt;br /&gt;
## looks like we have latency, throughput, usage, utilization, i/o, and inode usage. There&#039;s nothing about &amp;quot;SMART errors&amp;quot;&lt;br /&gt;
# looks like there *is* a smart module for munin https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
# it&#039;s already there on hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Mar 21  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 smart_&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# hetzner2 has it too &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology munin]# ls -lah /usr/share/munin/plugins | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Nov  6  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Nov  6  2023 smart_&lt;br /&gt;
[root@opensourceecology munin]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# crap, I just checked hetzner3&#039;s munin, and I realized that varnish is missing :(&lt;br /&gt;
# it looks like ansible *has* pushed-out the script and plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /usr/share/munin/plugins/ | grep -i varnish&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 varnish_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Feb 12 00:14 varnish5_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 28  2024 varnish5_.175431.2025-02-12@00:16:02~&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 25  2024 varnish5_.20240928.orig&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /etc/munin/plugins/ | grep -i varnish&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_backend_traffic -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_bad -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_expunge -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_hit_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_main_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_memory_usage -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_mgt_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_objects -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_request_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_threads -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_transfer_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Feb 12 00:16 varnish_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
 # I did a diff of the varnish5_ script from my server and ose&#039;s server, and I found 2 new lines at the top of the hetzner3 server&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@mail:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
#&lt;br /&gt;
# This program is free software; you can redistribute it and/or modify&lt;br /&gt;
maltfield@mail:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
# Ansible managed&lt;br /&gt;
&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so basically the issue appears to be that my &amp;quot;ansible managed&amp;quot; comment comes before the shebang, so varnish is interpreting everything as shell, instead of perl&lt;br /&gt;
# we can see the result of all these syntax errors with a test run too&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@mail:/etc/munin# munin-run varnish_hit_rate&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
client_req.value 704255&lt;br /&gt;
cache_miss.value 202581&lt;br /&gt;
cache_hitmiss.value 2181&lt;br /&gt;
cache_hit.value 499493&lt;br /&gt;
root@mail:/etc/munin#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 26: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 28: varnish5_: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 30: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 32: Varnish: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 34: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 36: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 38: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 39: [varnish5_*]: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 40: group: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 41: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 42: env.name: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 44: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 108: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 111: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 114: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 117: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 119: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 123: Syntax error: &amp;quot;(&amp;quot; unexpected&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I moved the &amp;quot;ansible managed&amp;quot; comment below the shebang in ansible, and pushed it out; now it works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
client_req.value 10714&lt;br /&gt;
cache_hitmiss.value 9&lt;br /&gt;
cache_hit.value 6478&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
cache_miss.value 4227&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also pushed-out smart at the same time, but it&#039;s not working&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_ suggest&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the docs page for the smart_ munin plugin says that we need this section at-minimum in the munin config file, so I added it to hetzner2 https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# tail -n4 zzz-ose &lt;br /&gt;
&lt;br /&gt;
[smart_*]&lt;br /&gt;
user root&lt;br /&gt;
group disk&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I manually created the symlinks for sda &amp;amp; sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /etc/munin/plugins&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sda&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sdb&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# sweet, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# munin-run smart_sdb&lt;br /&gt;
Program_Fail_Count.value 100&lt;br /&gt;
Reallocated_Event_Count.value 100&lt;br /&gt;
Ave_Block_Erase_Count.value 001&lt;br /&gt;
Reallocate_NAND_Blk_Cnt.value 100&lt;br /&gt;
Erase_Fail_Count.value 100&lt;br /&gt;
Reported_Uncorrect.value 100&lt;br /&gt;
SATA_Interfac_Downshift.value 100&lt;br /&gt;
Offline_Uncorrectable.value 100&lt;br /&gt;
smartctl_exit_status.value 8&lt;br /&gt;
Write_Error_Rate.value 100&lt;br /&gt;
FTL_Program_Page_Count.value 100&lt;br /&gt;
Current_Pending_Sector.value 100&lt;br /&gt;
Success_RAIN_Recov_Cnt.value 100&lt;br /&gt;
UDMA_CRC_Error_Count.value 100&lt;br /&gt;
Error_Correction_Count.value 100&lt;br /&gt;
Temperature_Celsius.value 064&lt;br /&gt;
Raw_Read_Error_Rate.value 100&lt;br /&gt;
Total_Host_Sector_Write.value 100&lt;br /&gt;
Power_Cycle_Count.value 100&lt;br /&gt;
Power_On_Hours.value 100&lt;br /&gt;
Host_Program_Page_Count.value 100&lt;br /&gt;
Unused_Reserve_NAND_Blk.value 000&lt;br /&gt;
Percent_Lifetime_Remain.value 000&lt;br /&gt;
Unexpect_Power_Loss_Ct.value 100&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, I&#039;m not getting the same results on hetzner3. I wonder if this munin plugin doesn&#039;t support nvme drives?&lt;br /&gt;
# oh, it looks like I&#039;m actually not updating that file anymore in ansible, because it has a backup. I&#039;m going to make a note in ansible so I don&#039;t make that mistake again.&lt;br /&gt;
# meanwhile, I manually updated the config file on hetzner3 too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin # cd plugin-conf.d/&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls&lt;br /&gt;
dhcpd3  munin-node  README  spamstats  zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # touch /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chown root:root /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chmod 0600 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # cp zzz-myconf /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls -lah /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
-rw------- 1 root root 1,7K Apr 20 17:29 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # vim zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # diff /var/tmp/munin-zzz-myconf.20250420 /etc/munin/plugin-conf.d/zzz-myconf &lt;br /&gt;
3c3&lt;br /&gt;
&amp;lt; # Version: 0.2&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Version: 0.3&lt;br /&gt;
9c9&lt;br /&gt;
&amp;lt; # Updated: 2024-12-12&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Updated: 2025-04-20&lt;br /&gt;
31a32,35&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; [smart_*]&lt;br /&gt;
&amp;gt; user root&lt;br /&gt;
&amp;gt; group disk&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that still fails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but, if I restart the service first and then run it, it – uhh – kinda works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # service munin-node restart&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it exits with a non-error, just a U. no further stats. huh.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
smartctl_exit_status.value U&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, it looks like the smart_ plugin doesn&#039;t work for nvme drives :(&lt;br /&gt;
## https://github.com/munin-monitoring/munin/issues/790&lt;br /&gt;
## https://github.com/aranemac/munin-smart-nvme&lt;br /&gt;
# I&#039;m not looking to compile some binary. I think we&#039;ve reached the point of diminished return here&lt;br /&gt;
# while historical smart charts would be great, what I really want to achieve is some email alerts from SMART, like we setup for the RAID&lt;br /&gt;
# I found a few guides about this&lt;br /&gt;
## https://linuxconfig.org/how-to-configure-smartd-and-be-notified-of-hard-disk-problems-via-email&lt;br /&gt;
## https://serverfault.com/questions/426761/is-smartd-properly-configured-to-send-alerts-by-email&lt;br /&gt;
## https://unix.stackexchange.com/questions/662633/best-practices-to-enable-smart-disk-notifications-on-a-linux-workstation&lt;br /&gt;
# I replaced the files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # mv /etc/smartd.conf /etc/smartd.conf.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).orig&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc # echo &amp;quot;DEVICESCAN -d removable -n standby -m REDACTED@opensourceecology.org -M exec /usr/share/smartmontools/smartd-runner&amp;quot; &amp;gt; /etc/smartd.conf&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but that didn&#039;t work; no email came when I restarted the service (even if I added -M test)&lt;br /&gt;
# I checked the status in systemd, and it says that it did try to send the mail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # systemctl status smartd&lt;br /&gt;
● smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/smartmontools.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (running) since Sun 2025-04-20 20:58:57 UTC; 3min 22s ago&lt;br /&gt;
	   Docs: man:smartd(8)&lt;br /&gt;
			 man:smartd.conf(5)&lt;br /&gt;
   Main PID: 1466569 (smartd)&lt;br /&gt;
	 Status: &amp;quot;Next check of 2 devices will start at 21:28:57&amp;quot;&lt;br /&gt;
	  Tasks: 1 (limit: 76834)&lt;br /&gt;
	 Memory: 1.2M&lt;br /&gt;
		CPU: 66ms&lt;br /&gt;
	 CGroup: /system.slice/smartmontools.service&lt;br /&gt;
			 └─1466569 /usr/sbin/smartd -n&lt;br /&gt;
&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, is SMART capable. Adding to &amp;quot;monitor&amp;quot; list.&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state read from /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme0n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NX0M104566-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 systemd[1]: Started smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon.&lt;br /&gt;
root@hetzner3 /etc #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I checked the postfix logs, and it looks like google is rejecting our mail?!?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # journalctl -fu postfix@-&lt;br /&gt;
...&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: Untrusted TLS connection established to aspmx.l.google.com[108.177.15.27]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bit&lt;br /&gt;
s) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: CB6E5B94BB2: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[108.177.15.27]:25, delay=1.2, delays=0.01/0.01/0.86/0.27, dsn=2.0.0, status=sent (250 2.0.0 OK  1745183017 ffacd0b85a97d-39efa5a45b6si4251829f8f.798 - gsmtp)&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/qmgr[4510]: CB6E5B94BB2: removed&lt;br /&gt;
Apr 20 21:04:36 hetzner3 postfix/smtp[1468114]: Untrusted TLS connection established to aspmx.l.google.com[2404:6800:4003:c02::1b]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/bounce socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/bounce socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/defer socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/defer socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: D13CAB94BB3: defer service failure&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: D13CAB94BB3: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[2404:6800:4003:c02::1b]:25, delay=4.5, delays=0.01/0.01/3.5/1, dsn=4.3.0, status=deferred (bounce or trace service failure)&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I changed it to my personal email, restarted, and I got two emails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme1&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NA0M345614, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme0&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NX0M104566, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I changed it back to the google groups email list email address, and I updated the wiki https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
# after lunch, I refreshed munin on hetzne2 and hetzner3, to see if smart info was not being charted&lt;br /&gt;
## on hetzner2, there&#039;s no changes. I don&#039;t see any charts related to SMART&lt;br /&gt;
## on hetzner3, there&#039;s two new charts (S.M.A.R.T values for drive nvme0n1 &amp;amp; S.M.A.R.T values for drive nvme1n1), but they&#039;re both empty; it only has 1 value (smartctl_exit_status), and it&#039;s &amp;quot;nan&amp;quot; for all time charts. This is expected, since it can&#039;t read the nvme smartctl output format.&lt;br /&gt;
# I think maybe I forgot to restart munin on hetzner2, so I gave that a try&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# service munin-node restart&lt;br /&gt;
Redirecting to /bin/systemctl restart munin-node.service&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# sudo -u munin /usr/bin/munin-cron&lt;br /&gt;
2025/04/20 21:29:38 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 55.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 56.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:52 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# whatever; I guess no munin logs on SMART for this dying server&lt;br /&gt;
# I also confirmed that varnish logs are now visible in munin&lt;br /&gt;
# I committed my ansible changes https://github.com/OpenSourceEcology/ansible/commit/2fb906fd62cf0773d84f50f1cf113ddfe66910ec&lt;br /&gt;
# anyway, I also updated smartd.conf on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology smartmontools]# cp smartd.conf smartd.conf.20250420.bak&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# vim smartd.conf&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# diff smartd.conf.20250420.bak smartd.conf&lt;br /&gt;
23c23,24&lt;br /&gt;
&amp;lt; DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; #DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
&amp;gt; DEVICESCAN -H -m REDACTED@opensourceecology.org -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
[root@opensourceecology smartmontools]# systemctl restart smartd&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh wow, that screaming about the disks failing wasn&#039;t just printed to my tty; it got printed to every tty on my screen session. It really is angry..&lt;br /&gt;
# but, alas, no email was sent – even from hetzner2. where email should *definitely* be working&lt;br /&gt;
# this time the postfix logs on hetzner2 gave us an error from gmail saying why they&#039;re blocking us&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21221]: 297716847E6: host aspmx.l.google.com[64.233.167.27] said: 421-4.7.28 Gmail has detected an unusual rate of unso&lt;br /&gt;
licited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.go&lt;br /&gt;
ogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42a931si4417083f8f.167 - gsmtp (in reply to end &lt;br /&gt;
of DATA command)&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21094]: 3CBF7684804: host aspmx.l.google.com[142.251.168.27] said: 421-4.7.28 Gmail has detected an unusual rate of uns&lt;br /&gt;
olicited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.g&lt;br /&gt;
oogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42967csi4306047f8f.165 - gsmtp (in reply to end&lt;br /&gt;
 of DATA command)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# marcin sent an email campaign today with phpList. If that didn&#039;t make it out due to this, that&#039;s kinda  problem.&lt;br /&gt;
# I see in the log that we&#039;re kinda spamming phplist_bounces@opensourceecology.org&lt;br /&gt;
# that&#039;s basically where phplist is supposed to let our admins know that it failed to deliver to some people on the mailing list&lt;br /&gt;
## I confirmed that this account *does* exist in the gsuite admin wui user list&lt;br /&gt;
# yeah, crap, it&#039;s blocking other mail sent to my personal account from apache.&lt;br /&gt;
# woah, I&#039;m tailing the mail log and I just got probably hundereds or thousands of emails tried to be sent. phpList is *supposed* to do it in small batches, but I wonder if, once it fails and gets added to the queue, it&#039;ll do the re-send without batching it..&lt;br /&gt;
# I checked phpList wui settings and config.php, and I don&#039;t see anything about rate-limiting&lt;br /&gt;
# here&#039;s the docs on it https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
# it says it should be set in config.php. By default, I think it&#039;s 5,000 emails per hour&lt;br /&gt;
# Marcin&#039;s campaign today was sent to 14,111 people&lt;br /&gt;
# I checked the event log page, and I see a lot of these &amp;quot;Maximum time for queue processing: 99999&amp;quot; – which I guess means we need to break these up into batches https://phplist.opensourceecology.org/lists/admin/?page=eventlog&lt;br /&gt;
# looks like the easiest thing to do is to add a pause with MAILQUEUE_THROTTLE https://discuss.phplist.org/t/some-advice-for-correct-configuration-of-sending-rate/429&lt;br /&gt;
# if we send one per second, then we&#039;ll send 3,600 per hour.&lt;br /&gt;
## If we have 15,000 people on our list, then at that rate we&#039;d need 4-5 hours to send a campaign. That sounds like a good idea.&lt;br /&gt;
# I updated the phpList config file to send only one email per second&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# diff config.20250420.php config.php &lt;br /&gt;
83a84,87&lt;br /&gt;
&amp;gt; // only send 1 email per second&lt;br /&gt;
&amp;gt; //  * https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
&amp;gt; define(&#039;MAILQUEUE_THROTTLE&#039;,1);&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we should also probably throttle postfix https://serverfault.com/questions/110919/postfix-throttling-for-outgoing-messages&lt;br /&gt;
# looks like for both hetzner2 and hetzner3, this is set to no delay&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 0s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I set this on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology postfix]# diff main.cf.20250420 main.cf&lt;br /&gt;
683a684,686&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # limit emails to the same-destination-domain to one-email-per-2-seconds&lt;br /&gt;
&amp;gt; default_destination_rate_delay = 2s&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# systemctl restart postfix&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 2s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I also added this to ansible and pushed it out to the server on hetnzer3 https://github.com/OpenSourceEcology/ansible/commit/7ed339cad055a9a0c5b04f26d32c9416daf3a2c7&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 19, 2025=&lt;br /&gt;
&lt;br /&gt;
# I responded to Tom&#039;s email about ssh&lt;br /&gt;
# Tom wasn&#039;t able to reset their account&#039;s password&lt;br /&gt;
# I think I created these accounts with `--disabled-password`, probably as some layered security for ssh (to force keys), but that kinda breaks sudo, which requires the password. I could make sudo NOPASSWD, but I think it&#039;s safer to have a user password set (and have ssh disabled passoword logins still) rather than set sudoers to NOPASSWD, in general&lt;br /&gt;
# disabled passwords are set with the &#039;!&#039; in the second field of /etc/shadown&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing:!:20133:0:99999:7:::&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I just manually edited /etc/shadow with vim to remove the exclimation point&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # vim /etc/shadow&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing::20133:0:99999:7:::&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom replied, saying he can become root on hetzner3 now.&lt;br /&gt;
# ...&lt;br /&gt;
# I returned to work on the plan for replacing the disks on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb#Change_Steps&lt;br /&gt;
# I confirmed that the disks (on both hetzner2 and hetzner3) are MBR partition scheme (not GPT) – indicated by &amp;quot;Disk label type: dos&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sda&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0x9b8e1266&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sda1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sdb&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0xd904fc05&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sdb1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# A quick spot-check shows that our backups usually finish at 09:55 – one time as late as 10:07. That&#039;s UTC.&lt;br /&gt;
# 10:00 UTC is 05:00 my time and 12:00 in Berlin. God that&#039;s early, but better to do this early in Germany time..&lt;br /&gt;
# I sent an email to Marcin asking if Thr 2025-04-24 @ 10:00 UTC (~05:00 FeF) would be a good time to do this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the first disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
Our backups finish daily at 10:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
 * 12:00 in Germany (where the server lives)&lt;br /&gt;
 * 05:00 here in Ecuador, and&lt;br /&gt;
 * 05:00 at FeF&lt;br /&gt;
&lt;br /&gt;
I propose next week on Thursday 2025-04-24 10:00 UTC.&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime, please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 18, 2025=&lt;br /&gt;
# Marcin sent another email this morning asking why osemain is down too now, and I responded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the&lt;br /&gt;
&amp;gt; last message&lt;br /&gt;
&lt;br /&gt;
Your whole database service was down, and it won&#039;t start. You have a varnish cache that stores a subset of pages in-memory for 24 hours. That&#039;s probably what you saw.&lt;br /&gt;
&lt;br /&gt;
I took webservers down yesterday to prevent the possibility of them corrupting the database worse, if it manages to start in recovery mode.&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt; go straight to migration to Hetzner 3.&lt;br /&gt;
&lt;br /&gt;
If you want high uptime, I don&#039;t recommend migrating to hetzner3 at this time. It&#039;s still not fully provisioned, and I actively work on it like a dev server. Which means I&#039;ll be restarting it and its services. It&#039;s not a safe place for production. That&#039;s why the wiki is the *last* service to migrate.&lt;br /&gt;
&lt;br /&gt;
Status update: yesterday I investigated to see if your underlying storage (disk, filesystem, or RAID) are failing, which might cause corruption. The filesystems were fine. RAID didn&#039;t have errors. The SMART logs on the disk said both of your two mirrored drives are failing and should be replaced within 24 hours. But I don&#039;t think that&#039;s evidence of corruption; I think it&#039;s just a timer that&#039;s alerting us to the possibility that the disks will fail soon. afaict, disk replacement is free (from Hetzner) but not trivial and high-risk. I&#039;ll postpone until after restoring the database.&lt;br /&gt;
&lt;br /&gt;
Likely not all of your database is corrupt. We *could* restore from backup, but I don&#039;t recommend that -- as you only have daily backups, and likely you&#039;ll have data loss.&lt;br /&gt;
&lt;br /&gt;
Yesterday I put the database in two recovery modes and was unable to get it to start. My plan is to continue to follow this guide, to see if I can find out which databases/tables/pages are corrupt and which are not. That way we can restore only the data we need from backups and minimize data loss&lt;br /&gt;
&lt;br /&gt;
 * https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&lt;br /&gt;
I have to go to the hospital today. If I have time, I will try to continue later tonight. And I plan to work on this over the weekend. I hope to have your sites back online early next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cheers,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 02:58, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Michael,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the last message -&lt;br /&gt;
&amp;gt; but now I&#039;m trying to post the blog posts and the main site appears to be&lt;br /&gt;
&amp;gt; down. Is our whole backend crashing?  Or is that something you are doing on&lt;br /&gt;
&amp;gt; your end?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Marcin&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On Thu, Apr 17, 2025 at 6:41 PM Marcin Jakubowski &amp;lt;&lt;br /&gt;
&amp;gt; REDACTED@opensourceecology.org&amp;gt; wrote:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Can we prioritize the wiki at this point to migrate the wiki right over to&lt;br /&gt;
&amp;gt;&amp;gt; Hetzner 3 with the  current up to date software, using the wiki backup from&lt;br /&gt;
&amp;gt;&amp;gt; 2 days ago, which is before the crash?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; The wiki was working at least the first part of yesterday, and I noticed&lt;br /&gt;
&amp;gt;&amp;gt; the crash at about 11 PM CST yesterday. Thus taking the backup from 4/15/25&lt;br /&gt;
&amp;gt;&amp;gt; should solve this? Ie, forget about trying to fix on Hetzner 2, go straight&lt;br /&gt;
&amp;gt;&amp;gt; to migration to Hetzner 3. Is that consistent with a possible shift in your&lt;br /&gt;
&amp;gt;&amp;gt; plans, or does that throw off the entire process of migration? OSE stands&lt;br /&gt;
&amp;gt;&amp;gt; stuck without it, I will have to do everything in Google docs if I don&#039;t&lt;br /&gt;
&amp;gt;&amp;gt; have wiki access, and i am justvputtingvout the announcent and recruiting.&lt;br /&gt;
&amp;gt;&amp;gt; I can switcj ro more publishing on the website, assuming that all works.&lt;br /&gt;
&amp;gt;&amp;gt; Please tell me what would be your proposed solution and how quickly you&lt;br /&gt;
&amp;gt;&amp;gt; think we can get back up to a functioning wiki, based on your schedule of&lt;br /&gt;
&amp;gt;&amp;gt; availability to work on this, so I can plan accordingly.  This is a much&lt;br /&gt;
&amp;gt;&amp;gt; higher priority than doing any of the main website migration.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thanks,&lt;br /&gt;
&amp;gt;&amp;gt; Marcin &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so back to trying to figure out the corruption of the mariadb&lt;br /&gt;
# looks like the attempt to start it in recovery mode 2 fails after 10 minutes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because a fatal signal was delivered to the control process. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    10m0.435s&lt;br /&gt;
user    0m0.011s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and the tail of the db log&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/mariadb/mariadb.log&lt;br /&gt;
250417 23:06:00  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:01  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:02  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:03  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:05  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:06  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:07  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:08  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:09  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so we have one more recovery mode we can try before it becomes destructive = 3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 3&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and gave it a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# damn, looks like it&#039;s stuck on the same thing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:33:17 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:33:17 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 20076 ...&lt;br /&gt;
250418 19:33:17 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:33:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:33:17 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:33:17 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:33:17 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:33:17 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:33:17 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:33:17  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:33:17  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:33:18  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:19  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:20  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet suggests this infinite loop is caused by the default of innodb_purge_threads=1, and it says we should set this to 0&lt;br /&gt;
## https://serverfault.com/questions/851342/mysql-crashed-and-not-starting-even-after-adding-innodb-force-recovery&lt;br /&gt;
## https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# I tried to cut off the systemctl restart early, but it&#039;s just stuck. I guess I just have to wait 10 minutes.&lt;br /&gt;
# anyway, I set the recovery back down to 2 and added the purge threads to 0 line; I&#039;ll try that when it&#039;s not blocked&lt;br /&gt;
# meanwhile, I read up on innodb_purge_threads, which is documented here https://dev.mysql.com/doc/refman/8.4/en/innodb-purge-configuration.html&lt;br /&gt;
# oh shit, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m2.102s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
[root@opensourceecology etc]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 19:44:30 UTC; 19s ago&lt;br /&gt;
  Process: 22469 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 22433 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 22468 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─22468 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─22693 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-...&lt;br /&gt;
&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 19:44:30 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the logs are being spammed with these last 5 lines a bunch; I guess something is still trying to access the db?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:44:28 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 22693 ...&lt;br /&gt;
250418 19:44:28 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:44:28 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:44:28 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:44:28 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:44:28 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:44:28 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:44:28 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:44:28  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:44:28  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:44:28  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:44:29 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883505166&lt;br /&gt;
250418 19:44:29 InnoDB: !!! innodb_force_recovery is set to 2 !!!&lt;br /&gt;
250418 19:44:29 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 19:44:29 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 19:44:29 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, the spam stopped. maybe just some startup thing.&lt;br /&gt;
# I was hoping at startup it would tell us which DBs/tables/pages were corrupt; I guess we have to initiate a scan or something.&lt;br /&gt;
# this guide doesn&#039;t say anything about that https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
# but this one recommends running `mysqlcheck` https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# this took about a minute to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# good news; looks like the wiki isn&#039;t fucked. it&#039;s just osemain, oswh, and cacti. restoring those from backups is probably not going to cause any data loss&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@opensourceecology dbFail.20250417]# head mysqlcheck.20250418.log &lt;br /&gt;
3dp_db.wp_commentmeta                              OK&lt;br /&gt;
3dp_db.wp_comments                                 OK&lt;br /&gt;
3dp_db.wp_links                                    OK&lt;br /&gt;
3dp_db.wp_masterslider_options                     OK&lt;br /&gt;
3dp_db.wp_masterslider_sliders                     OK&lt;br /&gt;
3dp_db.wp_options                                  OK&lt;br /&gt;
3dp_db.wp_postmeta                                 OK&lt;br /&gt;
3dp_db.wp_posts                                    OK&lt;br /&gt;
3dp_db.wp_revslider_css                            OK&lt;br /&gt;
3dp_db.wp_revslider_layer_animations               OK&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418.log &lt;br /&gt;
cacti_db.automation_ips&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.automation_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_cache&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_last&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output_boost_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
osemain_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
osemain_s_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
oswh_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s go ahead and take a mysqldump now, including the corrupt data. then I&#039;ll drop these three databases and restore from backups&lt;br /&gt;
## cacti_db&lt;br /&gt;
## osemain_db&lt;br /&gt;
## oswh_db&lt;br /&gt;
# I sent Marcin a status update email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I was able to start your database in recovery mode, and I see the following databases have corrupt tables:&lt;br /&gt;
&lt;br /&gt;
1. osemain&lt;br /&gt;
2. cacti&lt;br /&gt;
3. oswh&lt;br /&gt;
&lt;br /&gt;
Good news that the wiki isn&#039;t in that list. And that those particular corrupt DBs don&#039;t change much, so recovering just those databases from backups should result in an acceptable data loss, if any.&lt;br /&gt;
&lt;br /&gt;
I&#039;ll keep you updated.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I made the post-corruption mysqldump backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass --all-databases | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    2m48.845s&lt;br /&gt;
user    3m19.170s&lt;br /&gt;
sys     0m2.023s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# ls mysqldump*&lt;br /&gt;
mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz &lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s drop those three databases.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 14&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE cacti_db;&lt;br /&gt;
Query OK, 108 rows affected (0.38 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_db;&lt;br /&gt;
Query OK, 22 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oswh_db;&lt;br /&gt;
Query OK, 12 rows affected (0.03 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| fef_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
15 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# recovery mode isn&#039;t going to let us INSERT to recover data from backups, so let&#039;s take it out of recovery mode and see if the db will start&lt;br /&gt;
# nah, it failed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m2.805s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# logs are the same, I think?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:10:04 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:10:04 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 24305 ...&lt;br /&gt;
250418 20:10:04 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:10:04 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:10:04 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:10:04 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:10:04 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:10:04 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:10:04 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:10:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:10:04  InnoDB: Assertion failure in thread 140076605044480 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:10:04 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x560180c61cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x560180875975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f664031f630]&lt;br /&gt;
:0(__GI_raise)[0x7f663ea46387]&lt;br /&gt;
:0(__GI_abort)[0x7f663ea47a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x560180a0a45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x560180a0afa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x560180b0d504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x560180b02487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x560180a0d17d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x560180a010f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6640317ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f663eb0eb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:10:04 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I re-enabled recovery mode, but this time just as 1. This time it did start, but this loop gets spammed to the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:11:42 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883708456&lt;br /&gt;
250418 20:11:42 InnoDB: !!! innodb_force_recovery is set to 1 !!!&lt;br /&gt;
250418 20:11:42 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:11:42 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
250418 20:11:42  InnoDB: Assertion failure in thread 140282494781184 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:11:42 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed, &lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to &lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x55e2d6dbbcad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x55e2d69cf975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f962fbdc630]&lt;br /&gt;
:0(__GI_raise)[0x7f962e303387]&lt;br /&gt;
:0(__GI_abort)[0x7f962e304a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x55e2d6b6445f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x55e2d6b64fa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x55e2d6c67504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x55e2d6c5c487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x55e2d6b6717d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62e83c)[0x55e2d6b5a83c]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f962fbd4ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f962e3cbb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:11:42 mysqld_safe Number of processes running now: 0&lt;br /&gt;
250418 20:11:42 mysqld_safe mysqld restarted&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 27371 ...&lt;br /&gt;
250418 20:11:42 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:11:42 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:11:42 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:11:42 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:11:42 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:11:42 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:11:42 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:11:42  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, even though it *says* it&#039;s started&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m5.156s&lt;br /&gt;
user    0m0.008s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 20:11:07 UTC; 13s ago&lt;br /&gt;
  Process: 24459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 24423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 24458 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─24458 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─25620 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/v...&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:11:07 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
&lt;br /&gt;
real    0m0.012s&lt;br /&gt;
user    0m0.001s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can&#039;t connect to it with mysqlcheck&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log                              &lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I set it back to recovery mode 2, restarted, and tried the mysqlcheck again&lt;br /&gt;
# huh, all lines say OK&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418&lt;br /&gt;
mysqlcheck.20250418_201348.log  mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log&lt;br /&gt;
&lt;br /&gt;
real    0m11.597s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.009s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418_201559.log &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well now I&#039;m wondering if I should have run CHECK TABLE and REPAIR TABLE rather than just DROP them https://dev.mysql.com/doc/refman/8.4/en/myisam-table-close.html&lt;br /&gt;
# I&#039;m going to restore from the backup and then see if I can do that&lt;br /&gt;
# oh, right, we can&#039;t INSERT in recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1030 (HY000) at line 91: Got error -1 from storage engine&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, fuck, now I don&#039;t know why it won&#039;t start. And it doesn&#039;t tell me why. The good news is that I was able to get a db dump. maybe I can copy this huge dump over to some other server for repair and then copy it back?&lt;br /&gt;
# we should have backups. I&#039;m going to just purge all the non-system databases and see if we can get this thing started at all&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db d3ddb;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;d3ddb&#039; at line 1&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE d3d_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE fef_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE microfactory_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_db;&lt;br /&gt;
Query OK, 21 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_stabing_db;&lt;br /&gt;
ERROR 1008 (HY000): Can&#039;t drop database &#039;obi_stabing_db&#039;; database doesn&#039;t exist&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oseforum_db;&lt;br /&gt;
Query OK, 35 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_s_db;&lt;br /&gt;
Query OK, 20 rows affected (0.04 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osewiki_db;&lt;br /&gt;
Query OK, 59 rows affected (0.31 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE phplist_db;&lt;br /&gt;
Query OK, 42 rows affected (0.16 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE seedhome_db;&lt;br /&gt;
Query OK, 12 rows affected (0.05 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE store_db;&lt;br /&gt;
Query OK, 36 rows affected (0.11 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_staging_db;&lt;br /&gt;
Query OK, 21 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# even after that, it still won&#039;t start :&#039;(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m4.863s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Fri 2025-04-18 20:34:47 UTC; 14s ago&lt;br /&gt;
  Process: 18459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 18458 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 18423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 18458 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I purge those three system-level DBs, I want to confirm they&#039;re in our backups&lt;br /&gt;
# as I feared, it looks like they&#039;re missing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zgrep -E &#039;CREATE DATABASE&#039; mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | grep &#039;IF NOT EXISTS&#039; | grep -E &#039;^.{,100}$&#039;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `3dp_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `cacti_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `d3d_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `fef_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `microfactory_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `mysql` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_staging_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oseforum_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_s_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osewiki_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oswh_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `phplist_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `seedhome_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `store_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# according to this, information_schema is essentially a cache that gets created &amp;amp; destroyed every time mysql is restarted, so we should be ok to loose that https://stackoverflow.com/questions/15306132/information-schema-error-when-restoring-database-dump&lt;br /&gt;
# I&#039;m just going to manually dump these three anyway. Or try to&lt;br /&gt;
# well, I was able to get one of the three to backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass information_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_information_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz &lt;br /&gt;
mysqldump: Got error: 1044: &amp;quot;Access denied for user &#039;root&#039;@&#039;localhost&#039; to database &#039;information_schema&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass mysql | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_mysql.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    0m0.142s&lt;br /&gt;
user    0m0.155s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass performance_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_performance_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
mysqldump: Got error: 1142: &amp;quot;SELECT,LOCK TABL command denied to user &#039;root&#039;@&#039;localhost&#039; for table &#039;cond_instances&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.009s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# mysql looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode*&lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_information_schema.20250418_205054.sql.gz&lt;br /&gt;
716K    mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_performance_schema.20250418_205157.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to move this whole db dir out of the way and see if we can start it fresh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /var/lib&lt;br /&gt;
[root@opensourceecology lib]# du -sh mysql/&lt;br /&gt;
6.5G    mysql/&lt;br /&gt;
[root@opensourceecology lib]# ls -lah | grep -i mysql&lt;br /&gt;
drwxr-xr-x   4 mysql   mysql   4.0K Apr 18 20:50 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mv mysql mysql.20250418&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mkdir mysql&lt;br /&gt;
[root@opensourceecology lib]# chown mysql:mysql mysql&lt;br /&gt;
[root@opensourceecology lib]# chmod 0755 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 8.0K&lt;br /&gt;
drwxr-xr-x   2 mysql mysql 4.0K Apr 18 20:55 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, it&#039;s started outside recovery mode now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m3.550s&lt;br /&gt;
user    0m0.007s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
250418 20:55:06 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
250418 20:56:23 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:56:23 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 21252 ...&lt;br /&gt;
250418 20:56:23 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:56:23 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:56:23 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:56:23 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:56:23 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:56:23 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
InnoDB: The first specified data file ./ibdata1 did not exist:&lt;br /&gt;
InnoDB: a new database to be created!&lt;br /&gt;
250418 20:56:23  InnoDB: Setting file ./ibdata1 size to 10 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile0 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile0 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile1 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile1 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
InnoDB: Doublewrite buffer not found: creating new&lt;br /&gt;
InnoDB: Doublewrite buffer created&lt;br /&gt;
InnoDB: 127 rollback segment(s) active.&lt;br /&gt;
InnoDB: Creating foreign key constraint system tables&lt;br /&gt;
InnoDB: Foreign key constraint system tables created&lt;br /&gt;
250418 20:56:23  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:56:24 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 0&lt;br /&gt;
250418 20:56:24 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:56:24 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:56:24 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it created all these files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 29M&lt;br /&gt;
drwxr-xr-x   5 mysql mysql 4.0K Apr 18 20:56 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
-rw-rw----   1 mysql mysql  16K Apr 18 20:56 aria_log.00000001&lt;br /&gt;
-rw-rw----   1 mysql mysql   52 Apr 18 20:56 aria_log_control&lt;br /&gt;
-rw-rw----   1 mysql mysql  18M Apr 18 20:56 ibdata1&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile0&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile1&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 mysql&lt;br /&gt;
srwxrwxrwx   1 mysql mysql    0 Apr 18 20:56 mysql.sock&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 performance_schema&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 test&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that also would have killed the mysql password; I can&#039;t login&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1045 (28000): Access denied for user &#039;root&#039;@&#039;localhost&#039; (using password: YES)&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I hacked my way in and set the root password&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysqld_safe --skip-grant-tables --skip-networking &amp;amp;&lt;br /&gt;
mysql -u root&lt;br /&gt;
use mysql;&lt;br /&gt;
update user set password=PASSWORD(&amp;quot;new-password&amp;quot;) where User=&#039;root&#039;;&lt;br /&gt;
flush privileges;&lt;br /&gt;
exit&lt;br /&gt;
jobs -l&lt;br /&gt;
# kill mysqld_safe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now I can see our three databases, plus one named test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# usually this is where I&#039;d run the mysql hardening script, but let&#039;s just drop test manually and restore from backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE test;&lt;br /&gt;
Query OK, 0 rows affected (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first let&#039;s just restore the &#039;mysql&#039; database&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that appears to have worked; our users are present now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [mysql]&amp;gt; select User from user limit 10;&lt;br /&gt;
+------------------+&lt;br /&gt;
| User             |&lt;br /&gt;
+------------------+&lt;br /&gt;
| oseforum_user    |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| 3dp_user         |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| d3d_user         |&lt;br /&gt;
| fef_user         |&lt;br /&gt;
| microfactory_usr |&lt;br /&gt;
| munin_user       |&lt;br /&gt;
| obi2_user        |&lt;br /&gt;
| obi3_user        |&lt;br /&gt;
+------------------+&lt;br /&gt;
10 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [mysql]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, and ensured it&#039;s still working. Great.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 2&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s restore the rest – including even our corrupt databases – and see if it works or breaks&lt;br /&gt;
# that took about 11.5 minutes to import ~6.8G of data&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
&lt;br /&gt;
real    11m36.530s&lt;br /&gt;
user    1m52.944s&lt;br /&gt;
sys     0m3.593s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh /var/lib/mysql&lt;br /&gt;
6.8G    /var/lib/mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m still able to connect, and now I see all our DBs – including the ones it said were corrupt&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 6&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# woah, I gave it a restart, and it came back fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 3&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess we fixed it with no data loss?&lt;br /&gt;
# let&#039;s bring up the web servers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology lib]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology lib]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the wiki loads now&lt;br /&gt;
# so does osemain&lt;br /&gt;
# I&#039;d say we&#039;re back in business&lt;br /&gt;
# I sent an email to Marcin&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I think all your sites are back now.&lt;br /&gt;
&lt;br /&gt;
I was able to restore all of your databases from a dump of the database in recovery mode. So nothing needed to be restored from backups.&lt;br /&gt;
&lt;br /&gt;
Please let me know if you see any issues. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now that Marcin has ssh access on the server again, I wonder if he has permission to execute `restart` – that would be better for him than logging into the hetzner wui and doing hard resets, which likely caused this corruption&lt;br /&gt;
# at the risk of taking everything down after I just told Marcin that everything is up, I&#039;m going to try it&lt;br /&gt;
# looks like it won&#039;t let him reboot if other users are logged-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ reboot&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
Please retry operation after closing inhibitors and logging out other users.&lt;br /&gt;
Alternatively, ignore inhibitors and users with &#039;systemctl reboot -i&#039;.&lt;br /&gt;
[marcin@opensourceecology ~]$ systemctl reboot -i&lt;br /&gt;
==== AUTHENTICATING FOR org.freedesktop.login1.reboot-multiple-sessions ===&lt;br /&gt;
Authentication is required for rebooting the system while other users are logged in.&lt;br /&gt;
Multiple identities can be used for authentication:&lt;br /&gt;
 1.  maltfield&lt;br /&gt;
 2.  crupp&lt;br /&gt;
 3.  Tom Griffing (tgriffing)&lt;br /&gt;
 4.  jthomas&lt;br /&gt;
Choose identity to authenticate as (1-4):&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the sudoers command to give marcin *just* access to the reboot command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# visudo&lt;br /&gt;
[root@opensourceecology lib]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology lib]# tail /etc/sudoers&lt;br /&gt;
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom&lt;br /&gt;
&lt;br /&gt;
## Allows members of the users group to shutdown this system&lt;br /&gt;
# %users  localhost=/sbin/shutdown -h now&lt;br /&gt;
&lt;br /&gt;
## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)&lt;br /&gt;
#includedir /etc/sudoers.d&lt;br /&gt;
&lt;br /&gt;
# let marcin reboot the machine gracefully&lt;br /&gt;
marcin ALL = NOPASSWD: /sbin/reboot&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I couldn&#039;t test this on the server without changing marcin&#039;s password, so I spun-up a quick DispVM to ensure it *only* gives him access to reboot&lt;br /&gt;
# it&#039;s debian, but sudoers syntax should (hopefully) be the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@debian-12-dvm:~$ sudo su -&lt;br /&gt;
root@debian-12-dvm:~# adduser marcin --disabled-password --gecos &#039;&#039;&lt;br /&gt;
Adding user `marcin&#039; ...&lt;br /&gt;
Adding new group `marcin&#039; (1001) ...&lt;br /&gt;
Adding new user `marcin&#039; (1001) with group `marcin (1001)&#039; ...&lt;br /&gt;
Creating home directory `/home/marcin&#039; ...&lt;br /&gt;
Copying files from `/etc/skel&#039; ...&lt;br /&gt;
Adding new user `marcin&#039; to supplemental / extra groups `users&#039; ...&lt;br /&gt;
Adding user `marcin&#039; to group `users&#039; ...&lt;br /&gt;
root@debian-12-dvm:~# &lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# visudo&lt;br /&gt;
root@debian-12-dvm:~#&lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# passwd marcin&lt;br /&gt;
New password: &lt;br /&gt;
Retype new password: &lt;br /&gt;
passwd: password updated successfully&lt;br /&gt;
root@debian-12-dvm:~# sudo su - marcin&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo su -&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/su -&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo echo hi&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/echo hi&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$ &lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ reboot&lt;br /&gt;
-bash: reboot: command not found&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, that worked. Perfect.&lt;br /&gt;
# I tested it on hetzner2; it worked too.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ sudo reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I sent Marcin a reply ask him to test reboots via ssh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry the server just went down; that was me testing to make sure your &#039;marcin&#039; user now has permission to do a proper &amp;amp; safer `sudo reboot` of hetzner2. It does.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that&lt;br /&gt;
&amp;gt; I should plan on potential breakage at any time?&lt;br /&gt;
&lt;br /&gt;
Great question. There&#039;s a couple things I&#039;d like to implement to prevent this from happening again:&lt;br /&gt;
&lt;br /&gt;
1. Replace both of your disks on hetzner2&lt;br /&gt;
&lt;br /&gt;
2. Give you reboot permission on hetzner2&lt;br /&gt;
&lt;br /&gt;
My best-guess is that the corruption happened because you abruptly shutdown the server. As you know, that&#039;s generally not a good idea as it can cause data loss.&lt;br /&gt;
&lt;br /&gt;
But filesystems use journals and databases use pages. They *should* be able to recover from abrupt shutdowns. They wouldn&#039;t be very useful if they were so frail as to not be able to recover from something like that...&lt;br /&gt;
&lt;br /&gt;
But in this case, I think it was a &amp;quot;perfect storm&amp;quot; that you caused corruption and it wasn&#039;t able to recover from it due to a bug in mariadb. And, because your OS is EOL, we can&#039;t update to a newer version of mariadb that *is* able to recover from such a unlucky combination of events.&lt;br /&gt;
&lt;br /&gt;
So, in the meantime, instead of you logging into hetzner&#039;s WUI to trigger reboots, I&#039;d prefer if you would ssh into the hetzner2 server and execute&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
Please test this on your computer now to make sure you&#039;re setup for it. To ssh into hetzner2, execute this command on your computer:&lt;br /&gt;
&lt;br /&gt;
  ssh -p 32415 marcin@opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
And then at the prompt, execute this command (make sure you type this *after* you&#039;ve logged into hetzner, or you&#039;ll end-up rebooting your own laptop!)&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
The second thing I&#039;d like to do is replace both of your disks on hetzner2. I don&#039;t think they caused corruption in this case, but I did discover that they&#039;re both screaming that they&#039;re going to die soon and asking to be replaced, so I would be a fool not to heed that warning.&lt;br /&gt;
&lt;br /&gt;
Hetzner shouldn&#039;t charge us to replace a failing disk, but I&#039;ll schedule some downtime for remote hetzner hands to shutdown the machine, then I&#039;ll need to format the new drive, add it to the RAID (the mirror of two redundant disks), and update your grub boot partition.&lt;br /&gt;
&lt;br /&gt;
There&#039;s some risk in doing this, because you&#039;ll be running on one non-redundant disk (a disk which is screaming at us saying it&#039;s going to die within 24 hours) while the RAID is re-building. But, of course, there&#039;s risk in not doing it..&lt;br /&gt;
&lt;br /&gt;
Please confirm that you can now reboot hetzner2 via ssh.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 16:39, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Thats excellent, thabk you, looks good. Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that I should plan&lt;br /&gt;
&amp;gt; on potential breakage at any time? Regarding the full migration, how many&lt;br /&gt;
&amp;gt; more hours/days of provisioning do tou still expwct to need? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created an article for the CHG to replace the first disk on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
## I wonder if I can figure out which one grub uses and replace that one second..&lt;br /&gt;
# from my log yesterday, here&#039;s our two drive&#039;s serial numbers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fuck; looks like neither is referenced in /boot/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA4520&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA336C&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the steps to setup grub are actually quite simple, according to the hetzner docs https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## it says if we&#039;re doing it on the booted system, then we just need to run `grub-install /dev/sdX`&lt;br /&gt;
# it has additional instructions for grub1. And, uh, looks like we have grub1, grub2, *and* an efi dir in /boot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64                            initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img  System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64                      initramfs-3.10.0-327.18.2.el7.x86_64.img         System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64                        initramfs-3.10.0-514.26.2.el7.x86_64.img         System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64                        initramfs-3.10.0-693.2.2.el7.x86_64.img          System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64                         initramfs-3.10.0-693.2.2.el7.x86_64kdump.img     System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi                                                      initrd-plymouth.img                              vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
grub                                                     lost+found                                       vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
grub2                                                    symvers-3.10.0-1127.el7.x86_64.gz                vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img  symvers-3.10.0-1160.119.1.el7.x86_64.gz          vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img                     symvers-3.10.0-327.18.2.el7.x86_64.gz            vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img                symvers-3.10.0-514.26.2.el7.x86_64.gz            vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img               symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m thinking we should actually just tell hetzner to do a hot swap while the system is on, so we can do this &amp;quot;easy install&amp;quot; of grub without risking the system not coming-up after they removed the drive&lt;br /&gt;
# oh, the efi dir is empty, so I&#039;m thinking we&#039;re using grub2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# find efi&lt;br /&gt;
efi&lt;br /&gt;
efi/EFI&lt;br /&gt;
efi/EFI/centos&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, the grub dir just has one file in it?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub&lt;br /&gt;
total 10K&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Apr 11  2016 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
-rw-r--r--  1 root root 1.4K Nov 15  2011 splash.xpm.gz&lt;br /&gt;
[root@opensourceecology boot]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# grub2 looks most sane&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub2&lt;br /&gt;
total 52K&lt;br /&gt;
drwx------. 5 root root 1.0K Jul 26  2024 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Dec 15  2015 fonts&lt;br /&gt;
-rw-r--r--  1 root root 7.8K Jul 26  2024 grub.cfg&lt;br /&gt;
-rw-r--r--  1 root root 5.3K Jun  1  2016 grub.cfg.1499616907.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 6.1K Jul  9  2017 grub.cfg.1506097734.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 7.0K Sep 22  2017 grub.cfg.1588589453.rpmsave&lt;br /&gt;
-rw-r--r--. 1 root root 1.0K Jul 26  2024 grubenv&lt;br /&gt;
drwxr-xr-x. 2 root root 9.0K May 31  2016 i386-pc&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K May 31  2016 locale&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it looks like it&#039;s referencing the raid, not the drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### BEGIN /etc/grub.d/10_linux ###&lt;br /&gt;
menuentry &#039;CentOS Linux (3.10.0-1160.119.1.el7.x86_64) 7 (Core)&#039; --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option &#039;gnulinux-3.10.0-327.13.1.el7.x86_64-advanced-af18bd25-f715-4003-b055-170a07591c60&#039; {&lt;br /&gt;
		load_video&lt;br /&gt;
		set gfxpayload=keep&lt;br /&gt;
		insmod gzio&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod diskfilter&lt;br /&gt;
		insmod mdraid1x&lt;br /&gt;
		insmod ext2&lt;br /&gt;
		set root=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;&lt;br /&gt;
		if [ x$feature_platform_search_hint = xy ]; then&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root --hint=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;  9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		else&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root 9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		fi&lt;br /&gt;
		linux16 /vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
		initrd16 /initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# right, so if I understand this correctly: we&#039;re not updating grub. We&#039;re using &#039;grub-install&#039; to copy our grub config *to* the drive. that&#039;s easier and less concerning than I thought.&lt;br /&gt;
# well, since I can&#039;t see any good reason to pick one drive or the other to replace first, I&#039;m going to have them replace /dev/sdb first. Just because &#039;sda&#039; seems like it would be primary. I know it&#039;s probably not, but, anyway..&lt;br /&gt;
# that means we&#039;ll replace Crucial_CT250MX200SSD1_154410FA4520 first; I created another wiki entry for that https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
# Marcin sent me an email confirming that he&#039;s able to restart hetzner2 with `sudo reboot`. I asked him to use this in the future if he needs to reboot it again.&lt;br /&gt;
# the disk is getting pretty full, but I&#039;m going to leave these files in /var/tmp/ for at least a few days, to make sure we don&#039;t actually need to restore from a backup again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  150G   38G  80% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/lib/mysql.20250418 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 17, 2025=&lt;br /&gt;
# Marcin sent me an email last night (and again this morning) asking why the wiki is down&lt;br /&gt;
# I hadn&#039;t touched ose infra since 6 days ago&lt;br /&gt;
# the wiki is still on hetzner2, which is on EOL Cent, so I&#039;m not terribly surprised it&#039;s falling apart.&lt;br /&gt;
# I first warned Marcin about this many years ago, and hopefully the migration to hetzner3 will be finished before the end of this year&lt;br /&gt;
# anyway, let&#039;s check what happened to the wiki on hetzner2&lt;br /&gt;
# it&#039;s a 500 error complaining about the db&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ curl -iL wiki.opensourceecology.org&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:52 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://wiki.opensourceecology.org/&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 500 Internal Server Error&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:54 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 976&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Varnish: 434054&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish-v4&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h1&amp;gt;Sorry! This site is experiencing technical difficulties.&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Try waiting a few minutes and reloading.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;small&amp;gt;(Cannot access the database)&amp;lt;/small&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;hr /&amp;gt;&amp;lt;div style=&amp;quot;margin: 1.5em&amp;quot;&amp;gt;You can try searching via Google in the meantime.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;small&amp;gt;Note that their indexes of our content may be out of date.&amp;lt;/small&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;form method=&amp;quot;get&amp;quot; action=&amp;quot;//www.google.com/search&amp;quot; id=&amp;quot;googlesearch&amp;quot;&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;domains&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;num&amp;quot; value=&amp;quot;50&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;ie&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;oe&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;q&amp;quot; size=&amp;quot;31&amp;quot; maxlength=&amp;quot;255&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;submit&amp;quot; name=&amp;quot;btnG&amp;quot; value=&amp;quot;Search&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;p&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; checked=&amp;quot;checked&amp;quot; /&amp;gt;Open Source Ecology&amp;lt;/label&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;WWW&amp;lt;/label&amp;gt;&lt;br /&gt;
	&amp;lt;/p&amp;gt;&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# disk is fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G   96G   92G  52% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s no new logs in the apache error log when I hit the site in real-time (bypassing the cache)&lt;br /&gt;
# there&#039;s also no new logs in the mariadb error log when I hit the site in real-time&lt;br /&gt;
# well, the db isn&#039;t running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Thu 2025-04-17 17:39:24 UTC; 2h 42min ago&lt;br /&gt;
  Process: 1227 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 1226 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 1103 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 1226 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# error logs aren&#039;t very helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# journalctl -fu mariadb&lt;br /&gt;
-- Logs begin at Thu 2025-04-17 17:38:59 UTC. --&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I try to restart it manually, nothing gets put in the journal logs, but there&#039;s a bunch to the actual log file that the journal log mentions (damn systemd)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the log that pops-up when we try a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 20:24:31 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 20:24:31 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 10583 ...&lt;br /&gt;
250417 20:24:31 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 20:24:31 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 20:24:31 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 20:24:31 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 20:24:31 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 20:24:31 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 20:24:31 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 20:24:31  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 20:24:31  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 20:24:31  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 20:24:31  InnoDB: Assertion failure in thread 140093400303360 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250417 20:24:31 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x563a1c105cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x563a1bd19975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f6a294c9630]&lt;br /&gt;
:0(__GI_raise)[0x7f6a27bf0387]&lt;br /&gt;
:0(__GI_abort)[0x7f6a27bf1a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x563a1beae45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638f69)[0x563a1beaef69]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x563a1bfb1504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x563a1bfa6487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x563a1beb117d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x563a1bea50f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6a294c1ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f6a27cb8b0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250417 20:24:31 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# google points to this https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
## they say it could be a bug that might be fixed in v5.7. We&#039;re using 5.5.68. hetzner3 uses 5.8.&lt;br /&gt;
# reddit says we&#039;re fucked and should restore from backup https://old.reddit.com/r/mysql/comments/d3nkc7/innodb_assertion_failure_in_thread_4560_in_file/&lt;br /&gt;
# before reading any more, I&#039;m going to immediately make a local copy of our most-recent backups&lt;br /&gt;
# looks like we have a backup from 13 hours ago and one from 27 hours ago&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ date&lt;br /&gt;
Thu Apr 17 20:36:56 UTC 2025&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync&lt;br /&gt;
total 21G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 17 07:49 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    21G Apr 17 07:48 daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync.old/&lt;br /&gt;
total 22G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 16 07:52 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    22G Apr 16 07:52 daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this SE answer is helpful https://serverfault.com/questions/592793/mysql-crashed-and-wont-start-up&lt;br /&gt;
## it says we can force the db to start (in &amp;quot;recovery mode&amp;quot;) and then try to figure out which table is corrupted. Then we might be able to backup more-recent data from the not-corrupt tables and only recover the fucked table&lt;br /&gt;
## other warnings suggest solving the underlying issue: why did the data become corrupt?&lt;br /&gt;
## well, we know Marcin has been hard-resetting the server (via the hetzner wui) about every week because it keeps breaking since some months ago (it&#039;s EOL and not worth debugging)&lt;br /&gt;
## but it&#039;s also possible we have a worse issue, like a disk failing. We do have RAID1 tho, so idk. Still, it would be wise to check the SMART data and RAID logs and filesystem for corruption&lt;br /&gt;
# I sent a quick status update to Marcin so he knows the severity of the issue and that this isn&#039;t going to be fixed soon&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Your database is corrupt and won&#039;t start.&lt;br /&gt;
&lt;br /&gt;
Quick internet search for the error messages suggests this could be a bug that&#039;s been fixed in mariadb 5.7. You&#039;re using 5.6 and can&#039;t upgrade because your OS is EOL. hetnzer3 is running 5.8.&lt;br /&gt;
&lt;br /&gt;
 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
I&#039;m looking into seeing what is corrupt, what isn&#039;t corrupt, and if we can restore from backup.&lt;br /&gt;
&lt;br /&gt;
This is not going to be an easy or fast fix, sorry. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the backups of the backups finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /home/b2user/sync*/* /var/tmp/&lt;br /&gt;
sending incremental file list&lt;br /&gt;
daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
 22,975,631,986 100%  139.63MB/s    0:02:36 (xfr#1, to-chk=1/2)&lt;br /&gt;
daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
 21,566,407,634 100%  103.43MB/s    0:03:18 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 44,552,914,338 bytes  received 54 bytes  125,324,653.70 bytes/sec&lt;br /&gt;
total size is 44,542,039,620  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  138G   50G  74% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also going to take down the webservers, so that they can&#039;t fuck-up the database worse, if we do start it in some recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I should also make a backup of /var/lib/mysql&lt;br /&gt;
# I&#039;m going to create a dif for all of this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mkdir /var/tmp/dbFail.20250417&lt;br /&gt;
[root@opensourceecology ~]# chown root:root /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041* /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# vim /var/tmp/dbFail.20250417/info.txt&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /var/tmp/dbFail.20250417/info.txt &lt;br /&gt;
2025-04-17: Marcin emailed me last night saying the wiki was down with a db error. Today I tried to start it, but it refues to come-up. Looks like it&#039;s preventing itself from starting because it realizes something is corrupt and starting it would make things worse. Internet says maybe this was fixed in a newer version; we can&#039;t upgrade because Cent is EOL. Hetzner3 has the newer version&lt;br /&gt;
&lt;br /&gt;
		 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
		Anyway, I&#039;m creating this folder to store some backups before we make things worse.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# aaaand I added a copy of /var/lib/mysql/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /var/lib/mysql /var/tmp/dbFail.20250417/var-lib-mysql.$(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
sending incremental file list&lt;br /&gt;
created directory /var/tmp/dbFail.20250417/var-lib-mysql.20250417&lt;br /&gt;
mysql/&lt;br /&gt;
mysql/aria_log.00000001&lt;br /&gt;
		 16,384 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=707/709)&lt;br /&gt;
...&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rate_locations.frm&lt;br /&gt;
		  8,714 100%    9.26kB/s    0:00:00 (xfr#689, to-chk=1/709)&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rates.frm&lt;br /&gt;
		 13,128 100%   13.95kB/s    0:00:00 (xfr#690, to-chk=0/709)&lt;br /&gt;
&lt;br /&gt;
sent 7,384,914,964 bytes  received 13,343 bytes  114,495,012.51 bytes/sec&lt;br /&gt;
total size is 7,383,062,830  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# another important note: apparently we can keep increasing the value of innodb_force_recovery until it starts, but anything &amp;gt;3 could corrupt the data worse https://dba.stackexchange.com/q/241714&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Marko, MariaDB Innodb lead: MDEV-15370 was a bug when ugprading to 10.3, caused by MDEV-12288. Actually upgrades can still fail (MDEV-15912) if a slow shutdown of the old server was not made. Because the scenario does not involve upgrading to 10.3 or later, I am afraid that the user witnessed some kind of undo log corruption. Starting up with innodb_force_recovery=3 might allow dumping all data. If that crashes, then try innodb_force_recovery=5, but be aware that anything &amp;gt;3 may corrupt the database further, and therefore you should not use the database for anything else than mysqldump&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, a lot of the links for how to fix this are now dead&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## https://forums.mysql.com/read.php?22,603093,604631#msg-604631&lt;br /&gt;
## https://support.plesk.com/hc/en-us/articles/12377798484375-Plesk-is-not-accessible-ERROR-Zend-Db-Adapter-Exception-SQLSTATE-HY000-2002-No-such-file-or-directory&lt;br /&gt;
# we&#039;re running 5.6, so it should be this https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html&lt;br /&gt;
## but note that redirects to 8.6 for some reason? https://dev.mysql.com/doc/refman/8.4/en/forcing-innodb-recovery.html&lt;br /&gt;
## ah, so does 1.1 – apparently anything it doesn&#039;t like just reidrects to the latest version https://dev.mysql.com/doc/refman/1.1/en/forcing-innodb-recovery.html&lt;br /&gt;
# this suggests that, if we&#039;re going to use innodb_force_recovery 4 or greater, we only do it on another machine. So basically take the data I just backed-up put it on a separate machine, and do the fucker *there* instead https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## it also says that dumps of 4 or greater could still render corrupt data, so they shouldn&#039;t be trusted, anyway&lt;br /&gt;
## good news: it says the db blocks all INSERT, UPDATE, and DELETE commands when any recovery mode is enabled&lt;br /&gt;
### but we *can* run DROP. so the idea is to dump everything in recovery mode and drop what is corrupt. then restart with the recovery value set to 0 and restore.&lt;br /&gt;
## it says that dumps from recover mode of 1 or 2 or 3 are safe, and only the page is corrupt&lt;br /&gt;
### here&#039;s the definition of a page https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A unit representing how much data InnoDB transfers at any one time between disk (the data files) and memory (the buffer pool). A page can contain one or more rows, depending on how much data is in each row. If a row does not fit entirely into a single page, InnoDB sets up additional pointer-style data structures so that the information about the row can be stored in one page.&lt;br /&gt;
&lt;br /&gt;
One way to fit more data in each page is to use compressed row format. For tables that use BLOBs or large text fields, compact row format allows those large columns to be stored separately from the rest of the row, reducing I/O overhead and memory usage for queries that do not reference those columns.&lt;br /&gt;
&lt;br /&gt;
When InnoDB reads or writes sets of pages as a batch to increase I/O throughput, it reads or writes an extent at a time.&lt;br /&gt;
&lt;br /&gt;
All the InnoDB disk data structures within a MySQL instance share the same page size.&lt;br /&gt;
&lt;br /&gt;
See Also buffer pool, compact row format, compressed row format, data files, extent, page size, row.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess that just means data that hasn&#039;t been written to disk yet. So I *think* it should be OK to trust data that only has corrupt pages?&lt;br /&gt;
# ok, I think I have enough to proceed – at least for recovery modes 1, 2, and 3.&lt;br /&gt;
# but first let&#039;s check SMART&lt;br /&gt;
# oh, fuck, my notes on this are on the wiki. Of course.&lt;br /&gt;
# arch wiki to the rescue https://wiki.archlinux.org/title/S.M.A.R.T.&lt;br /&gt;
# fail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
-bash: smartctl: command not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# luckily the yum servers for this EOL OS are still online, and I could install it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# yum install smartmontools&lt;br /&gt;
...&lt;br /&gt;
Total download size: 546 k&lt;br /&gt;
Installed size: 2.0 M&lt;br /&gt;
Is this ok [y/d/N]: y&lt;br /&gt;
Downloading packages:&lt;br /&gt;
smartmontools-7.0-2.el7.x86_64.rpm                                                                                                              | 546 kB  00:00:00     &lt;br /&gt;
Running transaction check&lt;br /&gt;
Running transaction test&lt;br /&gt;
Transaction test succeeded&lt;br /&gt;
Running transaction&lt;br /&gt;
  Installing : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
  Verifying  : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
&lt;br /&gt;
Installed:&lt;br /&gt;
  smartmontools.x86_64 1:7.0-2.el7                                                                                                                                     &lt;br /&gt;
&lt;br /&gt;
Complete!&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# better&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
SMART support is: Available - device has SMART capability.&lt;br /&gt;
SMART support is: Enabled&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well this is terrifying; it says both our disks are gonna fail within 24 hours&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# compare that to hetnzer3, which says all is good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m not 100% convinced that this is true. I still want to initiate a test on the drives, but I&#039;m going to go ahead and pass this to hetzner support asap and ask them if there&#039;s a fee for them to replace our drives.&lt;br /&gt;
# oh, interesting. they have a walkthrough that says it&#039;s free via Server -&amp;gt; Technical -&amp;gt; Disk Failure https://robot.hetzner.com/support/index&lt;br /&gt;
## well, it lists two options&lt;br /&gt;
### Free Replacement drive nearly new or used and tested; depends on what is in stock. &lt;br /&gt;
### At cost Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
## we were given an option if we should hot swap while the system is on or shutdown. I&#039;m going to say shutdown. That&#039;ll be simpler from the OS side, I think&lt;br /&gt;
## dang, it says they&#039;ll swap the drive within 2-4 hours.&lt;br /&gt;
# I&#039;ve never done this before, but it&#039;s a hardware raid. My understanding is that as soon as it comes-up, it&#039;ll begin copying the data from one disk to the other disk. But, christ, if both disks are fucked then which disk should I choose them to replace? Can I see which one is more fucked than the other?&lt;br /&gt;
# hetzner provides 4 docs for assistance on this&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#information-on-defective-drives&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/maintainance/nvme/#show-serial-number-of-a-specific-nvme-ssd&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#creating-a-complete-smart-log&lt;br /&gt;
# that first doc says to run the command we just ran&lt;br /&gt;
# hmm..it says for more info we should look at the &amp;quot;Failed Attributes&amp;quot; – but we have none for either disk&lt;br /&gt;
# ok, the docs say we can get more info with -A&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so both say &amp;quot;Percent_Lifetime_Remain&amp;quot; is an issue. does that mean it&#039;s not *actually* writing corrupt data, but it&#039;s literally just a timer that hit and said &amp;quot;yeah you should probably replace the disk??&amp;quot;&lt;br /&gt;
# well, &amp;quot;Percent_Lifetime_Remain&amp;quot; doesn&#039;t appear in the docs table. nor in the source wikipedia table https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes&lt;br /&gt;
# yeah, reddit suggests that means the drive &amp;quot;should be replaced soon&amp;quot; but not that it&#039;s actually detected as failing now https://www.reddit.com/r/homelab/comments/kaaqma/percent_lifetime_remain_failing_now/&lt;br /&gt;
# in that case, I guess it doesn&#039;t matter which disk we replace. But let&#039;s go ahead and get one replaced. I don&#039;t think this was the cause of the db corruption (I still think it&#039;s &amp;quot;shutting down the computer abruptly + a bug in old mariadb that prevents it from recovering&amp;quot;), but I would be stupid not to take a free replacement of a RAID1-mirrored disk that&#039;s alerting us that it&#039;s too old to be in prod.&lt;br /&gt;
# the second hetnzer docs refer to nvme. that&#039;s relevant on hetzner3 but not hetzner2. anyway, I do want to know how to check this on hetzer2 (even if I can&#039;t update the wiki right now with this docs)&lt;br /&gt;
# wow, the output for smartctl looks very different for NVMEs on Debian than it does on CentOS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        39 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    6%&lt;br /&gt;
Data Units Read:                    152.358.379 [78,0 TB]&lt;br /&gt;
Data Units Written:                 52.125.092 [26,6 TB]&lt;br /&gt;
Host Read Commands:                 6.873.372.480&lt;br /&gt;
Host Write Commands:                1.362.559.127&lt;br /&gt;
Controller Busy Time:               22.226&lt;br /&gt;
Power Cycles:                       28&lt;br /&gt;
Power On Hours:                     17.245&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      159&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               39 Celsius&lt;br /&gt;
Temperature Sensor 2:               48 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        40 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    7%&lt;br /&gt;
Data Units Read:                    140.811.605 [72,0 TB]&lt;br /&gt;
Data Units Written:                 56.604.901 [28,9 TB]&lt;br /&gt;
Host Read Commands:                 1.304.073.899&lt;br /&gt;
Host Write Commands:                1.364.668.115&lt;br /&gt;
Controller Busy Time:               21.180&lt;br /&gt;
Power Cycles:                       23&lt;br /&gt;
Power On Hours:                     15.565&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      149&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               40 Celsius&lt;br /&gt;
Temperature Sensor 2:               45 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that shows we&#039;re at 6% and 7% usage on hetzner3, whereas I guess we&#039;re at 100% on hetzner2&lt;br /&gt;
# the third hetzner doc refers to a software raid. actually, I thought we were using a hardware raid, but now I&#039;m not sure&lt;br /&gt;
# this indicates that our raid is fine. two UUs (eg `[UU]`) is fine. Bad would be a U and a missing U (eg `[U_]`)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat &lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sdb2[1] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[1] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah crap, the process to bring the new drive back into the RAID is not-trivial https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## first we have to format the new drive exactly as the old drive, then add each partition into the RAID array, then update grub. And, of course, meanwhile we&#039;ll be running on one disk. So if we fuck-up any of those steps, we loose everything. This could take me a few days (or weeks), and meanwhile the sites are all offline and our daily backups on backblaze are being deleted/rotated out of existance. Sadly, I think I&#039;m going to postpone this until after we get the sites back-up.&lt;br /&gt;
# the last hetzner doc shows us how to get the serial number of our disks (which hetzner will ask-for when we tell them to swap it)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and ran a SMART test; it says it&#039;ll take just 2 minutes to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:07:55 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:08:18 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also kicked-off a long test, which I can check tomorrow&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:12 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:14 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, then we have the filesystem. it looks like /var/lib/msyql/ lives on &#039;/&#039; which is /dev/md2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h /var/lib/mysql&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
/dev/md2        197G  145G   43G  78% /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/md2&lt;br /&gt;
&lt;br /&gt;
Disk /dev/md2: 215.0 GB, 215024271360 bytes, 419969280 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk /dev/md2&lt;br /&gt;
NAME MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
md2    9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it won&#039;t let me check the filesystem while it&#039;s mounted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fsck /dev/md2&lt;br /&gt;
fsck from util-linux 2.23.2&lt;br /&gt;
e2fsck 1.42.9 (28-Dec-2013)&lt;br /&gt;
/dev/md2 is mounted.&lt;br /&gt;
e2fsck: Cannot continue, aborting.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it probably should be happening on-boot, but I couldn&#039;t find it in dmesg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i check&lt;br /&gt;
[    0.000000] Early table checksum verification disabled&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i fsck&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, instead we can just use tune2fs to get the info on the last check that was run&lt;br /&gt;
# looks like it ran today; probably when Marcin rebooted it https://unix.stackexchange.com/questions/400851/what-should-i-do-to-force-the-root-filesystem-check-and-optionally-a-fix-at-bo&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2&lt;br /&gt;
tune2fs 1.42.9 (28-Dec-2013)&lt;br /&gt;
Filesystem volume name:   &amp;lt;none&amp;gt;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Filesystem UUID:          af18bd25-f715-4003-b055-170a07591c60&lt;br /&gt;
Filesystem magic number:  0xEF53&lt;br /&gt;
Filesystem revision #:    1 (dynamic)&lt;br /&gt;
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize&lt;br /&gt;
Filesystem flags:         signed_directory_hash&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Filesystem OS type:       Linux&lt;br /&gt;
Inode count:              13131776&lt;br /&gt;
Block count:              52496160&lt;br /&gt;
Reserved block count:     2624808&lt;br /&gt;
Free blocks:              26575102&lt;br /&gt;
Free inodes:              12417672&lt;br /&gt;
First block:              0&lt;br /&gt;
Block size:               4096&lt;br /&gt;
Fragment size:            4096&lt;br /&gt;
Reserved GDT blocks:      1011&lt;br /&gt;
Blocks per group:         32768&lt;br /&gt;
Fragments per group:      32768&lt;br /&gt;
Inodes per group:         8192&lt;br /&gt;
Inode blocks per group:   512&lt;br /&gt;
Flex block group size:    16&lt;br /&gt;
Filesystem created:       Tue May 31 06:01:12 2016&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Last write time:          Thu Apr 17 17:39:00 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
Check interval:           0 (&amp;lt;none&amp;gt;)&lt;br /&gt;
Lifetime writes:          124 TB&lt;br /&gt;
Reserved blocks uid:      0 (user root)&lt;br /&gt;
Reserved blocks gid:      0 (group root)&lt;br /&gt;
First inode:              11&lt;br /&gt;
Inode size:               256&lt;br /&gt;
Required extra isize:     28&lt;br /&gt;
Desired extra isize:      28&lt;br /&gt;
Journal inode:            8&lt;br /&gt;
Default directory hash:   half_md4&lt;br /&gt;
Directory Hash Seed:      b9456d9f-1608-4444-99c2-02e6f327e42d&lt;br /&gt;
Journal backup:           inode blocks&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# both of the filesystems (/ and /boot) look fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md1 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /boot&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              46&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Tue May 31 06:01:07 2016&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, so far I couldn&#039;t find any signs of corruption on the disk/fs level&lt;br /&gt;
# back to the db, I set the recovery option in the my.cnf file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# cp my.cnf my.cnf.20250417&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 1&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it didn&#039;t come-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried changing it to restore level 2; this time it got stuck &amp;quot;waiting for the background threads&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 22:32:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 22:32:49 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 14901 ...&lt;br /&gt;
250417 22:32:49 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 22:32:49 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 22:32:49 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 22:32:49 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 22:32:49 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 22:32:49 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 22:32:49 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 22:32:49  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 22:32:49  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 22:32:49  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:50  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:51  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:52  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:53  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:54  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:55  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:56  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:57  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:58  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it seems infinite. I don&#039;t know if it&#039;s going to time-out, but I&#039;m just going to leave it and come-back tomorrow.&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 11, 2025=&lt;br /&gt;
&lt;br /&gt;
# let&#039;s get Catarina that broken staging site for osemain on hetzner3&lt;br /&gt;
# Marcin still hasn&#039;t regained access to his ssh key (so he can update the ose keepass), but he did finally send me the password to our hetzner account&lt;br /&gt;
# so now I can order a second IPv4 address, as needed for obi &amp;amp; osemain to have two distinct sites on hetzner3&lt;br /&gt;
# I logged-into hetzner https://robot.hetzner.com/server&lt;br /&gt;
# I also typed a &amp;quot;name&amp;quot; into the blank &amp;quot;name&amp;quot; fields for our two servers. one is now called &amp;quot;hetzner2&amp;quot; and the new one &amp;quot;hetzner3&amp;quot;&lt;br /&gt;
# I clicked on the server for &amp;quot;hetzner3&amp;quot; and the tab &amp;quot;IPs&amp;quot;.&lt;br /&gt;
## Then I clicked on &amp;quot;Order additional IPs / Nets&amp;quot;&lt;br /&gt;
## I selected &amp;quot;One additional IP with costs (€ 1.70 max. per month / € 0.0027 per hour + € 4.90 once-off setup)&amp;quot;&lt;br /&gt;
## it required me to enter a reason (IPv4 is scarce) to which I wrote:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
we need to run two websites with the same domain name that are already running on our primary IPv4 address, and a client doesn&#039;t have IPv6 working at their office&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## and I clicked &amp;quot;Apply for IP/subnet in obligation&amp;quot;&lt;br /&gt;
## I got a message; looks like it needs human approval&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your request for additional IPs/subnets was successfully sent. We will send you an email as soon as your IP/subnet is ready.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I typed an email to Marcin and Catarina to notify them of this order&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
As authorized on our last call, I ordered an additional IPv4 address for your hetzner account.&lt;br /&gt;
&lt;br /&gt;
IPv4 addresses are scarce, and it appears that they need to approve it manually.&lt;br /&gt;
&lt;br /&gt;
The cost is €1.70 per month + € 4.90 once-off setup.&lt;br /&gt;
&lt;br /&gt;
This will allow us to run more than one website with the same domain off the same server. That will be needed for osemain and obi.&lt;br /&gt;
&lt;br /&gt;
Once you finish rebuilding those websites on hetzner3 to use a new not-broken theme, we can cancel this second IP address.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I finished typing ^ that email, I got an email from hetzner indicating that we have a new IP&lt;br /&gt;
# I refreshed the hetzner wui, and now I see the new IP&lt;br /&gt;
# ...&lt;br /&gt;
# following-up on the bus factor, I added Catarina &amp;amp; Tom&#039;s ssh keys to their authorized_keys files on hetzner3&lt;br /&gt;
## I sent them both emails asking them to confirm access&lt;br /&gt;
# I also emailed Marcin asking if he installed zulucrypt yet to try to recover his old ssh key&lt;br /&gt;
# update: within a few hours, Marcin had successfully decrypted and mounted his old veracrypt volume using zuluCrypt&lt;br /&gt;
# he created this article on the wiki https://wiki.opensourceecology.org/wiki/Zulucrypt&lt;br /&gt;
# I found that he had previously documented scattered articles about backups, luks, veracrypt, pgp, cybersec general, etc in a ton of different articles. So I spent some time adding categories and &amp;quot;see also&amp;quot; sections to those articles, in hopes he will be more easily able to do this in the future&lt;br /&gt;
# I also asked him to please document what he needed for himself 5 years from now into a README file next to the &#039;ose-veracrypt&#039; volume on his usb drive.&lt;br /&gt;
# Marcin confirmed that he was able to restore his ssh keys and ssh into hetzner3. awesome.&lt;br /&gt;
# ...&lt;br /&gt;
# I logged all my hours and sent an invoice to OSE for last month (Mar 2025)&lt;br /&gt;
# gah, I had obliterated half my 2025Q1 log. when I tried to restore it, I got a 413 error lgo&lt;br /&gt;
# I checked php and nginx; it&#039;s 10M. How did I write &amp;gt;10 MB of text in one quarter?&lt;br /&gt;
# there&#039;s too many layers on this server; I checked the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Apr 11 22:18:20.306872 2025] [:error] [pid 13182] [client 127.0.0.1:56606] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;], referer: https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&lt;br /&gt;
HTTP/1.1 413 Request Entity Too Large&lt;br /&gt;
Message: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413)&lt;br /&gt;
Apache-Error: [file &amp;quot;apache2_util.c&amp;quot;] [line 271] [level 3] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;]&lt;br /&gt;
127.0.0.1 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.0&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot;&lt;br /&gt;
146.70.199.124 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.1&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it&#039;s modsecurity?&lt;br /&gt;
# gah, that&#039;s a lot of files to review&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# find .  |grep -i security&lt;br /&gt;
./conf.d/mod_security.wordpress.include&lt;br /&gt;
./conf.d/mod_security.conf&lt;br /&gt;
./conf.modules.d/10-mod_security.conf&lt;br /&gt;
./modsecurity.d&lt;br /&gt;
./modsecurity.d/activated_rules&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_42_tight_security.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_35_bad_robots.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_45_trojans.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_48_local_exceptions.conf.example&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_bad_robots.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_23_request_limits.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_49_inbound_blocking.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_60_correlation.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound_malware.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_scanners.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_40_generic_attacks.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_50_outbound.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_47_common_exceptions.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_30_http_policy.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_20_protocol_violations.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_xss_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_59_outbound_blocking.conf&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf.20181024.orig&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf&lt;br /&gt;
./modsecurity.d/do_not_log_passwords.conf&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it&#039;s SecRequestBodyLimit http://stackoverflow.com/questions/13887812/ddg#14690797&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -irl &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf&lt;br /&gt;
modules/mod_security2.so&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s 13107200&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimit 13107200&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimitAction Reject&lt;br /&gt;
Binary file modules/mod_security2.so matches&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# docs say it&#039;s in bytes https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecRequestBodyLimit&lt;br /&gt;
# so 13107200 / 1024 / 1024 = 12.5 MB.&lt;br /&gt;
# jesus that&#039;s a lot of data; I&#039;m not gonna increase that in 4 places (nginx, apache, mod_security, php); let&#039;s just split it into two articles :(&lt;br /&gt;
# ...&lt;br /&gt;
# so Marcin is stressing urgancy to get Catarina a sandbox so she can rebuild osemain using some new theme that&#039;s not broken on the latest version of wordpress, php, etc on hetzner3&lt;br /&gt;
# I didn&#039;t want to do this site before the other less-priority ones, but it&#039;s just a sandbox&lt;br /&gt;
# I realized I never made a CHG file for osemain&lt;br /&gt;
# looks like I first did a snapshot Jan 31https://wiki.opensourceecology.org/wiki/Maltfield_Log/2025_Q1#Fri_Jan_31.2C_2025&lt;br /&gt;
# ugh, I just said I was &amp;quot;following the same guide as with the other sites&amp;quot;&lt;br /&gt;
## I was hoping to know which one to CHG to copy-from&lt;br /&gt;
## I guess it makes the most sense to copy from obi, which already has both a static and dynamic site setup (untested)&lt;br /&gt;
# ok, I made a first draft of our osemain CHG to migrate to hetnzer3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306066</id>
		<title>Maltfield Log/2025 Q2</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306066"/>
		<updated>2025-04-27T22:02:03Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: apr 26&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the second quarter of the year 2025. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 26, 2025=&lt;br /&gt;
# Marcin authorized me to add Tom to our ops google groups mailing list and to give him access to our shared ose keepass&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Fri, Apr 25, 2025, 12:43 PM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; (re-sending without encryption)&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On 4/25/25 12:41, Michael Altfield wrote:&lt;br /&gt;
&amp;gt;&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Do you authorize:&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; 1. Giving Tom access to the shared OSE keepass file&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; 2. Adding Tom to the ops mailing list (this would allow him to password&lt;br /&gt;
&amp;gt;&amp;gt; reset many of our important accounts)&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Please let me know if you authorize the above.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom sent me his gpg public key, which I can use to add him to the wazuh emails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~$ gpg&lt;br /&gt;
gpg: WARNING: no command supplied.  Trying to guess what you mean ...&lt;br /&gt;
gpg: Go ahead and type your message ...&lt;br /&gt;
-----BEGIN PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
mQINBGgMJ7ABEACwllLJu87blFKJ8aZMR7pCjRzhhp266Rjxz7071iow43a7FkvN&lt;br /&gt;
pcXmYsuwW4dLhqA+Sose7Fjo9o9+7bOLcBAso9x9hk55+pDQm67wyXmxp+7pWVhj&lt;br /&gt;
hdLBsdB4faLQDHkHymKUs/UKRViN0an/6nARxVyah58Dh/OcnSIv0bnozze8YRJX&lt;br /&gt;
aklCs+OF2Jv+gBH5VWNMLloX+l+MsBYj9N14MsMeWJ8lSNFWBl/SOBGuOftZbljp&lt;br /&gt;
qb8dBZRo/4OR/Dr5zCUQ1KuPu2wFKfMRwi3NEdmUKpFf/U7Ydn7ZK2T+ZKl+x1eb&lt;br /&gt;
+0I0ZM0DgaTYTqd82wlag1hfrYM7SONYb0C03x5T4y+CsG9IchgQ2yihYIKgHOIW&lt;br /&gt;
Wiz6vC4N4EKmuKAqCOGS/gzp7xDqzXl2R2sWHyRuOn3yUr2z9HdDk2sjnobtaVli&lt;br /&gt;
wYaIoes9zrBgunLoK9S0FaHzSPX0FGwygV50E73BFxJBmL6eHeRVuYOi0FkAQmsN&lt;br /&gt;
dJeOvpCwKgBModyPbxin78KKbgF/0OnxWL+Zde6+J5l+aW81xbwNZYuyxWHSb7m3&lt;br /&gt;
2RM4dXhxAWM2cBQ5+b5yKopO8T4OzKl5C/rYzhuEYqpSEQJccFNHmQexkwqACVNl&lt;br /&gt;
h/D97jm0580ctnGCZuNzmLlsXX2mzqOj6UU2LlUFy0HT5tr93KBA+HkGhwARAQAB&lt;br /&gt;
tEBUb20gR3JpZmZpbmcgKE9TRSBQR1AgS2V5IDQtMjUtMjAyNSkgPHRvbS5ncmlm&lt;br /&gt;
ZmluZ0B0dXRhbm90YS5jb20+iQJRBBMBCgA7FiEEEzAJATSKmFEVZ5Fl+xN6Yz/R&lt;br /&gt;
60wFAmgMJ7ACGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQ+xN6Yz/R&lt;br /&gt;
60xHURAAqIUawudDI3dmIVPa/RHTOusoJA4KIXLNCMiILWd3iwZQFQNrt6YHpwJU&lt;br /&gt;
pyvsXAM4QWd/qt0D9IF6K9waOIA5ipX0yXFVxZ0V1BQ6aq3cK1r+NvQUcLJzS02W&lt;br /&gt;
T9UIJtHOs+8EbIIS6ybcnxS6RARinrJpTkoCWspWXMDnXcX3n4pbbhHQLViswf1C&lt;br /&gt;
tOE7uSfNPcxGLK4cYLxLL1VHC45eB2CTEAxfXSavCPI62IcYkZBdwWz7E8q1QpsP&lt;br /&gt;
vxgxe31b+v9NcaxW5tc2/4NwaObqKSZYlhK/pce3X18+uWzpmE3ubhPb7Ptb5GLo&lt;br /&gt;
42U9ymRFg7a14VFfq+wcwSlZR01o7Q2FofAOFpX+EoDBkughAX6hWyYxErJ4vD7k&lt;br /&gt;
ogYX25J5suxrixkTzDMJ0cCsZyt/Bu0liVnojaETUhrNUwBp7Rz7xx5x6Go/sZHK&lt;br /&gt;
mzhCe1q4xwSHeTZTjyG3oby4KDPgb0WEKCdUpa5BobgT9goGGXjCxe9dS8ZVUu4I&lt;br /&gt;
bso+h/SK95nmgsl/EDrmDXvWOh/Zy76GixCq48ydEkGbVz/6ri1+pD0NXYN/ijAu&lt;br /&gt;
h6EsLnoBLQCLlYYsBTfg31X2Sbzigeloy6iRWoHtCOAfI2Azdhby+BCGuSIvUOXa&lt;br /&gt;
Q4CQjmjYpsx7nwtjWOgCZ4rObTekj4O9ZnI8Gtxfpzy1gFdyfw65Ag0EaAwnsAEQ&lt;br /&gt;
ANnD6PMPT0CU1RqbAQtVw7eJksV96+tl/xG8mtje631n2uBe9WzyLch0fgC99eID&lt;br /&gt;
ZDGXfJUEdODuI9/H8037PnJmmMtP2eP1c/ztrql6pxPj9c0jIRWjtwmNhyYNaaEn&lt;br /&gt;
i0JyLz5SiTbuftlHXaKhVTuLc/Qp44FH5XK6LVHphDR8Ck43Mhj7enfvGvmAUgLW&lt;br /&gt;
OLQMst84oOCywYX+nUmov2rCIhuc6RhX4OcOBZcEA2W/CSsoNXR4To9mn8Gg3/dH&lt;br /&gt;
ZKS/3sDwJQxjFvkqc89+aTPY85TBoUGBUzbQG+KFQgDyVt4kABK1iyUA1PKZOb4Q&lt;br /&gt;
MZJnR9g0UI/ctfrOpz4hhEFaQ+rEYwdm5MSXOQGfjrnGu3t85IQzmxUXovqmfsjn&lt;br /&gt;
oFPSPd/91/rJJKxci+rCX7CpQSObPrwHNgPNQ5zleDV7d9/u9UaGRFeOaaM+abd0&lt;br /&gt;
RhPh4nJWbDdNOWpj3pxJkG3tzmbazBogxTq0SDRP8wvBAD0JYESoPVGWQ6czlTnu&lt;br /&gt;
T0ov9QKMb21mfUQ6DmfxTFQbkr1g1r2uYfJ1TbP0AcAK+Q/IMtt8F7chulfAe7/0&lt;br /&gt;
9nk7HwqWHTkj8+YB9+Ro2hkUTpL57uEYdG/ukGODfTNhu02wxG02zlYFsTyd/H62&lt;br /&gt;
VIgT1Cpf5HBb73lzdiSVtl45C34Fwu8ZO6dBdmk2c1nFABEBAAGJAjYEGAEKACAW&lt;br /&gt;
IQQTMAkBNIqYURVnkWX7E3pjP9HrTAUCaAwnsAIbDAAKCRD7E3pjP9HrTNxGD/wN&lt;br /&gt;
syvVZxm4hyw4l8U6J3B/3rKAup+l7GQCXthNK+f3YPwWdWc8DOo3kBrP4ppR5Ry9&lt;br /&gt;
YKb700wBDAYwWfy+ZJPHMi0vVUf8kX2QQEj4sFZHj9suTFvfLdsLTAhNtRXVtZiu&lt;br /&gt;
xfr1T3R3T0XSSFFdhiBO+BYRnlgFRiiR9FCTDaxrLRfhAhOwC6LHOarHnRi5nQS8&lt;br /&gt;
2PaHIYbWN7c5CdpH9dsPUt3xi1sEf8E87HTZo30Of/FYtB4eTOdx2DMqKscbJvZS&lt;br /&gt;
1ugK+2v7DMaiBMZCfbZSVNjn8+VcTOPW5KzJFsVR7UmfvTZu6c3jrshHuPOSguT7&lt;br /&gt;
l63AcfrJZOJe+djndWws2u0FpyMu0AHoS2r3EtBd/OydjEKG2P7qFb3KX9I9Tv35&lt;br /&gt;
zQmpHc4e2TJTYKpXyfarzgKFuUfOmZpm8maUTqFdEBL6pgwi1zcQ704g7Kzo/YUr&lt;br /&gt;
dHTA5yQ2WBBsrVKAZIt6Llkt0jIkpSyjjs5CAPJ2jsg61nq4uYw7w3jpwe80nbyc&lt;br /&gt;
7GgvdkJlTS7TfcYk3vlDQOQBpXqDZagQVUT8jc6mGiY/jbSzjGNt/8qObKSywFLY&lt;br /&gt;
XnxLVnGhKyzsWhR5fEbUCqywwc/c14gbjNguNZbU7e0Krf9ggYoglfPIOOp8XDX1&lt;br /&gt;
XwH+EXkSGW96dHXIYidONcMxClnA04zZY52Sr/r6Lw==&lt;br /&gt;
=UsaD&lt;br /&gt;
-----END PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
pub   rsa4096 2025-04-26 [SC]&lt;br /&gt;
	  13300901348A985115679165FB137A633FD1EB4C&lt;br /&gt;
uid           Tom Griffing (OSE PGP Key 4-25-2025) &amp;lt;REDACTED@tutanota.com&amp;gt;&lt;br /&gt;
sub   rsa4096 2025-04-26 [E]&lt;br /&gt;
user@ose:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I added Tom to the wazuh recipients, per https://wiki.opensourceecology.org/wiki/Wazuh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir -p /var/tmp/gpg&lt;br /&gt;
pushd /var/tmp/gpg&lt;br /&gt;
# write multi-line to file for documentation copy &amp;amp; paste&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; /var/tmp/gpg/tom.pubkey.asc&lt;br /&gt;
-----BEGIN PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
&lt;br /&gt;
mQINBGgMJ7ABEACwllLJu87blFKJ8aZMR7pCjRzhhp266Rjxz7071iow43a7FkvN&lt;br /&gt;
pcXmYsuwW4dLhqA+Sose7Fjo9o9+7bOLcBAso9x9hk55+pDQm67wyXmxp+7pWVhj&lt;br /&gt;
hdLBsdB4faLQDHkHymKUs/UKRViN0an/6nARxVyah58Dh/OcnSIv0bnozze8YRJX&lt;br /&gt;
aklCs+OF2Jv+gBH5VWNMLloX+l+MsBYj9N14MsMeWJ8lSNFWBl/SOBGuOftZbljp&lt;br /&gt;
qb8dBZRo/4OR/Dr5zCUQ1KuPu2wFKfMRwi3NEdmUKpFf/U7Ydn7ZK2T+ZKl+x1eb&lt;br /&gt;
+0I0ZM0DgaTYTqd82wlag1hfrYM7SONYb0C03x5T4y+CsG9IchgQ2yihYIKgHOIW&lt;br /&gt;
Wiz6vC4N4EKmuKAqCOGS/gzp7xDqzXl2R2sWHyRuOn3yUr2z9HdDk2sjnobtaVli&lt;br /&gt;
wYaIoes9zrBgunLoK9S0FaHzSPX0FGwygV50E73BFxJBmL6eHeRVuYOi0FkAQmsN&lt;br /&gt;
dJeOvpCwKgBModyPbxin78KKbgF/0OnxWL+Zde6+J5l+aW81xbwNZYuyxWHSb7m3&lt;br /&gt;
2RM4dXhxAWM2cBQ5+b5yKopO8T4OzKl5C/rYzhuEYqpSEQJccFNHmQexkwqACVNl&lt;br /&gt;
h/D97jm0580ctnGCZuNzmLlsXX2mzqOj6UU2LlUFy0HT5tr93KBA+HkGhwARAQAB&lt;br /&gt;
tEBUb20gR3JpZmZpbmcgKE9TRSBQR1AgS2V5IDQtMjUtMjAyNSkgPHRvbS5ncmlm&lt;br /&gt;
ZmluZ0B0dXRhbm90YS5jb20+iQJRBBMBCgA7FiEEEzAJATSKmFEVZ5Fl+xN6Yz/R&lt;br /&gt;
60wFAmgMJ7ACGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQ+xN6Yz/R&lt;br /&gt;
60xHURAAqIUawudDI3dmIVPa/RHTOusoJA4KIXLNCMiILWd3iwZQFQNrt6YHpwJU&lt;br /&gt;
pyvsXAM4QWd/qt0D9IF6K9waOIA5ipX0yXFVxZ0V1BQ6aq3cK1r+NvQUcLJzS02W&lt;br /&gt;
T9UIJtHOs+8EbIIS6ybcnxS6RARinrJpTkoCWspWXMDnXcX3n4pbbhHQLViswf1C&lt;br /&gt;
tOE7uSfNPcxGLK4cYLxLL1VHC45eB2CTEAxfXSavCPI62IcYkZBdwWz7E8q1QpsP&lt;br /&gt;
vxgxe31b+v9NcaxW5tc2/4NwaObqKSZYlhK/pce3X18+uWzpmE3ubhPb7Ptb5GLo&lt;br /&gt;
42U9ymRFg7a14VFfq+wcwSlZR01o7Q2FofAOFpX+EoDBkughAX6hWyYxErJ4vD7k&lt;br /&gt;
ogYX25J5suxrixkTzDMJ0cCsZyt/Bu0liVnojaETUhrNUwBp7Rz7xx5x6Go/sZHK&lt;br /&gt;
mzhCe1q4xwSHeTZTjyG3oby4KDPgb0WEKCdUpa5BobgT9goGGXjCxe9dS8ZVUu4I&lt;br /&gt;
bso+h/SK95nmgsl/EDrmDXvWOh/Zy76GixCq48ydEkGbVz/6ri1+pD0NXYN/ijAu&lt;br /&gt;
h6EsLnoBLQCLlYYsBTfg31X2Sbzigeloy6iRWoHtCOAfI2Azdhby+BCGuSIvUOXa&lt;br /&gt;
Q4CQjmjYpsx7nwtjWOgCZ4rObTekj4O9ZnI8Gtxfpzy1gFdyfw65Ag0EaAwnsAEQ&lt;br /&gt;
ANnD6PMPT0CU1RqbAQtVw7eJksV96+tl/xG8mtje631n2uBe9WzyLch0fgC99eID&lt;br /&gt;
ZDGXfJUEdODuI9/H8037PnJmmMtP2eP1c/ztrql6pxPj9c0jIRWjtwmNhyYNaaEn&lt;br /&gt;
i0JyLz5SiTbuftlHXaKhVTuLc/Qp44FH5XK6LVHphDR8Ck43Mhj7enfvGvmAUgLW&lt;br /&gt;
OLQMst84oOCywYX+nUmov2rCIhuc6RhX4OcOBZcEA2W/CSsoNXR4To9mn8Gg3/dH&lt;br /&gt;
ZKS/3sDwJQxjFvkqc89+aTPY85TBoUGBUzbQG+KFQgDyVt4kABK1iyUA1PKZOb4Q&lt;br /&gt;
MZJnR9g0UI/ctfrOpz4hhEFaQ+rEYwdm5MSXOQGfjrnGu3t85IQzmxUXovqmfsjn&lt;br /&gt;
oFPSPd/91/rJJKxci+rCX7CpQSObPrwHNgPNQ5zleDV7d9/u9UaGRFeOaaM+abd0&lt;br /&gt;
RhPh4nJWbDdNOWpj3pxJkG3tzmbazBogxTq0SDRP8wvBAD0JYESoPVGWQ6czlTnu&lt;br /&gt;
T0ov9QKMb21mfUQ6DmfxTFQbkr1g1r2uYfJ1TbP0AcAK+Q/IMtt8F7chulfAe7/0&lt;br /&gt;
9nk7HwqWHTkj8+YB9+Ro2hkUTpL57uEYdG/ukGODfTNhu02wxG02zlYFsTyd/H62&lt;br /&gt;
VIgT1Cpf5HBb73lzdiSVtl45C34Fwu8ZO6dBdmk2c1nFABEBAAGJAjYEGAEKACAW&lt;br /&gt;
IQQTMAkBNIqYURVnkWX7E3pjP9HrTAUCaAwnsAIbDAAKCRD7E3pjP9HrTNxGD/wN&lt;br /&gt;
syvVZxm4hyw4l8U6J3B/3rKAup+l7GQCXthNK+f3YPwWdWc8DOo3kBrP4ppR5Ry9&lt;br /&gt;
YKb700wBDAYwWfy+ZJPHMi0vVUf8kX2QQEj4sFZHj9suTFvfLdsLTAhNtRXVtZiu&lt;br /&gt;
xfr1T3R3T0XSSFFdhiBO+BYRnlgFRiiR9FCTDaxrLRfhAhOwC6LHOarHnRi5nQS8&lt;br /&gt;
2PaHIYbWN7c5CdpH9dsPUt3xi1sEf8E87HTZo30Of/FYtB4eTOdx2DMqKscbJvZS&lt;br /&gt;
1ugK+2v7DMaiBMZCfbZSVNjn8+VcTOPW5KzJFsVR7UmfvTZu6c3jrshHuPOSguT7&lt;br /&gt;
l63AcfrJZOJe+djndWws2u0FpyMu0AHoS2r3EtBd/OydjEKG2P7qFb3KX9I9Tv35&lt;br /&gt;
zQmpHc4e2TJTYKpXyfarzgKFuUfOmZpm8maUTqFdEBL6pgwi1zcQ704g7Kzo/YUr&lt;br /&gt;
dHTA5yQ2WBBsrVKAZIt6Llkt0jIkpSyjjs5CAPJ2jsg61nq4uYw7w3jpwe80nbyc&lt;br /&gt;
7GgvdkJlTS7TfcYk3vlDQOQBpXqDZagQVUT8jc6mGiY/jbSzjGNt/8qObKSywFLY&lt;br /&gt;
XnxLVnGhKyzsWhR5fEbUCqywwc/c14gbjNguNZbU7e0Krf9ggYoglfPIOOp8XDX1&lt;br /&gt;
XwH+EXkSGW96dHXIYidONcMxClnA04zZY52Sr/r6Lw==&lt;br /&gt;
=UsaD&lt;br /&gt;
-----END PGP PUBLIC KEY BLOCK-----&lt;br /&gt;
EOF&lt;br /&gt;
gpg --homedir /var/ossec/.gnupg --import /var/tmp/gpg/tom.pubkey.asc&lt;br /&gt;
popd&lt;br /&gt;
&lt;br /&gt;
# add marcin&#039;s email (that matches an email on a UID of his key above) to the space-delimited &amp;quot;recipients&amp;quot; variable&lt;br /&gt;
vim /var/ossec/sent_encrypted_alarm.settings&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I sent him an email asking him to confirm that it&#039;s working&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Tom,&lt;br /&gt;
&lt;br /&gt;
Can you please confirm that you&#039;re now receiving alerts from wazuh?&lt;br /&gt;
&lt;br /&gt;
Wazuh is our HIDS (Host-Based Intrusion Detection System). It&#039;s a fork of the HIDS and FIM (File Integrity Monitor) OSSEC. Because it sometimes sends sensitive information (eg diffs of config files with passwords), it&#039;s important that we encrypt its email notifications end-to-end with PGP.&lt;br /&gt;
&lt;br /&gt;
And because someone who compromises the server could &amp;quot;clean up&amp;quot; after themselves, these (off-server) alerts are critical to post-compromise investigations.&lt;br /&gt;
&lt;br /&gt;
For more info, see:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Wazuh&lt;br /&gt;
 * https://en.wikipedia.org/wiki/OSSEC&lt;br /&gt;
 * https://documentation.wazuh.com/current/getting-started/index.html&lt;br /&gt;
&lt;br /&gt;
Out-of-the-box, Wazuh has a ton of features, but probably where we use it the most is its ingestion of apache&#039;s mod_security WAF and its tie-in to Wazuh&#039;s Active Response. If an IP is found doing something bad (eg multiple consecutive 403 responses, such as a brute-force attack on wordpress [or ssh]), then the IP will get temp blocked by the firewall for 10 minutes. If it does it again shortly after the ban is lifted, it&#039;ll be banned for 12 hours. If again, 1 day. Then 2 days. Then 4 days. And the max ban for 5x repeat offenses is 8 days&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/OpenSourceEcology/ansible/blob/master/hetzner3/roles/maltfield.wazuh/templates/ossec.conf.j2#L256-L271&lt;br /&gt;
&lt;br /&gt;
It also has rootkit detection, and lots of other useful alerts that &amp;quot;just work&amp;quot; out of the box.&lt;br /&gt;
&lt;br /&gt;
Please confirm that you&#039;re now receiving encrypted wazuh alerts.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried to add Tom to our ops google groups email list, but it said I wasn&#039;t allowed to add members outside of our google workspace&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
An error occurred&lt;br /&gt;
1 user is outside of your organization. Based on your group or organization settings, you can only add organization users to this group. Contact your group owner or domain administrator for help.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I checked our user&#039;s group. it appears that Tom doesn&#039;t have an account @opensourceecology.org in gsuite&lt;br /&gt;
# I found the setting to change that here https://admin.google.com/ac/managedsettings/864450622151/GROUPS_SHARING_SETTINGS_TAB&lt;br /&gt;
## https://support.google.com/a/thread/63692725/&lt;br /&gt;
## https://support.google.com/a/answer/167097&lt;br /&gt;
# I checked the box that said &amp;quot;Group owners can allow external members&amp;quot;&lt;br /&gt;
## curiously the subline said &amp;quot;Organization admins can always add external members&amp;quot; – but I&#039;m a damn org admin, and I couldn&#039;t add him :/&lt;br /&gt;
# I tried to add him again, but I got the same error&lt;br /&gt;
# this time I went to the group settings https://groups.google.com/a/opensourceecology.org/g/REDACTED/settings&lt;br /&gt;
# I found the &amp;quot;allow external members&amp;quot; and changed it from &amp;quot;off&amp;quot; to &amp;quot;on&amp;quot; and clicked &amp;quot;save changes&amp;quot;&lt;br /&gt;
## this wasn&#039;t possible before. So first I had to change the workspace-wide settings to allow me to change the groups-specific settings. now it&#039;s changed.&lt;br /&gt;
# this time it worked.&lt;br /&gt;
# I sent an email to our ops google group, asking Tom to reply if he saw it&lt;br /&gt;
# ...&lt;br /&gt;
# I checked-in on hetzner2 to make sure it rebooted this morning&lt;br /&gt;
# looks like the cron is set to reboot at 10:40 UTC every day, and – indeed – uptime says it&#039;s been online for a bit less than 13 hours. And its last boot time was today at 10:41:25&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# uptime&lt;br /&gt;
 23:30:25 up 12:49,  7 users,  load average: 1.02, 0.98, 0.74&lt;br /&gt;
[root@opensourceecology ~]# journalctl | head&lt;br /&gt;
-- Logs begin at Sat 2025-04-26 10:41:25 UTC, end at Sat 2025-04-26 23:30:26 UTC. --&lt;br /&gt;
Apr 26 10:41:25 localhost systemd-journal[129]: Runtime journal is using 8.0M (max allowed 3.1G, trying to leave 4.0G free of 31.2G available → current limit 3.1G).&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpuset&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpu&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Initializing cgroup subsys cpuacct&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Linux version 3.10.0-1160.119.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Tue Jun 4 14:43:51 UTC 2024&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: e820: BIOS-provided physical RAM map:&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009c7ff] usable&lt;br /&gt;
Apr 26 10:41:25 localhost kernel: BIOS-e820: [mem 0x000000000009c800-0x000000000009ffff] reserved&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# cat /etc/cron.d/reboot &lt;br /&gt;
# 2025-04-24: temp hack for unstable hetzner2 while we build-out hetzner3 to replace it&lt;br /&gt;
40 10 * * * root /sbin/reboot&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Sat Apr 26 23:31:32 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like we&#039;ll have ~2 minutes of downtime every day in the very early morning in the US. I can live with that.&lt;br /&gt;
# and grub clearly is fixed&lt;br /&gt;
# oh, also the RAID looks healthy&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I asked Tom for his GitHub account profile username, so I can grant him write access to our OSE ansible repo&lt;br /&gt;
# I updated Tom&#039;s new ssh key to his authorized_keys file on hetzner2&lt;br /&gt;
# I sent Tom an email asking to confirm his access to hetzner2&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 25, 2025=&lt;br /&gt;
# I woke up this morning and discovered the wiki was offline&lt;br /&gt;
# I tried to ssh into the server; it&#039;s not responding&lt;br /&gt;
# I figured I&#039;d log into the hetzner wui, but – uhh – the credentials are in keepass and live on the server&lt;br /&gt;
# I mitigated this by giving Marcin a copy of the keepass file on his veracrypt drive, but he since changed the password a month or two ago, and we don&#039;t have a new local copy&lt;br /&gt;
# I sent an email to Marcin asking him to login to hetzner wui and boot hetzner2. if it doesn&#039;t come-up, then I&#039;ll have to get the password from him so I can load it in the wui from a rescue disk&lt;br /&gt;
# oh, I did find the new hetzner password in my personal keepass&lt;br /&gt;
# I logged-in, and I found the server was listed as being on. But I can&#039;t ping it. I gave it an &amp;quot;automatic hardware reset&amp;quot; from the wui&lt;br /&gt;
# I&#039;ll give it a few minutes before trying the rescue system&lt;br /&gt;
# their rescue systems are much nicer for their cloud product than their dedicated server product&lt;br /&gt;
# it looks like I have two options&lt;br /&gt;
## rescue boot mode: where I&#039;m given ssh access&lt;br /&gt;
## vnc&lt;br /&gt;
# the problem with the rescue boot is that – if this is a grub issue – I wouldn&#039;t be able to &amp;quot;see&amp;quot; the error&lt;br /&gt;
# I enabled VNC and gave the server a reboot&lt;br /&gt;
# I was able to connect via vnc, but it was the damn installation wizard for almalinux. I quit the installation, and the vnc session died.&lt;br /&gt;
# damn, I guess vnc won&#039;t let me see the boot process, after all&lt;br /&gt;
# instead I tried the &amp;quot;rescue system&amp;quot;&lt;br /&gt;
# that didn&#039;t work; I can&#039;t access ssh on either of the IP addresses&lt;br /&gt;
# the docs say to activate the rescue system and then reboot it; that&#039;s what I did https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system/&lt;br /&gt;
# this time I fully shut down the server, and then I enabled the rescue system (while it&#039;s off)&lt;br /&gt;
# I went back to the Reset tab, and it&#039;s still off. So I booted it&lt;br /&gt;
# somehow I was able to login from my ose vm using my personal ssh key, but with user root&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~$ ssh -v root@138.201.84.223&lt;br /&gt;
OpenSSH_9.2p1 Debian-2+deb12u5, OpenSSL 3.0.15 3 Sep 2024&lt;br /&gt;
debug1: Reading configuration data /home/user/.ssh/config&lt;br /&gt;
debug1: Reading configuration data /etc/ssh/ssh_config&lt;br /&gt;
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files&lt;br /&gt;
debug1: /etc/ssh/ssh_config line 21: Applying options for *&lt;br /&gt;
debug1: Connecting to 138.201.84.223 [138.201.84.223] port 22.&lt;br /&gt;
debug1: Connection established.&lt;br /&gt;
...&lt;br /&gt;
Linux rescue 6.12.19 #1 SMP Fri Mar 14 05:34:52 UTC 2025 x86_64&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
  Welcome to the Hetzner Rescue System.&lt;br /&gt;
&lt;br /&gt;
  This Rescue System is based on Debian GNU/Linux 12 (bookworm) with a custom kernel.&lt;br /&gt;
  You can install software like you would in a normal system.&lt;br /&gt;
&lt;br /&gt;
  To install a new operating system from one of our prebuilt images, run &#039;installimage&#039; and follow the instructions.&lt;br /&gt;
&lt;br /&gt;
  Important note: Any data that was not written to the disks will be lost during a reboot.&lt;br /&gt;
&lt;br /&gt;
  For additional information, check the following resources:&lt;br /&gt;
	Rescue System:           https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system&lt;br /&gt;
	Installimage:            https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage&lt;br /&gt;
	Install custom software: https://docs.hetzner.com/robot/dedicated-server/operating-systems/installing-custom-images&lt;br /&gt;
	other articles:          https://docs.hetzner.com/robot&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Rescue System (via Legacy/CSM) up since 2025-04-25 17:24 +02:00&lt;br /&gt;
&lt;br /&gt;
Hardware data:&lt;br /&gt;
&lt;br /&gt;
   CPU1: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Cores 8)&lt;br /&gt;
   Memory:  64153 MB (Non-ECC)&lt;br /&gt;
   Disk /dev/sda: 250 GB (=&amp;gt; 232 GiB) &lt;br /&gt;
   Disk /dev/sdb: 512 GB (=&amp;gt; 476 GiB) &lt;br /&gt;
   Total capacity 709 GiB with 2 Disks&lt;br /&gt;
&lt;br /&gt;
Network data:&lt;br /&gt;
   eth0  LINK: yes&lt;br /&gt;
		 MAC:  90:1b:0e:94:07:c4&lt;br /&gt;
		 IP:   138.201.84.223&lt;br /&gt;
		 IPv6: 2a01:4f8:172:209e::2/64&lt;br /&gt;
		 Intel(R) PRO/1000 Network Driver&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I was able to mount the root drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 0/2 pages [0KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
root@rescue ~ # mount /dev/md2 /mnt&lt;br /&gt;
root@rescue ~ # ls /mnt&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ # ls /mnt/home&lt;br /&gt;
b2user  crupp  hart     lberezhny  marcin      stagingsync  wp&lt;br /&gt;
cmota   Flipo  jthomas  maltfield  not-apache  tgriffing&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what the point of this is; I can&#039;t fix it if I can&#039;t watch it boot and see what&#039;s breaking&lt;br /&gt;
# ok, at the bottom of the docs, hetnzer lists another option = xKVM Rescue System https://docs.hetzner.com/robot/dedicated-server/virtualization/vkvm/&lt;br /&gt;
# it specifically says that&#039;s for debugging boot issues&lt;br /&gt;
# last thing before I try that: I downloaded a local copy of the keepass files from hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~/tmp/hetzner2$ rsync -av --progress root@138.201.84.223:/mnt/etc/keepass ./etc-keepass-20250525&lt;br /&gt;
receiving incremental file list&lt;br /&gt;
created directory ./etc-keepass-20250525&lt;br /&gt;
keepass/&lt;br /&gt;
keepass/passwords.kdbx&lt;br /&gt;
		 46,142 100%   44.00MB/s    0:00:00 (xfr#1, to-chk=6/8)&lt;br /&gt;
keepass/passwords.kdbx.20170728.bak&lt;br /&gt;
		  4,590 100%    4.38MB/s    0:00:00 (xfr#2, to-chk=5/8)&lt;br /&gt;
keepass/passwords.kdbx.20170804.bak&lt;br /&gt;
		  4,590 100%    4.38MB/s    0:00:00 (xfr#3, to-chk=4/8)&lt;br /&gt;
keepass/passwords.kdbx.20190820.bak&lt;br /&gt;
		 33,726 100%  143.20kB/s    0:00:00 (xfr#4, to-chk=3/8)&lt;br /&gt;
keepass/passwords.kdbx.20190909.bak&lt;br /&gt;
		 34,238 100%   71.75kB/s    0:00:00 (xfr#5, to-chk=2/8)&lt;br /&gt;
keepass/passwords.kdbx.20250316.bak&lt;br /&gt;
		 45,406 100%   94.55kB/s    0:00:00 (xfr#6, to-chk=1/8)&lt;br /&gt;
keepass/passwords.kdbxs.20180525.bak&lt;br /&gt;
		 27,102 100%   56.31kB/s    0:00:00 (xfr#7, to-chk=0/8)&lt;br /&gt;
&lt;br /&gt;
sent 161 bytes  received 196,407 bytes  35,739.64 bytes/sec&lt;br /&gt;
total size is 195,794  speedup is 1.00&lt;br /&gt;
user@ose:~/tmp/hetzner2$ &lt;br /&gt;
&lt;br /&gt;
user@ose:~/tmp/hetzner2$ du -sh etc-keepass-20250525/keepass/*&lt;br /&gt;
48K	etc-keepass-20250525/keepass/passwords.kdbx&lt;br /&gt;
8.0K	etc-keepass-20250525/keepass/passwords.kdbx.20170728.bak&lt;br /&gt;
8.0K	etc-keepass-20250525/keepass/passwords.kdbx.20170804.bak&lt;br /&gt;
36K	etc-keepass-20250525/keepass/passwords.kdbx.20190820.bak&lt;br /&gt;
36K	etc-keepass-20250525/keepass/passwords.kdbx.20190909.bak&lt;br /&gt;
48K	etc-keepass-20250525/keepass/passwords.kdbx.20250316.bak&lt;br /&gt;
28K	etc-keepass-20250525/keepass/passwords.kdbxs.20180525.bak&lt;br /&gt;
user@ose:~/tmp/hetzner2$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so this time was the same as the rescue system, except I choose &amp;quot;xKVM&amp;quot; instead of &amp;quot;Linux&amp;quot; in the &amp;quot;Operationg System&amp;quot; dropdown&lt;br /&gt;
# strange, it gave me an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Public key authentication is not available for the selected operating system.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I unselected my ssh key, and chose &amp;quot;no key&amp;quot; instead&lt;br /&gt;
# it gave me a URL and a password. I booted the server, but the URL didn&#039;t load (&amp;quot;Unable to connect&amp;quot; error)&lt;br /&gt;
# ok, it took a few minutes and had a self-signed cert&lt;br /&gt;
# I bypassed the cert error, and entered the username and password into the basic auth popup. It failed! Could I really have been MITM&#039;d?&lt;br /&gt;
# I immediately shut down the server from the wui, and I tried again.&lt;br /&gt;
# this time I was able to login – both from ssh and in the wui.&lt;br /&gt;
# as soon as it opened, I saw the error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
No more network devices&lt;br /&gt;
&lt;br /&gt;
Booting from Hard Disk...&lt;br /&gt;
.&lt;br /&gt;
error: symbol &#039;grub_calloc&#039; not found.&lt;br /&gt;
Entering rescue mode...&lt;br /&gt;
grub rescue&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I wonder if this is grub or grub2. I didn&#039;t have a binary &amp;quot;grub-install&amp;quot; before. I assumed it was an error with the hetzner docs when I did &amp;quot;grub2-install&amp;quot; instead, which said it worked (there was a warning that the docs said were safe to ignore)&lt;br /&gt;
# curoiusly, the opposite is true for the ssh session in vkvm: I have grub-install but not grub2-install&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@vKVM-rescue ~ # which grub-install&lt;br /&gt;
/usr/sbin/grub-install&lt;br /&gt;
root@vKVM-rescue ~ # &lt;br /&gt;
root@vKVM-rescue ~ # which grub2-install&lt;br /&gt;
root@vKVM-rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the docs in question https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
# I don&#039;t want to fuck with the grub without first taking a backup of these disks. But, uh, it looks like I can&#039;t access the RAID from inside this vkvm setup&lt;br /&gt;
# yeah, that&#039;s one of the limitations listed for VKVM https://docs.hetzner.com/robot/dedicated-server/virtualization/vkvm/#raid-controllers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Configured units are passed through as SCSI devices to the VM. However it is not possible to access the controller. Please use the regular Hetzner Rescue System for this purpose.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I shutdown VKVM and booted it into the regular rescue mode&lt;br /&gt;
# it took a few minutes to get back into the old rescue system, but here I can use the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS&lt;br /&gt;
loop0     7:0    0   3.4G  1 loop  &lt;br /&gt;
sda       8:0    0 476.9G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 &lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 &lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 &lt;br /&gt;
sdb       8:16   0 232.9G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 &lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 &lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 &lt;br /&gt;
root@rescue ~ # mkdir /mnt/md1&lt;br /&gt;
root@rescue ~ # mkdir /mnt/md2&lt;br /&gt;
root@rescue ~ # mount /dev/md1 /mnt/md1&lt;br /&gt;
root@rescue ~ # mount /dev/md2 /mnt/md2&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created a dir for these backups&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # mkdir /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # chown root:root /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # chmod 0700 /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first I made a backup from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # rsync -av --progress /mnt/md1 /mnt/md2/var/tmp/20250425-grub-fail/md1.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
...&lt;br /&gt;
md1/grub2/locale/zh_TW.mo&lt;br /&gt;
		 30,882 100%   31.38kB/s    0:00:00 (xfr#345, to-chk=0/355)&lt;br /&gt;
md1/lost+found/&lt;br /&gt;
&lt;br /&gt;
sent 399,450,301 bytes  received 6,709 bytes  159,782,804.00 bytes/sec&lt;br /&gt;
total size is 399,330,989  speedup is 1.00&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# then I figured I&#039;d make a backup of the two disk partitions directly, but I couldn&#039;t even mount it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # umount /mnt/md1&lt;br /&gt;
root@rescue ~ # mkdir /mnt/sda2&lt;br /&gt;
root@rescue ~ # mkdir /mnt/sdb2&lt;br /&gt;
root@rescue ~ # mount /dev/sda2 /mnt/sda2&lt;br /&gt;
mount: /mnt/sda2: unknown filesystem type &#039;linux_raid_member&#039;.&lt;br /&gt;
	   dmesg(1) may have more information after failed mount system call.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried this command (from the docs), which I skipped before because it said that the next command (grub-install) was enough; sure enough, it didn&#039;t work https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-mkdevicemap -n&lt;br /&gt;
grub-mkdevicemap: error: cannot open /boot/grub/device.map.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I investigated this before, and I thought I decided we&#039;re using grub2, not grub1&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # mount /dev/md1 /mnt/md1&lt;br /&gt;
root@rescue ~ # ls /mnt/md1/&lt;br /&gt;
config-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi&lt;br /&gt;
grub&lt;br /&gt;
grub2&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-327.18.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-514.26.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64kdump.img&lt;br /&gt;
initrd-plymouth.img&lt;br /&gt;
lost+found&lt;br /&gt;
symvers-3.10.0-1127.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-1160.119.1.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-327.18.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-514.26.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, shit, even the grub-install command is v2 https://askubuntu.com/questions/107486/how-to-know-the-version-of-grub&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-install --version&lt;br /&gt;
grub-install (GRUB) 2.06-13+deb12u1&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, this indicates we&#039;re not using lilo https://askubuntu.com/questions/24459/how-do-i-find-out-which-boot-loader-i-have&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2/etc/ | grep lilo&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can dd straight from the disk to read the MBR. And, yeah, it appears we are using grub via MBR .. and this info is stored on the disks, not the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # dd if=/dev/md1 bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # dd if=/dev/sda bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
214fb5736d1e5ad63e515dc2fffe44bd928cd8dab2c019dc11fb9fcaef5ea90dbf51f1ac507ab1cfbbe74ff&lt;br /&gt;
ZRr=&lt;br /&gt;
`|f	&lt;br /&gt;
\|f1&lt;br /&gt;
GRUB &lt;br /&gt;
Geom&lt;br /&gt;
Hard Disk&lt;br /&gt;
Read&lt;br /&gt;
 Error&lt;br /&gt;
DA/jjF&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # dd if=/dev/sdb bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
ZRr=&lt;br /&gt;
`|f	&lt;br /&gt;
\|f1&lt;br /&gt;
GRUB &lt;br /&gt;
Geom&lt;br /&gt;
Hard Disk&lt;br /&gt;
Read&lt;br /&gt;
 Error&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# idk what to do; I tried the grub-install again, but it gives me this error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-install /dev/sda&lt;br /&gt;
grub-install: error: /usr/lib/grub/i386-pc/modinfo.sh doesn&#039;t exist. Please specify --target or --directory.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # grub-install /dev/sdb&lt;br /&gt;
grub-install: error: /usr/lib/grub/i386-pc/modinfo.sh doesn&#039;t exist. Please specify --target or --directory.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried creating a chroot of our real raid disks first&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ # umount /mnt/md1&lt;br /&gt;
root@rescue ~ # chroot-prepare /mnt/md2&lt;br /&gt;
root@rescue ~ # chroot /mnt/md2&lt;br /&gt;
root@rescue / # ls /boot&lt;br /&gt;
root@rescue / # mount /dev/md1 /boot&lt;br /&gt;
root@rescue / # ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi&lt;br /&gt;
grub&lt;br /&gt;
grub2&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-327.18.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-514.26.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64kdump.img&lt;br /&gt;
initrd-plymouth.img&lt;br /&gt;
lost+found&lt;br /&gt;
symvers-3.10.0-1127.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-1160.119.1.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-327.18.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-514.26.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
root@rescue / # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I then tried the grub install again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue / # grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
root@rescue / #&lt;br /&gt;
&lt;br /&gt;
root@rescue / # grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
root@rescue / # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I exited the chroot and shutdown the rescue system&lt;br /&gt;
# I activated the VKVM resuce system, and booted it again&lt;br /&gt;
# when I connected to the KVM wui, I was shown a password prompt. So I think booting works!&lt;br /&gt;
# I rebooted it from the ssh&lt;br /&gt;
# and now I can ssh into the real system&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~$ autossh opensourceecology.org&lt;br /&gt;
Last login: Thu Apr 24 23:12:44 2025 from 146.70.199.15&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and now the wiki loads too&lt;br /&gt;
# I did another reboot test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ sudo su -&lt;br /&gt;
[sudo] password for maltfield: &lt;br /&gt;
Last login: Thu Apr 24 16:25:15 UTC 2025 on pts/0&lt;br /&gt;
[root@opensourceecology ~]# reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
Last login: Fri Apr 25 16:29:21 2025 from 185.204.1.184&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# idk, my takeaway is that either one or some of these assumptions are correct&lt;br /&gt;
## grub-install needs to be run *after* the RAID sync is finished&lt;br /&gt;
## grub-install needs to be run on *both* the new *and* the old disk&lt;br /&gt;
## grub-install needs to be run inside a chroot on the rescue system&lt;br /&gt;
# anyway, we&#039;re stable again&lt;br /&gt;
# I got an email from Marcin saying Tom could help with the migrations. I sent him some wiki articles to get caught-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Tom,&lt;br /&gt;
&lt;br /&gt;
I&#039;ll try to get you ssh access on hetzner2 soon. In the meantime, please read the following articles:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Hetzner2&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
&lt;br /&gt;
I&#039;ve started preparing draft &amp;quot;change tickets&amp;quot; for migrating each of the websites from hetzner2 to hetzner3. Note that some of these are not fully tested, so you&#039;ll want to execute them manually and make corrections as-needed.&lt;br /&gt;
&lt;br /&gt;
Please also read-through these:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_store_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_microfactory_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_fef&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_oswh&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_phplist_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_wiki_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
(There&#039;s also one CHG for the forum that I think needs to be made)&lt;br /&gt;
&lt;br /&gt;
The next item TODO is to finish the migration plan for these websites:&lt;br /&gt;
&lt;br /&gt;
 1. www.opensourceecology.org (osemain)&lt;br /&gt;
 2. www.openbuildinginstiture.org (obi)&lt;br /&gt;
&lt;br /&gt;
We decided that there would be 2 simultaneous versions of obi:&lt;br /&gt;
&lt;br /&gt;
1. A static site scraped with curl on hetzner3&lt;br /&gt;
2. The (broken) dynamic wordpress site on hetzner3&lt;br /&gt;
&lt;br /&gt;
And we decided that there would be 3 simultaneous versions of osemain:&lt;br /&gt;
&lt;br /&gt;
1. The live/current site on hetzner2&lt;br /&gt;
2. A static site scraped with curl on hetzner3&lt;br /&gt;
3. The (broken) dynamic wordpress site on hetzner3&lt;br /&gt;
&lt;br /&gt;
To have multiple sites with the same domain on the same server, we bought a second IPv4 address (FeF isn&#039;t setup with IPv6). This week I just finished updating the hetzer3 server to persist this new IPv4 address.&lt;br /&gt;
&lt;br /&gt;
The next item for you would be to update our ansible to push out new vhosts (in nginx, varnish, and apache) for the static sites that are bound to the second IPv4 address using the same hostname.&lt;br /&gt;
&lt;br /&gt;
Please read-through the ansible playbook and roles (most importantly for nginx, varnish, and apache) to understand how they&#039;re provisioned&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/OpenSourceEcology/ansible&lt;br /&gt;
&lt;br /&gt;
Since you have access to hetzner3, you can also poke around (read-only please) the configs for these three web services to understand how ansible provisions them.&lt;br /&gt;
&lt;br /&gt;
Once you&#039;ve updated and pushed-out the new vhosts with ansible, you&#039;ll need to update the migration plan&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_obi_to_hetzner3&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
And then you&#039;ll want to go-through each migration plan to create a temp &amp;quot;snapshot&amp;quot; of all the sites on hetzner3, where Marcin &amp;amp; Catarina can do a thorough verification of each site (by updating /etc/hosts) before we do the *real* migration -- which is nearly the same as the &amp;quot;snapshot&amp;quot; except we actually migrate DNS.&lt;br /&gt;
&lt;br /&gt;
Please let me know when you&#039;ve finished reading the above articles.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/24/25 22:16, REDACTED@tutanota.com wrote:&lt;br /&gt;
&amp;gt; Michael;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I need to reset my ssh key on hetzner2. Can you use the same as on 3 or best to generate a new one?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I spoke with Marcin and I think I can help with the admin, as I have time available.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Can you give a run-down of its status and what needs to be done for completing the migration to hetzner3?&lt;br /&gt;
&amp;gt; -- &lt;br /&gt;
&amp;gt; Tom Griffing&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 24, 2025=&lt;br /&gt;
# it&#039;s 05:00; I tried to login to the wiki, but I got an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, under that it says I&#039;m already logged-in?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
You are already logged in as Maltfield. Use the form below to log in as another user. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, let&#039;s start the CHG to replace the failing disk on hetzner 2 https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
# I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried to remove the first partition from the RAID, but it said I can&#039;t?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# apparently the docs say that if the RAID is healthy, you have to force it with &#039;--fail&#039; https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
# crap, I realized I have an issue in my CHG (we need two sysadmins for peer review *sigh*)&lt;br /&gt;
## I listed this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## but it should be this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, it looks like I first need to execute this, to force the RAID into a failure state&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I was able to remove it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# by 10:32 UTC, I submitted the request to hetzner to replace /dev/sdb = &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# it says they should do it within 2-4 hours&lt;br /&gt;
# meanwhile, I updated https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# at 08:00 my time, I checked and saw that we had an email come from hetzner at 06:36 (my time)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, crap. I tried to load the wiki CHG article, but there&#039;s an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry! This site is experiencing technical difficulties.&lt;br /&gt;
&lt;br /&gt;
Try waiting a few minutes and reloading.&lt;br /&gt;
&lt;br /&gt;
(Cannot access the database)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the server wasn&#039;t shutdown, and my screen session is still intact, but dmesg is being flooded with RAID and io errors&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
[11136.011313] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.011372] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11136.319267] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.319322] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.827642] EXT4-fs error: 5 callbacks suppressed&lt;br /&gt;
[11138.827693] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
[11138.827793] EXT4-fs: 5 callbacks suppressed&lt;br /&gt;
[11138.827841] EXT4-fs (md2): previous I/O error to superblock detected&lt;br /&gt;
[11138.835255] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835311] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835367] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11138.835472] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well anyway, I&#039;ll see if I can at least restart the RAID sync and install grub on the new disk&lt;br /&gt;
# son of a bitch, they removed the wrong drive!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 13:05:32 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT&lt;br /&gt;
sdb      8:16   0   477G  0 disk &lt;br /&gt;
sdc      8:32   0 232.9G  0 disk &lt;br /&gt;
├─sdc1   8:33   0    32G  0 part &lt;br /&gt;
├─sdc2   8:34   0   512M  0 part &lt;br /&gt;
└─sdc3   8:35   0 200.4G  0 part &lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
device node not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it shows a new drive (sdc) and and old drive (sdb)&lt;br /&gt;
# ugh, so now we have nothing in the raid?&lt;br /&gt;
# here&#039;s the new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdc | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# christ, so this new disk is half the size of our actual disk? what did they do?!?&lt;br /&gt;
# and now we have a prod server online with no redundancy. I can&#039;t tell them to put back-in the *correct* disk, or we&#039;ll have data loss&lt;br /&gt;
# I&#039;m going to stop all the web services before this disaster gets any worse&lt;br /&gt;
# great; io errors. this is a damn disaster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
Failed to stop apache2.service: Unit apache2.service not loaded.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and made partition backups, anyway&lt;br /&gt;
# wait, actually, it said that /dev/sdc = Crucial_CT250MX200SSD1_154410FA336C. That&#039;s our old /dev/sda&lt;br /&gt;
# so they *did* remove the right drive, but the re-insertion of the wrong drive pushed /dev/sda to /dev/sdc. That kinda breaks our ability to map the RAID, but let&#039;s at-least partition this new drive&lt;br /&gt;
# but this new drive isn&#039;t the right size. it&#039;s 512G while our old disk was 250G. I guess it&#039;s better to have too-big of a disk than too-small of a disk, but we won&#039;t be able to use that extra disk space. I&#039;m going to assume that they just didn&#039;t have 250G disks in-stock anymore.&lt;br /&gt;
# anyway, I tried to backup the partitions, but that wouldn&#039;t work since we&#039;re read-only&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
mkdir: cannot create directory ‘/var/tmp/chg.20250424_132010’: Read-only file system&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
chown: cannot access ‘/var/tmp/chg.20250424_132010’: No such file or directory&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what to do besides giving it a reboot, but that scares me&lt;br /&gt;
# I&#039;d like to take a backup, but I can&#039;t if I get read-only errors :(&lt;br /&gt;
# well, I guess that&#039;s why we made a backup before this. I don&#039;t think I have any option other than to reboot. and pray that grub is intact to bring it back.&lt;br /&gt;
# I gave it a reboot. If it doesn&#039;t come back, I&#039;ll try to boot to the rescue CD from within the hetzner wui&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date &amp;amp;&amp;amp; reboot&lt;br /&gt;
Thu Apr 24 13:24:18 UTC 2025&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&lt;br /&gt;
Failed to start reboot.target: Unit is not loaded properly: Input/output error.&lt;br /&gt;
See system logs and &#039;systemctl status reboot.target&#039; for details.&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# wtf, it can&#039;t even reboot it&#039;s so broken.&lt;br /&gt;
# I triggered a rest on the hetzner wui&lt;br /&gt;
# the server came back, and I immediately shutdown all services again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and triggered backups&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /etc/cron.d/backup_to_backblaze &lt;br /&gt;
20 07 * * * root time /bin/nice /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
20 04 03 * * root time /bin/nice /root/backups/backupReport.sh&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# time /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, sdc is gone. we have sda and sdb again, and sda is our original sda – as we wanted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I made a backup of the partitions; it&#039;s not surprising the sdb file is empty&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# pushd $chg_dir&lt;br /&gt;
/var/tmp/chg.20250424_133230 ~&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# du -sh ${chg_dir}/*&lt;br /&gt;
4.0K    /var/tmp/chg.20250424_133230/sda_parttable_mbr.bak&lt;br /&gt;
0       /var/tmp/chg.20250424_133230/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I copied the partition from sda to sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
Checking that no-one is using this disk right now ...&lt;br /&gt;
OK&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 62260 cylinders, 255 heads, 63 sectors/track&lt;br /&gt;
sfdisk:  /dev/sdb: unrecognized partition table type&lt;br /&gt;
&lt;br /&gt;
Old situation:&lt;br /&gt;
sfdisk: No partitions found&lt;br /&gt;
&lt;br /&gt;
New situation:&lt;br /&gt;
Units: sectors of 512 bytes, counting from 0&lt;br /&gt;
&lt;br /&gt;
   Device Boot    Start       End   #sectors  Id  System&lt;br /&gt;
/dev/sdb1          2048  67110912   67108865  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2      67112960  68161536    1048577  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3      68163584 488395120  420231537  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb4             0         -          0   0  Empty&lt;br /&gt;
Warning: partition 1 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not end at a cylinder boundary&lt;br /&gt;
Warning: no primary partition is marked bootable (active)&lt;br /&gt;
This does not matter for LILO, but the DOS MBR will not boot this disk.&lt;br /&gt;
Successfully wrote the new partition table&lt;br /&gt;
&lt;br /&gt;
Re-reading the partition table ...&lt;br /&gt;
&lt;br /&gt;
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)&lt;br /&gt;
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1&lt;br /&gt;
(See fdisk(8).)&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good, other than the complaint about not being able to boot from this disk; I&#039;ll check later what is LILO and if this will matter for raid grub&lt;br /&gt;
# I reloaded the partition table for this disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# blockdev --rereadpt /dev/sdb&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I added the new disk to the RAID, and it shows that it&#039;s starting to sync now. excellent&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm: added /dev/sdb1&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm: added /dev/sdb2&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
mdadm: added /dev/sdb3&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [&amp;gt;....................]  recovery =  0.0% (19712/33521664) finish=481.1min speed=1159K/sec&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, it looks like it&#039;s not syncing each partition of the RAID at the same time. it&#039;s doing md0 now and then it&#039;ll do the others after, I guess&lt;br /&gt;
# md0 is partition 1 (sda1/sdb1). That&#039;s *sigh* swap. It&#039;s 32GB.&lt;br /&gt;
# I kinda wish we&#039;d sync&#039;d /boot first. I don&#039;t think I can install grub until that&#039;s sync&#039;d. maybe?&lt;br /&gt;
# it says it&#039;s moving about 1024K/s. That&#039;s 1 MB per sec. 32G*1024 = 32,768 MB. That&#039;s 32,768 seconds / 60 = 546 minutes / 60 = 9 hours. Just for swap!&lt;br /&gt;
# assuming we have the same speed for the rest of the disk, that&#039;s 250 G * 1024 = 256,000 MB / 1 MB/s = 256,000 seconds. 256,000 seconds / 60 = 4,266.666666667 minutes / 60 = 4,266.666666667 = 71.11 hours. I guess we just have to accept the risk and hope that old /dev/sda with all our data doesn&#039;t fail within then next 3 days.&lt;br /&gt;
# I tried to go ahead and install grub on the new disk, but i got a command not found error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub-install /dev/sdb&lt;br /&gt;
-bash: grub-install: command not found&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub&lt;br /&gt;
grub2-bios-setup           grub2-glue-efi             grub2-mkconfig             grub2-mkpasswd-pbkdf2      grub2-probe                grub2-set-default&lt;br /&gt;
grub2-editenv              grub2-install              grub2-mkfont               grub2-mkrelpath            grub2-reboot               grub2-setpassword&lt;br /&gt;
grub2-file                 grub2-kbdcomp              grub2-mkimage              grub2-mkrescue             grub2-render-label         grub2-sparc64-setup&lt;br /&gt;
grub2-fstest               grub2-macbless             grub2-mklayout             grub2-mkstandalone         grub2-rpm-sort             grub2-syslinux2cfg&lt;br /&gt;
grub2-get-kernel-settings  grub2-menulst2cfg          grub2-mknetdir             grub2-ofpathname           grub2-script-check         grubby&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it should be &#039;grub2-install&#039; I tried that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that&#039;s two warnings but no errors; I&#039;ll take it.&lt;br /&gt;
# we&#039;re up to 12.4% on the RAID sync of swap. It&#039;s now going &amp;gt;50x faster than it was before; good news&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [==&amp;gt;..................]  recovery = 12.4% (4168832/33521664) finish=8.2min speed=59264K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# calculations at that speed would be 250*1024/58 = 4,413.793103448 seconds / 60 = 73 minutes. Oh, that&#039;s just over an hour.&lt;br /&gt;
# and now we&#039;re at 42.7%&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [========&amp;gt;............]  recovery = 42.7% (14334208/33521664) finish=6.6min speed=47845K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups are still running; I&#039;ll let them finish before starting-up the webservers again&lt;br /&gt;
# I wrote a status email to Marcin&lt;br /&gt;
# the backups still aren&#039;t finished&lt;br /&gt;
# I checked on the raid replication, and it shows md0 (swap) and md1 (boot) are both done. Horray! Now we just need to finish root (/), which is 9.8% done and going at 60 MB/s. Great!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:05:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [=&amp;gt;...................]  recovery =  9.8% (20767872/209984640) finish=50.5min speed=62429K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave the grub install a double-tap now that it&#039;s synced with the first disk; the output was the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the output of lsblk looks much nicer now, too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0 232.9G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups say they&#039;re 9% uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/backups/backup.log&lt;br /&gt;
...&lt;br /&gt;
2025/04/24 14:13:48 INFO  :&lt;br /&gt;
Transferred:        2.210G / 20.472 GBytes, 11%, 2.904 MBytes/s, ETA 1h47m20s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:      13m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg: 10% /20.472G, 2.997M/s, 1h43m59s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I decided to just kill the backup script and manually upload it without the bwlimit, so it&#039;ll go-out faster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# /bin/sudo -u b2user /bin/rclone -v copy /home/b2user/sync/daily_hetzner2_20250424_133017.tar.gpg b2:ose-server-backups&lt;br /&gt;
2025/04/24 14:15:20 INFO  :&lt;br /&gt;
Transferred:      116.500M / 20.472 GBytes, 1%, 1.958 MBytes/s, ETA 2h57m25s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:       1m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg:  0% /20.472G, 5.065M/s, 1h8m35s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# meanwhile we&#039;re at 24% on the RAID sync&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [====&amp;gt;................]  recovery = 23.9% (50200448/209984640) finish=101.1min speed=26325K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, important to note: our new disk doesn&#039;t say that it&#039;s failing :D&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# while the old disk says it&#039;s reached 100% of its lifecycle, the new disk says it&#039;s at – uhh – 96% of it&#039;s life? That doesn&#039;t sound very good :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78516&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       50&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3445&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       47&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   060   046   000    Old_age   Always       -       40 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       407132499909&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12839097351&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26313144762&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52083&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       33&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1449&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       20&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   061   049   000    Old_age   Always       -       39 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       600236629947&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18860233219&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11828985935&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Shame. I was hoping for at least something &amp;lt;50%. Well, I wonder how long that remaining 4% will last us :/&lt;br /&gt;
# ok, backups just finished; let&#039;s start the web services&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl start mariadb&lt;br /&gt;
[root@opensourceecology ~]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the wiki CHG with a status https://wiki.opensourceecology.org/wiki/Category:CHGs&lt;br /&gt;
# And I sent an email to Marcin recommending that he replace /dev/sda with an actual new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&lt;br /&gt;
Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&lt;br /&gt;
I was a bit disappointed to learn that hetzner replaced a disk with 0% &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for choosing the free disk replacement..&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&lt;br /&gt;
Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on replacing that one next week too, but I would recommend that you pay for a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&lt;br /&gt;
Do you authorize me selecting €41.18 for the replacement of /dev/sda on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# from the output above, our old drive said it had &amp;quot;Power_On_Hours&amp;quot; of 78516/24/365 = 8.96 years&lt;br /&gt;
# and our new drive says Power_On_Hours = 52083/24/365 = 5.95 years. Well that&#039;s better, I guess.&lt;br /&gt;
# oh wow, the power cycle count is crazy; our disk we only rebooted 50 times and the new one was only 33 times.&lt;br /&gt;
# also the SMART data for both of these drives has different keys (not just values). apparently it&#039;s very vendor-specific, so some of these comparisons are apples-to-oranges&lt;br /&gt;
# right, we&#039;re at 69.7% replication on root. I&#039;m going to go make breakfast and check-in again after&lt;br /&gt;
# ...&lt;br /&gt;
# over lunch, I realized that Marcin&#039;s last email was possibly hyperbolic panic&lt;br /&gt;
# he&#039;s worried that he just kicked-off a marketing campaign (for the apprenticeship), which now links to information on a broken website – where potential applicants can&#039;t read the info&lt;br /&gt;
# but I think the content actually *is* accessible, just not to Marcin&lt;br /&gt;
# when you&#039;re logged-into the wiki, the cookies bypass the cache. So, regretablly, when hetnzer2&#039;s backend is offline, Marcin sees an error&lt;br /&gt;
# but I&#039;d bet that the frontpage of all the websites and the recently-published apprenticeship info page that he&#039;s published &amp;amp; promoted are still online when he sees that error – for users who are *not* logged-into the site&lt;br /&gt;
# but if the backend site is broken for &amp;gt;24 hours, then the cache will cache the errors (not the content)&lt;br /&gt;
# as a short-term hack, I recommended that we setup a daily reboot of hetzner2 at 10:40 (a good buffer after the backups finish uploading)&lt;br /&gt;
# I asked Marcin if he&#039;d like me to setup a daily reboot at 10:40&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&lt;br /&gt;
Of course I agree it&#039;s not good, and we should migrate away from hetzner2 asap. And I do wish I had more bandwidth to finish the migration faster for you.&lt;br /&gt;
&lt;br /&gt;
But you have a varnish cache that caches pages for 24 hours. Even if your backend webserver and database are down, popular pages (like the frontpage of your wiki or a recent article that you&#039;ve recently promoted) should still load for users.&lt;br /&gt;
&lt;br /&gt;
The big issue isn&#039;t marketing and read-only content. The big issue is editing. That&#039;s what is breaking.&lt;br /&gt;
&lt;br /&gt;
When you&#039;re logged into the wiki, it bypasses the varnish cache. So, even if the wiki appears down to you, the contents of (most) articles viewed in the past 24 hours will be still visible to potential apprenticeship applicants.&lt;br /&gt;
&lt;br /&gt;
The next time you see the websites are down, try loading it from another device where you&#039;re not logged-in. You&#039;ll probably see that the apprenticeship info is still accessible, even though the backend for the site is down.&lt;br /&gt;
&lt;br /&gt;
As a short-term hack, I recommend setting-up a daily reboot of the server. Backups typically finish before 10:10 UTC. I recommend we add a cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&lt;br /&gt;
The server seems to function for some time after a fresh reboot, and it caches pages for 24 hours. So the first time someone loads a page in the wiki after that reboot, it&#039;ll be cached for the entire time that the server is online until its next reboot. I think this will ensure higher availability of your read-only content (eg information about the apprenticeship).&lt;br /&gt;
&lt;br /&gt;
Would you like me to setup a daily reboot at 10:40 UTC on hetzner2? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# I checked-in on the RAID replication status; it&#039;s finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [===================&amp;gt;.]  recovery = 96.5% (202794752/209984640) finish=2.5min speed=46324K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:20:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	 	 523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like I started it just after 13:32 and it finished just before 15:20. So it took just under 2 hours. Great!&lt;br /&gt;
# I updated the article with status updates, marking the CHG as completed successfully https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb#2025-04-24_16:18_UTC&lt;br /&gt;
# And I sent an email to Marcin &amp;amp; Catarana to let them know it was successful, and asked again about buying a new drive for replacing /dev/sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Update: your new (used) disk is now fully synced with the old (failing) disk.&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
According to SMART data, you now have one failing disk and one not-failing disk.&lt;br /&gt;
&lt;br /&gt;
Your hetzner2 RAID is now healthy, and you have redundancy spread across two mirrored disks again.&lt;br /&gt;
&lt;br /&gt;
Next week I&#039;d like to replace the other failing disk. Please let me know if you approve the purchase of a new disk for its replacement. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin got back to me, approving the purchase of the new disk; I updated the ticket https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# Note that the price is listed as &amp;quot;at cost&amp;quot; and it says&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# 1,000 hours is fine. That&#039;s compared to the 78,516 hours of /dev/sda and 52,083 hours of our &amp;quot;new&amp;quot; /dev/sdb&lt;br /&gt;
# but it&#039;s a bit concerning that it says it might not be in-stock. I&#039;m going to message them and ask if they can set one aside for us for next week&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hi Support,&lt;br /&gt;
&lt;br /&gt;
Can you set-aside a replacement disk for this server?&lt;br /&gt;
&lt;br /&gt;
Our disks&#039; SMART logs indicated that both disks should be replaced. Today we replaced one of the two disks, but the disk that you replaced it with has 4% of its life left, according to SMART data (it has 52,083 hours of operation).&lt;br /&gt;
&lt;br /&gt;
Next week we would like to replace the other disk, and this time we&#039;d like your &amp;quot;at cost&amp;quot; option, to get a disk with &amp;lt;1,000 hours of operation.&lt;br /&gt;
&lt;br /&gt;
But I was a bit concerned when I read this next to the WUI option for &amp;quot;at cost&amp;quot; on your website&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&lt;br /&gt;
Specifically what worries me is the &amp;quot;may not be in stock&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
Can you please tell us if you have stock now? And if you do, can you please reserve one disk for us for next week?&lt;br /&gt;
&lt;br /&gt;
We don&#039;t want to remove a disk from our RAID and plan for downtime, only to discover that you don&#039;t have a disk available for us..&lt;br /&gt;
&lt;br /&gt;
Please let us know if you can reserve 1 disk for us for next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I asked Marcin if Wed next week at 11:00 UTC is ok for replacing hetzner2&#039;s sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
   * 13:00 in Germany (where the server lives)&lt;br /&gt;
   * 06:00 here in Ecuador, and&lt;br /&gt;
   * 06:00 at FeF&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime,&lt;br /&gt;
please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
agreeable to you, and if you have any questions.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin returned the email confirming the time&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin got back to me and told me to setup the daily reboot cron on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, please set up reboot. That is decent for now&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 11:08 AM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;  &amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt;  &amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Of course I agree it&#039;s not good, and we should migrate away from&lt;br /&gt;
&amp;gt; hetzner2 asap. And I do wish I had more bandwidth to finish the&lt;br /&gt;
&amp;gt; migration faster for you.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; But you have a varnish cache that caches pages for 24 hours. Even if&lt;br /&gt;
&amp;gt; your backend webserver and database are down, popular pages (like the&lt;br /&gt;
&amp;gt; frontpage of your wiki or a recent article that you&#039;ve recently&lt;br /&gt;
&amp;gt; promoted) should still load for users.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The big issue isn&#039;t marketing and read-only content. The big issue is&lt;br /&gt;
&amp;gt; editing. That&#039;s what is breaking.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When you&#039;re logged into the wiki, it bypasses the varnish cache. So,&lt;br /&gt;
&amp;gt; even if the wiki appears down to you, the contents of (most) articles&lt;br /&gt;
&amp;gt; viewed in the past 24 hours will be still visible to potential&lt;br /&gt;
&amp;gt; apprenticeship applicants.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The next time you see the websites are down, try loading it from another&lt;br /&gt;
&amp;gt; device where you&#039;re not logged-in. You&#039;ll probably see that the&lt;br /&gt;
&amp;gt; apprenticeship info is still accessible, even though the backend for the&lt;br /&gt;
&amp;gt; site is down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; As a short-term hack, I recommend setting-up a daily reboot of the&lt;br /&gt;
&amp;gt; server. Backups typically finish before 10:10 UTC. I recommend we add a&lt;br /&gt;
&amp;gt; cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The server seems to function for some time after a fresh reboot, and it&lt;br /&gt;
&amp;gt; caches pages for 24 hours. So the first time someone loads a page in the&lt;br /&gt;
&amp;gt; wiki after that reboot, it&#039;ll be cached for the entire time that the&lt;br /&gt;
&amp;gt; server is online until its next reboot. I think this will ensure higher&lt;br /&gt;
&amp;gt; availability of your read-only content (eg information about the&lt;br /&gt;
&amp;gt; apprenticeship).&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you like me to setup a daily reboot at 10:40 UTC on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we don&#039;t have ansible for hetzner2; I did this manually&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology cron.d]# pwd&lt;br /&gt;
/etc/cron.d&lt;br /&gt;
[root@opensourceecology cron.d]# ls -lah&lt;br /&gt;
total 52K&lt;br /&gt;
drwxr-xr-x.   2 root root 4.0K Apr 24 17:56 .&lt;br /&gt;
drwxr-xr-x. 105 root root  12K Apr 18 21:52 ..&lt;br /&gt;
-rw-r--r--    1 root root  128 May 16  2023 0hourly&lt;br /&gt;
-rw-r--r--    1 root root 1.3K Apr  9  2019 awstats_generate_static_files&lt;br /&gt;
-rw-r--r--    1 root root  151 Apr 24 17:52 backup_to_backblaze&lt;br /&gt;
-rw-r--r--    1 root root   78 May 31  2024 cacti&lt;br /&gt;
-rw-r--r--    1 root root  125 Dec 11 00:16 letsencrypt&lt;br /&gt;
-rw-r--r--    1 root root  506 Mar 18  2019 phplist&lt;br /&gt;
-rw-r--r--    1 root root  108 Jan  7  2022 raid-check&lt;br /&gt;
-rw-r--r--    1 root root  118 Apr 24 17:56 reboot&lt;br /&gt;
-rw-------    1 root root  235 Dec 15  2022 sysstat&lt;br /&gt;
[root@opensourceecology cron.d]# cat reboot &lt;br /&gt;
# 2025-04-24: temp hack for unstable hetzner2 while we build-out hetzner3 to replace it&lt;br /&gt;
40 10 * * * root /sbin/reboot&lt;br /&gt;
[root@opensourceecology cron.d]# &lt;br /&gt;
# tomorrow morning I should check on the uptime and journalctl to make sure it rebooted sometime around 10:40 UTC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# ok, back to hetzner3: we bought a second IPv4 address for the static sites, but the server&#039;s networking was never setup for it; let&#039;s add that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # cp interfaces interfaces.20250424&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that failed.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
You have mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
I restored the backup file, and it still failed. The journal and status aren&#039;t helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl status networking&lt;br /&gt;
× networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: failed (Result: exit-code) since Thu 2025-04-24 17:18:55 UTC; 52s ago&lt;br /&gt;
   Duration: 2month 1w 20h 39min 50.765s&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 3259336 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)&lt;br /&gt;
	Process: 3259371 ExecStopPost=/usr/bin/touch /run/network/restart-hotplug (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 3259336 (code=exited, status=1/FAILURE)&lt;br /&gt;
		CPU: 29ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
root@hetzner3 ~ # journalctl -u networking | tail&lt;br /&gt;
Apr 24 17:16:36 hetzner3 ifup[3258504]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I run the ExecStart command manaully, I can add a verbose tag. but that&#039;s not especially helpful, either&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ifup --verbose -a --read-environment&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
&lt;br /&gt;
ifup: configuring interface enp0s31f6=enp0s31f6 (inet)&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
ip addr add 144.76.164.201/255.255.255.224 broadcast 144.76.164.223       dev enp0s31f6 label enp0s31f6&lt;br /&gt;
RTNETLINK answers: File exists&lt;br /&gt;
ifup: failed to bring up enp0s31f6&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/000resolvconf&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/ethtool&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/postfix&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/resolved&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# curiously, though, the new IPv4 address is listed in `ip a`&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to give this server a reboot before proceeding, to make sure the IP config is sticky&lt;br /&gt;
# when it came-up, it lost the new IP :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, at least it&#039;s restarting now without errors; I can work with that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # systemctl restart networking&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /etc/network # systemctlstatus networking&lt;br /&gt;
-bash: systemctlstatus: command not found&lt;br /&gt;
root@hetzner3 /etc/network # systemctl status networking&lt;br /&gt;
● networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (exited) since Thu 2025-04-24 17:33:40 UTC; 15s ago&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 8598 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)&lt;br /&gt;
	Process: 9022 ExecStart=/bin/sh -c if [ -f /run/network/restart-hotplug ]; then /sbin/ifup -a --read-environment --allow=hotplug; fi (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 9022 (code=exited, status=0/SUCCESS)&lt;br /&gt;
		CPU: 357ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:33:34 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:33:39 hetzner3 ifup[8663]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 ifup[8907]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 systemd[1]: Finished networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s try to add it now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces interfaces.20250424 &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,23&lt;br /&gt;
&amp;gt; iface enp0s31f6 inet static&lt;br /&gt;
&amp;gt;   address 144.76.164.195&lt;br /&gt;
&amp;gt;   netmask 255.255.255.224&lt;br /&gt;
&amp;gt;   gateway 144.76.164.193&lt;br /&gt;
&amp;gt;   # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
&amp;gt;   #up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&amp;gt; &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, but I have errors again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# curiously, it *did* add the new IP address; wtf&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet isn&#039;t very helpful because it seems the damn format has changed so many times over the years; lots of outdated info&lt;br /&gt;
# lots of people say they fixed this by deleting everything in interfaces.d/, but we don&#039;t have anything in that folder&lt;br /&gt;
# I did find this hetzner-specific docs on adding a second IP; it&#039;s totally different than what I&#039;ve read elsewhere https://docs.hetzner.com/robot/dedicated-server/network/net-config-debian-ubuntu&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
up ip addr add 10.4.2.1/32 dev eth0&lt;br /&gt;
down ip addr del 10.4.2.1/32 dev eth0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried this, and gave the server a reboot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,20&lt;br /&gt;
&amp;gt;   # 2025-04-24: add second IPv4 address&lt;br /&gt;
&amp;gt;   up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;   down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # cat interfaces&lt;br /&gt;
### Hetzner Online GmbH installimage&lt;br /&gt;
&lt;br /&gt;
source /etc/network/interfaces.d/*&lt;br /&gt;
&lt;br /&gt;
auto lo&lt;br /&gt;
iface lo inet loopback&lt;br /&gt;
iface lo inet6 loopback&lt;br /&gt;
&lt;br /&gt;
auto enp0s31f6&lt;br /&gt;
iface enp0s31f6 inet static&lt;br /&gt;
  address 144.76.164.201&lt;br /&gt;
  netmask 255.255.255.224&lt;br /&gt;
  gateway 144.76.164.193&lt;br /&gt;
  # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
  up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
  # 2025-04-24: add second IPv4 address&lt;br /&gt;
  up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
  down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::2&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::3&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the system came-up with the IP I want. Cool!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I&#039;m able to restart the service without it yelling at me (or breaking the IP config)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also able to ping the server on both IPs, which is a good sign&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ ping 144.76.164.201&lt;br /&gt;
PING 144.76.164.201 (144.76.164.201) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=1 ttl=50 time=490 ms&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=2 ttl=50 time=490 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.201 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1000ms&lt;br /&gt;
rtt min/avg/max/mdev = 489.558/489.676/489.795/0.118 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
user@disp9871:~$ ping 144.76.164.195&lt;br /&gt;
PING 144.76.164.195 (144.76.164.195) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=1 ttl=50 time=493 ms&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=2 ttl=50 time=512 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.195 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1001ms&lt;br /&gt;
rtt min/avg/max/mdev = 492.853/502.518/512.184/9.665 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I used netcat to test it. Most ports are closed, and I found that nginx is listening on most of the other ports on all IPs – except 4443&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # nc -s 144.76.164.195 -l -p 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and this was how it looked on my laptop&#039;s side&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ nc 144.76.164.195 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so the server&#039;s new IPv4 address is configured (and persistent between reboots)&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 20, 2025=&lt;br /&gt;
# Marcin replied to my email authorizing the replacement of the /dev/sdb disk on hetzner2 at 2025-04-24 10:00 UTC https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
## I updated the article with the defined date &amp;amp; time&lt;br /&gt;
# ...&lt;br /&gt;
# I also checked hetzner3. I see that I setup email alerts for the RAID, but not for SMART.&lt;br /&gt;
## on hetzner2, we had no errors of the RAID, but we did have SMART errors. I guess eventually if it failed enough that RAID replication was breaking, we would have gotten alerts. But it would be good if we could get alerts *before* that happened..&lt;br /&gt;
# I checked munin on hetzner2 to see what data it collects for monitoring disks @ /disk-day.html&lt;br /&gt;
## looks like we have latency, throughput, usage, utilization, i/o, and inode usage. There&#039;s nothing about &amp;quot;SMART errors&amp;quot;&lt;br /&gt;
# looks like there *is* a smart module for munin https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
# it&#039;s already there on hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Mar 21  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 smart_&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# hetzner2 has it too &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology munin]# ls -lah /usr/share/munin/plugins | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Nov  6  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Nov  6  2023 smart_&lt;br /&gt;
[root@opensourceecology munin]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# crap, I just checked hetzner3&#039;s munin, and I realized that varnish is missing :(&lt;br /&gt;
# it looks like ansible *has* pushed-out the script and plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /usr/share/munin/plugins/ | grep -i varnish&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 varnish_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Feb 12 00:14 varnish5_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 28  2024 varnish5_.175431.2025-02-12@00:16:02~&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 25  2024 varnish5_.20240928.orig&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /etc/munin/plugins/ | grep -i varnish&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_backend_traffic -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_bad -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_expunge -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_hit_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_main_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_memory_usage -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_mgt_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_objects -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_request_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_threads -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_transfer_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Feb 12 00:16 varnish_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
 # I did a diff of the varnish5_ script from my server and ose&#039;s server, and I found 2 new lines at the top of the hetzner3 server&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@mail:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
#&lt;br /&gt;
# This program is free software; you can redistribute it and/or modify&lt;br /&gt;
maltfield@mail:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
# Ansible managed&lt;br /&gt;
&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so basically the issue appears to be that my &amp;quot;ansible managed&amp;quot; comment comes before the shebang, so varnish is interpreting everything as shell, instead of perl&lt;br /&gt;
# we can see the result of all these syntax errors with a test run too&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@mail:/etc/munin# munin-run varnish_hit_rate&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
client_req.value 704255&lt;br /&gt;
cache_miss.value 202581&lt;br /&gt;
cache_hitmiss.value 2181&lt;br /&gt;
cache_hit.value 499493&lt;br /&gt;
root@mail:/etc/munin#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 26: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 28: varnish5_: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 30: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 32: Varnish: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 34: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 36: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 38: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 39: [varnish5_*]: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 40: group: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 41: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 42: env.name: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 44: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 108: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 111: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 114: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 117: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 119: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 123: Syntax error: &amp;quot;(&amp;quot; unexpected&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I moved the &amp;quot;ansible managed&amp;quot; comment below the shebang in ansible, and pushed it out; now it works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
client_req.value 10714&lt;br /&gt;
cache_hitmiss.value 9&lt;br /&gt;
cache_hit.value 6478&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
cache_miss.value 4227&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also pushed-out smart at the same time, but it&#039;s not working&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_ suggest&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the docs page for the smart_ munin plugin says that we need this section at-minimum in the munin config file, so I added it to hetzner2 https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# tail -n4 zzz-ose &lt;br /&gt;
&lt;br /&gt;
[smart_*]&lt;br /&gt;
user root&lt;br /&gt;
group disk&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I manually created the symlinks for sda &amp;amp; sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /etc/munin/plugins&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sda&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sdb&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# sweet, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# munin-run smart_sdb&lt;br /&gt;
Program_Fail_Count.value 100&lt;br /&gt;
Reallocated_Event_Count.value 100&lt;br /&gt;
Ave_Block_Erase_Count.value 001&lt;br /&gt;
Reallocate_NAND_Blk_Cnt.value 100&lt;br /&gt;
Erase_Fail_Count.value 100&lt;br /&gt;
Reported_Uncorrect.value 100&lt;br /&gt;
SATA_Interfac_Downshift.value 100&lt;br /&gt;
Offline_Uncorrectable.value 100&lt;br /&gt;
smartctl_exit_status.value 8&lt;br /&gt;
Write_Error_Rate.value 100&lt;br /&gt;
FTL_Program_Page_Count.value 100&lt;br /&gt;
Current_Pending_Sector.value 100&lt;br /&gt;
Success_RAIN_Recov_Cnt.value 100&lt;br /&gt;
UDMA_CRC_Error_Count.value 100&lt;br /&gt;
Error_Correction_Count.value 100&lt;br /&gt;
Temperature_Celsius.value 064&lt;br /&gt;
Raw_Read_Error_Rate.value 100&lt;br /&gt;
Total_Host_Sector_Write.value 100&lt;br /&gt;
Power_Cycle_Count.value 100&lt;br /&gt;
Power_On_Hours.value 100&lt;br /&gt;
Host_Program_Page_Count.value 100&lt;br /&gt;
Unused_Reserve_NAND_Blk.value 000&lt;br /&gt;
Percent_Lifetime_Remain.value 000&lt;br /&gt;
Unexpect_Power_Loss_Ct.value 100&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, I&#039;m not getting the same results on hetzner3. I wonder if this munin plugin doesn&#039;t support nvme drives?&lt;br /&gt;
# oh, it looks like I&#039;m actually not updating that file anymore in ansible, because it has a backup. I&#039;m going to make a note in ansible so I don&#039;t make that mistake again.&lt;br /&gt;
# meanwhile, I manually updated the config file on hetzner3 too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin # cd plugin-conf.d/&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls&lt;br /&gt;
dhcpd3  munin-node  README  spamstats  zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # touch /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chown root:root /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chmod 0600 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # cp zzz-myconf /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls -lah /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
-rw------- 1 root root 1,7K Apr 20 17:29 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # vim zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # diff /var/tmp/munin-zzz-myconf.20250420 /etc/munin/plugin-conf.d/zzz-myconf &lt;br /&gt;
3c3&lt;br /&gt;
&amp;lt; # Version: 0.2&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Version: 0.3&lt;br /&gt;
9c9&lt;br /&gt;
&amp;lt; # Updated: 2024-12-12&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Updated: 2025-04-20&lt;br /&gt;
31a32,35&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; [smart_*]&lt;br /&gt;
&amp;gt; user root&lt;br /&gt;
&amp;gt; group disk&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that still fails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but, if I restart the service first and then run it, it – uhh – kinda works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # service munin-node restart&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it exits with a non-error, just a U. no further stats. huh.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
smartctl_exit_status.value U&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, it looks like the smart_ plugin doesn&#039;t work for nvme drives :(&lt;br /&gt;
## https://github.com/munin-monitoring/munin/issues/790&lt;br /&gt;
## https://github.com/aranemac/munin-smart-nvme&lt;br /&gt;
# I&#039;m not looking to compile some binary. I think we&#039;ve reached the point of diminished return here&lt;br /&gt;
# while historical smart charts would be great, what I really want to achieve is some email alerts from SMART, like we setup for the RAID&lt;br /&gt;
# I found a few guides about this&lt;br /&gt;
## https://linuxconfig.org/how-to-configure-smartd-and-be-notified-of-hard-disk-problems-via-email&lt;br /&gt;
## https://serverfault.com/questions/426761/is-smartd-properly-configured-to-send-alerts-by-email&lt;br /&gt;
## https://unix.stackexchange.com/questions/662633/best-practices-to-enable-smart-disk-notifications-on-a-linux-workstation&lt;br /&gt;
# I replaced the files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # mv /etc/smartd.conf /etc/smartd.conf.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).orig&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc # echo &amp;quot;DEVICESCAN -d removable -n standby -m REDACTED@opensourceecology.org -M exec /usr/share/smartmontools/smartd-runner&amp;quot; &amp;gt; /etc/smartd.conf&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but that didn&#039;t work; no email came when I restarted the service (even if I added -M test)&lt;br /&gt;
# I checked the status in systemd, and it says that it did try to send the mail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # systemctl status smartd&lt;br /&gt;
● smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/smartmontools.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (running) since Sun 2025-04-20 20:58:57 UTC; 3min 22s ago&lt;br /&gt;
	   Docs: man:smartd(8)&lt;br /&gt;
			 man:smartd.conf(5)&lt;br /&gt;
   Main PID: 1466569 (smartd)&lt;br /&gt;
	 Status: &amp;quot;Next check of 2 devices will start at 21:28:57&amp;quot;&lt;br /&gt;
	  Tasks: 1 (limit: 76834)&lt;br /&gt;
	 Memory: 1.2M&lt;br /&gt;
		CPU: 66ms&lt;br /&gt;
	 CGroup: /system.slice/smartmontools.service&lt;br /&gt;
			 └─1466569 /usr/sbin/smartd -n&lt;br /&gt;
&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, is SMART capable. Adding to &amp;quot;monitor&amp;quot; list.&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state read from /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme0n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NX0M104566-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 systemd[1]: Started smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon.&lt;br /&gt;
root@hetzner3 /etc #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I checked the postfix logs, and it looks like google is rejecting our mail?!?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # journalctl -fu postfix@-&lt;br /&gt;
...&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: Untrusted TLS connection established to aspmx.l.google.com[108.177.15.27]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bit&lt;br /&gt;
s) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: CB6E5B94BB2: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[108.177.15.27]:25, delay=1.2, delays=0.01/0.01/0.86/0.27, dsn=2.0.0, status=sent (250 2.0.0 OK  1745183017 ffacd0b85a97d-39efa5a45b6si4251829f8f.798 - gsmtp)&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/qmgr[4510]: CB6E5B94BB2: removed&lt;br /&gt;
Apr 20 21:04:36 hetzner3 postfix/smtp[1468114]: Untrusted TLS connection established to aspmx.l.google.com[2404:6800:4003:c02::1b]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/bounce socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/bounce socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/defer socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/defer socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: D13CAB94BB3: defer service failure&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: D13CAB94BB3: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[2404:6800:4003:c02::1b]:25, delay=4.5, delays=0.01/0.01/3.5/1, dsn=4.3.0, status=deferred (bounce or trace service failure)&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I changed it to my personal email, restarted, and I got two emails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme1&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NA0M345614, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme0&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NX0M104566, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I changed it back to the google groups email list email address, and I updated the wiki https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
# after lunch, I refreshed munin on hetzne2 and hetzner3, to see if smart info was not being charted&lt;br /&gt;
## on hetzner2, there&#039;s no changes. I don&#039;t see any charts related to SMART&lt;br /&gt;
## on hetzner3, there&#039;s two new charts (S.M.A.R.T values for drive nvme0n1 &amp;amp; S.M.A.R.T values for drive nvme1n1), but they&#039;re both empty; it only has 1 value (smartctl_exit_status), and it&#039;s &amp;quot;nan&amp;quot; for all time charts. This is expected, since it can&#039;t read the nvme smartctl output format.&lt;br /&gt;
# I think maybe I forgot to restart munin on hetzner2, so I gave that a try&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# service munin-node restart&lt;br /&gt;
Redirecting to /bin/systemctl restart munin-node.service&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# sudo -u munin /usr/bin/munin-cron&lt;br /&gt;
2025/04/20 21:29:38 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 55.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 56.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:52 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# whatever; I guess no munin logs on SMART for this dying server&lt;br /&gt;
# I also confirmed that varnish logs are now visible in munin&lt;br /&gt;
# I committed my ansible changes https://github.com/OpenSourceEcology/ansible/commit/2fb906fd62cf0773d84f50f1cf113ddfe66910ec&lt;br /&gt;
# anyway, I also updated smartd.conf on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology smartmontools]# cp smartd.conf smartd.conf.20250420.bak&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# vim smartd.conf&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# diff smartd.conf.20250420.bak smartd.conf&lt;br /&gt;
23c23,24&lt;br /&gt;
&amp;lt; DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; #DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
&amp;gt; DEVICESCAN -H -m REDACTED@opensourceecology.org -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
[root@opensourceecology smartmontools]# systemctl restart smartd&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh wow, that screaming about the disks failing wasn&#039;t just printed to my tty; it got printed to every tty on my screen session. It really is angry..&lt;br /&gt;
# but, alas, no email was sent – even from hetzner2. where email should *definitely* be working&lt;br /&gt;
# this time the postfix logs on hetzner2 gave us an error from gmail saying why they&#039;re blocking us&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21221]: 297716847E6: host aspmx.l.google.com[64.233.167.27] said: 421-4.7.28 Gmail has detected an unusual rate of unso&lt;br /&gt;
licited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.go&lt;br /&gt;
ogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42a931si4417083f8f.167 - gsmtp (in reply to end &lt;br /&gt;
of DATA command)&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21094]: 3CBF7684804: host aspmx.l.google.com[142.251.168.27] said: 421-4.7.28 Gmail has detected an unusual rate of uns&lt;br /&gt;
olicited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.g&lt;br /&gt;
oogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42967csi4306047f8f.165 - gsmtp (in reply to end&lt;br /&gt;
 of DATA command)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# marcin sent an email campaign today with phpList. If that didn&#039;t make it out due to this, that&#039;s kinda  problem.&lt;br /&gt;
# I see in the log that we&#039;re kinda spamming phplist_bounces@opensourceecology.org&lt;br /&gt;
# that&#039;s basically where phplist is supposed to let our admins know that it failed to deliver to some people on the mailing list&lt;br /&gt;
## I confirmed that this account *does* exist in the gsuite admin wui user list&lt;br /&gt;
# yeah, crap, it&#039;s blocking other mail sent to my personal account from apache.&lt;br /&gt;
# woah, I&#039;m tailing the mail log and I just got probably hundereds or thousands of emails tried to be sent. phpList is *supposed* to do it in small batches, but I wonder if, once it fails and gets added to the queue, it&#039;ll do the re-send without batching it..&lt;br /&gt;
# I checked phpList wui settings and config.php, and I don&#039;t see anything about rate-limiting&lt;br /&gt;
# here&#039;s the docs on it https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
# it says it should be set in config.php. By default, I think it&#039;s 5,000 emails per hour&lt;br /&gt;
# Marcin&#039;s campaign today was sent to 14,111 people&lt;br /&gt;
# I checked the event log page, and I see a lot of these &amp;quot;Maximum time for queue processing: 99999&amp;quot; – which I guess means we need to break these up into batches https://phplist.opensourceecology.org/lists/admin/?page=eventlog&lt;br /&gt;
# looks like the easiest thing to do is to add a pause with MAILQUEUE_THROTTLE https://discuss.phplist.org/t/some-advice-for-correct-configuration-of-sending-rate/429&lt;br /&gt;
# if we send one per second, then we&#039;ll send 3,600 per hour.&lt;br /&gt;
## If we have 15,000 people on our list, then at that rate we&#039;d need 4-5 hours to send a campaign. That sounds like a good idea.&lt;br /&gt;
# I updated the phpList config file to send only one email per second&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# diff config.20250420.php config.php &lt;br /&gt;
83a84,87&lt;br /&gt;
&amp;gt; // only send 1 email per second&lt;br /&gt;
&amp;gt; //  * https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
&amp;gt; define(&#039;MAILQUEUE_THROTTLE&#039;,1);&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we should also probably throttle postfix https://serverfault.com/questions/110919/postfix-throttling-for-outgoing-messages&lt;br /&gt;
# looks like for both hetzner2 and hetzner3, this is set to no delay&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 0s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I set this on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology postfix]# diff main.cf.20250420 main.cf&lt;br /&gt;
683a684,686&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # limit emails to the same-destination-domain to one-email-per-2-seconds&lt;br /&gt;
&amp;gt; default_destination_rate_delay = 2s&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# systemctl restart postfix&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 2s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I also added this to ansible and pushed it out to the server on hetnzer3 https://github.com/OpenSourceEcology/ansible/commit/7ed339cad055a9a0c5b04f26d32c9416daf3a2c7&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 19, 2025=&lt;br /&gt;
&lt;br /&gt;
# I responded to Tom&#039;s email about ssh&lt;br /&gt;
# Tom wasn&#039;t able to reset their account&#039;s password&lt;br /&gt;
# I think I created these accounts with `--disabled-password`, probably as some layered security for ssh (to force keys), but that kinda breaks sudo, which requires the password. I could make sudo NOPASSWD, but I think it&#039;s safer to have a user password set (and have ssh disabled passoword logins still) rather than set sudoers to NOPASSWD, in general&lt;br /&gt;
# disabled passwords are set with the &#039;!&#039; in the second field of /etc/shadown&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing:!:20133:0:99999:7:::&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I just manually edited /etc/shadow with vim to remove the exclimation point&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # vim /etc/shadow&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing::20133:0:99999:7:::&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom replied, saying he can become root on hetzner3 now.&lt;br /&gt;
# ...&lt;br /&gt;
# I returned to work on the plan for replacing the disks on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb#Change_Steps&lt;br /&gt;
# I confirmed that the disks (on both hetzner2 and hetzner3) are MBR partition scheme (not GPT) – indicated by &amp;quot;Disk label type: dos&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sda&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0x9b8e1266&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sda1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sdb&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0xd904fc05&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sdb1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# A quick spot-check shows that our backups usually finish at 09:55 – one time as late as 10:07. That&#039;s UTC.&lt;br /&gt;
# 10:00 UTC is 05:00 my time and 12:00 in Berlin. God that&#039;s early, but better to do this early in Germany time..&lt;br /&gt;
# I sent an email to Marcin asking if Thr 2025-04-24 @ 10:00 UTC (~05:00 FeF) would be a good time to do this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the first disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
Our backups finish daily at 10:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
 * 12:00 in Germany (where the server lives)&lt;br /&gt;
 * 05:00 here in Ecuador, and&lt;br /&gt;
 * 05:00 at FeF&lt;br /&gt;
&lt;br /&gt;
I propose next week on Thursday 2025-04-24 10:00 UTC.&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime, please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 18, 2025=&lt;br /&gt;
# Marcin sent another email this morning asking why osemain is down too now, and I responded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the&lt;br /&gt;
&amp;gt; last message&lt;br /&gt;
&lt;br /&gt;
Your whole database service was down, and it won&#039;t start. You have a varnish cache that stores a subset of pages in-memory for 24 hours. That&#039;s probably what you saw.&lt;br /&gt;
&lt;br /&gt;
I took webservers down yesterday to prevent the possibility of them corrupting the database worse, if it manages to start in recovery mode.&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt; go straight to migration to Hetzner 3.&lt;br /&gt;
&lt;br /&gt;
If you want high uptime, I don&#039;t recommend migrating to hetzner3 at this time. It&#039;s still not fully provisioned, and I actively work on it like a dev server. Which means I&#039;ll be restarting it and its services. It&#039;s not a safe place for production. That&#039;s why the wiki is the *last* service to migrate.&lt;br /&gt;
&lt;br /&gt;
Status update: yesterday I investigated to see if your underlying storage (disk, filesystem, or RAID) are failing, which might cause corruption. The filesystems were fine. RAID didn&#039;t have errors. The SMART logs on the disk said both of your two mirrored drives are failing and should be replaced within 24 hours. But I don&#039;t think that&#039;s evidence of corruption; I think it&#039;s just a timer that&#039;s alerting us to the possibility that the disks will fail soon. afaict, disk replacement is free (from Hetzner) but not trivial and high-risk. I&#039;ll postpone until after restoring the database.&lt;br /&gt;
&lt;br /&gt;
Likely not all of your database is corrupt. We *could* restore from backup, but I don&#039;t recommend that -- as you only have daily backups, and likely you&#039;ll have data loss.&lt;br /&gt;
&lt;br /&gt;
Yesterday I put the database in two recovery modes and was unable to get it to start. My plan is to continue to follow this guide, to see if I can find out which databases/tables/pages are corrupt and which are not. That way we can restore only the data we need from backups and minimize data loss&lt;br /&gt;
&lt;br /&gt;
 * https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&lt;br /&gt;
I have to go to the hospital today. If I have time, I will try to continue later tonight. And I plan to work on this over the weekend. I hope to have your sites back online early next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cheers,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 02:58, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Michael,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the last message -&lt;br /&gt;
&amp;gt; but now I&#039;m trying to post the blog posts and the main site appears to be&lt;br /&gt;
&amp;gt; down. Is our whole backend crashing?  Or is that something you are doing on&lt;br /&gt;
&amp;gt; your end?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Marcin&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On Thu, Apr 17, 2025 at 6:41 PM Marcin Jakubowski &amp;lt;&lt;br /&gt;
&amp;gt; REDACTED@opensourceecology.org&amp;gt; wrote:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Can we prioritize the wiki at this point to migrate the wiki right over to&lt;br /&gt;
&amp;gt;&amp;gt; Hetzner 3 with the  current up to date software, using the wiki backup from&lt;br /&gt;
&amp;gt;&amp;gt; 2 days ago, which is before the crash?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; The wiki was working at least the first part of yesterday, and I noticed&lt;br /&gt;
&amp;gt;&amp;gt; the crash at about 11 PM CST yesterday. Thus taking the backup from 4/15/25&lt;br /&gt;
&amp;gt;&amp;gt; should solve this? Ie, forget about trying to fix on Hetzner 2, go straight&lt;br /&gt;
&amp;gt;&amp;gt; to migration to Hetzner 3. Is that consistent with a possible shift in your&lt;br /&gt;
&amp;gt;&amp;gt; plans, or does that throw off the entire process of migration? OSE stands&lt;br /&gt;
&amp;gt;&amp;gt; stuck without it, I will have to do everything in Google docs if I don&#039;t&lt;br /&gt;
&amp;gt;&amp;gt; have wiki access, and i am justvputtingvout the announcent and recruiting.&lt;br /&gt;
&amp;gt;&amp;gt; I can switcj ro more publishing on the website, assuming that all works.&lt;br /&gt;
&amp;gt;&amp;gt; Please tell me what would be your proposed solution and how quickly you&lt;br /&gt;
&amp;gt;&amp;gt; think we can get back up to a functioning wiki, based on your schedule of&lt;br /&gt;
&amp;gt;&amp;gt; availability to work on this, so I can plan accordingly.  This is a much&lt;br /&gt;
&amp;gt;&amp;gt; higher priority than doing any of the main website migration.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thanks,&lt;br /&gt;
&amp;gt;&amp;gt; Marcin &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so back to trying to figure out the corruption of the mariadb&lt;br /&gt;
# looks like the attempt to start it in recovery mode 2 fails after 10 minutes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because a fatal signal was delivered to the control process. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    10m0.435s&lt;br /&gt;
user    0m0.011s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and the tail of the db log&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/mariadb/mariadb.log&lt;br /&gt;
250417 23:06:00  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:01  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:02  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:03  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:05  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:06  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:07  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:08  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:09  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so we have one more recovery mode we can try before it becomes destructive = 3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 3&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and gave it a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# damn, looks like it&#039;s stuck on the same thing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:33:17 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:33:17 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 20076 ...&lt;br /&gt;
250418 19:33:17 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:33:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:33:17 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:33:17 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:33:17 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:33:17 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:33:17 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:33:17  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:33:17  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:33:18  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:19  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:20  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet suggests this infinite loop is caused by the default of innodb_purge_threads=1, and it says we should set this to 0&lt;br /&gt;
## https://serverfault.com/questions/851342/mysql-crashed-and-not-starting-even-after-adding-innodb-force-recovery&lt;br /&gt;
## https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# I tried to cut off the systemctl restart early, but it&#039;s just stuck. I guess I just have to wait 10 minutes.&lt;br /&gt;
# anyway, I set the recovery back down to 2 and added the purge threads to 0 line; I&#039;ll try that when it&#039;s not blocked&lt;br /&gt;
# meanwhile, I read up on innodb_purge_threads, which is documented here https://dev.mysql.com/doc/refman/8.4/en/innodb-purge-configuration.html&lt;br /&gt;
# oh shit, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m2.102s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
[root@opensourceecology etc]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 19:44:30 UTC; 19s ago&lt;br /&gt;
  Process: 22469 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 22433 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 22468 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─22468 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─22693 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-...&lt;br /&gt;
&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 19:44:30 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the logs are being spammed with these last 5 lines a bunch; I guess something is still trying to access the db?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:44:28 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 22693 ...&lt;br /&gt;
250418 19:44:28 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:44:28 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:44:28 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:44:28 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:44:28 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:44:28 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:44:28 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:44:28  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:44:28  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:44:28  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:44:29 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883505166&lt;br /&gt;
250418 19:44:29 InnoDB: !!! innodb_force_recovery is set to 2 !!!&lt;br /&gt;
250418 19:44:29 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 19:44:29 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 19:44:29 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, the spam stopped. maybe just some startup thing.&lt;br /&gt;
# I was hoping at startup it would tell us which DBs/tables/pages were corrupt; I guess we have to initiate a scan or something.&lt;br /&gt;
# this guide doesn&#039;t say anything about that https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
# but this one recommends running `mysqlcheck` https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# this took about a minute to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# good news; looks like the wiki isn&#039;t fucked. it&#039;s just osemain, oswh, and cacti. restoring those from backups is probably not going to cause any data loss&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@opensourceecology dbFail.20250417]# head mysqlcheck.20250418.log &lt;br /&gt;
3dp_db.wp_commentmeta                              OK&lt;br /&gt;
3dp_db.wp_comments                                 OK&lt;br /&gt;
3dp_db.wp_links                                    OK&lt;br /&gt;
3dp_db.wp_masterslider_options                     OK&lt;br /&gt;
3dp_db.wp_masterslider_sliders                     OK&lt;br /&gt;
3dp_db.wp_options                                  OK&lt;br /&gt;
3dp_db.wp_postmeta                                 OK&lt;br /&gt;
3dp_db.wp_posts                                    OK&lt;br /&gt;
3dp_db.wp_revslider_css                            OK&lt;br /&gt;
3dp_db.wp_revslider_layer_animations               OK&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418.log &lt;br /&gt;
cacti_db.automation_ips&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.automation_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_cache&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_last&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output_boost_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
osemain_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
osemain_s_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
oswh_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s go ahead and take a mysqldump now, including the corrupt data. then I&#039;ll drop these three databases and restore from backups&lt;br /&gt;
## cacti_db&lt;br /&gt;
## osemain_db&lt;br /&gt;
## oswh_db&lt;br /&gt;
# I sent Marcin a status update email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I was able to start your database in recovery mode, and I see the following databases have corrupt tables:&lt;br /&gt;
&lt;br /&gt;
1. osemain&lt;br /&gt;
2. cacti&lt;br /&gt;
3. oswh&lt;br /&gt;
&lt;br /&gt;
Good news that the wiki isn&#039;t in that list. And that those particular corrupt DBs don&#039;t change much, so recovering just those databases from backups should result in an acceptable data loss, if any.&lt;br /&gt;
&lt;br /&gt;
I&#039;ll keep you updated.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I made the post-corruption mysqldump backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass --all-databases | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    2m48.845s&lt;br /&gt;
user    3m19.170s&lt;br /&gt;
sys     0m2.023s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# ls mysqldump*&lt;br /&gt;
mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz &lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s drop those three databases.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 14&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE cacti_db;&lt;br /&gt;
Query OK, 108 rows affected (0.38 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_db;&lt;br /&gt;
Query OK, 22 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oswh_db;&lt;br /&gt;
Query OK, 12 rows affected (0.03 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| fef_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
15 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# recovery mode isn&#039;t going to let us INSERT to recover data from backups, so let&#039;s take it out of recovery mode and see if the db will start&lt;br /&gt;
# nah, it failed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m2.805s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# logs are the same, I think?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:10:04 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:10:04 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 24305 ...&lt;br /&gt;
250418 20:10:04 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:10:04 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:10:04 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:10:04 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:10:04 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:10:04 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:10:04 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:10:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:10:04  InnoDB: Assertion failure in thread 140076605044480 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:10:04 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x560180c61cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x560180875975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f664031f630]&lt;br /&gt;
:0(__GI_raise)[0x7f663ea46387]&lt;br /&gt;
:0(__GI_abort)[0x7f663ea47a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x560180a0a45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x560180a0afa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x560180b0d504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x560180b02487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x560180a0d17d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x560180a010f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6640317ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f663eb0eb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:10:04 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I re-enabled recovery mode, but this time just as 1. This time it did start, but this loop gets spammed to the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:11:42 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883708456&lt;br /&gt;
250418 20:11:42 InnoDB: !!! innodb_force_recovery is set to 1 !!!&lt;br /&gt;
250418 20:11:42 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:11:42 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
250418 20:11:42  InnoDB: Assertion failure in thread 140282494781184 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:11:42 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed, &lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to &lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x55e2d6dbbcad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x55e2d69cf975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f962fbdc630]&lt;br /&gt;
:0(__GI_raise)[0x7f962e303387]&lt;br /&gt;
:0(__GI_abort)[0x7f962e304a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x55e2d6b6445f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x55e2d6b64fa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x55e2d6c67504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x55e2d6c5c487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x55e2d6b6717d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62e83c)[0x55e2d6b5a83c]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f962fbd4ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f962e3cbb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:11:42 mysqld_safe Number of processes running now: 0&lt;br /&gt;
250418 20:11:42 mysqld_safe mysqld restarted&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 27371 ...&lt;br /&gt;
250418 20:11:42 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:11:42 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:11:42 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:11:42 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:11:42 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:11:42 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:11:42 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:11:42  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, even though it *says* it&#039;s started&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m5.156s&lt;br /&gt;
user    0m0.008s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 20:11:07 UTC; 13s ago&lt;br /&gt;
  Process: 24459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 24423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 24458 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─24458 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─25620 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/v...&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:11:07 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
&lt;br /&gt;
real    0m0.012s&lt;br /&gt;
user    0m0.001s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can&#039;t connect to it with mysqlcheck&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log                              &lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I set it back to recovery mode 2, restarted, and tried the mysqlcheck again&lt;br /&gt;
# huh, all lines say OK&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418&lt;br /&gt;
mysqlcheck.20250418_201348.log  mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log&lt;br /&gt;
&lt;br /&gt;
real    0m11.597s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.009s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418_201559.log &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well now I&#039;m wondering if I should have run CHECK TABLE and REPAIR TABLE rather than just DROP them https://dev.mysql.com/doc/refman/8.4/en/myisam-table-close.html&lt;br /&gt;
# I&#039;m going to restore from the backup and then see if I can do that&lt;br /&gt;
# oh, right, we can&#039;t INSERT in recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1030 (HY000) at line 91: Got error -1 from storage engine&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, fuck, now I don&#039;t know why it won&#039;t start. And it doesn&#039;t tell me why. The good news is that I was able to get a db dump. maybe I can copy this huge dump over to some other server for repair and then copy it back?&lt;br /&gt;
# we should have backups. I&#039;m going to just purge all the non-system databases and see if we can get this thing started at all&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db d3ddb;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;d3ddb&#039; at line 1&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE d3d_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE fef_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE microfactory_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_db;&lt;br /&gt;
Query OK, 21 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_stabing_db;&lt;br /&gt;
ERROR 1008 (HY000): Can&#039;t drop database &#039;obi_stabing_db&#039;; database doesn&#039;t exist&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oseforum_db;&lt;br /&gt;
Query OK, 35 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_s_db;&lt;br /&gt;
Query OK, 20 rows affected (0.04 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osewiki_db;&lt;br /&gt;
Query OK, 59 rows affected (0.31 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE phplist_db;&lt;br /&gt;
Query OK, 42 rows affected (0.16 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE seedhome_db;&lt;br /&gt;
Query OK, 12 rows affected (0.05 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE store_db;&lt;br /&gt;
Query OK, 36 rows affected (0.11 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_staging_db;&lt;br /&gt;
Query OK, 21 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# even after that, it still won&#039;t start :&#039;(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m4.863s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Fri 2025-04-18 20:34:47 UTC; 14s ago&lt;br /&gt;
  Process: 18459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 18458 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 18423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 18458 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I purge those three system-level DBs, I want to confirm they&#039;re in our backups&lt;br /&gt;
# as I feared, it looks like they&#039;re missing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zgrep -E &#039;CREATE DATABASE&#039; mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | grep &#039;IF NOT EXISTS&#039; | grep -E &#039;^.{,100}$&#039;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `3dp_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `cacti_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `d3d_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `fef_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `microfactory_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `mysql` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_staging_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oseforum_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_s_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osewiki_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oswh_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `phplist_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `seedhome_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `store_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# according to this, information_schema is essentially a cache that gets created &amp;amp; destroyed every time mysql is restarted, so we should be ok to loose that https://stackoverflow.com/questions/15306132/information-schema-error-when-restoring-database-dump&lt;br /&gt;
# I&#039;m just going to manually dump these three anyway. Or try to&lt;br /&gt;
# well, I was able to get one of the three to backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass information_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_information_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz &lt;br /&gt;
mysqldump: Got error: 1044: &amp;quot;Access denied for user &#039;root&#039;@&#039;localhost&#039; to database &#039;information_schema&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass mysql | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_mysql.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    0m0.142s&lt;br /&gt;
user    0m0.155s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass performance_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_performance_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
mysqldump: Got error: 1142: &amp;quot;SELECT,LOCK TABL command denied to user &#039;root&#039;@&#039;localhost&#039; for table &#039;cond_instances&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.009s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# mysql looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode*&lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_information_schema.20250418_205054.sql.gz&lt;br /&gt;
716K    mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_performance_schema.20250418_205157.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to move this whole db dir out of the way and see if we can start it fresh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /var/lib&lt;br /&gt;
[root@opensourceecology lib]# du -sh mysql/&lt;br /&gt;
6.5G    mysql/&lt;br /&gt;
[root@opensourceecology lib]# ls -lah | grep -i mysql&lt;br /&gt;
drwxr-xr-x   4 mysql   mysql   4.0K Apr 18 20:50 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mv mysql mysql.20250418&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mkdir mysql&lt;br /&gt;
[root@opensourceecology lib]# chown mysql:mysql mysql&lt;br /&gt;
[root@opensourceecology lib]# chmod 0755 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 8.0K&lt;br /&gt;
drwxr-xr-x   2 mysql mysql 4.0K Apr 18 20:55 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, it&#039;s started outside recovery mode now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m3.550s&lt;br /&gt;
user    0m0.007s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
250418 20:55:06 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
250418 20:56:23 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:56:23 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 21252 ...&lt;br /&gt;
250418 20:56:23 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:56:23 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:56:23 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:56:23 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:56:23 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:56:23 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
InnoDB: The first specified data file ./ibdata1 did not exist:&lt;br /&gt;
InnoDB: a new database to be created!&lt;br /&gt;
250418 20:56:23  InnoDB: Setting file ./ibdata1 size to 10 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile0 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile0 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile1 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile1 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
InnoDB: Doublewrite buffer not found: creating new&lt;br /&gt;
InnoDB: Doublewrite buffer created&lt;br /&gt;
InnoDB: 127 rollback segment(s) active.&lt;br /&gt;
InnoDB: Creating foreign key constraint system tables&lt;br /&gt;
InnoDB: Foreign key constraint system tables created&lt;br /&gt;
250418 20:56:23  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:56:24 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 0&lt;br /&gt;
250418 20:56:24 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:56:24 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:56:24 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it created all these files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 29M&lt;br /&gt;
drwxr-xr-x   5 mysql mysql 4.0K Apr 18 20:56 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
-rw-rw----   1 mysql mysql  16K Apr 18 20:56 aria_log.00000001&lt;br /&gt;
-rw-rw----   1 mysql mysql   52 Apr 18 20:56 aria_log_control&lt;br /&gt;
-rw-rw----   1 mysql mysql  18M Apr 18 20:56 ibdata1&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile0&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile1&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 mysql&lt;br /&gt;
srwxrwxrwx   1 mysql mysql    0 Apr 18 20:56 mysql.sock&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 performance_schema&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 test&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that also would have killed the mysql password; I can&#039;t login&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1045 (28000): Access denied for user &#039;root&#039;@&#039;localhost&#039; (using password: YES)&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I hacked my way in and set the root password&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysqld_safe --skip-grant-tables --skip-networking &amp;amp;&lt;br /&gt;
mysql -u root&lt;br /&gt;
use mysql;&lt;br /&gt;
update user set password=PASSWORD(&amp;quot;new-password&amp;quot;) where User=&#039;root&#039;;&lt;br /&gt;
flush privileges;&lt;br /&gt;
exit&lt;br /&gt;
jobs -l&lt;br /&gt;
# kill mysqld_safe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now I can see our three databases, plus one named test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# usually this is where I&#039;d run the mysql hardening script, but let&#039;s just drop test manually and restore from backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE test;&lt;br /&gt;
Query OK, 0 rows affected (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first let&#039;s just restore the &#039;mysql&#039; database&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that appears to have worked; our users are present now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [mysql]&amp;gt; select User from user limit 10;&lt;br /&gt;
+------------------+&lt;br /&gt;
| User             |&lt;br /&gt;
+------------------+&lt;br /&gt;
| oseforum_user    |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| 3dp_user         |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| d3d_user         |&lt;br /&gt;
| fef_user         |&lt;br /&gt;
| microfactory_usr |&lt;br /&gt;
| munin_user       |&lt;br /&gt;
| obi2_user        |&lt;br /&gt;
| obi3_user        |&lt;br /&gt;
+------------------+&lt;br /&gt;
10 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [mysql]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, and ensured it&#039;s still working. Great.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 2&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s restore the rest – including even our corrupt databases – and see if it works or breaks&lt;br /&gt;
# that took about 11.5 minutes to import ~6.8G of data&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
&lt;br /&gt;
real    11m36.530s&lt;br /&gt;
user    1m52.944s&lt;br /&gt;
sys     0m3.593s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh /var/lib/mysql&lt;br /&gt;
6.8G    /var/lib/mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m still able to connect, and now I see all our DBs – including the ones it said were corrupt&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 6&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# woah, I gave it a restart, and it came back fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 3&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess we fixed it with no data loss?&lt;br /&gt;
# let&#039;s bring up the web servers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology lib]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology lib]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the wiki loads now&lt;br /&gt;
# so does osemain&lt;br /&gt;
# I&#039;d say we&#039;re back in business&lt;br /&gt;
# I sent an email to Marcin&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I think all your sites are back now.&lt;br /&gt;
&lt;br /&gt;
I was able to restore all of your databases from a dump of the database in recovery mode. So nothing needed to be restored from backups.&lt;br /&gt;
&lt;br /&gt;
Please let me know if you see any issues. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now that Marcin has ssh access on the server again, I wonder if he has permission to execute `restart` – that would be better for him than logging into the hetzner wui and doing hard resets, which likely caused this corruption&lt;br /&gt;
# at the risk of taking everything down after I just told Marcin that everything is up, I&#039;m going to try it&lt;br /&gt;
# looks like it won&#039;t let him reboot if other users are logged-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ reboot&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
Please retry operation after closing inhibitors and logging out other users.&lt;br /&gt;
Alternatively, ignore inhibitors and users with &#039;systemctl reboot -i&#039;.&lt;br /&gt;
[marcin@opensourceecology ~]$ systemctl reboot -i&lt;br /&gt;
==== AUTHENTICATING FOR org.freedesktop.login1.reboot-multiple-sessions ===&lt;br /&gt;
Authentication is required for rebooting the system while other users are logged in.&lt;br /&gt;
Multiple identities can be used for authentication:&lt;br /&gt;
 1.  maltfield&lt;br /&gt;
 2.  crupp&lt;br /&gt;
 3.  Tom Griffing (tgriffing)&lt;br /&gt;
 4.  jthomas&lt;br /&gt;
Choose identity to authenticate as (1-4):&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the sudoers command to give marcin *just* access to the reboot command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# visudo&lt;br /&gt;
[root@opensourceecology lib]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology lib]# tail /etc/sudoers&lt;br /&gt;
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom&lt;br /&gt;
&lt;br /&gt;
## Allows members of the users group to shutdown this system&lt;br /&gt;
# %users  localhost=/sbin/shutdown -h now&lt;br /&gt;
&lt;br /&gt;
## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)&lt;br /&gt;
#includedir /etc/sudoers.d&lt;br /&gt;
&lt;br /&gt;
# let marcin reboot the machine gracefully&lt;br /&gt;
marcin ALL = NOPASSWD: /sbin/reboot&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I couldn&#039;t test this on the server without changing marcin&#039;s password, so I spun-up a quick DispVM to ensure it *only* gives him access to reboot&lt;br /&gt;
# it&#039;s debian, but sudoers syntax should (hopefully) be the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@debian-12-dvm:~$ sudo su -&lt;br /&gt;
root@debian-12-dvm:~# adduser marcin --disabled-password --gecos &#039;&#039;&lt;br /&gt;
Adding user `marcin&#039; ...&lt;br /&gt;
Adding new group `marcin&#039; (1001) ...&lt;br /&gt;
Adding new user `marcin&#039; (1001) with group `marcin (1001)&#039; ...&lt;br /&gt;
Creating home directory `/home/marcin&#039; ...&lt;br /&gt;
Copying files from `/etc/skel&#039; ...&lt;br /&gt;
Adding new user `marcin&#039; to supplemental / extra groups `users&#039; ...&lt;br /&gt;
Adding user `marcin&#039; to group `users&#039; ...&lt;br /&gt;
root@debian-12-dvm:~# &lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# visudo&lt;br /&gt;
root@debian-12-dvm:~#&lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# passwd marcin&lt;br /&gt;
New password: &lt;br /&gt;
Retype new password: &lt;br /&gt;
passwd: password updated successfully&lt;br /&gt;
root@debian-12-dvm:~# sudo su - marcin&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo su -&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/su -&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo echo hi&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/echo hi&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$ &lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ reboot&lt;br /&gt;
-bash: reboot: command not found&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, that worked. Perfect.&lt;br /&gt;
# I tested it on hetzner2; it worked too.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ sudo reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I sent Marcin a reply ask him to test reboots via ssh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry the server just went down; that was me testing to make sure your &#039;marcin&#039; user now has permission to do a proper &amp;amp; safer `sudo reboot` of hetzner2. It does.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that&lt;br /&gt;
&amp;gt; I should plan on potential breakage at any time?&lt;br /&gt;
&lt;br /&gt;
Great question. There&#039;s a couple things I&#039;d like to implement to prevent this from happening again:&lt;br /&gt;
&lt;br /&gt;
1. Replace both of your disks on hetzner2&lt;br /&gt;
&lt;br /&gt;
2. Give you reboot permission on hetzner2&lt;br /&gt;
&lt;br /&gt;
My best-guess is that the corruption happened because you abruptly shutdown the server. As you know, that&#039;s generally not a good idea as it can cause data loss.&lt;br /&gt;
&lt;br /&gt;
But filesystems use journals and databases use pages. They *should* be able to recover from abrupt shutdowns. They wouldn&#039;t be very useful if they were so frail as to not be able to recover from something like that...&lt;br /&gt;
&lt;br /&gt;
But in this case, I think it was a &amp;quot;perfect storm&amp;quot; that you caused corruption and it wasn&#039;t able to recover from it due to a bug in mariadb. And, because your OS is EOL, we can&#039;t update to a newer version of mariadb that *is* able to recover from such a unlucky combination of events.&lt;br /&gt;
&lt;br /&gt;
So, in the meantime, instead of you logging into hetzner&#039;s WUI to trigger reboots, I&#039;d prefer if you would ssh into the hetzner2 server and execute&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
Please test this on your computer now to make sure you&#039;re setup for it. To ssh into hetzner2, execute this command on your computer:&lt;br /&gt;
&lt;br /&gt;
  ssh -p 32415 marcin@opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
And then at the prompt, execute this command (make sure you type this *after* you&#039;ve logged into hetzner, or you&#039;ll end-up rebooting your own laptop!)&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
The second thing I&#039;d like to do is replace both of your disks on hetzner2. I don&#039;t think they caused corruption in this case, but I did discover that they&#039;re both screaming that they&#039;re going to die soon and asking to be replaced, so I would be a fool not to heed that warning.&lt;br /&gt;
&lt;br /&gt;
Hetzner shouldn&#039;t charge us to replace a failing disk, but I&#039;ll schedule some downtime for remote hetzner hands to shutdown the machine, then I&#039;ll need to format the new drive, add it to the RAID (the mirror of two redundant disks), and update your grub boot partition.&lt;br /&gt;
&lt;br /&gt;
There&#039;s some risk in doing this, because you&#039;ll be running on one non-redundant disk (a disk which is screaming at us saying it&#039;s going to die within 24 hours) while the RAID is re-building. But, of course, there&#039;s risk in not doing it..&lt;br /&gt;
&lt;br /&gt;
Please confirm that you can now reboot hetzner2 via ssh.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 16:39, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Thats excellent, thabk you, looks good. Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that I should plan&lt;br /&gt;
&amp;gt; on potential breakage at any time? Regarding the full migration, how many&lt;br /&gt;
&amp;gt; more hours/days of provisioning do tou still expwct to need? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created an article for the CHG to replace the first disk on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
## I wonder if I can figure out which one grub uses and replace that one second..&lt;br /&gt;
# from my log yesterday, here&#039;s our two drive&#039;s serial numbers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fuck; looks like neither is referenced in /boot/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA4520&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA336C&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the steps to setup grub are actually quite simple, according to the hetzner docs https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## it says if we&#039;re doing it on the booted system, then we just need to run `grub-install /dev/sdX`&lt;br /&gt;
# it has additional instructions for grub1. And, uh, looks like we have grub1, grub2, *and* an efi dir in /boot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64                            initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img  System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64                      initramfs-3.10.0-327.18.2.el7.x86_64.img         System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64                        initramfs-3.10.0-514.26.2.el7.x86_64.img         System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64                        initramfs-3.10.0-693.2.2.el7.x86_64.img          System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64                         initramfs-3.10.0-693.2.2.el7.x86_64kdump.img     System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi                                                      initrd-plymouth.img                              vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
grub                                                     lost+found                                       vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
grub2                                                    symvers-3.10.0-1127.el7.x86_64.gz                vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img  symvers-3.10.0-1160.119.1.el7.x86_64.gz          vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img                     symvers-3.10.0-327.18.2.el7.x86_64.gz            vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img                symvers-3.10.0-514.26.2.el7.x86_64.gz            vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img               symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m thinking we should actually just tell hetzner to do a hot swap while the system is on, so we can do this &amp;quot;easy install&amp;quot; of grub without risking the system not coming-up after they removed the drive&lt;br /&gt;
# oh, the efi dir is empty, so I&#039;m thinking we&#039;re using grub2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# find efi&lt;br /&gt;
efi&lt;br /&gt;
efi/EFI&lt;br /&gt;
efi/EFI/centos&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, the grub dir just has one file in it?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub&lt;br /&gt;
total 10K&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Apr 11  2016 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
-rw-r--r--  1 root root 1.4K Nov 15  2011 splash.xpm.gz&lt;br /&gt;
[root@opensourceecology boot]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# grub2 looks most sane&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub2&lt;br /&gt;
total 52K&lt;br /&gt;
drwx------. 5 root root 1.0K Jul 26  2024 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Dec 15  2015 fonts&lt;br /&gt;
-rw-r--r--  1 root root 7.8K Jul 26  2024 grub.cfg&lt;br /&gt;
-rw-r--r--  1 root root 5.3K Jun  1  2016 grub.cfg.1499616907.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 6.1K Jul  9  2017 grub.cfg.1506097734.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 7.0K Sep 22  2017 grub.cfg.1588589453.rpmsave&lt;br /&gt;
-rw-r--r--. 1 root root 1.0K Jul 26  2024 grubenv&lt;br /&gt;
drwxr-xr-x. 2 root root 9.0K May 31  2016 i386-pc&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K May 31  2016 locale&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it looks like it&#039;s referencing the raid, not the drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### BEGIN /etc/grub.d/10_linux ###&lt;br /&gt;
menuentry &#039;CentOS Linux (3.10.0-1160.119.1.el7.x86_64) 7 (Core)&#039; --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option &#039;gnulinux-3.10.0-327.13.1.el7.x86_64-advanced-af18bd25-f715-4003-b055-170a07591c60&#039; {&lt;br /&gt;
		load_video&lt;br /&gt;
		set gfxpayload=keep&lt;br /&gt;
		insmod gzio&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod diskfilter&lt;br /&gt;
		insmod mdraid1x&lt;br /&gt;
		insmod ext2&lt;br /&gt;
		set root=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;&lt;br /&gt;
		if [ x$feature_platform_search_hint = xy ]; then&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root --hint=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;  9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		else&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root 9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		fi&lt;br /&gt;
		linux16 /vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
		initrd16 /initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# right, so if I understand this correctly: we&#039;re not updating grub. We&#039;re using &#039;grub-install&#039; to copy our grub config *to* the drive. that&#039;s easier and less concerning than I thought.&lt;br /&gt;
# well, since I can&#039;t see any good reason to pick one drive or the other to replace first, I&#039;m going to have them replace /dev/sdb first. Just because &#039;sda&#039; seems like it would be primary. I know it&#039;s probably not, but, anyway..&lt;br /&gt;
# that means we&#039;ll replace Crucial_CT250MX200SSD1_154410FA4520 first; I created another wiki entry for that https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
# Marcin sent me an email confirming that he&#039;s able to restart hetzner2 with `sudo reboot`. I asked him to use this in the future if he needs to reboot it again.&lt;br /&gt;
# the disk is getting pretty full, but I&#039;m going to leave these files in /var/tmp/ for at least a few days, to make sure we don&#039;t actually need to restore from a backup again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  150G   38G  80% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/lib/mysql.20250418 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 17, 2025=&lt;br /&gt;
# Marcin sent me an email last night (and again this morning) asking why the wiki is down&lt;br /&gt;
# I hadn&#039;t touched ose infra since 6 days ago&lt;br /&gt;
# the wiki is still on hetzner2, which is on EOL Cent, so I&#039;m not terribly surprised it&#039;s falling apart.&lt;br /&gt;
# I first warned Marcin about this many years ago, and hopefully the migration to hetzner3 will be finished before the end of this year&lt;br /&gt;
# anyway, let&#039;s check what happened to the wiki on hetzner2&lt;br /&gt;
# it&#039;s a 500 error complaining about the db&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ curl -iL wiki.opensourceecology.org&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:52 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://wiki.opensourceecology.org/&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 500 Internal Server Error&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:54 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 976&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Varnish: 434054&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish-v4&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h1&amp;gt;Sorry! This site is experiencing technical difficulties.&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Try waiting a few minutes and reloading.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;small&amp;gt;(Cannot access the database)&amp;lt;/small&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;hr /&amp;gt;&amp;lt;div style=&amp;quot;margin: 1.5em&amp;quot;&amp;gt;You can try searching via Google in the meantime.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;small&amp;gt;Note that their indexes of our content may be out of date.&amp;lt;/small&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;form method=&amp;quot;get&amp;quot; action=&amp;quot;//www.google.com/search&amp;quot; id=&amp;quot;googlesearch&amp;quot;&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;domains&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;num&amp;quot; value=&amp;quot;50&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;ie&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;oe&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;q&amp;quot; size=&amp;quot;31&amp;quot; maxlength=&amp;quot;255&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;submit&amp;quot; name=&amp;quot;btnG&amp;quot; value=&amp;quot;Search&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;p&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; checked=&amp;quot;checked&amp;quot; /&amp;gt;Open Source Ecology&amp;lt;/label&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;WWW&amp;lt;/label&amp;gt;&lt;br /&gt;
	&amp;lt;/p&amp;gt;&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# disk is fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G   96G   92G  52% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s no new logs in the apache error log when I hit the site in real-time (bypassing the cache)&lt;br /&gt;
# there&#039;s also no new logs in the mariadb error log when I hit the site in real-time&lt;br /&gt;
# well, the db isn&#039;t running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Thu 2025-04-17 17:39:24 UTC; 2h 42min ago&lt;br /&gt;
  Process: 1227 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 1226 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 1103 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 1226 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# error logs aren&#039;t very helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# journalctl -fu mariadb&lt;br /&gt;
-- Logs begin at Thu 2025-04-17 17:38:59 UTC. --&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I try to restart it manually, nothing gets put in the journal logs, but there&#039;s a bunch to the actual log file that the journal log mentions (damn systemd)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the log that pops-up when we try a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 20:24:31 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 20:24:31 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 10583 ...&lt;br /&gt;
250417 20:24:31 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 20:24:31 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 20:24:31 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 20:24:31 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 20:24:31 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 20:24:31 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 20:24:31 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 20:24:31  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 20:24:31  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 20:24:31  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 20:24:31  InnoDB: Assertion failure in thread 140093400303360 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250417 20:24:31 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x563a1c105cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x563a1bd19975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f6a294c9630]&lt;br /&gt;
:0(__GI_raise)[0x7f6a27bf0387]&lt;br /&gt;
:0(__GI_abort)[0x7f6a27bf1a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x563a1beae45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638f69)[0x563a1beaef69]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x563a1bfb1504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x563a1bfa6487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x563a1beb117d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x563a1bea50f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6a294c1ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f6a27cb8b0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250417 20:24:31 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# google points to this https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
## they say it could be a bug that might be fixed in v5.7. We&#039;re using 5.5.68. hetzner3 uses 5.8.&lt;br /&gt;
# reddit says we&#039;re fucked and should restore from backup https://old.reddit.com/r/mysql/comments/d3nkc7/innodb_assertion_failure_in_thread_4560_in_file/&lt;br /&gt;
# before reading any more, I&#039;m going to immediately make a local copy of our most-recent backups&lt;br /&gt;
# looks like we have a backup from 13 hours ago and one from 27 hours ago&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ date&lt;br /&gt;
Thu Apr 17 20:36:56 UTC 2025&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync&lt;br /&gt;
total 21G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 17 07:49 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    21G Apr 17 07:48 daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync.old/&lt;br /&gt;
total 22G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 16 07:52 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    22G Apr 16 07:52 daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this SE answer is helpful https://serverfault.com/questions/592793/mysql-crashed-and-wont-start-up&lt;br /&gt;
## it says we can force the db to start (in &amp;quot;recovery mode&amp;quot;) and then try to figure out which table is corrupted. Then we might be able to backup more-recent data from the not-corrupt tables and only recover the fucked table&lt;br /&gt;
## other warnings suggest solving the underlying issue: why did the data become corrupt?&lt;br /&gt;
## well, we know Marcin has been hard-resetting the server (via the hetzner wui) about every week because it keeps breaking since some months ago (it&#039;s EOL and not worth debugging)&lt;br /&gt;
## but it&#039;s also possible we have a worse issue, like a disk failing. We do have RAID1 tho, so idk. Still, it would be wise to check the SMART data and RAID logs and filesystem for corruption&lt;br /&gt;
# I sent a quick status update to Marcin so he knows the severity of the issue and that this isn&#039;t going to be fixed soon&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Your database is corrupt and won&#039;t start.&lt;br /&gt;
&lt;br /&gt;
Quick internet search for the error messages suggests this could be a bug that&#039;s been fixed in mariadb 5.7. You&#039;re using 5.6 and can&#039;t upgrade because your OS is EOL. hetnzer3 is running 5.8.&lt;br /&gt;
&lt;br /&gt;
 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
I&#039;m looking into seeing what is corrupt, what isn&#039;t corrupt, and if we can restore from backup.&lt;br /&gt;
&lt;br /&gt;
This is not going to be an easy or fast fix, sorry. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the backups of the backups finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /home/b2user/sync*/* /var/tmp/&lt;br /&gt;
sending incremental file list&lt;br /&gt;
daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
 22,975,631,986 100%  139.63MB/s    0:02:36 (xfr#1, to-chk=1/2)&lt;br /&gt;
daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
 21,566,407,634 100%  103.43MB/s    0:03:18 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 44,552,914,338 bytes  received 54 bytes  125,324,653.70 bytes/sec&lt;br /&gt;
total size is 44,542,039,620  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  138G   50G  74% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also going to take down the webservers, so that they can&#039;t fuck-up the database worse, if we do start it in some recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I should also make a backup of /var/lib/mysql&lt;br /&gt;
# I&#039;m going to create a dif for all of this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mkdir /var/tmp/dbFail.20250417&lt;br /&gt;
[root@opensourceecology ~]# chown root:root /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041* /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# vim /var/tmp/dbFail.20250417/info.txt&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /var/tmp/dbFail.20250417/info.txt &lt;br /&gt;
2025-04-17: Marcin emailed me last night saying the wiki was down with a db error. Today I tried to start it, but it refues to come-up. Looks like it&#039;s preventing itself from starting because it realizes something is corrupt and starting it would make things worse. Internet says maybe this was fixed in a newer version; we can&#039;t upgrade because Cent is EOL. Hetzner3 has the newer version&lt;br /&gt;
&lt;br /&gt;
		 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
		Anyway, I&#039;m creating this folder to store some backups before we make things worse.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# aaaand I added a copy of /var/lib/mysql/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /var/lib/mysql /var/tmp/dbFail.20250417/var-lib-mysql.$(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
sending incremental file list&lt;br /&gt;
created directory /var/tmp/dbFail.20250417/var-lib-mysql.20250417&lt;br /&gt;
mysql/&lt;br /&gt;
mysql/aria_log.00000001&lt;br /&gt;
		 16,384 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=707/709)&lt;br /&gt;
...&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rate_locations.frm&lt;br /&gt;
		  8,714 100%    9.26kB/s    0:00:00 (xfr#689, to-chk=1/709)&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rates.frm&lt;br /&gt;
		 13,128 100%   13.95kB/s    0:00:00 (xfr#690, to-chk=0/709)&lt;br /&gt;
&lt;br /&gt;
sent 7,384,914,964 bytes  received 13,343 bytes  114,495,012.51 bytes/sec&lt;br /&gt;
total size is 7,383,062,830  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# another important note: apparently we can keep increasing the value of innodb_force_recovery until it starts, but anything &amp;gt;3 could corrupt the data worse https://dba.stackexchange.com/q/241714&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Marko, MariaDB Innodb lead: MDEV-15370 was a bug when ugprading to 10.3, caused by MDEV-12288. Actually upgrades can still fail (MDEV-15912) if a slow shutdown of the old server was not made. Because the scenario does not involve upgrading to 10.3 or later, I am afraid that the user witnessed some kind of undo log corruption. Starting up with innodb_force_recovery=3 might allow dumping all data. If that crashes, then try innodb_force_recovery=5, but be aware that anything &amp;gt;3 may corrupt the database further, and therefore you should not use the database for anything else than mysqldump&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, a lot of the links for how to fix this are now dead&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## https://forums.mysql.com/read.php?22,603093,604631#msg-604631&lt;br /&gt;
## https://support.plesk.com/hc/en-us/articles/12377798484375-Plesk-is-not-accessible-ERROR-Zend-Db-Adapter-Exception-SQLSTATE-HY000-2002-No-such-file-or-directory&lt;br /&gt;
# we&#039;re running 5.6, so it should be this https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html&lt;br /&gt;
## but note that redirects to 8.6 for some reason? https://dev.mysql.com/doc/refman/8.4/en/forcing-innodb-recovery.html&lt;br /&gt;
## ah, so does 1.1 – apparently anything it doesn&#039;t like just reidrects to the latest version https://dev.mysql.com/doc/refman/1.1/en/forcing-innodb-recovery.html&lt;br /&gt;
# this suggests that, if we&#039;re going to use innodb_force_recovery 4 or greater, we only do it on another machine. So basically take the data I just backed-up put it on a separate machine, and do the fucker *there* instead https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## it also says that dumps of 4 or greater could still render corrupt data, so they shouldn&#039;t be trusted, anyway&lt;br /&gt;
## good news: it says the db blocks all INSERT, UPDATE, and DELETE commands when any recovery mode is enabled&lt;br /&gt;
### but we *can* run DROP. so the idea is to dump everything in recovery mode and drop what is corrupt. then restart with the recovery value set to 0 and restore.&lt;br /&gt;
## it says that dumps from recover mode of 1 or 2 or 3 are safe, and only the page is corrupt&lt;br /&gt;
### here&#039;s the definition of a page https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A unit representing how much data InnoDB transfers at any one time between disk (the data files) and memory (the buffer pool). A page can contain one or more rows, depending on how much data is in each row. If a row does not fit entirely into a single page, InnoDB sets up additional pointer-style data structures so that the information about the row can be stored in one page.&lt;br /&gt;
&lt;br /&gt;
One way to fit more data in each page is to use compressed row format. For tables that use BLOBs or large text fields, compact row format allows those large columns to be stored separately from the rest of the row, reducing I/O overhead and memory usage for queries that do not reference those columns.&lt;br /&gt;
&lt;br /&gt;
When InnoDB reads or writes sets of pages as a batch to increase I/O throughput, it reads or writes an extent at a time.&lt;br /&gt;
&lt;br /&gt;
All the InnoDB disk data structures within a MySQL instance share the same page size.&lt;br /&gt;
&lt;br /&gt;
See Also buffer pool, compact row format, compressed row format, data files, extent, page size, row.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess that just means data that hasn&#039;t been written to disk yet. So I *think* it should be OK to trust data that only has corrupt pages?&lt;br /&gt;
# ok, I think I have enough to proceed – at least for recovery modes 1, 2, and 3.&lt;br /&gt;
# but first let&#039;s check SMART&lt;br /&gt;
# oh, fuck, my notes on this are on the wiki. Of course.&lt;br /&gt;
# arch wiki to the rescue https://wiki.archlinux.org/title/S.M.A.R.T.&lt;br /&gt;
# fail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
-bash: smartctl: command not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# luckily the yum servers for this EOL OS are still online, and I could install it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# yum install smartmontools&lt;br /&gt;
...&lt;br /&gt;
Total download size: 546 k&lt;br /&gt;
Installed size: 2.0 M&lt;br /&gt;
Is this ok [y/d/N]: y&lt;br /&gt;
Downloading packages:&lt;br /&gt;
smartmontools-7.0-2.el7.x86_64.rpm                                                                                                              | 546 kB  00:00:00     &lt;br /&gt;
Running transaction check&lt;br /&gt;
Running transaction test&lt;br /&gt;
Transaction test succeeded&lt;br /&gt;
Running transaction&lt;br /&gt;
  Installing : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
  Verifying  : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
&lt;br /&gt;
Installed:&lt;br /&gt;
  smartmontools.x86_64 1:7.0-2.el7                                                                                                                                     &lt;br /&gt;
&lt;br /&gt;
Complete!&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# better&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
SMART support is: Available - device has SMART capability.&lt;br /&gt;
SMART support is: Enabled&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well this is terrifying; it says both our disks are gonna fail within 24 hours&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# compare that to hetnzer3, which says all is good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m not 100% convinced that this is true. I still want to initiate a test on the drives, but I&#039;m going to go ahead and pass this to hetzner support asap and ask them if there&#039;s a fee for them to replace our drives.&lt;br /&gt;
# oh, interesting. they have a walkthrough that says it&#039;s free via Server -&amp;gt; Technical -&amp;gt; Disk Failure https://robot.hetzner.com/support/index&lt;br /&gt;
## well, it lists two options&lt;br /&gt;
### Free Replacement drive nearly new or used and tested; depends on what is in stock. &lt;br /&gt;
### At cost Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
## we were given an option if we should hot swap while the system is on or shutdown. I&#039;m going to say shutdown. That&#039;ll be simpler from the OS side, I think&lt;br /&gt;
## dang, it says they&#039;ll swap the drive within 2-4 hours.&lt;br /&gt;
# I&#039;ve never done this before, but it&#039;s a hardware raid. My understanding is that as soon as it comes-up, it&#039;ll begin copying the data from one disk to the other disk. But, christ, if both disks are fucked then which disk should I choose them to replace? Can I see which one is more fucked than the other?&lt;br /&gt;
# hetzner provides 4 docs for assistance on this&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#information-on-defective-drives&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/maintainance/nvme/#show-serial-number-of-a-specific-nvme-ssd&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#creating-a-complete-smart-log&lt;br /&gt;
# that first doc says to run the command we just ran&lt;br /&gt;
# hmm..it says for more info we should look at the &amp;quot;Failed Attributes&amp;quot; – but we have none for either disk&lt;br /&gt;
# ok, the docs say we can get more info with -A&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so both say &amp;quot;Percent_Lifetime_Remain&amp;quot; is an issue. does that mean it&#039;s not *actually* writing corrupt data, but it&#039;s literally just a timer that hit and said &amp;quot;yeah you should probably replace the disk??&amp;quot;&lt;br /&gt;
# well, &amp;quot;Percent_Lifetime_Remain&amp;quot; doesn&#039;t appear in the docs table. nor in the source wikipedia table https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes&lt;br /&gt;
# yeah, reddit suggests that means the drive &amp;quot;should be replaced soon&amp;quot; but not that it&#039;s actually detected as failing now https://www.reddit.com/r/homelab/comments/kaaqma/percent_lifetime_remain_failing_now/&lt;br /&gt;
# in that case, I guess it doesn&#039;t matter which disk we replace. But let&#039;s go ahead and get one replaced. I don&#039;t think this was the cause of the db corruption (I still think it&#039;s &amp;quot;shutting down the computer abruptly + a bug in old mariadb that prevents it from recovering&amp;quot;), but I would be stupid not to take a free replacement of a RAID1-mirrored disk that&#039;s alerting us that it&#039;s too old to be in prod.&lt;br /&gt;
# the second hetnzer docs refer to nvme. that&#039;s relevant on hetzner3 but not hetzner2. anyway, I do want to know how to check this on hetzer2 (even if I can&#039;t update the wiki right now with this docs)&lt;br /&gt;
# wow, the output for smartctl looks very different for NVMEs on Debian than it does on CentOS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        39 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    6%&lt;br /&gt;
Data Units Read:                    152.358.379 [78,0 TB]&lt;br /&gt;
Data Units Written:                 52.125.092 [26,6 TB]&lt;br /&gt;
Host Read Commands:                 6.873.372.480&lt;br /&gt;
Host Write Commands:                1.362.559.127&lt;br /&gt;
Controller Busy Time:               22.226&lt;br /&gt;
Power Cycles:                       28&lt;br /&gt;
Power On Hours:                     17.245&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      159&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               39 Celsius&lt;br /&gt;
Temperature Sensor 2:               48 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        40 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    7%&lt;br /&gt;
Data Units Read:                    140.811.605 [72,0 TB]&lt;br /&gt;
Data Units Written:                 56.604.901 [28,9 TB]&lt;br /&gt;
Host Read Commands:                 1.304.073.899&lt;br /&gt;
Host Write Commands:                1.364.668.115&lt;br /&gt;
Controller Busy Time:               21.180&lt;br /&gt;
Power Cycles:                       23&lt;br /&gt;
Power On Hours:                     15.565&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      149&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               40 Celsius&lt;br /&gt;
Temperature Sensor 2:               45 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that shows we&#039;re at 6% and 7% usage on hetzner3, whereas I guess we&#039;re at 100% on hetzner2&lt;br /&gt;
# the third hetzner doc refers to a software raid. actually, I thought we were using a hardware raid, but now I&#039;m not sure&lt;br /&gt;
# this indicates that our raid is fine. two UUs (eg `[UU]`) is fine. Bad would be a U and a missing U (eg `[U_]`)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat &lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sdb2[1] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[1] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah crap, the process to bring the new drive back into the RAID is not-trivial https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## first we have to format the new drive exactly as the old drive, then add each partition into the RAID array, then update grub. And, of course, meanwhile we&#039;ll be running on one disk. So if we fuck-up any of those steps, we loose everything. This could take me a few days (or weeks), and meanwhile the sites are all offline and our daily backups on backblaze are being deleted/rotated out of existance. Sadly, I think I&#039;m going to postpone this until after we get the sites back-up.&lt;br /&gt;
# the last hetzner doc shows us how to get the serial number of our disks (which hetzner will ask-for when we tell them to swap it)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and ran a SMART test; it says it&#039;ll take just 2 minutes to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:07:55 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:08:18 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also kicked-off a long test, which I can check tomorrow&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:12 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:14 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, then we have the filesystem. it looks like /var/lib/msyql/ lives on &#039;/&#039; which is /dev/md2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h /var/lib/mysql&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
/dev/md2        197G  145G   43G  78% /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/md2&lt;br /&gt;
&lt;br /&gt;
Disk /dev/md2: 215.0 GB, 215024271360 bytes, 419969280 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk /dev/md2&lt;br /&gt;
NAME MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
md2    9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it won&#039;t let me check the filesystem while it&#039;s mounted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fsck /dev/md2&lt;br /&gt;
fsck from util-linux 2.23.2&lt;br /&gt;
e2fsck 1.42.9 (28-Dec-2013)&lt;br /&gt;
/dev/md2 is mounted.&lt;br /&gt;
e2fsck: Cannot continue, aborting.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it probably should be happening on-boot, but I couldn&#039;t find it in dmesg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i check&lt;br /&gt;
[    0.000000] Early table checksum verification disabled&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i fsck&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, instead we can just use tune2fs to get the info on the last check that was run&lt;br /&gt;
# looks like it ran today; probably when Marcin rebooted it https://unix.stackexchange.com/questions/400851/what-should-i-do-to-force-the-root-filesystem-check-and-optionally-a-fix-at-bo&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2&lt;br /&gt;
tune2fs 1.42.9 (28-Dec-2013)&lt;br /&gt;
Filesystem volume name:   &amp;lt;none&amp;gt;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Filesystem UUID:          af18bd25-f715-4003-b055-170a07591c60&lt;br /&gt;
Filesystem magic number:  0xEF53&lt;br /&gt;
Filesystem revision #:    1 (dynamic)&lt;br /&gt;
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize&lt;br /&gt;
Filesystem flags:         signed_directory_hash&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Filesystem OS type:       Linux&lt;br /&gt;
Inode count:              13131776&lt;br /&gt;
Block count:              52496160&lt;br /&gt;
Reserved block count:     2624808&lt;br /&gt;
Free blocks:              26575102&lt;br /&gt;
Free inodes:              12417672&lt;br /&gt;
First block:              0&lt;br /&gt;
Block size:               4096&lt;br /&gt;
Fragment size:            4096&lt;br /&gt;
Reserved GDT blocks:      1011&lt;br /&gt;
Blocks per group:         32768&lt;br /&gt;
Fragments per group:      32768&lt;br /&gt;
Inodes per group:         8192&lt;br /&gt;
Inode blocks per group:   512&lt;br /&gt;
Flex block group size:    16&lt;br /&gt;
Filesystem created:       Tue May 31 06:01:12 2016&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Last write time:          Thu Apr 17 17:39:00 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
Check interval:           0 (&amp;lt;none&amp;gt;)&lt;br /&gt;
Lifetime writes:          124 TB&lt;br /&gt;
Reserved blocks uid:      0 (user root)&lt;br /&gt;
Reserved blocks gid:      0 (group root)&lt;br /&gt;
First inode:              11&lt;br /&gt;
Inode size:               256&lt;br /&gt;
Required extra isize:     28&lt;br /&gt;
Desired extra isize:      28&lt;br /&gt;
Journal inode:            8&lt;br /&gt;
Default directory hash:   half_md4&lt;br /&gt;
Directory Hash Seed:      b9456d9f-1608-4444-99c2-02e6f327e42d&lt;br /&gt;
Journal backup:           inode blocks&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# both of the filesystems (/ and /boot) look fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md1 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /boot&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              46&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Tue May 31 06:01:07 2016&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, so far I couldn&#039;t find any signs of corruption on the disk/fs level&lt;br /&gt;
# back to the db, I set the recovery option in the my.cnf file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# cp my.cnf my.cnf.20250417&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 1&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it didn&#039;t come-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried changing it to restore level 2; this time it got stuck &amp;quot;waiting for the background threads&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 22:32:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 22:32:49 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 14901 ...&lt;br /&gt;
250417 22:32:49 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 22:32:49 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 22:32:49 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 22:32:49 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 22:32:49 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 22:32:49 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 22:32:49 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 22:32:49  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 22:32:49  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 22:32:49  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:50  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:51  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:52  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:53  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:54  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:55  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:56  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:57  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:58  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it seems infinite. I don&#039;t know if it&#039;s going to time-out, but I&#039;m just going to leave it and come-back tomorrow.&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 11, 2025=&lt;br /&gt;
&lt;br /&gt;
# let&#039;s get Catarina that broken staging site for osemain on hetzner3&lt;br /&gt;
# Marcin still hasn&#039;t regained access to his ssh key (so he can update the ose keepass), but he did finally send me the password to our hetzner account&lt;br /&gt;
# so now I can order a second IPv4 address, as needed for obi &amp;amp; osemain to have two distinct sites on hetzner3&lt;br /&gt;
# I logged-into hetzner https://robot.hetzner.com/server&lt;br /&gt;
# I also typed a &amp;quot;name&amp;quot; into the blank &amp;quot;name&amp;quot; fields for our two servers. one is now called &amp;quot;hetzner2&amp;quot; and the new one &amp;quot;hetzner3&amp;quot;&lt;br /&gt;
# I clicked on the server for &amp;quot;hetzner3&amp;quot; and the tab &amp;quot;IPs&amp;quot;.&lt;br /&gt;
## Then I clicked on &amp;quot;Order additional IPs / Nets&amp;quot;&lt;br /&gt;
## I selected &amp;quot;One additional IP with costs (€ 1.70 max. per month / € 0.0027 per hour + € 4.90 once-off setup)&amp;quot;&lt;br /&gt;
## it required me to enter a reason (IPv4 is scarce) to which I wrote:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
we need to run two websites with the same domain name that are already running on our primary IPv4 address, and a client doesn&#039;t have IPv6 working at their office&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## and I clicked &amp;quot;Apply for IP/subnet in obligation&amp;quot;&lt;br /&gt;
## I got a message; looks like it needs human approval&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your request for additional IPs/subnets was successfully sent. We will send you an email as soon as your IP/subnet is ready.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I typed an email to Marcin and Catarina to notify them of this order&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
As authorized on our last call, I ordered an additional IPv4 address for your hetzner account.&lt;br /&gt;
&lt;br /&gt;
IPv4 addresses are scarce, and it appears that they need to approve it manually.&lt;br /&gt;
&lt;br /&gt;
The cost is €1.70 per month + € 4.90 once-off setup.&lt;br /&gt;
&lt;br /&gt;
This will allow us to run more than one website with the same domain off the same server. That will be needed for osemain and obi.&lt;br /&gt;
&lt;br /&gt;
Once you finish rebuilding those websites on hetzner3 to use a new not-broken theme, we can cancel this second IP address.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I finished typing ^ that email, I got an email from hetzner indicating that we have a new IP&lt;br /&gt;
# I refreshed the hetzner wui, and now I see the new IP&lt;br /&gt;
# ...&lt;br /&gt;
# following-up on the bus factor, I added Catarina &amp;amp; Tom&#039;s ssh keys to their authorized_keys files on hetzner3&lt;br /&gt;
## I sent them both emails asking them to confirm access&lt;br /&gt;
# I also emailed Marcin asking if he installed zulucrypt yet to try to recover his old ssh key&lt;br /&gt;
# update: within a few hours, Marcin had successfully decrypted and mounted his old veracrypt volume using zuluCrypt&lt;br /&gt;
# he created this article on the wiki https://wiki.opensourceecology.org/wiki/Zulucrypt&lt;br /&gt;
# I found that he had previously documented scattered articles about backups, luks, veracrypt, pgp, cybersec general, etc in a ton of different articles. So I spent some time adding categories and &amp;quot;see also&amp;quot; sections to those articles, in hopes he will be more easily able to do this in the future&lt;br /&gt;
# I also asked him to please document what he needed for himself 5 years from now into a README file next to the &#039;ose-veracrypt&#039; volume on his usb drive.&lt;br /&gt;
# Marcin confirmed that he was able to restore his ssh keys and ssh into hetzner3. awesome.&lt;br /&gt;
# ...&lt;br /&gt;
# I logged all my hours and sent an invoice to OSE for last month (Mar 2025)&lt;br /&gt;
# gah, I had obliterated half my 2025Q1 log. when I tried to restore it, I got a 413 error lgo&lt;br /&gt;
# I checked php and nginx; it&#039;s 10M. How did I write &amp;gt;10 MB of text in one quarter?&lt;br /&gt;
# there&#039;s too many layers on this server; I checked the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Apr 11 22:18:20.306872 2025] [:error] [pid 13182] [client 127.0.0.1:56606] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;], referer: https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&lt;br /&gt;
HTTP/1.1 413 Request Entity Too Large&lt;br /&gt;
Message: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413)&lt;br /&gt;
Apache-Error: [file &amp;quot;apache2_util.c&amp;quot;] [line 271] [level 3] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;]&lt;br /&gt;
127.0.0.1 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.0&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot;&lt;br /&gt;
146.70.199.124 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.1&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it&#039;s modsecurity?&lt;br /&gt;
# gah, that&#039;s a lot of files to review&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# find .  |grep -i security&lt;br /&gt;
./conf.d/mod_security.wordpress.include&lt;br /&gt;
./conf.d/mod_security.conf&lt;br /&gt;
./conf.modules.d/10-mod_security.conf&lt;br /&gt;
./modsecurity.d&lt;br /&gt;
./modsecurity.d/activated_rules&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_42_tight_security.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_35_bad_robots.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_45_trojans.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_48_local_exceptions.conf.example&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_bad_robots.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_23_request_limits.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_49_inbound_blocking.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_60_correlation.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound_malware.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_scanners.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_40_generic_attacks.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_50_outbound.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_47_common_exceptions.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_30_http_policy.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_20_protocol_violations.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_xss_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_59_outbound_blocking.conf&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf.20181024.orig&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf&lt;br /&gt;
./modsecurity.d/do_not_log_passwords.conf&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it&#039;s SecRequestBodyLimit http://stackoverflow.com/questions/13887812/ddg#14690797&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -irl &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf&lt;br /&gt;
modules/mod_security2.so&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s 13107200&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimit 13107200&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimitAction Reject&lt;br /&gt;
Binary file modules/mod_security2.so matches&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# docs say it&#039;s in bytes https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecRequestBodyLimit&lt;br /&gt;
# so 13107200 / 1024 / 1024 = 12.5 MB.&lt;br /&gt;
# jesus that&#039;s a lot of data; I&#039;m not gonna increase that in 4 places (nginx, apache, mod_security, php); let&#039;s just split it into two articles :(&lt;br /&gt;
# ...&lt;br /&gt;
# so Marcin is stressing urgancy to get Catarina a sandbox so she can rebuild osemain using some new theme that&#039;s not broken on the latest version of wordpress, php, etc on hetzner3&lt;br /&gt;
# I didn&#039;t want to do this site before the other less-priority ones, but it&#039;s just a sandbox&lt;br /&gt;
# I realized I never made a CHG file for osemain&lt;br /&gt;
# looks like I first did a snapshot Jan 31https://wiki.opensourceecology.org/wiki/Maltfield_Log/2025_Q1#Fri_Jan_31.2C_2025&lt;br /&gt;
# ugh, I just said I was &amp;quot;following the same guide as with the other sites&amp;quot;&lt;br /&gt;
## I was hoping to know which one to CHG to copy-from&lt;br /&gt;
## I guess it makes the most sense to copy from obi, which already has both a static and dynamic site setup (untested)&lt;br /&gt;
# ok, I made a first draft of our osemain CHG to migrate to hetnzer3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
# oh, crap, I&#039;m going to remove&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306065</id>
		<title>Maltfield Log/2025 Q2</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306065"/>
		<updated>2025-04-27T21:59:44Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: Apr 25&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the second quarter of the year 2025. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 25, 2025=&lt;br /&gt;
# I woke up this morning and discovered the wiki was offline&lt;br /&gt;
# I tried to ssh into the server; it&#039;s not responding&lt;br /&gt;
# I figured I&#039;d log into the hetzner wui, but – uhh – the credentials are in keepass and live on the server&lt;br /&gt;
# I mitigated this by giving Marcin a copy of the keepass file on his veracrypt drive, but he since changed the password a month or two ago, and we don&#039;t have a new local copy&lt;br /&gt;
# I sent an email to Marcin asking him to login to hetzner wui and boot hetzner2. if it doesn&#039;t come-up, then I&#039;ll have to get the password from him so I can load it in the wui from a rescue disk&lt;br /&gt;
# oh, I did find the new hetzner password in my personal keepass&lt;br /&gt;
# I logged-in, and I found the server was listed as being on. But I can&#039;t ping it. I gave it an &amp;quot;automatic hardware reset&amp;quot; from the wui&lt;br /&gt;
# I&#039;ll give it a few minutes before trying the rescue system&lt;br /&gt;
# their rescue systems are much nicer for their cloud product than their dedicated server product&lt;br /&gt;
# it looks like I have two options&lt;br /&gt;
## rescue boot mode: where I&#039;m given ssh access&lt;br /&gt;
## vnc&lt;br /&gt;
# the problem with the rescue boot is that – if this is a grub issue – I wouldn&#039;t be able to &amp;quot;see&amp;quot; the error&lt;br /&gt;
# I enabled VNC and gave the server a reboot&lt;br /&gt;
# I was able to connect via vnc, but it was the damn installation wizard for almalinux. I quit the installation, and the vnc session died.&lt;br /&gt;
# damn, I guess vnc won&#039;t let me see the boot process, after all&lt;br /&gt;
# instead I tried the &amp;quot;rescue system&amp;quot;&lt;br /&gt;
# that didn&#039;t work; I can&#039;t access ssh on either of the IP addresses&lt;br /&gt;
# the docs say to activate the rescue system and then reboot it; that&#039;s what I did https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system/&lt;br /&gt;
# this time I fully shut down the server, and then I enabled the rescue system (while it&#039;s off)&lt;br /&gt;
# I went back to the Reset tab, and it&#039;s still off. So I booted it&lt;br /&gt;
# somehow I was able to login from my ose vm using my personal ssh key, but with user root&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~$ ssh -v root@138.201.84.223&lt;br /&gt;
OpenSSH_9.2p1 Debian-2+deb12u5, OpenSSL 3.0.15 3 Sep 2024&lt;br /&gt;
debug1: Reading configuration data /home/user/.ssh/config&lt;br /&gt;
debug1: Reading configuration data /etc/ssh/ssh_config&lt;br /&gt;
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files&lt;br /&gt;
debug1: /etc/ssh/ssh_config line 21: Applying options for *&lt;br /&gt;
debug1: Connecting to 138.201.84.223 [138.201.84.223] port 22.&lt;br /&gt;
debug1: Connection established.&lt;br /&gt;
...&lt;br /&gt;
Linux rescue 6.12.19 #1 SMP Fri Mar 14 05:34:52 UTC 2025 x86_64&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
  Welcome to the Hetzner Rescue System.&lt;br /&gt;
&lt;br /&gt;
  This Rescue System is based on Debian GNU/Linux 12 (bookworm) with a custom kernel.&lt;br /&gt;
  You can install software like you would in a normal system.&lt;br /&gt;
&lt;br /&gt;
  To install a new operating system from one of our prebuilt images, run &#039;installimage&#039; and follow the instructions.&lt;br /&gt;
&lt;br /&gt;
  Important note: Any data that was not written to the disks will be lost during a reboot.&lt;br /&gt;
&lt;br /&gt;
  For additional information, check the following resources:&lt;br /&gt;
	Rescue System:           https://docs.hetzner.com/robot/dedicated-server/troubleshooting/hetzner-rescue-system&lt;br /&gt;
	Installimage:            https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage&lt;br /&gt;
	Install custom software: https://docs.hetzner.com/robot/dedicated-server/operating-systems/installing-custom-images&lt;br /&gt;
	other articles:          https://docs.hetzner.com/robot&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Rescue System (via Legacy/CSM) up since 2025-04-25 17:24 +02:00&lt;br /&gt;
&lt;br /&gt;
Hardware data:&lt;br /&gt;
&lt;br /&gt;
   CPU1: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Cores 8)&lt;br /&gt;
   Memory:  64153 MB (Non-ECC)&lt;br /&gt;
   Disk /dev/sda: 250 GB (=&amp;gt; 232 GiB) &lt;br /&gt;
   Disk /dev/sdb: 512 GB (=&amp;gt; 476 GiB) &lt;br /&gt;
   Total capacity 709 GiB with 2 Disks&lt;br /&gt;
&lt;br /&gt;
Network data:&lt;br /&gt;
   eth0  LINK: yes&lt;br /&gt;
		 MAC:  90:1b:0e:94:07:c4&lt;br /&gt;
		 IP:   138.201.84.223&lt;br /&gt;
		 IPv6: 2a01:4f8:172:209e::2/64&lt;br /&gt;
		 Intel(R) PRO/1000 Network Driver&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I was able to mount the root drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[2]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 0/2 pages [0KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[2]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[2]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
root@rescue ~ # mount /dev/md2 /mnt&lt;br /&gt;
root@rescue ~ # ls /mnt&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ # ls /mnt/home&lt;br /&gt;
b2user  crupp  hart     lberezhny  marcin      stagingsync  wp&lt;br /&gt;
cmota   Flipo  jthomas  maltfield  not-apache  tgriffing&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what the point of this is; I can&#039;t fix it if I can&#039;t watch it boot and see what&#039;s breaking&lt;br /&gt;
# ok, at the bottom of the docs, hetnzer lists another option = xKVM Rescue System https://docs.hetzner.com/robot/dedicated-server/virtualization/vkvm/&lt;br /&gt;
# it specifically says that&#039;s for debugging boot issues&lt;br /&gt;
# last thing before I try that: I downloaded a local copy of the keepass files from hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~/tmp/hetzner2$ rsync -av --progress root@138.201.84.223:/mnt/etc/keepass ./etc-keepass-20250525&lt;br /&gt;
receiving incremental file list&lt;br /&gt;
created directory ./etc-keepass-20250525&lt;br /&gt;
keepass/&lt;br /&gt;
keepass/passwords.kdbx&lt;br /&gt;
		 46,142 100%   44.00MB/s    0:00:00 (xfr#1, to-chk=6/8)&lt;br /&gt;
keepass/passwords.kdbx.20170728.bak&lt;br /&gt;
		  4,590 100%    4.38MB/s    0:00:00 (xfr#2, to-chk=5/8)&lt;br /&gt;
keepass/passwords.kdbx.20170804.bak&lt;br /&gt;
		  4,590 100%    4.38MB/s    0:00:00 (xfr#3, to-chk=4/8)&lt;br /&gt;
keepass/passwords.kdbx.20190820.bak&lt;br /&gt;
		 33,726 100%  143.20kB/s    0:00:00 (xfr#4, to-chk=3/8)&lt;br /&gt;
keepass/passwords.kdbx.20190909.bak&lt;br /&gt;
		 34,238 100%   71.75kB/s    0:00:00 (xfr#5, to-chk=2/8)&lt;br /&gt;
keepass/passwords.kdbx.20250316.bak&lt;br /&gt;
		 45,406 100%   94.55kB/s    0:00:00 (xfr#6, to-chk=1/8)&lt;br /&gt;
keepass/passwords.kdbxs.20180525.bak&lt;br /&gt;
		 27,102 100%   56.31kB/s    0:00:00 (xfr#7, to-chk=0/8)&lt;br /&gt;
&lt;br /&gt;
sent 161 bytes  received 196,407 bytes  35,739.64 bytes/sec&lt;br /&gt;
total size is 195,794  speedup is 1.00&lt;br /&gt;
user@ose:~/tmp/hetzner2$ &lt;br /&gt;
&lt;br /&gt;
user@ose:~/tmp/hetzner2$ du -sh etc-keepass-20250525/keepass/*&lt;br /&gt;
48K	etc-keepass-20250525/keepass/passwords.kdbx&lt;br /&gt;
8.0K	etc-keepass-20250525/keepass/passwords.kdbx.20170728.bak&lt;br /&gt;
8.0K	etc-keepass-20250525/keepass/passwords.kdbx.20170804.bak&lt;br /&gt;
36K	etc-keepass-20250525/keepass/passwords.kdbx.20190820.bak&lt;br /&gt;
36K	etc-keepass-20250525/keepass/passwords.kdbx.20190909.bak&lt;br /&gt;
48K	etc-keepass-20250525/keepass/passwords.kdbx.20250316.bak&lt;br /&gt;
28K	etc-keepass-20250525/keepass/passwords.kdbxs.20180525.bak&lt;br /&gt;
user@ose:~/tmp/hetzner2$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so this time was the same as the rescue system, except I choose &amp;quot;xKVM&amp;quot; instead of &amp;quot;Linux&amp;quot; in the &amp;quot;Operationg System&amp;quot; dropdown&lt;br /&gt;
# strange, it gave me an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Public key authentication is not available for the selected operating system.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I unselected my ssh key, and chose &amp;quot;no key&amp;quot; instead&lt;br /&gt;
# it gave me a URL and a password. I booted the server, but the URL didn&#039;t load (&amp;quot;Unable to connect&amp;quot; error)&lt;br /&gt;
# ok, it took a few minutes and had a self-signed cert&lt;br /&gt;
# I bypassed the cert error, and entered the username and password into the basic auth popup. It failed! Could I really have been MITM&#039;d?&lt;br /&gt;
# I immediately shut down the server from the wui, and I tried again.&lt;br /&gt;
# this time I was able to login – both from ssh and in the wui.&lt;br /&gt;
# as soon as it opened, I saw the error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
No more network devices&lt;br /&gt;
&lt;br /&gt;
Booting from Hard Disk...&lt;br /&gt;
.&lt;br /&gt;
error: symbol &#039;grub_calloc&#039; not found.&lt;br /&gt;
Entering rescue mode...&lt;br /&gt;
grub rescue&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I wonder if this is grub or grub2. I didn&#039;t have a binary &amp;quot;grub-install&amp;quot; before. I assumed it was an error with the hetzner docs when I did &amp;quot;grub2-install&amp;quot; instead, which said it worked (there was a warning that the docs said were safe to ignore)&lt;br /&gt;
# curoiusly, the opposite is true for the ssh session in vkvm: I have grub-install but not grub2-install&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@vKVM-rescue ~ # which grub-install&lt;br /&gt;
/usr/sbin/grub-install&lt;br /&gt;
root@vKVM-rescue ~ # &lt;br /&gt;
root@vKVM-rescue ~ # which grub2-install&lt;br /&gt;
root@vKVM-rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the docs in question https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
# I don&#039;t want to fuck with the grub without first taking a backup of these disks. But, uh, it looks like I can&#039;t access the RAID from inside this vkvm setup&lt;br /&gt;
# yeah, that&#039;s one of the limitations listed for VKVM https://docs.hetzner.com/robot/dedicated-server/virtualization/vkvm/#raid-controllers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Configured units are passed through as SCSI devices to the VM. However it is not possible to access the controller. Please use the regular Hetzner Rescue System for this purpose.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I shutdown VKVM and booted it into the regular rescue mode&lt;br /&gt;
# it took a few minutes to get back into the old rescue system, but here I can use the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS&lt;br /&gt;
loop0     7:0    0   3.4G  1 loop  &lt;br /&gt;
sda       8:0    0 476.9G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 &lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 &lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 &lt;br /&gt;
sdb       8:16   0 232.9G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 &lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 &lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 &lt;br /&gt;
root@rescue ~ # mkdir /mnt/md1&lt;br /&gt;
root@rescue ~ # mkdir /mnt/md2&lt;br /&gt;
root@rescue ~ # mount /dev/md1 /mnt/md1&lt;br /&gt;
root@rescue ~ # mount /dev/md2 /mnt/md2&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created a dir for these backups&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # mkdir /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # chown root:root /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # chmod 0700 /mnt/md2/var/tmp/20250425-grub-fail&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first I made a backup from the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # rsync -av --progress /mnt/md1 /mnt/md2/var/tmp/20250425-grub-fail/md1.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
...&lt;br /&gt;
md1/grub2/locale/zh_TW.mo&lt;br /&gt;
		 30,882 100%   31.38kB/s    0:00:00 (xfr#345, to-chk=0/355)&lt;br /&gt;
md1/lost+found/&lt;br /&gt;
&lt;br /&gt;
sent 399,450,301 bytes  received 6,709 bytes  159,782,804.00 bytes/sec&lt;br /&gt;
total size is 399,330,989  speedup is 1.00&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# then I figured I&#039;d make a backup of the two disk partitions directly, but I couldn&#039;t even mount it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # umount /mnt/md1&lt;br /&gt;
root@rescue ~ # mkdir /mnt/sda2&lt;br /&gt;
root@rescue ~ # mkdir /mnt/sdb2&lt;br /&gt;
root@rescue ~ # mount /dev/sda2 /mnt/sda2&lt;br /&gt;
mount: /mnt/sda2: unknown filesystem type &#039;linux_raid_member&#039;.&lt;br /&gt;
	   dmesg(1) may have more information after failed mount system call.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried this command (from the docs), which I skipped before because it said that the next command (grub-install) was enough; sure enough, it didn&#039;t work https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-mkdevicemap -n&lt;br /&gt;
grub-mkdevicemap: error: cannot open /boot/grub/device.map.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I investigated this before, and I thought I decided we&#039;re using grub2, not grub1&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # mount /dev/md1 /mnt/md1&lt;br /&gt;
root@rescue ~ # ls /mnt/md1/&lt;br /&gt;
config-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi&lt;br /&gt;
grub&lt;br /&gt;
grub2&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-327.18.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-514.26.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64kdump.img&lt;br /&gt;
initrd-plymouth.img&lt;br /&gt;
lost+found&lt;br /&gt;
symvers-3.10.0-1127.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-1160.119.1.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-327.18.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-514.26.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, shit, even the grub-install command is v2 https://askubuntu.com/questions/107486/how-to-know-the-version-of-grub&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-install --version&lt;br /&gt;
grub-install (GRUB) 2.06-13+deb12u1&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, this indicates we&#039;re not using lilo https://askubuntu.com/questions/24459/how-do-i-find-out-which-boot-loader-i-have&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2/etc/ | grep lilo&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can dd straight from the disk to read the MBR. And, yeah, it appears we are using grub via MBR .. and this info is stored on the disks, not the raid&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # dd if=/dev/md1 bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # dd if=/dev/sda bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
214fb5736d1e5ad63e515dc2fffe44bd928cd8dab2c019dc11fb9fcaef5ea90dbf51f1ac507ab1cfbbe74ff&lt;br /&gt;
ZRr=&lt;br /&gt;
`|f	&lt;br /&gt;
\|f1&lt;br /&gt;
GRUB &lt;br /&gt;
Geom&lt;br /&gt;
Hard Disk&lt;br /&gt;
Read&lt;br /&gt;
 Error&lt;br /&gt;
DA/jjF&lt;br /&gt;
root@rescue ~ #&lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # dd if=/dev/sdb bs=512 count=1 2&amp;gt;/dev/null | strings&lt;br /&gt;
ZRr=&lt;br /&gt;
`|f	&lt;br /&gt;
\|f1&lt;br /&gt;
GRUB &lt;br /&gt;
Geom&lt;br /&gt;
Hard Disk&lt;br /&gt;
Read&lt;br /&gt;
 Error&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# idk what to do; I tried the grub-install again, but it gives me this error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # grub-install /dev/sda&lt;br /&gt;
grub-install: error: /usr/lib/grub/i386-pc/modinfo.sh doesn&#039;t exist. Please specify --target or --directory.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&lt;br /&gt;
root@rescue ~ # grub-install /dev/sdb&lt;br /&gt;
grub-install: error: /usr/lib/grub/i386-pc/modinfo.sh doesn&#039;t exist. Please specify --target or --directory.&lt;br /&gt;
root@rescue ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried creating a chroot of our real raid disks first&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue ~ # ls /mnt/md2&lt;br /&gt;
bin   etc                installimage.debug  lost+found  old   root  srv  usr&lt;br /&gt;
boot  home               lib                 media       opt   run   sys  var&lt;br /&gt;
dev   installimage.conf  lib64               mnt         proc  sbin  tmp&lt;br /&gt;
root@rescue ~ # umount /mnt/md1&lt;br /&gt;
root@rescue ~ # chroot-prepare /mnt/md2&lt;br /&gt;
root@rescue ~ # chroot /mnt/md2&lt;br /&gt;
root@rescue / # ls /boot&lt;br /&gt;
root@rescue / # mount /dev/md1 /boot&lt;br /&gt;
root@rescue / # ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi&lt;br /&gt;
grub&lt;br /&gt;
grub2&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img&lt;br /&gt;
initramfs-3.10.0-327.18.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-514.26.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64.img&lt;br /&gt;
initramfs-3.10.0-693.2.2.el7.x86_64kdump.img&lt;br /&gt;
initrd-plymouth.img&lt;br /&gt;
lost+found&lt;br /&gt;
symvers-3.10.0-1127.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-1160.119.1.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-327.18.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-514.26.2.el7.x86_64.gz&lt;br /&gt;
symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
root@rescue / # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I then tried the grub install again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@rescue / # grub2-install /dev/sda&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
root@rescue / #&lt;br /&gt;
&lt;br /&gt;
root@rescue / # grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
root@rescue / # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I exited the chroot and shutdown the rescue system&lt;br /&gt;
# I activated the VKVM resuce system, and booted it again&lt;br /&gt;
# when I connected to the KVM wui, I was shown a password prompt. So I think booting works!&lt;br /&gt;
# I rebooted it from the ssh&lt;br /&gt;
# and now I can ssh into the real system&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~$ autossh opensourceecology.org&lt;br /&gt;
Last login: Thu Apr 24 23:12:44 2025 from 146.70.199.15&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and now the wiki loads too&lt;br /&gt;
# I did another reboot test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ sudo su -&lt;br /&gt;
[sudo] password for maltfield: &lt;br /&gt;
Last login: Thu Apr 24 16:25:15 UTC 2025 on pts/0&lt;br /&gt;
[root@opensourceecology ~]# reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
Last login: Fri Apr 25 16:29:21 2025 from 185.204.1.184&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# idk, my takeaway is that either one or some of these assumptions are correct&lt;br /&gt;
## grub-install needs to be run *after* the RAID sync is finished&lt;br /&gt;
## grub-install needs to be run on *both* the new *and* the old disk&lt;br /&gt;
## grub-install needs to be run inside a chroot on the rescue system&lt;br /&gt;
# anyway, we&#039;re stable again&lt;br /&gt;
# I got an email from Marcin saying Tom could help with the migrations. I sent him some wiki articles to get caught-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Tom,&lt;br /&gt;
&lt;br /&gt;
I&#039;ll try to get you ssh access on hetzner2 soon. In the meantime, please read the following articles:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Hetzner2&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
&lt;br /&gt;
I&#039;ve started preparing draft &amp;quot;change tickets&amp;quot; for migrating each of the websites from hetzner2 to hetzner3. Note that some of these are not fully tested, so you&#039;ll want to execute them manually and make corrections as-needed.&lt;br /&gt;
&lt;br /&gt;
Please also read-through these:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_store_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_microfactory_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_fef&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_oswh&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_phplist_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_wiki_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
(There&#039;s also one CHG for the forum that I think needs to be made)&lt;br /&gt;
&lt;br /&gt;
The next item TODO is to finish the migration plan for these websites:&lt;br /&gt;
&lt;br /&gt;
 1. www.opensourceecology.org (osemain)&lt;br /&gt;
 2. www.openbuildinginstiture.org (obi)&lt;br /&gt;
&lt;br /&gt;
We decided that there would be 2 simultaneous versions of obi:&lt;br /&gt;
&lt;br /&gt;
1. A static site scraped with curl on hetzner3&lt;br /&gt;
2. The (broken) dynamic wordpress site on hetzner3&lt;br /&gt;
&lt;br /&gt;
And we decided that there would be 3 simultaneous versions of osemain:&lt;br /&gt;
&lt;br /&gt;
1. The live/current site on hetzner2&lt;br /&gt;
2. A static site scraped with curl on hetzner3&lt;br /&gt;
3. The (broken) dynamic wordpress site on hetzner3&lt;br /&gt;
&lt;br /&gt;
To have multiple sites with the same domain on the same server, we bought a second IPv4 address (FeF isn&#039;t setup with IPv6). This week I just finished updating the hetzer3 server to persist this new IPv4 address.&lt;br /&gt;
&lt;br /&gt;
The next item for you would be to update our ansible to push out new vhosts (in nginx, varnish, and apache) for the static sites that are bound to the second IPv4 address using the same hostname.&lt;br /&gt;
&lt;br /&gt;
Please read-through the ansible playbook and roles (most importantly for nginx, varnish, and apache) to understand how they&#039;re provisioned&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/OpenSourceEcology/ansible&lt;br /&gt;
&lt;br /&gt;
Since you have access to hetzner3, you can also poke around (read-only please) the configs for these three web services to understand how ansible provisions them.&lt;br /&gt;
&lt;br /&gt;
Once you&#039;ve updated and pushed-out the new vhosts with ansible, you&#039;ll need to update the migration plan&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_obi_to_hetzner3&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
&lt;br /&gt;
And then you&#039;ll want to go-through each migration plan to create a temp &amp;quot;snapshot&amp;quot; of all the sites on hetzner3, where Marcin &amp;amp; Catarina can do a thorough verification of each site (by updating /etc/hosts) before we do the *real* migration -- which is nearly the same as the &amp;quot;snapshot&amp;quot; except we actually migrate DNS.&lt;br /&gt;
&lt;br /&gt;
Please let me know when you&#039;ve finished reading the above articles.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/24/25 22:16, REDACTED@tutanota.com wrote:&lt;br /&gt;
&amp;gt; Michael;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I need to reset my ssh key on hetzner2. Can you use the same as on 3 or best to generate a new one?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I spoke with Marcin and I think I can help with the admin, as I have time available.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Can you give a run-down of its status and what needs to be done for completing the migration to hetzner3?&lt;br /&gt;
&amp;gt; -- &lt;br /&gt;
&amp;gt; Tom Griffing&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 24, 2025=&lt;br /&gt;
# it&#039;s 05:00; I tried to login to the wiki, but I got an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, under that it says I&#039;m already logged-in?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
You are already logged in as Maltfield. Use the form below to log in as another user. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, let&#039;s start the CHG to replace the failing disk on hetzner 2 https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
# I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried to remove the first partition from the RAID, but it said I can&#039;t?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# apparently the docs say that if the RAID is healthy, you have to force it with &#039;--fail&#039; https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
# crap, I realized I have an issue in my CHG (we need two sysadmins for peer review *sigh*)&lt;br /&gt;
## I listed this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## but it should be this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, it looks like I first need to execute this, to force the RAID into a failure state&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I was able to remove it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# by 10:32 UTC, I submitted the request to hetzner to replace /dev/sdb = &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# it says they should do it within 2-4 hours&lt;br /&gt;
# meanwhile, I updated https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# at 08:00 my time, I checked and saw that we had an email come from hetzner at 06:36 (my time)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, crap. I tried to load the wiki CHG article, but there&#039;s an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry! This site is experiencing technical difficulties.&lt;br /&gt;
&lt;br /&gt;
Try waiting a few minutes and reloading.&lt;br /&gt;
&lt;br /&gt;
(Cannot access the database)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the server wasn&#039;t shutdown, and my screen session is still intact, but dmesg is being flooded with RAID and io errors&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
[11136.011313] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.011372] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11136.319267] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.319322] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.827642] EXT4-fs error: 5 callbacks suppressed&lt;br /&gt;
[11138.827693] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
[11138.827793] EXT4-fs: 5 callbacks suppressed&lt;br /&gt;
[11138.827841] EXT4-fs (md2): previous I/O error to superblock detected&lt;br /&gt;
[11138.835255] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835311] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835367] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11138.835472] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well anyway, I&#039;ll see if I can at least restart the RAID sync and install grub on the new disk&lt;br /&gt;
# son of a bitch, they removed the wrong drive!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 13:05:32 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT&lt;br /&gt;
sdb      8:16   0   477G  0 disk &lt;br /&gt;
sdc      8:32   0 232.9G  0 disk &lt;br /&gt;
├─sdc1   8:33   0    32G  0 part &lt;br /&gt;
├─sdc2   8:34   0   512M  0 part &lt;br /&gt;
└─sdc3   8:35   0 200.4G  0 part &lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
device node not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it shows a new drive (sdc) and and old drive (sdb)&lt;br /&gt;
# ugh, so now we have nothing in the raid?&lt;br /&gt;
# here&#039;s the new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdc | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# christ, so this new disk is half the size of our actual disk? what did they do?!?&lt;br /&gt;
# and now we have a prod server online with no redundancy. I can&#039;t tell them to put back-in the *correct* disk, or we&#039;ll have data loss&lt;br /&gt;
# I&#039;m going to stop all the web services before this disaster gets any worse&lt;br /&gt;
# great; io errors. this is a damn disaster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
Failed to stop apache2.service: Unit apache2.service not loaded.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and made partition backups, anyway&lt;br /&gt;
# wait, actually, it said that /dev/sdc = Crucial_CT250MX200SSD1_154410FA336C. That&#039;s our old /dev/sda&lt;br /&gt;
# so they *did* remove the right drive, but the re-insertion of the wrong drive pushed /dev/sda to /dev/sdc. That kinda breaks our ability to map the RAID, but let&#039;s at-least partition this new drive&lt;br /&gt;
# but this new drive isn&#039;t the right size. it&#039;s 512G while our old disk was 250G. I guess it&#039;s better to have too-big of a disk than too-small of a disk, but we won&#039;t be able to use that extra disk space. I&#039;m going to assume that they just didn&#039;t have 250G disks in-stock anymore.&lt;br /&gt;
# anyway, I tried to backup the partitions, but that wouldn&#039;t work since we&#039;re read-only&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
mkdir: cannot create directory ‘/var/tmp/chg.20250424_132010’: Read-only file system&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
chown: cannot access ‘/var/tmp/chg.20250424_132010’: No such file or directory&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what to do besides giving it a reboot, but that scares me&lt;br /&gt;
# I&#039;d like to take a backup, but I can&#039;t if I get read-only errors :(&lt;br /&gt;
# well, I guess that&#039;s why we made a backup before this. I don&#039;t think I have any option other than to reboot. and pray that grub is intact to bring it back.&lt;br /&gt;
# I gave it a reboot. If it doesn&#039;t come back, I&#039;ll try to boot to the rescue CD from within the hetzner wui&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date &amp;amp;&amp;amp; reboot&lt;br /&gt;
Thu Apr 24 13:24:18 UTC 2025&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&lt;br /&gt;
Failed to start reboot.target: Unit is not loaded properly: Input/output error.&lt;br /&gt;
See system logs and &#039;systemctl status reboot.target&#039; for details.&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# wtf, it can&#039;t even reboot it&#039;s so broken.&lt;br /&gt;
# I triggered a rest on the hetzner wui&lt;br /&gt;
# the server came back, and I immediately shutdown all services again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and triggered backups&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /etc/cron.d/backup_to_backblaze &lt;br /&gt;
20 07 * * * root time /bin/nice /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
20 04 03 * * root time /bin/nice /root/backups/backupReport.sh&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# time /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, sdc is gone. we have sda and sdb again, and sda is our original sda – as we wanted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I made a backup of the partitions; it&#039;s not surprising the sdb file is empty&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# pushd $chg_dir&lt;br /&gt;
/var/tmp/chg.20250424_133230 ~&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# du -sh ${chg_dir}/*&lt;br /&gt;
4.0K    /var/tmp/chg.20250424_133230/sda_parttable_mbr.bak&lt;br /&gt;
0       /var/tmp/chg.20250424_133230/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I copied the partition from sda to sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
Checking that no-one is using this disk right now ...&lt;br /&gt;
OK&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 62260 cylinders, 255 heads, 63 sectors/track&lt;br /&gt;
sfdisk:  /dev/sdb: unrecognized partition table type&lt;br /&gt;
&lt;br /&gt;
Old situation:&lt;br /&gt;
sfdisk: No partitions found&lt;br /&gt;
&lt;br /&gt;
New situation:&lt;br /&gt;
Units: sectors of 512 bytes, counting from 0&lt;br /&gt;
&lt;br /&gt;
   Device Boot    Start       End   #sectors  Id  System&lt;br /&gt;
/dev/sdb1          2048  67110912   67108865  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2      67112960  68161536    1048577  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3      68163584 488395120  420231537  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb4             0         -          0   0  Empty&lt;br /&gt;
Warning: partition 1 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not end at a cylinder boundary&lt;br /&gt;
Warning: no primary partition is marked bootable (active)&lt;br /&gt;
This does not matter for LILO, but the DOS MBR will not boot this disk.&lt;br /&gt;
Successfully wrote the new partition table&lt;br /&gt;
&lt;br /&gt;
Re-reading the partition table ...&lt;br /&gt;
&lt;br /&gt;
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)&lt;br /&gt;
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1&lt;br /&gt;
(See fdisk(8).)&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good, other than the complaint about not being able to boot from this disk; I&#039;ll check later what is LILO and if this will matter for raid grub&lt;br /&gt;
# I reloaded the partition table for this disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# blockdev --rereadpt /dev/sdb&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I added the new disk to the RAID, and it shows that it&#039;s starting to sync now. excellent&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm: added /dev/sdb1&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm: added /dev/sdb2&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
mdadm: added /dev/sdb3&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [&amp;gt;....................]  recovery =  0.0% (19712/33521664) finish=481.1min speed=1159K/sec&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, it looks like it&#039;s not syncing each partition of the RAID at the same time. it&#039;s doing md0 now and then it&#039;ll do the others after, I guess&lt;br /&gt;
# md0 is partition 1 (sda1/sdb1). That&#039;s *sigh* swap. It&#039;s 32GB.&lt;br /&gt;
# I kinda wish we&#039;d sync&#039;d /boot first. I don&#039;t think I can install grub until that&#039;s sync&#039;d. maybe?&lt;br /&gt;
# it says it&#039;s moving about 1024K/s. That&#039;s 1 MB per sec. 32G*1024 = 32,768 MB. That&#039;s 32,768 seconds / 60 = 546 minutes / 60 = 9 hours. Just for swap!&lt;br /&gt;
# assuming we have the same speed for the rest of the disk, that&#039;s 250 G * 1024 = 256,000 MB / 1 MB/s = 256,000 seconds. 256,000 seconds / 60 = 4,266.666666667 minutes / 60 = 4,266.666666667 = 71.11 hours. I guess we just have to accept the risk and hope that old /dev/sda with all our data doesn&#039;t fail within then next 3 days.&lt;br /&gt;
# I tried to go ahead and install grub on the new disk, but i got a command not found error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub-install /dev/sdb&lt;br /&gt;
-bash: grub-install: command not found&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub&lt;br /&gt;
grub2-bios-setup           grub2-glue-efi             grub2-mkconfig             grub2-mkpasswd-pbkdf2      grub2-probe                grub2-set-default&lt;br /&gt;
grub2-editenv              grub2-install              grub2-mkfont               grub2-mkrelpath            grub2-reboot               grub2-setpassword&lt;br /&gt;
grub2-file                 grub2-kbdcomp              grub2-mkimage              grub2-mkrescue             grub2-render-label         grub2-sparc64-setup&lt;br /&gt;
grub2-fstest               grub2-macbless             grub2-mklayout             grub2-mkstandalone         grub2-rpm-sort             grub2-syslinux2cfg&lt;br /&gt;
grub2-get-kernel-settings  grub2-menulst2cfg          grub2-mknetdir             grub2-ofpathname           grub2-script-check         grubby&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it should be &#039;grub2-install&#039; I tried that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that&#039;s two warnings but no errors; I&#039;ll take it.&lt;br /&gt;
# we&#039;re up to 12.4% on the RAID sync of swap. It&#039;s now going &amp;gt;50x faster than it was before; good news&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [==&amp;gt;..................]  recovery = 12.4% (4168832/33521664) finish=8.2min speed=59264K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# calculations at that speed would be 250*1024/58 = 4,413.793103448 seconds / 60 = 73 minutes. Oh, that&#039;s just over an hour.&lt;br /&gt;
# and now we&#039;re at 42.7%&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [========&amp;gt;............]  recovery = 42.7% (14334208/33521664) finish=6.6min speed=47845K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups are still running; I&#039;ll let them finish before starting-up the webservers again&lt;br /&gt;
# I wrote a status email to Marcin&lt;br /&gt;
# the backups still aren&#039;t finished&lt;br /&gt;
# I checked on the raid replication, and it shows md0 (swap) and md1 (boot) are both done. Horray! Now we just need to finish root (/), which is 9.8% done and going at 60 MB/s. Great!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:05:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [=&amp;gt;...................]  recovery =  9.8% (20767872/209984640) finish=50.5min speed=62429K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave the grub install a double-tap now that it&#039;s synced with the first disk; the output was the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the output of lsblk looks much nicer now, too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0 232.9G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups say they&#039;re 9% uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/backups/backup.log&lt;br /&gt;
...&lt;br /&gt;
2025/04/24 14:13:48 INFO  :&lt;br /&gt;
Transferred:        2.210G / 20.472 GBytes, 11%, 2.904 MBytes/s, ETA 1h47m20s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:      13m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg: 10% /20.472G, 2.997M/s, 1h43m59s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I decided to just kill the backup script and manually upload it without the bwlimit, so it&#039;ll go-out faster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# /bin/sudo -u b2user /bin/rclone -v copy /home/b2user/sync/daily_hetzner2_20250424_133017.tar.gpg b2:ose-server-backups&lt;br /&gt;
2025/04/24 14:15:20 INFO  :&lt;br /&gt;
Transferred:      116.500M / 20.472 GBytes, 1%, 1.958 MBytes/s, ETA 2h57m25s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:       1m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg:  0% /20.472G, 5.065M/s, 1h8m35s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# meanwhile we&#039;re at 24% on the RAID sync&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [====&amp;gt;................]  recovery = 23.9% (50200448/209984640) finish=101.1min speed=26325K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, important to note: our new disk doesn&#039;t say that it&#039;s failing :D&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# while the old disk says it&#039;s reached 100% of its lifecycle, the new disk says it&#039;s at – uhh – 96% of it&#039;s life? That doesn&#039;t sound very good :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78516&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       50&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3445&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       47&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   060   046   000    Old_age   Always       -       40 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       407132499909&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12839097351&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26313144762&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52083&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       33&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1449&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       20&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   061   049   000    Old_age   Always       -       39 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       600236629947&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18860233219&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11828985935&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Shame. I was hoping for at least something &amp;lt;50%. Well, I wonder how long that remaining 4% will last us :/&lt;br /&gt;
# ok, backups just finished; let&#039;s start the web services&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl start mariadb&lt;br /&gt;
[root@opensourceecology ~]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the wiki CHG with a status https://wiki.opensourceecology.org/wiki/Category:CHGs&lt;br /&gt;
# And I sent an email to Marcin recommending that he replace /dev/sda with an actual new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&lt;br /&gt;
Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&lt;br /&gt;
I was a bit disappointed to learn that hetzner replaced a disk with 0% &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for choosing the free disk replacement..&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&lt;br /&gt;
Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on replacing that one next week too, but I would recommend that you pay for a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&lt;br /&gt;
Do you authorize me selecting €41.18 for the replacement of /dev/sda on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# from the output above, our old drive said it had &amp;quot;Power_On_Hours&amp;quot; of 78516/24/365 = 8.96 years&lt;br /&gt;
# and our new drive says Power_On_Hours = 52083/24/365 = 5.95 years. Well that&#039;s better, I guess.&lt;br /&gt;
# oh wow, the power cycle count is crazy; our disk we only rebooted 50 times and the new one was only 33 times.&lt;br /&gt;
# also the SMART data for both of these drives has different keys (not just values). apparently it&#039;s very vendor-specific, so some of these comparisons are apples-to-oranges&lt;br /&gt;
# right, we&#039;re at 69.7% replication on root. I&#039;m going to go make breakfast and check-in again after&lt;br /&gt;
# ...&lt;br /&gt;
# over lunch, I realized that Marcin&#039;s last email was possibly hyperbolic panic&lt;br /&gt;
# he&#039;s worried that he just kicked-off a marketing campaign (for the apprenticeship), which now links to information on a broken website – where potential applicants can&#039;t read the info&lt;br /&gt;
# but I think the content actually *is* accessible, just not to Marcin&lt;br /&gt;
# when you&#039;re logged-into the wiki, the cookies bypass the cache. So, regretablly, when hetnzer2&#039;s backend is offline, Marcin sees an error&lt;br /&gt;
# but I&#039;d bet that the frontpage of all the websites and the recently-published apprenticeship info page that he&#039;s published &amp;amp; promoted are still online when he sees that error – for users who are *not* logged-into the site&lt;br /&gt;
# but if the backend site is broken for &amp;gt;24 hours, then the cache will cache the errors (not the content)&lt;br /&gt;
# as a short-term hack, I recommended that we setup a daily reboot of hetzner2 at 10:40 (a good buffer after the backups finish uploading)&lt;br /&gt;
# I asked Marcin if he&#039;d like me to setup a daily reboot at 10:40&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&lt;br /&gt;
Of course I agree it&#039;s not good, and we should migrate away from hetzner2 asap. And I do wish I had more bandwidth to finish the migration faster for you.&lt;br /&gt;
&lt;br /&gt;
But you have a varnish cache that caches pages for 24 hours. Even if your backend webserver and database are down, popular pages (like the frontpage of your wiki or a recent article that you&#039;ve recently promoted) should still load for users.&lt;br /&gt;
&lt;br /&gt;
The big issue isn&#039;t marketing and read-only content. The big issue is editing. That&#039;s what is breaking.&lt;br /&gt;
&lt;br /&gt;
When you&#039;re logged into the wiki, it bypasses the varnish cache. So, even if the wiki appears down to you, the contents of (most) articles viewed in the past 24 hours will be still visible to potential apprenticeship applicants.&lt;br /&gt;
&lt;br /&gt;
The next time you see the websites are down, try loading it from another device where you&#039;re not logged-in. You&#039;ll probably see that the apprenticeship info is still accessible, even though the backend for the site is down.&lt;br /&gt;
&lt;br /&gt;
As a short-term hack, I recommend setting-up a daily reboot of the server. Backups typically finish before 10:10 UTC. I recommend we add a cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&lt;br /&gt;
The server seems to function for some time after a fresh reboot, and it caches pages for 24 hours. So the first time someone loads a page in the wiki after that reboot, it&#039;ll be cached for the entire time that the server is online until its next reboot. I think this will ensure higher availability of your read-only content (eg information about the apprenticeship).&lt;br /&gt;
&lt;br /&gt;
Would you like me to setup a daily reboot at 10:40 UTC on hetzner2? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# I checked-in on the RAID replication status; it&#039;s finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [===================&amp;gt;.]  recovery = 96.5% (202794752/209984640) finish=2.5min speed=46324K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:20:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	 	 523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like I started it just after 13:32 and it finished just before 15:20. So it took just under 2 hours. Great!&lt;br /&gt;
# I updated the article with status updates, marking the CHG as completed successfully https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb#2025-04-24_16:18_UTC&lt;br /&gt;
# And I sent an email to Marcin &amp;amp; Catarana to let them know it was successful, and asked again about buying a new drive for replacing /dev/sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Update: your new (used) disk is now fully synced with the old (failing) disk.&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
According to SMART data, you now have one failing disk and one not-failing disk.&lt;br /&gt;
&lt;br /&gt;
Your hetzner2 RAID is now healthy, and you have redundancy spread across two mirrored disks again.&lt;br /&gt;
&lt;br /&gt;
Next week I&#039;d like to replace the other failing disk. Please let me know if you approve the purchase of a new disk for its replacement. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin got back to me, approving the purchase of the new disk; I updated the ticket https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# Note that the price is listed as &amp;quot;at cost&amp;quot; and it says&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# 1,000 hours is fine. That&#039;s compared to the 78,516 hours of /dev/sda and 52,083 hours of our &amp;quot;new&amp;quot; /dev/sdb&lt;br /&gt;
# but it&#039;s a bit concerning that it says it might not be in-stock. I&#039;m going to message them and ask if they can set one aside for us for next week&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hi Support,&lt;br /&gt;
&lt;br /&gt;
Can you set-aside a replacement disk for this server?&lt;br /&gt;
&lt;br /&gt;
Our disks&#039; SMART logs indicated that both disks should be replaced. Today we replaced one of the two disks, but the disk that you replaced it with has 4% of its life left, according to SMART data (it has 52,083 hours of operation).&lt;br /&gt;
&lt;br /&gt;
Next week we would like to replace the other disk, and this time we&#039;d like your &amp;quot;at cost&amp;quot; option, to get a disk with &amp;lt;1,000 hours of operation.&lt;br /&gt;
&lt;br /&gt;
But I was a bit concerned when I read this next to the WUI option for &amp;quot;at cost&amp;quot; on your website&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&lt;br /&gt;
Specifically what worries me is the &amp;quot;may not be in stock&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
Can you please tell us if you have stock now? And if you do, can you please reserve one disk for us for next week?&lt;br /&gt;
&lt;br /&gt;
We don&#039;t want to remove a disk from our RAID and plan for downtime, only to discover that you don&#039;t have a disk available for us..&lt;br /&gt;
&lt;br /&gt;
Please let us know if you can reserve 1 disk for us for next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I asked Marcin if Wed next week at 11:00 UTC is ok for replacing hetzner2&#039;s sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
   * 13:00 in Germany (where the server lives)&lt;br /&gt;
   * 06:00 here in Ecuador, and&lt;br /&gt;
   * 06:00 at FeF&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime,&lt;br /&gt;
please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
agreeable to you, and if you have any questions.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin returned the email confirming the time&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin got back to me and told me to setup the daily reboot cron on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, please set up reboot. That is decent for now&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 11:08 AM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;  &amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt;  &amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Of course I agree it&#039;s not good, and we should migrate away from&lt;br /&gt;
&amp;gt; hetzner2 asap. And I do wish I had more bandwidth to finish the&lt;br /&gt;
&amp;gt; migration faster for you.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; But you have a varnish cache that caches pages for 24 hours. Even if&lt;br /&gt;
&amp;gt; your backend webserver and database are down, popular pages (like the&lt;br /&gt;
&amp;gt; frontpage of your wiki or a recent article that you&#039;ve recently&lt;br /&gt;
&amp;gt; promoted) should still load for users.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The big issue isn&#039;t marketing and read-only content. The big issue is&lt;br /&gt;
&amp;gt; editing. That&#039;s what is breaking.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When you&#039;re logged into the wiki, it bypasses the varnish cache. So,&lt;br /&gt;
&amp;gt; even if the wiki appears down to you, the contents of (most) articles&lt;br /&gt;
&amp;gt; viewed in the past 24 hours will be still visible to potential&lt;br /&gt;
&amp;gt; apprenticeship applicants.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The next time you see the websites are down, try loading it from another&lt;br /&gt;
&amp;gt; device where you&#039;re not logged-in. You&#039;ll probably see that the&lt;br /&gt;
&amp;gt; apprenticeship info is still accessible, even though the backend for the&lt;br /&gt;
&amp;gt; site is down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; As a short-term hack, I recommend setting-up a daily reboot of the&lt;br /&gt;
&amp;gt; server. Backups typically finish before 10:10 UTC. I recommend we add a&lt;br /&gt;
&amp;gt; cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The server seems to function for some time after a fresh reboot, and it&lt;br /&gt;
&amp;gt; caches pages for 24 hours. So the first time someone loads a page in the&lt;br /&gt;
&amp;gt; wiki after that reboot, it&#039;ll be cached for the entire time that the&lt;br /&gt;
&amp;gt; server is online until its next reboot. I think this will ensure higher&lt;br /&gt;
&amp;gt; availability of your read-only content (eg information about the&lt;br /&gt;
&amp;gt; apprenticeship).&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you like me to setup a daily reboot at 10:40 UTC on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we don&#039;t have ansible for hetzner2; I did this manually&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology cron.d]# pwd&lt;br /&gt;
/etc/cron.d&lt;br /&gt;
[root@opensourceecology cron.d]# ls -lah&lt;br /&gt;
total 52K&lt;br /&gt;
drwxr-xr-x.   2 root root 4.0K Apr 24 17:56 .&lt;br /&gt;
drwxr-xr-x. 105 root root  12K Apr 18 21:52 ..&lt;br /&gt;
-rw-r--r--    1 root root  128 May 16  2023 0hourly&lt;br /&gt;
-rw-r--r--    1 root root 1.3K Apr  9  2019 awstats_generate_static_files&lt;br /&gt;
-rw-r--r--    1 root root  151 Apr 24 17:52 backup_to_backblaze&lt;br /&gt;
-rw-r--r--    1 root root   78 May 31  2024 cacti&lt;br /&gt;
-rw-r--r--    1 root root  125 Dec 11 00:16 letsencrypt&lt;br /&gt;
-rw-r--r--    1 root root  506 Mar 18  2019 phplist&lt;br /&gt;
-rw-r--r--    1 root root  108 Jan  7  2022 raid-check&lt;br /&gt;
-rw-r--r--    1 root root  118 Apr 24 17:56 reboot&lt;br /&gt;
-rw-------    1 root root  235 Dec 15  2022 sysstat&lt;br /&gt;
[root@opensourceecology cron.d]# cat reboot &lt;br /&gt;
# 2025-04-24: temp hack for unstable hetzner2 while we build-out hetzner3 to replace it&lt;br /&gt;
40 10 * * * root /sbin/reboot&lt;br /&gt;
[root@opensourceecology cron.d]# &lt;br /&gt;
# tomorrow morning I should check on the uptime and journalctl to make sure it rebooted sometime around 10:40 UTC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# ok, back to hetzner3: we bought a second IPv4 address for the static sites, but the server&#039;s networking was never setup for it; let&#039;s add that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # cp interfaces interfaces.20250424&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that failed.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
You have mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
I restored the backup file, and it still failed. The journal and status aren&#039;t helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl status networking&lt;br /&gt;
× networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: failed (Result: exit-code) since Thu 2025-04-24 17:18:55 UTC; 52s ago&lt;br /&gt;
   Duration: 2month 1w 20h 39min 50.765s&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 3259336 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)&lt;br /&gt;
	Process: 3259371 ExecStopPost=/usr/bin/touch /run/network/restart-hotplug (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 3259336 (code=exited, status=1/FAILURE)&lt;br /&gt;
		CPU: 29ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
root@hetzner3 ~ # journalctl -u networking | tail&lt;br /&gt;
Apr 24 17:16:36 hetzner3 ifup[3258504]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I run the ExecStart command manaully, I can add a verbose tag. but that&#039;s not especially helpful, either&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ifup --verbose -a --read-environment&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
&lt;br /&gt;
ifup: configuring interface enp0s31f6=enp0s31f6 (inet)&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
ip addr add 144.76.164.201/255.255.255.224 broadcast 144.76.164.223       dev enp0s31f6 label enp0s31f6&lt;br /&gt;
RTNETLINK answers: File exists&lt;br /&gt;
ifup: failed to bring up enp0s31f6&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/000resolvconf&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/ethtool&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/postfix&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/resolved&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# curiously, though, the new IPv4 address is listed in `ip a`&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to give this server a reboot before proceeding, to make sure the IP config is sticky&lt;br /&gt;
# when it came-up, it lost the new IP :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, at least it&#039;s restarting now without errors; I can work with that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # systemctl restart networking&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /etc/network # systemctlstatus networking&lt;br /&gt;
-bash: systemctlstatus: command not found&lt;br /&gt;
root@hetzner3 /etc/network # systemctl status networking&lt;br /&gt;
● networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (exited) since Thu 2025-04-24 17:33:40 UTC; 15s ago&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 8598 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)&lt;br /&gt;
	Process: 9022 ExecStart=/bin/sh -c if [ -f /run/network/restart-hotplug ]; then /sbin/ifup -a --read-environment --allow=hotplug; fi (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 9022 (code=exited, status=0/SUCCESS)&lt;br /&gt;
		CPU: 357ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:33:34 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:33:39 hetzner3 ifup[8663]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 ifup[8907]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 systemd[1]: Finished networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s try to add it now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces interfaces.20250424 &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,23&lt;br /&gt;
&amp;gt; iface enp0s31f6 inet static&lt;br /&gt;
&amp;gt;   address 144.76.164.195&lt;br /&gt;
&amp;gt;   netmask 255.255.255.224&lt;br /&gt;
&amp;gt;   gateway 144.76.164.193&lt;br /&gt;
&amp;gt;   # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
&amp;gt;   #up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&amp;gt; &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, but I have errors again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# curiously, it *did* add the new IP address; wtf&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet isn&#039;t very helpful because it seems the damn format has changed so many times over the years; lots of outdated info&lt;br /&gt;
# lots of people say they fixed this by deleting everything in interfaces.d/, but we don&#039;t have anything in that folder&lt;br /&gt;
# I did find this hetzner-specific docs on adding a second IP; it&#039;s totally different than what I&#039;ve read elsewhere https://docs.hetzner.com/robot/dedicated-server/network/net-config-debian-ubuntu&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
up ip addr add 10.4.2.1/32 dev eth0&lt;br /&gt;
down ip addr del 10.4.2.1/32 dev eth0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried this, and gave the server a reboot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,20&lt;br /&gt;
&amp;gt;   # 2025-04-24: add second IPv4 address&lt;br /&gt;
&amp;gt;   up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;   down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # cat interfaces&lt;br /&gt;
### Hetzner Online GmbH installimage&lt;br /&gt;
&lt;br /&gt;
source /etc/network/interfaces.d/*&lt;br /&gt;
&lt;br /&gt;
auto lo&lt;br /&gt;
iface lo inet loopback&lt;br /&gt;
iface lo inet6 loopback&lt;br /&gt;
&lt;br /&gt;
auto enp0s31f6&lt;br /&gt;
iface enp0s31f6 inet static&lt;br /&gt;
  address 144.76.164.201&lt;br /&gt;
  netmask 255.255.255.224&lt;br /&gt;
  gateway 144.76.164.193&lt;br /&gt;
  # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
  up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
  # 2025-04-24: add second IPv4 address&lt;br /&gt;
  up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
  down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::2&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::3&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the system came-up with the IP I want. Cool!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I&#039;m able to restart the service without it yelling at me (or breaking the IP config)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also able to ping the server on both IPs, which is a good sign&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ ping 144.76.164.201&lt;br /&gt;
PING 144.76.164.201 (144.76.164.201) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=1 ttl=50 time=490 ms&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=2 ttl=50 time=490 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.201 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1000ms&lt;br /&gt;
rtt min/avg/max/mdev = 489.558/489.676/489.795/0.118 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
user@disp9871:~$ ping 144.76.164.195&lt;br /&gt;
PING 144.76.164.195 (144.76.164.195) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=1 ttl=50 time=493 ms&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=2 ttl=50 time=512 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.195 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1001ms&lt;br /&gt;
rtt min/avg/max/mdev = 492.853/502.518/512.184/9.665 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I used netcat to test it. Most ports are closed, and I found that nginx is listening on most of the other ports on all IPs – except 4443&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # nc -s 144.76.164.195 -l -p 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and this was how it looked on my laptop&#039;s side&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ nc 144.76.164.195 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so the server&#039;s new IPv4 address is configured (and persistent between reboots)&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 20, 2025=&lt;br /&gt;
# Marcin replied to my email authorizing the replacement of the /dev/sdb disk on hetzner2 at 2025-04-24 10:00 UTC https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
## I updated the article with the defined date &amp;amp; time&lt;br /&gt;
# ...&lt;br /&gt;
# I also checked hetzner3. I see that I setup email alerts for the RAID, but not for SMART.&lt;br /&gt;
## on hetzner2, we had no errors of the RAID, but we did have SMART errors. I guess eventually if it failed enough that RAID replication was breaking, we would have gotten alerts. But it would be good if we could get alerts *before* that happened..&lt;br /&gt;
# I checked munin on hetzner2 to see what data it collects for monitoring disks @ /disk-day.html&lt;br /&gt;
## looks like we have latency, throughput, usage, utilization, i/o, and inode usage. There&#039;s nothing about &amp;quot;SMART errors&amp;quot;&lt;br /&gt;
# looks like there *is* a smart module for munin https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
# it&#039;s already there on hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Mar 21  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 smart_&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# hetzner2 has it too &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology munin]# ls -lah /usr/share/munin/plugins | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Nov  6  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Nov  6  2023 smart_&lt;br /&gt;
[root@opensourceecology munin]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# crap, I just checked hetzner3&#039;s munin, and I realized that varnish is missing :(&lt;br /&gt;
# it looks like ansible *has* pushed-out the script and plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /usr/share/munin/plugins/ | grep -i varnish&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 varnish_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Feb 12 00:14 varnish5_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 28  2024 varnish5_.175431.2025-02-12@00:16:02~&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 25  2024 varnish5_.20240928.orig&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /etc/munin/plugins/ | grep -i varnish&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_backend_traffic -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_bad -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_expunge -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_hit_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_main_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_memory_usage -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_mgt_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_objects -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_request_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_threads -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_transfer_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Feb 12 00:16 varnish_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
 # I did a diff of the varnish5_ script from my server and ose&#039;s server, and I found 2 new lines at the top of the hetzner3 server&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@mail:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
#&lt;br /&gt;
# This program is free software; you can redistribute it and/or modify&lt;br /&gt;
maltfield@mail:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
# Ansible managed&lt;br /&gt;
&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so basically the issue appears to be that my &amp;quot;ansible managed&amp;quot; comment comes before the shebang, so varnish is interpreting everything as shell, instead of perl&lt;br /&gt;
# we can see the result of all these syntax errors with a test run too&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@mail:/etc/munin# munin-run varnish_hit_rate&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
client_req.value 704255&lt;br /&gt;
cache_miss.value 202581&lt;br /&gt;
cache_hitmiss.value 2181&lt;br /&gt;
cache_hit.value 499493&lt;br /&gt;
root@mail:/etc/munin#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 26: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 28: varnish5_: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 30: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 32: Varnish: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 34: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 36: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 38: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 39: [varnish5_*]: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 40: group: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 41: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 42: env.name: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 44: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 108: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 111: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 114: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 117: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 119: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 123: Syntax error: &amp;quot;(&amp;quot; unexpected&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I moved the &amp;quot;ansible managed&amp;quot; comment below the shebang in ansible, and pushed it out; now it works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
client_req.value 10714&lt;br /&gt;
cache_hitmiss.value 9&lt;br /&gt;
cache_hit.value 6478&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
cache_miss.value 4227&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also pushed-out smart at the same time, but it&#039;s not working&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_ suggest&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the docs page for the smart_ munin plugin says that we need this section at-minimum in the munin config file, so I added it to hetzner2 https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# tail -n4 zzz-ose &lt;br /&gt;
&lt;br /&gt;
[smart_*]&lt;br /&gt;
user root&lt;br /&gt;
group disk&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I manually created the symlinks for sda &amp;amp; sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /etc/munin/plugins&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sda&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sdb&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# sweet, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# munin-run smart_sdb&lt;br /&gt;
Program_Fail_Count.value 100&lt;br /&gt;
Reallocated_Event_Count.value 100&lt;br /&gt;
Ave_Block_Erase_Count.value 001&lt;br /&gt;
Reallocate_NAND_Blk_Cnt.value 100&lt;br /&gt;
Erase_Fail_Count.value 100&lt;br /&gt;
Reported_Uncorrect.value 100&lt;br /&gt;
SATA_Interfac_Downshift.value 100&lt;br /&gt;
Offline_Uncorrectable.value 100&lt;br /&gt;
smartctl_exit_status.value 8&lt;br /&gt;
Write_Error_Rate.value 100&lt;br /&gt;
FTL_Program_Page_Count.value 100&lt;br /&gt;
Current_Pending_Sector.value 100&lt;br /&gt;
Success_RAIN_Recov_Cnt.value 100&lt;br /&gt;
UDMA_CRC_Error_Count.value 100&lt;br /&gt;
Error_Correction_Count.value 100&lt;br /&gt;
Temperature_Celsius.value 064&lt;br /&gt;
Raw_Read_Error_Rate.value 100&lt;br /&gt;
Total_Host_Sector_Write.value 100&lt;br /&gt;
Power_Cycle_Count.value 100&lt;br /&gt;
Power_On_Hours.value 100&lt;br /&gt;
Host_Program_Page_Count.value 100&lt;br /&gt;
Unused_Reserve_NAND_Blk.value 000&lt;br /&gt;
Percent_Lifetime_Remain.value 000&lt;br /&gt;
Unexpect_Power_Loss_Ct.value 100&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, I&#039;m not getting the same results on hetzner3. I wonder if this munin plugin doesn&#039;t support nvme drives?&lt;br /&gt;
# oh, it looks like I&#039;m actually not updating that file anymore in ansible, because it has a backup. I&#039;m going to make a note in ansible so I don&#039;t make that mistake again.&lt;br /&gt;
# meanwhile, I manually updated the config file on hetzner3 too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin # cd plugin-conf.d/&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls&lt;br /&gt;
dhcpd3  munin-node  README  spamstats  zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # touch /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chown root:root /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chmod 0600 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # cp zzz-myconf /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls -lah /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
-rw------- 1 root root 1,7K Apr 20 17:29 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # vim zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # diff /var/tmp/munin-zzz-myconf.20250420 /etc/munin/plugin-conf.d/zzz-myconf &lt;br /&gt;
3c3&lt;br /&gt;
&amp;lt; # Version: 0.2&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Version: 0.3&lt;br /&gt;
9c9&lt;br /&gt;
&amp;lt; # Updated: 2024-12-12&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Updated: 2025-04-20&lt;br /&gt;
31a32,35&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; [smart_*]&lt;br /&gt;
&amp;gt; user root&lt;br /&gt;
&amp;gt; group disk&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that still fails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but, if I restart the service first and then run it, it – uhh – kinda works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # service munin-node restart&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it exits with a non-error, just a U. no further stats. huh.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
smartctl_exit_status.value U&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, it looks like the smart_ plugin doesn&#039;t work for nvme drives :(&lt;br /&gt;
## https://github.com/munin-monitoring/munin/issues/790&lt;br /&gt;
## https://github.com/aranemac/munin-smart-nvme&lt;br /&gt;
# I&#039;m not looking to compile some binary. I think we&#039;ve reached the point of diminished return here&lt;br /&gt;
# while historical smart charts would be great, what I really want to achieve is some email alerts from SMART, like we setup for the RAID&lt;br /&gt;
# I found a few guides about this&lt;br /&gt;
## https://linuxconfig.org/how-to-configure-smartd-and-be-notified-of-hard-disk-problems-via-email&lt;br /&gt;
## https://serverfault.com/questions/426761/is-smartd-properly-configured-to-send-alerts-by-email&lt;br /&gt;
## https://unix.stackexchange.com/questions/662633/best-practices-to-enable-smart-disk-notifications-on-a-linux-workstation&lt;br /&gt;
# I replaced the files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # mv /etc/smartd.conf /etc/smartd.conf.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).orig&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc # echo &amp;quot;DEVICESCAN -d removable -n standby -m REDACTED@opensourceecology.org -M exec /usr/share/smartmontools/smartd-runner&amp;quot; &amp;gt; /etc/smartd.conf&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but that didn&#039;t work; no email came when I restarted the service (even if I added -M test)&lt;br /&gt;
# I checked the status in systemd, and it says that it did try to send the mail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # systemctl status smartd&lt;br /&gt;
● smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/smartmontools.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (running) since Sun 2025-04-20 20:58:57 UTC; 3min 22s ago&lt;br /&gt;
	   Docs: man:smartd(8)&lt;br /&gt;
			 man:smartd.conf(5)&lt;br /&gt;
   Main PID: 1466569 (smartd)&lt;br /&gt;
	 Status: &amp;quot;Next check of 2 devices will start at 21:28:57&amp;quot;&lt;br /&gt;
	  Tasks: 1 (limit: 76834)&lt;br /&gt;
	 Memory: 1.2M&lt;br /&gt;
		CPU: 66ms&lt;br /&gt;
	 CGroup: /system.slice/smartmontools.service&lt;br /&gt;
			 └─1466569 /usr/sbin/smartd -n&lt;br /&gt;
&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, is SMART capable. Adding to &amp;quot;monitor&amp;quot; list.&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state read from /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme0n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NX0M104566-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 systemd[1]: Started smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon.&lt;br /&gt;
root@hetzner3 /etc #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I checked the postfix logs, and it looks like google is rejecting our mail?!?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # journalctl -fu postfix@-&lt;br /&gt;
...&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: Untrusted TLS connection established to aspmx.l.google.com[108.177.15.27]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bit&lt;br /&gt;
s) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: CB6E5B94BB2: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[108.177.15.27]:25, delay=1.2, delays=0.01/0.01/0.86/0.27, dsn=2.0.0, status=sent (250 2.0.0 OK  1745183017 ffacd0b85a97d-39efa5a45b6si4251829f8f.798 - gsmtp)&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/qmgr[4510]: CB6E5B94BB2: removed&lt;br /&gt;
Apr 20 21:04:36 hetzner3 postfix/smtp[1468114]: Untrusted TLS connection established to aspmx.l.google.com[2404:6800:4003:c02::1b]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/bounce socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/bounce socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/defer socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/defer socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: D13CAB94BB3: defer service failure&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: D13CAB94BB3: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[2404:6800:4003:c02::1b]:25, delay=4.5, delays=0.01/0.01/3.5/1, dsn=4.3.0, status=deferred (bounce or trace service failure)&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I changed it to my personal email, restarted, and I got two emails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme1&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NA0M345614, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme0&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NX0M104566, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I changed it back to the google groups email list email address, and I updated the wiki https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
# after lunch, I refreshed munin on hetzne2 and hetzner3, to see if smart info was not being charted&lt;br /&gt;
## on hetzner2, there&#039;s no changes. I don&#039;t see any charts related to SMART&lt;br /&gt;
## on hetzner3, there&#039;s two new charts (S.M.A.R.T values for drive nvme0n1 &amp;amp; S.M.A.R.T values for drive nvme1n1), but they&#039;re both empty; it only has 1 value (smartctl_exit_status), and it&#039;s &amp;quot;nan&amp;quot; for all time charts. This is expected, since it can&#039;t read the nvme smartctl output format.&lt;br /&gt;
# I think maybe I forgot to restart munin on hetzner2, so I gave that a try&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# service munin-node restart&lt;br /&gt;
Redirecting to /bin/systemctl restart munin-node.service&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# sudo -u munin /usr/bin/munin-cron&lt;br /&gt;
2025/04/20 21:29:38 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 55.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 56.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:52 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# whatever; I guess no munin logs on SMART for this dying server&lt;br /&gt;
# I also confirmed that varnish logs are now visible in munin&lt;br /&gt;
# I committed my ansible changes https://github.com/OpenSourceEcology/ansible/commit/2fb906fd62cf0773d84f50f1cf113ddfe66910ec&lt;br /&gt;
# anyway, I also updated smartd.conf on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology smartmontools]# cp smartd.conf smartd.conf.20250420.bak&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# vim smartd.conf&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# diff smartd.conf.20250420.bak smartd.conf&lt;br /&gt;
23c23,24&lt;br /&gt;
&amp;lt; DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; #DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
&amp;gt; DEVICESCAN -H -m REDACTED@opensourceecology.org -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
[root@opensourceecology smartmontools]# systemctl restart smartd&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh wow, that screaming about the disks failing wasn&#039;t just printed to my tty; it got printed to every tty on my screen session. It really is angry..&lt;br /&gt;
# but, alas, no email was sent – even from hetzner2. where email should *definitely* be working&lt;br /&gt;
# this time the postfix logs on hetzner2 gave us an error from gmail saying why they&#039;re blocking us&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21221]: 297716847E6: host aspmx.l.google.com[64.233.167.27] said: 421-4.7.28 Gmail has detected an unusual rate of unso&lt;br /&gt;
licited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.go&lt;br /&gt;
ogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42a931si4417083f8f.167 - gsmtp (in reply to end &lt;br /&gt;
of DATA command)&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21094]: 3CBF7684804: host aspmx.l.google.com[142.251.168.27] said: 421-4.7.28 Gmail has detected an unusual rate of uns&lt;br /&gt;
olicited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.g&lt;br /&gt;
oogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42967csi4306047f8f.165 - gsmtp (in reply to end&lt;br /&gt;
 of DATA command)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# marcin sent an email campaign today with phpList. If that didn&#039;t make it out due to this, that&#039;s kinda  problem.&lt;br /&gt;
# I see in the log that we&#039;re kinda spamming phplist_bounces@opensourceecology.org&lt;br /&gt;
# that&#039;s basically where phplist is supposed to let our admins know that it failed to deliver to some people on the mailing list&lt;br /&gt;
## I confirmed that this account *does* exist in the gsuite admin wui user list&lt;br /&gt;
# yeah, crap, it&#039;s blocking other mail sent to my personal account from apache.&lt;br /&gt;
# woah, I&#039;m tailing the mail log and I just got probably hundereds or thousands of emails tried to be sent. phpList is *supposed* to do it in small batches, but I wonder if, once it fails and gets added to the queue, it&#039;ll do the re-send without batching it..&lt;br /&gt;
# I checked phpList wui settings and config.php, and I don&#039;t see anything about rate-limiting&lt;br /&gt;
# here&#039;s the docs on it https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
# it says it should be set in config.php. By default, I think it&#039;s 5,000 emails per hour&lt;br /&gt;
# Marcin&#039;s campaign today was sent to 14,111 people&lt;br /&gt;
# I checked the event log page, and I see a lot of these &amp;quot;Maximum time for queue processing: 99999&amp;quot; – which I guess means we need to break these up into batches https://phplist.opensourceecology.org/lists/admin/?page=eventlog&lt;br /&gt;
# looks like the easiest thing to do is to add a pause with MAILQUEUE_THROTTLE https://discuss.phplist.org/t/some-advice-for-correct-configuration-of-sending-rate/429&lt;br /&gt;
# if we send one per second, then we&#039;ll send 3,600 per hour.&lt;br /&gt;
## If we have 15,000 people on our list, then at that rate we&#039;d need 4-5 hours to send a campaign. That sounds like a good idea.&lt;br /&gt;
# I updated the phpList config file to send only one email per second&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# diff config.20250420.php config.php &lt;br /&gt;
83a84,87&lt;br /&gt;
&amp;gt; // only send 1 email per second&lt;br /&gt;
&amp;gt; //  * https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
&amp;gt; define(&#039;MAILQUEUE_THROTTLE&#039;,1);&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we should also probably throttle postfix https://serverfault.com/questions/110919/postfix-throttling-for-outgoing-messages&lt;br /&gt;
# looks like for both hetzner2 and hetzner3, this is set to no delay&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 0s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I set this on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology postfix]# diff main.cf.20250420 main.cf&lt;br /&gt;
683a684,686&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # limit emails to the same-destination-domain to one-email-per-2-seconds&lt;br /&gt;
&amp;gt; default_destination_rate_delay = 2s&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# systemctl restart postfix&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 2s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I also added this to ansible and pushed it out to the server on hetnzer3 https://github.com/OpenSourceEcology/ansible/commit/7ed339cad055a9a0c5b04f26d32c9416daf3a2c7&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 19, 2025=&lt;br /&gt;
&lt;br /&gt;
# I responded to Tom&#039;s email about ssh&lt;br /&gt;
# Tom wasn&#039;t able to reset their account&#039;s password&lt;br /&gt;
# I think I created these accounts with `--disabled-password`, probably as some layered security for ssh (to force keys), but that kinda breaks sudo, which requires the password. I could make sudo NOPASSWD, but I think it&#039;s safer to have a user password set (and have ssh disabled passoword logins still) rather than set sudoers to NOPASSWD, in general&lt;br /&gt;
# disabled passwords are set with the &#039;!&#039; in the second field of /etc/shadown&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing:!:20133:0:99999:7:::&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I just manually edited /etc/shadow with vim to remove the exclimation point&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # vim /etc/shadow&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing::20133:0:99999:7:::&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom replied, saying he can become root on hetzner3 now.&lt;br /&gt;
# ...&lt;br /&gt;
# I returned to work on the plan for replacing the disks on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb#Change_Steps&lt;br /&gt;
# I confirmed that the disks (on both hetzner2 and hetzner3) are MBR partition scheme (not GPT) – indicated by &amp;quot;Disk label type: dos&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sda&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0x9b8e1266&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sda1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sdb&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0xd904fc05&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sdb1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# A quick spot-check shows that our backups usually finish at 09:55 – one time as late as 10:07. That&#039;s UTC.&lt;br /&gt;
# 10:00 UTC is 05:00 my time and 12:00 in Berlin. God that&#039;s early, but better to do this early in Germany time..&lt;br /&gt;
# I sent an email to Marcin asking if Thr 2025-04-24 @ 10:00 UTC (~05:00 FeF) would be a good time to do this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the first disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
Our backups finish daily at 10:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
 * 12:00 in Germany (where the server lives)&lt;br /&gt;
 * 05:00 here in Ecuador, and&lt;br /&gt;
 * 05:00 at FeF&lt;br /&gt;
&lt;br /&gt;
I propose next week on Thursday 2025-04-24 10:00 UTC.&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime, please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 18, 2025=&lt;br /&gt;
# Marcin sent another email this morning asking why osemain is down too now, and I responded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the&lt;br /&gt;
&amp;gt; last message&lt;br /&gt;
&lt;br /&gt;
Your whole database service was down, and it won&#039;t start. You have a varnish cache that stores a subset of pages in-memory for 24 hours. That&#039;s probably what you saw.&lt;br /&gt;
&lt;br /&gt;
I took webservers down yesterday to prevent the possibility of them corrupting the database worse, if it manages to start in recovery mode.&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt; go straight to migration to Hetzner 3.&lt;br /&gt;
&lt;br /&gt;
If you want high uptime, I don&#039;t recommend migrating to hetzner3 at this time. It&#039;s still not fully provisioned, and I actively work on it like a dev server. Which means I&#039;ll be restarting it and its services. It&#039;s not a safe place for production. That&#039;s why the wiki is the *last* service to migrate.&lt;br /&gt;
&lt;br /&gt;
Status update: yesterday I investigated to see if your underlying storage (disk, filesystem, or RAID) are failing, which might cause corruption. The filesystems were fine. RAID didn&#039;t have errors. The SMART logs on the disk said both of your two mirrored drives are failing and should be replaced within 24 hours. But I don&#039;t think that&#039;s evidence of corruption; I think it&#039;s just a timer that&#039;s alerting us to the possibility that the disks will fail soon. afaict, disk replacement is free (from Hetzner) but not trivial and high-risk. I&#039;ll postpone until after restoring the database.&lt;br /&gt;
&lt;br /&gt;
Likely not all of your database is corrupt. We *could* restore from backup, but I don&#039;t recommend that -- as you only have daily backups, and likely you&#039;ll have data loss.&lt;br /&gt;
&lt;br /&gt;
Yesterday I put the database in two recovery modes and was unable to get it to start. My plan is to continue to follow this guide, to see if I can find out which databases/tables/pages are corrupt and which are not. That way we can restore only the data we need from backups and minimize data loss&lt;br /&gt;
&lt;br /&gt;
 * https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&lt;br /&gt;
I have to go to the hospital today. If I have time, I will try to continue later tonight. And I plan to work on this over the weekend. I hope to have your sites back online early next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cheers,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 02:58, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Michael,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the last message -&lt;br /&gt;
&amp;gt; but now I&#039;m trying to post the blog posts and the main site appears to be&lt;br /&gt;
&amp;gt; down. Is our whole backend crashing?  Or is that something you are doing on&lt;br /&gt;
&amp;gt; your end?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Marcin&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On Thu, Apr 17, 2025 at 6:41 PM Marcin Jakubowski &amp;lt;&lt;br /&gt;
&amp;gt; REDACTED@opensourceecology.org&amp;gt; wrote:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Can we prioritize the wiki at this point to migrate the wiki right over to&lt;br /&gt;
&amp;gt;&amp;gt; Hetzner 3 with the  current up to date software, using the wiki backup from&lt;br /&gt;
&amp;gt;&amp;gt; 2 days ago, which is before the crash?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; The wiki was working at least the first part of yesterday, and I noticed&lt;br /&gt;
&amp;gt;&amp;gt; the crash at about 11 PM CST yesterday. Thus taking the backup from 4/15/25&lt;br /&gt;
&amp;gt;&amp;gt; should solve this? Ie, forget about trying to fix on Hetzner 2, go straight&lt;br /&gt;
&amp;gt;&amp;gt; to migration to Hetzner 3. Is that consistent with a possible shift in your&lt;br /&gt;
&amp;gt;&amp;gt; plans, or does that throw off the entire process of migration? OSE stands&lt;br /&gt;
&amp;gt;&amp;gt; stuck without it, I will have to do everything in Google docs if I don&#039;t&lt;br /&gt;
&amp;gt;&amp;gt; have wiki access, and i am justvputtingvout the announcent and recruiting.&lt;br /&gt;
&amp;gt;&amp;gt; I can switcj ro more publishing on the website, assuming that all works.&lt;br /&gt;
&amp;gt;&amp;gt; Please tell me what would be your proposed solution and how quickly you&lt;br /&gt;
&amp;gt;&amp;gt; think we can get back up to a functioning wiki, based on your schedule of&lt;br /&gt;
&amp;gt;&amp;gt; availability to work on this, so I can plan accordingly.  This is a much&lt;br /&gt;
&amp;gt;&amp;gt; higher priority than doing any of the main website migration.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thanks,&lt;br /&gt;
&amp;gt;&amp;gt; Marcin &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so back to trying to figure out the corruption of the mariadb&lt;br /&gt;
# looks like the attempt to start it in recovery mode 2 fails after 10 minutes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because a fatal signal was delivered to the control process. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    10m0.435s&lt;br /&gt;
user    0m0.011s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and the tail of the db log&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/mariadb/mariadb.log&lt;br /&gt;
250417 23:06:00  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:01  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:02  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:03  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:05  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:06  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:07  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:08  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:09  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so we have one more recovery mode we can try before it becomes destructive = 3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 3&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and gave it a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# damn, looks like it&#039;s stuck on the same thing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:33:17 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:33:17 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 20076 ...&lt;br /&gt;
250418 19:33:17 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:33:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:33:17 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:33:17 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:33:17 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:33:17 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:33:17 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:33:17  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:33:17  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:33:18  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:19  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:20  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet suggests this infinite loop is caused by the default of innodb_purge_threads=1, and it says we should set this to 0&lt;br /&gt;
## https://serverfault.com/questions/851342/mysql-crashed-and-not-starting-even-after-adding-innodb-force-recovery&lt;br /&gt;
## https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# I tried to cut off the systemctl restart early, but it&#039;s just stuck. I guess I just have to wait 10 minutes.&lt;br /&gt;
# anyway, I set the recovery back down to 2 and added the purge threads to 0 line; I&#039;ll try that when it&#039;s not blocked&lt;br /&gt;
# meanwhile, I read up on innodb_purge_threads, which is documented here https://dev.mysql.com/doc/refman/8.4/en/innodb-purge-configuration.html&lt;br /&gt;
# oh shit, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m2.102s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
[root@opensourceecology etc]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 19:44:30 UTC; 19s ago&lt;br /&gt;
  Process: 22469 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 22433 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 22468 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─22468 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─22693 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-...&lt;br /&gt;
&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 19:44:30 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the logs are being spammed with these last 5 lines a bunch; I guess something is still trying to access the db?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:44:28 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 22693 ...&lt;br /&gt;
250418 19:44:28 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:44:28 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:44:28 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:44:28 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:44:28 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:44:28 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:44:28 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:44:28  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:44:28  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:44:28  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:44:29 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883505166&lt;br /&gt;
250418 19:44:29 InnoDB: !!! innodb_force_recovery is set to 2 !!!&lt;br /&gt;
250418 19:44:29 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 19:44:29 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 19:44:29 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, the spam stopped. maybe just some startup thing.&lt;br /&gt;
# I was hoping at startup it would tell us which DBs/tables/pages were corrupt; I guess we have to initiate a scan or something.&lt;br /&gt;
# this guide doesn&#039;t say anything about that https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
# but this one recommends running `mysqlcheck` https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# this took about a minute to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# good news; looks like the wiki isn&#039;t fucked. it&#039;s just osemain, oswh, and cacti. restoring those from backups is probably not going to cause any data loss&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@opensourceecology dbFail.20250417]# head mysqlcheck.20250418.log &lt;br /&gt;
3dp_db.wp_commentmeta                              OK&lt;br /&gt;
3dp_db.wp_comments                                 OK&lt;br /&gt;
3dp_db.wp_links                                    OK&lt;br /&gt;
3dp_db.wp_masterslider_options                     OK&lt;br /&gt;
3dp_db.wp_masterslider_sliders                     OK&lt;br /&gt;
3dp_db.wp_options                                  OK&lt;br /&gt;
3dp_db.wp_postmeta                                 OK&lt;br /&gt;
3dp_db.wp_posts                                    OK&lt;br /&gt;
3dp_db.wp_revslider_css                            OK&lt;br /&gt;
3dp_db.wp_revslider_layer_animations               OK&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418.log &lt;br /&gt;
cacti_db.automation_ips&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.automation_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_cache&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_last&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output_boost_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
osemain_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
osemain_s_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
oswh_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s go ahead and take a mysqldump now, including the corrupt data. then I&#039;ll drop these three databases and restore from backups&lt;br /&gt;
## cacti_db&lt;br /&gt;
## osemain_db&lt;br /&gt;
## oswh_db&lt;br /&gt;
# I sent Marcin a status update email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I was able to start your database in recovery mode, and I see the following databases have corrupt tables:&lt;br /&gt;
&lt;br /&gt;
1. osemain&lt;br /&gt;
2. cacti&lt;br /&gt;
3. oswh&lt;br /&gt;
&lt;br /&gt;
Good news that the wiki isn&#039;t in that list. And that those particular corrupt DBs don&#039;t change much, so recovering just those databases from backups should result in an acceptable data loss, if any.&lt;br /&gt;
&lt;br /&gt;
I&#039;ll keep you updated.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I made the post-corruption mysqldump backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass --all-databases | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    2m48.845s&lt;br /&gt;
user    3m19.170s&lt;br /&gt;
sys     0m2.023s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# ls mysqldump*&lt;br /&gt;
mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz &lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s drop those three databases.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 14&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE cacti_db;&lt;br /&gt;
Query OK, 108 rows affected (0.38 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_db;&lt;br /&gt;
Query OK, 22 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oswh_db;&lt;br /&gt;
Query OK, 12 rows affected (0.03 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| fef_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
15 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# recovery mode isn&#039;t going to let us INSERT to recover data from backups, so let&#039;s take it out of recovery mode and see if the db will start&lt;br /&gt;
# nah, it failed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m2.805s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# logs are the same, I think?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:10:04 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:10:04 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 24305 ...&lt;br /&gt;
250418 20:10:04 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:10:04 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:10:04 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:10:04 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:10:04 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:10:04 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:10:04 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:10:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:10:04  InnoDB: Assertion failure in thread 140076605044480 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:10:04 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x560180c61cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x560180875975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f664031f630]&lt;br /&gt;
:0(__GI_raise)[0x7f663ea46387]&lt;br /&gt;
:0(__GI_abort)[0x7f663ea47a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x560180a0a45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x560180a0afa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x560180b0d504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x560180b02487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x560180a0d17d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x560180a010f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6640317ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f663eb0eb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:10:04 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I re-enabled recovery mode, but this time just as 1. This time it did start, but this loop gets spammed to the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:11:42 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883708456&lt;br /&gt;
250418 20:11:42 InnoDB: !!! innodb_force_recovery is set to 1 !!!&lt;br /&gt;
250418 20:11:42 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:11:42 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
250418 20:11:42  InnoDB: Assertion failure in thread 140282494781184 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:11:42 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed, &lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to &lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x55e2d6dbbcad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x55e2d69cf975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f962fbdc630]&lt;br /&gt;
:0(__GI_raise)[0x7f962e303387]&lt;br /&gt;
:0(__GI_abort)[0x7f962e304a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x55e2d6b6445f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x55e2d6b64fa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x55e2d6c67504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x55e2d6c5c487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x55e2d6b6717d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62e83c)[0x55e2d6b5a83c]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f962fbd4ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f962e3cbb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:11:42 mysqld_safe Number of processes running now: 0&lt;br /&gt;
250418 20:11:42 mysqld_safe mysqld restarted&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 27371 ...&lt;br /&gt;
250418 20:11:42 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:11:42 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:11:42 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:11:42 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:11:42 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:11:42 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:11:42 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:11:42  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, even though it *says* it&#039;s started&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m5.156s&lt;br /&gt;
user    0m0.008s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 20:11:07 UTC; 13s ago&lt;br /&gt;
  Process: 24459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 24423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 24458 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─24458 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─25620 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/v...&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:11:07 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
&lt;br /&gt;
real    0m0.012s&lt;br /&gt;
user    0m0.001s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can&#039;t connect to it with mysqlcheck&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log                              &lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I set it back to recovery mode 2, restarted, and tried the mysqlcheck again&lt;br /&gt;
# huh, all lines say OK&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418&lt;br /&gt;
mysqlcheck.20250418_201348.log  mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log&lt;br /&gt;
&lt;br /&gt;
real    0m11.597s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.009s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418_201559.log &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well now I&#039;m wondering if I should have run CHECK TABLE and REPAIR TABLE rather than just DROP them https://dev.mysql.com/doc/refman/8.4/en/myisam-table-close.html&lt;br /&gt;
# I&#039;m going to restore from the backup and then see if I can do that&lt;br /&gt;
# oh, right, we can&#039;t INSERT in recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1030 (HY000) at line 91: Got error -1 from storage engine&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, fuck, now I don&#039;t know why it won&#039;t start. And it doesn&#039;t tell me why. The good news is that I was able to get a db dump. maybe I can copy this huge dump over to some other server for repair and then copy it back?&lt;br /&gt;
# we should have backups. I&#039;m going to just purge all the non-system databases and see if we can get this thing started at all&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db d3ddb;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;d3ddb&#039; at line 1&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE d3d_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE fef_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE microfactory_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_db;&lt;br /&gt;
Query OK, 21 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_stabing_db;&lt;br /&gt;
ERROR 1008 (HY000): Can&#039;t drop database &#039;obi_stabing_db&#039;; database doesn&#039;t exist&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oseforum_db;&lt;br /&gt;
Query OK, 35 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_s_db;&lt;br /&gt;
Query OK, 20 rows affected (0.04 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osewiki_db;&lt;br /&gt;
Query OK, 59 rows affected (0.31 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE phplist_db;&lt;br /&gt;
Query OK, 42 rows affected (0.16 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE seedhome_db;&lt;br /&gt;
Query OK, 12 rows affected (0.05 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE store_db;&lt;br /&gt;
Query OK, 36 rows affected (0.11 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_staging_db;&lt;br /&gt;
Query OK, 21 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# even after that, it still won&#039;t start :&#039;(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m4.863s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Fri 2025-04-18 20:34:47 UTC; 14s ago&lt;br /&gt;
  Process: 18459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 18458 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 18423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 18458 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I purge those three system-level DBs, I want to confirm they&#039;re in our backups&lt;br /&gt;
# as I feared, it looks like they&#039;re missing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zgrep -E &#039;CREATE DATABASE&#039; mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | grep &#039;IF NOT EXISTS&#039; | grep -E &#039;^.{,100}$&#039;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `3dp_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `cacti_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `d3d_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `fef_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `microfactory_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `mysql` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_staging_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oseforum_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_s_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osewiki_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oswh_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `phplist_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `seedhome_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `store_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# according to this, information_schema is essentially a cache that gets created &amp;amp; destroyed every time mysql is restarted, so we should be ok to loose that https://stackoverflow.com/questions/15306132/information-schema-error-when-restoring-database-dump&lt;br /&gt;
# I&#039;m just going to manually dump these three anyway. Or try to&lt;br /&gt;
# well, I was able to get one of the three to backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass information_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_information_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz &lt;br /&gt;
mysqldump: Got error: 1044: &amp;quot;Access denied for user &#039;root&#039;@&#039;localhost&#039; to database &#039;information_schema&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass mysql | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_mysql.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    0m0.142s&lt;br /&gt;
user    0m0.155s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass performance_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_performance_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
mysqldump: Got error: 1142: &amp;quot;SELECT,LOCK TABL command denied to user &#039;root&#039;@&#039;localhost&#039; for table &#039;cond_instances&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.009s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# mysql looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode*&lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_information_schema.20250418_205054.sql.gz&lt;br /&gt;
716K    mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_performance_schema.20250418_205157.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to move this whole db dir out of the way and see if we can start it fresh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /var/lib&lt;br /&gt;
[root@opensourceecology lib]# du -sh mysql/&lt;br /&gt;
6.5G    mysql/&lt;br /&gt;
[root@opensourceecology lib]# ls -lah | grep -i mysql&lt;br /&gt;
drwxr-xr-x   4 mysql   mysql   4.0K Apr 18 20:50 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mv mysql mysql.20250418&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mkdir mysql&lt;br /&gt;
[root@opensourceecology lib]# chown mysql:mysql mysql&lt;br /&gt;
[root@opensourceecology lib]# chmod 0755 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 8.0K&lt;br /&gt;
drwxr-xr-x   2 mysql mysql 4.0K Apr 18 20:55 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, it&#039;s started outside recovery mode now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m3.550s&lt;br /&gt;
user    0m0.007s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
250418 20:55:06 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
250418 20:56:23 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:56:23 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 21252 ...&lt;br /&gt;
250418 20:56:23 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:56:23 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:56:23 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:56:23 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:56:23 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:56:23 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
InnoDB: The first specified data file ./ibdata1 did not exist:&lt;br /&gt;
InnoDB: a new database to be created!&lt;br /&gt;
250418 20:56:23  InnoDB: Setting file ./ibdata1 size to 10 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile0 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile0 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile1 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile1 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
InnoDB: Doublewrite buffer not found: creating new&lt;br /&gt;
InnoDB: Doublewrite buffer created&lt;br /&gt;
InnoDB: 127 rollback segment(s) active.&lt;br /&gt;
InnoDB: Creating foreign key constraint system tables&lt;br /&gt;
InnoDB: Foreign key constraint system tables created&lt;br /&gt;
250418 20:56:23  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:56:24 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 0&lt;br /&gt;
250418 20:56:24 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:56:24 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:56:24 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it created all these files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 29M&lt;br /&gt;
drwxr-xr-x   5 mysql mysql 4.0K Apr 18 20:56 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
-rw-rw----   1 mysql mysql  16K Apr 18 20:56 aria_log.00000001&lt;br /&gt;
-rw-rw----   1 mysql mysql   52 Apr 18 20:56 aria_log_control&lt;br /&gt;
-rw-rw----   1 mysql mysql  18M Apr 18 20:56 ibdata1&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile0&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile1&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 mysql&lt;br /&gt;
srwxrwxrwx   1 mysql mysql    0 Apr 18 20:56 mysql.sock&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 performance_schema&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 test&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that also would have killed the mysql password; I can&#039;t login&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1045 (28000): Access denied for user &#039;root&#039;@&#039;localhost&#039; (using password: YES)&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I hacked my way in and set the root password&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysqld_safe --skip-grant-tables --skip-networking &amp;amp;&lt;br /&gt;
mysql -u root&lt;br /&gt;
use mysql;&lt;br /&gt;
update user set password=PASSWORD(&amp;quot;new-password&amp;quot;) where User=&#039;root&#039;;&lt;br /&gt;
flush privileges;&lt;br /&gt;
exit&lt;br /&gt;
jobs -l&lt;br /&gt;
# kill mysqld_safe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now I can see our three databases, plus one named test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# usually this is where I&#039;d run the mysql hardening script, but let&#039;s just drop test manually and restore from backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE test;&lt;br /&gt;
Query OK, 0 rows affected (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first let&#039;s just restore the &#039;mysql&#039; database&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that appears to have worked; our users are present now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [mysql]&amp;gt; select User from user limit 10;&lt;br /&gt;
+------------------+&lt;br /&gt;
| User             |&lt;br /&gt;
+------------------+&lt;br /&gt;
| oseforum_user    |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| 3dp_user         |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| d3d_user         |&lt;br /&gt;
| fef_user         |&lt;br /&gt;
| microfactory_usr |&lt;br /&gt;
| munin_user       |&lt;br /&gt;
| obi2_user        |&lt;br /&gt;
| obi3_user        |&lt;br /&gt;
+------------------+&lt;br /&gt;
10 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [mysql]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, and ensured it&#039;s still working. Great.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 2&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s restore the rest – including even our corrupt databases – and see if it works or breaks&lt;br /&gt;
# that took about 11.5 minutes to import ~6.8G of data&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
&lt;br /&gt;
real    11m36.530s&lt;br /&gt;
user    1m52.944s&lt;br /&gt;
sys     0m3.593s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh /var/lib/mysql&lt;br /&gt;
6.8G    /var/lib/mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m still able to connect, and now I see all our DBs – including the ones it said were corrupt&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 6&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# woah, I gave it a restart, and it came back fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 3&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess we fixed it with no data loss?&lt;br /&gt;
# let&#039;s bring up the web servers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology lib]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology lib]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the wiki loads now&lt;br /&gt;
# so does osemain&lt;br /&gt;
# I&#039;d say we&#039;re back in business&lt;br /&gt;
# I sent an email to Marcin&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I think all your sites are back now.&lt;br /&gt;
&lt;br /&gt;
I was able to restore all of your databases from a dump of the database in recovery mode. So nothing needed to be restored from backups.&lt;br /&gt;
&lt;br /&gt;
Please let me know if you see any issues. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now that Marcin has ssh access on the server again, I wonder if he has permission to execute `restart` – that would be better for him than logging into the hetzner wui and doing hard resets, which likely caused this corruption&lt;br /&gt;
# at the risk of taking everything down after I just told Marcin that everything is up, I&#039;m going to try it&lt;br /&gt;
# looks like it won&#039;t let him reboot if other users are logged-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ reboot&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
Please retry operation after closing inhibitors and logging out other users.&lt;br /&gt;
Alternatively, ignore inhibitors and users with &#039;systemctl reboot -i&#039;.&lt;br /&gt;
[marcin@opensourceecology ~]$ systemctl reboot -i&lt;br /&gt;
==== AUTHENTICATING FOR org.freedesktop.login1.reboot-multiple-sessions ===&lt;br /&gt;
Authentication is required for rebooting the system while other users are logged in.&lt;br /&gt;
Multiple identities can be used for authentication:&lt;br /&gt;
 1.  maltfield&lt;br /&gt;
 2.  crupp&lt;br /&gt;
 3.  Tom Griffing (tgriffing)&lt;br /&gt;
 4.  jthomas&lt;br /&gt;
Choose identity to authenticate as (1-4):&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the sudoers command to give marcin *just* access to the reboot command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# visudo&lt;br /&gt;
[root@opensourceecology lib]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology lib]# tail /etc/sudoers&lt;br /&gt;
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom&lt;br /&gt;
&lt;br /&gt;
## Allows members of the users group to shutdown this system&lt;br /&gt;
# %users  localhost=/sbin/shutdown -h now&lt;br /&gt;
&lt;br /&gt;
## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)&lt;br /&gt;
#includedir /etc/sudoers.d&lt;br /&gt;
&lt;br /&gt;
# let marcin reboot the machine gracefully&lt;br /&gt;
marcin ALL = NOPASSWD: /sbin/reboot&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I couldn&#039;t test this on the server without changing marcin&#039;s password, so I spun-up a quick DispVM to ensure it *only* gives him access to reboot&lt;br /&gt;
# it&#039;s debian, but sudoers syntax should (hopefully) be the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@debian-12-dvm:~$ sudo su -&lt;br /&gt;
root@debian-12-dvm:~# adduser marcin --disabled-password --gecos &#039;&#039;&lt;br /&gt;
Adding user `marcin&#039; ...&lt;br /&gt;
Adding new group `marcin&#039; (1001) ...&lt;br /&gt;
Adding new user `marcin&#039; (1001) with group `marcin (1001)&#039; ...&lt;br /&gt;
Creating home directory `/home/marcin&#039; ...&lt;br /&gt;
Copying files from `/etc/skel&#039; ...&lt;br /&gt;
Adding new user `marcin&#039; to supplemental / extra groups `users&#039; ...&lt;br /&gt;
Adding user `marcin&#039; to group `users&#039; ...&lt;br /&gt;
root@debian-12-dvm:~# &lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# visudo&lt;br /&gt;
root@debian-12-dvm:~#&lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# passwd marcin&lt;br /&gt;
New password: &lt;br /&gt;
Retype new password: &lt;br /&gt;
passwd: password updated successfully&lt;br /&gt;
root@debian-12-dvm:~# sudo su - marcin&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo su -&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/su -&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo echo hi&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/echo hi&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$ &lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ reboot&lt;br /&gt;
-bash: reboot: command not found&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, that worked. Perfect.&lt;br /&gt;
# I tested it on hetzner2; it worked too.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ sudo reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I sent Marcin a reply ask him to test reboots via ssh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry the server just went down; that was me testing to make sure your &#039;marcin&#039; user now has permission to do a proper &amp;amp; safer `sudo reboot` of hetzner2. It does.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that&lt;br /&gt;
&amp;gt; I should plan on potential breakage at any time?&lt;br /&gt;
&lt;br /&gt;
Great question. There&#039;s a couple things I&#039;d like to implement to prevent this from happening again:&lt;br /&gt;
&lt;br /&gt;
1. Replace both of your disks on hetzner2&lt;br /&gt;
&lt;br /&gt;
2. Give you reboot permission on hetzner2&lt;br /&gt;
&lt;br /&gt;
My best-guess is that the corruption happened because you abruptly shutdown the server. As you know, that&#039;s generally not a good idea as it can cause data loss.&lt;br /&gt;
&lt;br /&gt;
But filesystems use journals and databases use pages. They *should* be able to recover from abrupt shutdowns. They wouldn&#039;t be very useful if they were so frail as to not be able to recover from something like that...&lt;br /&gt;
&lt;br /&gt;
But in this case, I think it was a &amp;quot;perfect storm&amp;quot; that you caused corruption and it wasn&#039;t able to recover from it due to a bug in mariadb. And, because your OS is EOL, we can&#039;t update to a newer version of mariadb that *is* able to recover from such a unlucky combination of events.&lt;br /&gt;
&lt;br /&gt;
So, in the meantime, instead of you logging into hetzner&#039;s WUI to trigger reboots, I&#039;d prefer if you would ssh into the hetzner2 server and execute&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
Please test this on your computer now to make sure you&#039;re setup for it. To ssh into hetzner2, execute this command on your computer:&lt;br /&gt;
&lt;br /&gt;
  ssh -p 32415 marcin@opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
And then at the prompt, execute this command (make sure you type this *after* you&#039;ve logged into hetzner, or you&#039;ll end-up rebooting your own laptop!)&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
The second thing I&#039;d like to do is replace both of your disks on hetzner2. I don&#039;t think they caused corruption in this case, but I did discover that they&#039;re both screaming that they&#039;re going to die soon and asking to be replaced, so I would be a fool not to heed that warning.&lt;br /&gt;
&lt;br /&gt;
Hetzner shouldn&#039;t charge us to replace a failing disk, but I&#039;ll schedule some downtime for remote hetzner hands to shutdown the machine, then I&#039;ll need to format the new drive, add it to the RAID (the mirror of two redundant disks), and update your grub boot partition.&lt;br /&gt;
&lt;br /&gt;
There&#039;s some risk in doing this, because you&#039;ll be running on one non-redundant disk (a disk which is screaming at us saying it&#039;s going to die within 24 hours) while the RAID is re-building. But, of course, there&#039;s risk in not doing it..&lt;br /&gt;
&lt;br /&gt;
Please confirm that you can now reboot hetzner2 via ssh.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 16:39, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Thats excellent, thabk you, looks good. Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that I should plan&lt;br /&gt;
&amp;gt; on potential breakage at any time? Regarding the full migration, how many&lt;br /&gt;
&amp;gt; more hours/days of provisioning do tou still expwct to need? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created an article for the CHG to replace the first disk on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
## I wonder if I can figure out which one grub uses and replace that one second..&lt;br /&gt;
# from my log yesterday, here&#039;s our two drive&#039;s serial numbers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fuck; looks like neither is referenced in /boot/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA4520&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA336C&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the steps to setup grub are actually quite simple, according to the hetzner docs https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## it says if we&#039;re doing it on the booted system, then we just need to run `grub-install /dev/sdX`&lt;br /&gt;
# it has additional instructions for grub1. And, uh, looks like we have grub1, grub2, *and* an efi dir in /boot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64                            initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img  System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64                      initramfs-3.10.0-327.18.2.el7.x86_64.img         System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64                        initramfs-3.10.0-514.26.2.el7.x86_64.img         System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64                        initramfs-3.10.0-693.2.2.el7.x86_64.img          System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64                         initramfs-3.10.0-693.2.2.el7.x86_64kdump.img     System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi                                                      initrd-plymouth.img                              vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
grub                                                     lost+found                                       vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
grub2                                                    symvers-3.10.0-1127.el7.x86_64.gz                vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img  symvers-3.10.0-1160.119.1.el7.x86_64.gz          vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img                     symvers-3.10.0-327.18.2.el7.x86_64.gz            vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img                symvers-3.10.0-514.26.2.el7.x86_64.gz            vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img               symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m thinking we should actually just tell hetzner to do a hot swap while the system is on, so we can do this &amp;quot;easy install&amp;quot; of grub without risking the system not coming-up after they removed the drive&lt;br /&gt;
# oh, the efi dir is empty, so I&#039;m thinking we&#039;re using grub2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# find efi&lt;br /&gt;
efi&lt;br /&gt;
efi/EFI&lt;br /&gt;
efi/EFI/centos&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, the grub dir just has one file in it?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub&lt;br /&gt;
total 10K&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Apr 11  2016 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
-rw-r--r--  1 root root 1.4K Nov 15  2011 splash.xpm.gz&lt;br /&gt;
[root@opensourceecology boot]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# grub2 looks most sane&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub2&lt;br /&gt;
total 52K&lt;br /&gt;
drwx------. 5 root root 1.0K Jul 26  2024 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Dec 15  2015 fonts&lt;br /&gt;
-rw-r--r--  1 root root 7.8K Jul 26  2024 grub.cfg&lt;br /&gt;
-rw-r--r--  1 root root 5.3K Jun  1  2016 grub.cfg.1499616907.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 6.1K Jul  9  2017 grub.cfg.1506097734.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 7.0K Sep 22  2017 grub.cfg.1588589453.rpmsave&lt;br /&gt;
-rw-r--r--. 1 root root 1.0K Jul 26  2024 grubenv&lt;br /&gt;
drwxr-xr-x. 2 root root 9.0K May 31  2016 i386-pc&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K May 31  2016 locale&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it looks like it&#039;s referencing the raid, not the drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### BEGIN /etc/grub.d/10_linux ###&lt;br /&gt;
menuentry &#039;CentOS Linux (3.10.0-1160.119.1.el7.x86_64) 7 (Core)&#039; --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option &#039;gnulinux-3.10.0-327.13.1.el7.x86_64-advanced-af18bd25-f715-4003-b055-170a07591c60&#039; {&lt;br /&gt;
		load_video&lt;br /&gt;
		set gfxpayload=keep&lt;br /&gt;
		insmod gzio&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod diskfilter&lt;br /&gt;
		insmod mdraid1x&lt;br /&gt;
		insmod ext2&lt;br /&gt;
		set root=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;&lt;br /&gt;
		if [ x$feature_platform_search_hint = xy ]; then&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root --hint=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;  9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		else&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root 9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		fi&lt;br /&gt;
		linux16 /vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
		initrd16 /initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# right, so if I understand this correctly: we&#039;re not updating grub. We&#039;re using &#039;grub-install&#039; to copy our grub config *to* the drive. that&#039;s easier and less concerning than I thought.&lt;br /&gt;
# well, since I can&#039;t see any good reason to pick one drive or the other to replace first, I&#039;m going to have them replace /dev/sdb first. Just because &#039;sda&#039; seems like it would be primary. I know it&#039;s probably not, but, anyway..&lt;br /&gt;
# that means we&#039;ll replace Crucial_CT250MX200SSD1_154410FA4520 first; I created another wiki entry for that https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
# Marcin sent me an email confirming that he&#039;s able to restart hetzner2 with `sudo reboot`. I asked him to use this in the future if he needs to reboot it again.&lt;br /&gt;
# the disk is getting pretty full, but I&#039;m going to leave these files in /var/tmp/ for at least a few days, to make sure we don&#039;t actually need to restore from a backup again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  150G   38G  80% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/lib/mysql.20250418 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 17, 2025=&lt;br /&gt;
# Marcin sent me an email last night (and again this morning) asking why the wiki is down&lt;br /&gt;
# I hadn&#039;t touched ose infra since 6 days ago&lt;br /&gt;
# the wiki is still on hetzner2, which is on EOL Cent, so I&#039;m not terribly surprised it&#039;s falling apart.&lt;br /&gt;
# I first warned Marcin about this many years ago, and hopefully the migration to hetzner3 will be finished before the end of this year&lt;br /&gt;
# anyway, let&#039;s check what happened to the wiki on hetzner2&lt;br /&gt;
# it&#039;s a 500 error complaining about the db&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ curl -iL wiki.opensourceecology.org&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:52 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://wiki.opensourceecology.org/&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 500 Internal Server Error&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:54 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 976&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Varnish: 434054&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish-v4&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h1&amp;gt;Sorry! This site is experiencing technical difficulties.&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Try waiting a few minutes and reloading.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;small&amp;gt;(Cannot access the database)&amp;lt;/small&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;hr /&amp;gt;&amp;lt;div style=&amp;quot;margin: 1.5em&amp;quot;&amp;gt;You can try searching via Google in the meantime.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;small&amp;gt;Note that their indexes of our content may be out of date.&amp;lt;/small&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;form method=&amp;quot;get&amp;quot; action=&amp;quot;//www.google.com/search&amp;quot; id=&amp;quot;googlesearch&amp;quot;&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;domains&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;num&amp;quot; value=&amp;quot;50&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;ie&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;oe&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;q&amp;quot; size=&amp;quot;31&amp;quot; maxlength=&amp;quot;255&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;submit&amp;quot; name=&amp;quot;btnG&amp;quot; value=&amp;quot;Search&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;p&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; checked=&amp;quot;checked&amp;quot; /&amp;gt;Open Source Ecology&amp;lt;/label&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;WWW&amp;lt;/label&amp;gt;&lt;br /&gt;
	&amp;lt;/p&amp;gt;&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# disk is fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G   96G   92G  52% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s no new logs in the apache error log when I hit the site in real-time (bypassing the cache)&lt;br /&gt;
# there&#039;s also no new logs in the mariadb error log when I hit the site in real-time&lt;br /&gt;
# well, the db isn&#039;t running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Thu 2025-04-17 17:39:24 UTC; 2h 42min ago&lt;br /&gt;
  Process: 1227 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 1226 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 1103 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 1226 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# error logs aren&#039;t very helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# journalctl -fu mariadb&lt;br /&gt;
-- Logs begin at Thu 2025-04-17 17:38:59 UTC. --&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I try to restart it manually, nothing gets put in the journal logs, but there&#039;s a bunch to the actual log file that the journal log mentions (damn systemd)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the log that pops-up when we try a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 20:24:31 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 20:24:31 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 10583 ...&lt;br /&gt;
250417 20:24:31 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 20:24:31 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 20:24:31 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 20:24:31 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 20:24:31 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 20:24:31 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 20:24:31 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 20:24:31  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 20:24:31  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 20:24:31  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 20:24:31  InnoDB: Assertion failure in thread 140093400303360 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250417 20:24:31 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x563a1c105cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x563a1bd19975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f6a294c9630]&lt;br /&gt;
:0(__GI_raise)[0x7f6a27bf0387]&lt;br /&gt;
:0(__GI_abort)[0x7f6a27bf1a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x563a1beae45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638f69)[0x563a1beaef69]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x563a1bfb1504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x563a1bfa6487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x563a1beb117d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x563a1bea50f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6a294c1ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f6a27cb8b0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250417 20:24:31 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# google points to this https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
## they say it could be a bug that might be fixed in v5.7. We&#039;re using 5.5.68. hetzner3 uses 5.8.&lt;br /&gt;
# reddit says we&#039;re fucked and should restore from backup https://old.reddit.com/r/mysql/comments/d3nkc7/innodb_assertion_failure_in_thread_4560_in_file/&lt;br /&gt;
# before reading any more, I&#039;m going to immediately make a local copy of our most-recent backups&lt;br /&gt;
# looks like we have a backup from 13 hours ago and one from 27 hours ago&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ date&lt;br /&gt;
Thu Apr 17 20:36:56 UTC 2025&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync&lt;br /&gt;
total 21G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 17 07:49 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    21G Apr 17 07:48 daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync.old/&lt;br /&gt;
total 22G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 16 07:52 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    22G Apr 16 07:52 daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this SE answer is helpful https://serverfault.com/questions/592793/mysql-crashed-and-wont-start-up&lt;br /&gt;
## it says we can force the db to start (in &amp;quot;recovery mode&amp;quot;) and then try to figure out which table is corrupted. Then we might be able to backup more-recent data from the not-corrupt tables and only recover the fucked table&lt;br /&gt;
## other warnings suggest solving the underlying issue: why did the data become corrupt?&lt;br /&gt;
## well, we know Marcin has been hard-resetting the server (via the hetzner wui) about every week because it keeps breaking since some months ago (it&#039;s EOL and not worth debugging)&lt;br /&gt;
## but it&#039;s also possible we have a worse issue, like a disk failing. We do have RAID1 tho, so idk. Still, it would be wise to check the SMART data and RAID logs and filesystem for corruption&lt;br /&gt;
# I sent a quick status update to Marcin so he knows the severity of the issue and that this isn&#039;t going to be fixed soon&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Your database is corrupt and won&#039;t start.&lt;br /&gt;
&lt;br /&gt;
Quick internet search for the error messages suggests this could be a bug that&#039;s been fixed in mariadb 5.7. You&#039;re using 5.6 and can&#039;t upgrade because your OS is EOL. hetnzer3 is running 5.8.&lt;br /&gt;
&lt;br /&gt;
 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
I&#039;m looking into seeing what is corrupt, what isn&#039;t corrupt, and if we can restore from backup.&lt;br /&gt;
&lt;br /&gt;
This is not going to be an easy or fast fix, sorry. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the backups of the backups finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /home/b2user/sync*/* /var/tmp/&lt;br /&gt;
sending incremental file list&lt;br /&gt;
daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
 22,975,631,986 100%  139.63MB/s    0:02:36 (xfr#1, to-chk=1/2)&lt;br /&gt;
daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
 21,566,407,634 100%  103.43MB/s    0:03:18 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 44,552,914,338 bytes  received 54 bytes  125,324,653.70 bytes/sec&lt;br /&gt;
total size is 44,542,039,620  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  138G   50G  74% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also going to take down the webservers, so that they can&#039;t fuck-up the database worse, if we do start it in some recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I should also make a backup of /var/lib/mysql&lt;br /&gt;
# I&#039;m going to create a dif for all of this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mkdir /var/tmp/dbFail.20250417&lt;br /&gt;
[root@opensourceecology ~]# chown root:root /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041* /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# vim /var/tmp/dbFail.20250417/info.txt&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /var/tmp/dbFail.20250417/info.txt &lt;br /&gt;
2025-04-17: Marcin emailed me last night saying the wiki was down with a db error. Today I tried to start it, but it refues to come-up. Looks like it&#039;s preventing itself from starting because it realizes something is corrupt and starting it would make things worse. Internet says maybe this was fixed in a newer version; we can&#039;t upgrade because Cent is EOL. Hetzner3 has the newer version&lt;br /&gt;
&lt;br /&gt;
		 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
		Anyway, I&#039;m creating this folder to store some backups before we make things worse.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# aaaand I added a copy of /var/lib/mysql/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /var/lib/mysql /var/tmp/dbFail.20250417/var-lib-mysql.$(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
sending incremental file list&lt;br /&gt;
created directory /var/tmp/dbFail.20250417/var-lib-mysql.20250417&lt;br /&gt;
mysql/&lt;br /&gt;
mysql/aria_log.00000001&lt;br /&gt;
		 16,384 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=707/709)&lt;br /&gt;
...&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rate_locations.frm&lt;br /&gt;
		  8,714 100%    9.26kB/s    0:00:00 (xfr#689, to-chk=1/709)&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rates.frm&lt;br /&gt;
		 13,128 100%   13.95kB/s    0:00:00 (xfr#690, to-chk=0/709)&lt;br /&gt;
&lt;br /&gt;
sent 7,384,914,964 bytes  received 13,343 bytes  114,495,012.51 bytes/sec&lt;br /&gt;
total size is 7,383,062,830  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# another important note: apparently we can keep increasing the value of innodb_force_recovery until it starts, but anything &amp;gt;3 could corrupt the data worse https://dba.stackexchange.com/q/241714&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Marko, MariaDB Innodb lead: MDEV-15370 was a bug when ugprading to 10.3, caused by MDEV-12288. Actually upgrades can still fail (MDEV-15912) if a slow shutdown of the old server was not made. Because the scenario does not involve upgrading to 10.3 or later, I am afraid that the user witnessed some kind of undo log corruption. Starting up with innodb_force_recovery=3 might allow dumping all data. If that crashes, then try innodb_force_recovery=5, but be aware that anything &amp;gt;3 may corrupt the database further, and therefore you should not use the database for anything else than mysqldump&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, a lot of the links for how to fix this are now dead&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## https://forums.mysql.com/read.php?22,603093,604631#msg-604631&lt;br /&gt;
## https://support.plesk.com/hc/en-us/articles/12377798484375-Plesk-is-not-accessible-ERROR-Zend-Db-Adapter-Exception-SQLSTATE-HY000-2002-No-such-file-or-directory&lt;br /&gt;
# we&#039;re running 5.6, so it should be this https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html&lt;br /&gt;
## but note that redirects to 8.6 for some reason? https://dev.mysql.com/doc/refman/8.4/en/forcing-innodb-recovery.html&lt;br /&gt;
## ah, so does 1.1 – apparently anything it doesn&#039;t like just reidrects to the latest version https://dev.mysql.com/doc/refman/1.1/en/forcing-innodb-recovery.html&lt;br /&gt;
# this suggests that, if we&#039;re going to use innodb_force_recovery 4 or greater, we only do it on another machine. So basically take the data I just backed-up put it on a separate machine, and do the fucker *there* instead https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## it also says that dumps of 4 or greater could still render corrupt data, so they shouldn&#039;t be trusted, anyway&lt;br /&gt;
## good news: it says the db blocks all INSERT, UPDATE, and DELETE commands when any recovery mode is enabled&lt;br /&gt;
### but we *can* run DROP. so the idea is to dump everything in recovery mode and drop what is corrupt. then restart with the recovery value set to 0 and restore.&lt;br /&gt;
## it says that dumps from recover mode of 1 or 2 or 3 are safe, and only the page is corrupt&lt;br /&gt;
### here&#039;s the definition of a page https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A unit representing how much data InnoDB transfers at any one time between disk (the data files) and memory (the buffer pool). A page can contain one or more rows, depending on how much data is in each row. If a row does not fit entirely into a single page, InnoDB sets up additional pointer-style data structures so that the information about the row can be stored in one page.&lt;br /&gt;
&lt;br /&gt;
One way to fit more data in each page is to use compressed row format. For tables that use BLOBs or large text fields, compact row format allows those large columns to be stored separately from the rest of the row, reducing I/O overhead and memory usage for queries that do not reference those columns.&lt;br /&gt;
&lt;br /&gt;
When InnoDB reads or writes sets of pages as a batch to increase I/O throughput, it reads or writes an extent at a time.&lt;br /&gt;
&lt;br /&gt;
All the InnoDB disk data structures within a MySQL instance share the same page size.&lt;br /&gt;
&lt;br /&gt;
See Also buffer pool, compact row format, compressed row format, data files, extent, page size, row.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess that just means data that hasn&#039;t been written to disk yet. So I *think* it should be OK to trust data that only has corrupt pages?&lt;br /&gt;
# ok, I think I have enough to proceed – at least for recovery modes 1, 2, and 3.&lt;br /&gt;
# but first let&#039;s check SMART&lt;br /&gt;
# oh, fuck, my notes on this are on the wiki. Of course.&lt;br /&gt;
# arch wiki to the rescue https://wiki.archlinux.org/title/S.M.A.R.T.&lt;br /&gt;
# fail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
-bash: smartctl: command not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# luckily the yum servers for this EOL OS are still online, and I could install it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# yum install smartmontools&lt;br /&gt;
...&lt;br /&gt;
Total download size: 546 k&lt;br /&gt;
Installed size: 2.0 M&lt;br /&gt;
Is this ok [y/d/N]: y&lt;br /&gt;
Downloading packages:&lt;br /&gt;
smartmontools-7.0-2.el7.x86_64.rpm                                                                                                              | 546 kB  00:00:00     &lt;br /&gt;
Running transaction check&lt;br /&gt;
Running transaction test&lt;br /&gt;
Transaction test succeeded&lt;br /&gt;
Running transaction&lt;br /&gt;
  Installing : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
  Verifying  : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
&lt;br /&gt;
Installed:&lt;br /&gt;
  smartmontools.x86_64 1:7.0-2.el7                                                                                                                                     &lt;br /&gt;
&lt;br /&gt;
Complete!&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# better&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
SMART support is: Available - device has SMART capability.&lt;br /&gt;
SMART support is: Enabled&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well this is terrifying; it says both our disks are gonna fail within 24 hours&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# compare that to hetnzer3, which says all is good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m not 100% convinced that this is true. I still want to initiate a test on the drives, but I&#039;m going to go ahead and pass this to hetzner support asap and ask them if there&#039;s a fee for them to replace our drives.&lt;br /&gt;
# oh, interesting. they have a walkthrough that says it&#039;s free via Server -&amp;gt; Technical -&amp;gt; Disk Failure https://robot.hetzner.com/support/index&lt;br /&gt;
## well, it lists two options&lt;br /&gt;
### Free Replacement drive nearly new or used and tested; depends on what is in stock. &lt;br /&gt;
### At cost Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
## we were given an option if we should hot swap while the system is on or shutdown. I&#039;m going to say shutdown. That&#039;ll be simpler from the OS side, I think&lt;br /&gt;
## dang, it says they&#039;ll swap the drive within 2-4 hours.&lt;br /&gt;
# I&#039;ve never done this before, but it&#039;s a hardware raid. My understanding is that as soon as it comes-up, it&#039;ll begin copying the data from one disk to the other disk. But, christ, if both disks are fucked then which disk should I choose them to replace? Can I see which one is more fucked than the other?&lt;br /&gt;
# hetzner provides 4 docs for assistance on this&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#information-on-defective-drives&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/maintainance/nvme/#show-serial-number-of-a-specific-nvme-ssd&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#creating-a-complete-smart-log&lt;br /&gt;
# that first doc says to run the command we just ran&lt;br /&gt;
# hmm..it says for more info we should look at the &amp;quot;Failed Attributes&amp;quot; – but we have none for either disk&lt;br /&gt;
# ok, the docs say we can get more info with -A&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so both say &amp;quot;Percent_Lifetime_Remain&amp;quot; is an issue. does that mean it&#039;s not *actually* writing corrupt data, but it&#039;s literally just a timer that hit and said &amp;quot;yeah you should probably replace the disk??&amp;quot;&lt;br /&gt;
# well, &amp;quot;Percent_Lifetime_Remain&amp;quot; doesn&#039;t appear in the docs table. nor in the source wikipedia table https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes&lt;br /&gt;
# yeah, reddit suggests that means the drive &amp;quot;should be replaced soon&amp;quot; but not that it&#039;s actually detected as failing now https://www.reddit.com/r/homelab/comments/kaaqma/percent_lifetime_remain_failing_now/&lt;br /&gt;
# in that case, I guess it doesn&#039;t matter which disk we replace. But let&#039;s go ahead and get one replaced. I don&#039;t think this was the cause of the db corruption (I still think it&#039;s &amp;quot;shutting down the computer abruptly + a bug in old mariadb that prevents it from recovering&amp;quot;), but I would be stupid not to take a free replacement of a RAID1-mirrored disk that&#039;s alerting us that it&#039;s too old to be in prod.&lt;br /&gt;
# the second hetnzer docs refer to nvme. that&#039;s relevant on hetzner3 but not hetzner2. anyway, I do want to know how to check this on hetzer2 (even if I can&#039;t update the wiki right now with this docs)&lt;br /&gt;
# wow, the output for smartctl looks very different for NVMEs on Debian than it does on CentOS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        39 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    6%&lt;br /&gt;
Data Units Read:                    152.358.379 [78,0 TB]&lt;br /&gt;
Data Units Written:                 52.125.092 [26,6 TB]&lt;br /&gt;
Host Read Commands:                 6.873.372.480&lt;br /&gt;
Host Write Commands:                1.362.559.127&lt;br /&gt;
Controller Busy Time:               22.226&lt;br /&gt;
Power Cycles:                       28&lt;br /&gt;
Power On Hours:                     17.245&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      159&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               39 Celsius&lt;br /&gt;
Temperature Sensor 2:               48 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        40 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    7%&lt;br /&gt;
Data Units Read:                    140.811.605 [72,0 TB]&lt;br /&gt;
Data Units Written:                 56.604.901 [28,9 TB]&lt;br /&gt;
Host Read Commands:                 1.304.073.899&lt;br /&gt;
Host Write Commands:                1.364.668.115&lt;br /&gt;
Controller Busy Time:               21.180&lt;br /&gt;
Power Cycles:                       23&lt;br /&gt;
Power On Hours:                     15.565&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      149&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               40 Celsius&lt;br /&gt;
Temperature Sensor 2:               45 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that shows we&#039;re at 6% and 7% usage on hetzner3, whereas I guess we&#039;re at 100% on hetzner2&lt;br /&gt;
# the third hetzner doc refers to a software raid. actually, I thought we were using a hardware raid, but now I&#039;m not sure&lt;br /&gt;
# this indicates that our raid is fine. two UUs (eg `[UU]`) is fine. Bad would be a U and a missing U (eg `[U_]`)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat &lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sdb2[1] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[1] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah crap, the process to bring the new drive back into the RAID is not-trivial https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## first we have to format the new drive exactly as the old drive, then add each partition into the RAID array, then update grub. And, of course, meanwhile we&#039;ll be running on one disk. So if we fuck-up any of those steps, we loose everything. This could take me a few days (or weeks), and meanwhile the sites are all offline and our daily backups on backblaze are being deleted/rotated out of existance. Sadly, I think I&#039;m going to postpone this until after we get the sites back-up.&lt;br /&gt;
# the last hetzner doc shows us how to get the serial number of our disks (which hetzner will ask-for when we tell them to swap it)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and ran a SMART test; it says it&#039;ll take just 2 minutes to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:07:55 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:08:18 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also kicked-off a long test, which I can check tomorrow&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:12 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:14 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, then we have the filesystem. it looks like /var/lib/msyql/ lives on &#039;/&#039; which is /dev/md2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h /var/lib/mysql&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
/dev/md2        197G  145G   43G  78% /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/md2&lt;br /&gt;
&lt;br /&gt;
Disk /dev/md2: 215.0 GB, 215024271360 bytes, 419969280 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk /dev/md2&lt;br /&gt;
NAME MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
md2    9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it won&#039;t let me check the filesystem while it&#039;s mounted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fsck /dev/md2&lt;br /&gt;
fsck from util-linux 2.23.2&lt;br /&gt;
e2fsck 1.42.9 (28-Dec-2013)&lt;br /&gt;
/dev/md2 is mounted.&lt;br /&gt;
e2fsck: Cannot continue, aborting.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it probably should be happening on-boot, but I couldn&#039;t find it in dmesg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i check&lt;br /&gt;
[    0.000000] Early table checksum verification disabled&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i fsck&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, instead we can just use tune2fs to get the info on the last check that was run&lt;br /&gt;
# looks like it ran today; probably when Marcin rebooted it https://unix.stackexchange.com/questions/400851/what-should-i-do-to-force-the-root-filesystem-check-and-optionally-a-fix-at-bo&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2&lt;br /&gt;
tune2fs 1.42.9 (28-Dec-2013)&lt;br /&gt;
Filesystem volume name:   &amp;lt;none&amp;gt;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Filesystem UUID:          af18bd25-f715-4003-b055-170a07591c60&lt;br /&gt;
Filesystem magic number:  0xEF53&lt;br /&gt;
Filesystem revision #:    1 (dynamic)&lt;br /&gt;
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize&lt;br /&gt;
Filesystem flags:         signed_directory_hash&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Filesystem OS type:       Linux&lt;br /&gt;
Inode count:              13131776&lt;br /&gt;
Block count:              52496160&lt;br /&gt;
Reserved block count:     2624808&lt;br /&gt;
Free blocks:              26575102&lt;br /&gt;
Free inodes:              12417672&lt;br /&gt;
First block:              0&lt;br /&gt;
Block size:               4096&lt;br /&gt;
Fragment size:            4096&lt;br /&gt;
Reserved GDT blocks:      1011&lt;br /&gt;
Blocks per group:         32768&lt;br /&gt;
Fragments per group:      32768&lt;br /&gt;
Inodes per group:         8192&lt;br /&gt;
Inode blocks per group:   512&lt;br /&gt;
Flex block group size:    16&lt;br /&gt;
Filesystem created:       Tue May 31 06:01:12 2016&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Last write time:          Thu Apr 17 17:39:00 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
Check interval:           0 (&amp;lt;none&amp;gt;)&lt;br /&gt;
Lifetime writes:          124 TB&lt;br /&gt;
Reserved blocks uid:      0 (user root)&lt;br /&gt;
Reserved blocks gid:      0 (group root)&lt;br /&gt;
First inode:              11&lt;br /&gt;
Inode size:               256&lt;br /&gt;
Required extra isize:     28&lt;br /&gt;
Desired extra isize:      28&lt;br /&gt;
Journal inode:            8&lt;br /&gt;
Default directory hash:   half_md4&lt;br /&gt;
Directory Hash Seed:      b9456d9f-1608-4444-99c2-02e6f327e42d&lt;br /&gt;
Journal backup:           inode blocks&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# both of the filesystems (/ and /boot) look fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md1 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /boot&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              46&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Tue May 31 06:01:07 2016&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, so far I couldn&#039;t find any signs of corruption on the disk/fs level&lt;br /&gt;
# back to the db, I set the recovery option in the my.cnf file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# cp my.cnf my.cnf.20250417&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 1&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it didn&#039;t come-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried changing it to restore level 2; this time it got stuck &amp;quot;waiting for the background threads&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 22:32:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 22:32:49 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 14901 ...&lt;br /&gt;
250417 22:32:49 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 22:32:49 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 22:32:49 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 22:32:49 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 22:32:49 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 22:32:49 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 22:32:49 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 22:32:49  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 22:32:49  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 22:32:49  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:50  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:51  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:52  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:53  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:54  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:55  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:56  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:57  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:58  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it seems infinite. I don&#039;t know if it&#039;s going to time-out, but I&#039;m just going to leave it and come-back tomorrow.&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 11, 2025=&lt;br /&gt;
&lt;br /&gt;
# let&#039;s get Catarina that broken staging site for osemain on hetzner3&lt;br /&gt;
# Marcin still hasn&#039;t regained access to his ssh key (so he can update the ose keepass), but he did finally send me the password to our hetzner account&lt;br /&gt;
# so now I can order a second IPv4 address, as needed for obi &amp;amp; osemain to have two distinct sites on hetzner3&lt;br /&gt;
# I logged-into hetzner https://robot.hetzner.com/server&lt;br /&gt;
# I also typed a &amp;quot;name&amp;quot; into the blank &amp;quot;name&amp;quot; fields for our two servers. one is now called &amp;quot;hetzner2&amp;quot; and the new one &amp;quot;hetzner3&amp;quot;&lt;br /&gt;
# I clicked on the server for &amp;quot;hetzner3&amp;quot; and the tab &amp;quot;IPs&amp;quot;.&lt;br /&gt;
## Then I clicked on &amp;quot;Order additional IPs / Nets&amp;quot;&lt;br /&gt;
## I selected &amp;quot;One additional IP with costs (€ 1.70 max. per month / € 0.0027 per hour + € 4.90 once-off setup)&amp;quot;&lt;br /&gt;
## it required me to enter a reason (IPv4 is scarce) to which I wrote:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
we need to run two websites with the same domain name that are already running on our primary IPv4 address, and a client doesn&#039;t have IPv6 working at their office&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## and I clicked &amp;quot;Apply for IP/subnet in obligation&amp;quot;&lt;br /&gt;
## I got a message; looks like it needs human approval&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your request for additional IPs/subnets was successfully sent. We will send you an email as soon as your IP/subnet is ready.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I typed an email to Marcin and Catarina to notify them of this order&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
As authorized on our last call, I ordered an additional IPv4 address for your hetzner account.&lt;br /&gt;
&lt;br /&gt;
IPv4 addresses are scarce, and it appears that they need to approve it manually.&lt;br /&gt;
&lt;br /&gt;
The cost is €1.70 per month + € 4.90 once-off setup.&lt;br /&gt;
&lt;br /&gt;
This will allow us to run more than one website with the same domain off the same server. That will be needed for osemain and obi.&lt;br /&gt;
&lt;br /&gt;
Once you finish rebuilding those websites on hetzner3 to use a new not-broken theme, we can cancel this second IP address.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I finished typing ^ that email, I got an email from hetzner indicating that we have a new IP&lt;br /&gt;
# I refreshed the hetzner wui, and now I see the new IP&lt;br /&gt;
# ...&lt;br /&gt;
# following-up on the bus factor, I added Catarina &amp;amp; Tom&#039;s ssh keys to their authorized_keys files on hetzner3&lt;br /&gt;
## I sent them both emails asking them to confirm access&lt;br /&gt;
# I also emailed Marcin asking if he installed zulucrypt yet to try to recover his old ssh key&lt;br /&gt;
# update: within a few hours, Marcin had successfully decrypted and mounted his old veracrypt volume using zuluCrypt&lt;br /&gt;
# he created this article on the wiki https://wiki.opensourceecology.org/wiki/Zulucrypt&lt;br /&gt;
# I found that he had previously documented scattered articles about backups, luks, veracrypt, pgp, cybersec general, etc in a ton of different articles. So I spent some time adding categories and &amp;quot;see also&amp;quot; sections to those articles, in hopes he will be more easily able to do this in the future&lt;br /&gt;
# I also asked him to please document what he needed for himself 5 years from now into a README file next to the &#039;ose-veracrypt&#039; volume on his usb drive.&lt;br /&gt;
# Marcin confirmed that he was able to restore his ssh keys and ssh into hetzner3. awesome.&lt;br /&gt;
# ...&lt;br /&gt;
# I logged all my hours and sent an invoice to OSE for last month (Mar 2025)&lt;br /&gt;
# gah, I had obliterated half my 2025Q1 log. when I tried to restore it, I got a 413 error lgo&lt;br /&gt;
# I checked php and nginx; it&#039;s 10M. How did I write &amp;gt;10 MB of text in one quarter?&lt;br /&gt;
# there&#039;s too many layers on this server; I checked the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Apr 11 22:18:20.306872 2025] [:error] [pid 13182] [client 127.0.0.1:56606] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;], referer: https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&lt;br /&gt;
HTTP/1.1 413 Request Entity Too Large&lt;br /&gt;
Message: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413)&lt;br /&gt;
Apache-Error: [file &amp;quot;apache2_util.c&amp;quot;] [line 271] [level 3] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;]&lt;br /&gt;
127.0.0.1 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.0&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot;&lt;br /&gt;
146.70.199.124 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.1&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it&#039;s modsecurity?&lt;br /&gt;
# gah, that&#039;s a lot of files to review&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# find .  |grep -i security&lt;br /&gt;
./conf.d/mod_security.wordpress.include&lt;br /&gt;
./conf.d/mod_security.conf&lt;br /&gt;
./conf.modules.d/10-mod_security.conf&lt;br /&gt;
./modsecurity.d&lt;br /&gt;
./modsecurity.d/activated_rules&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_42_tight_security.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_35_bad_robots.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_45_trojans.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_48_local_exceptions.conf.example&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_bad_robots.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_23_request_limits.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_49_inbound_blocking.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_60_correlation.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound_malware.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_scanners.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_40_generic_attacks.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_50_outbound.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_47_common_exceptions.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_30_http_policy.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_20_protocol_violations.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_xss_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_59_outbound_blocking.conf&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf.20181024.orig&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf&lt;br /&gt;
./modsecurity.d/do_not_log_passwords.conf&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it&#039;s SecRequestBodyLimit http://stackoverflow.com/questions/13887812/ddg#14690797&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -irl &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf&lt;br /&gt;
modules/mod_security2.so&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s 13107200&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimit 13107200&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimitAction Reject&lt;br /&gt;
Binary file modules/mod_security2.so matches&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# docs say it&#039;s in bytes https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecRequestBodyLimit&lt;br /&gt;
# so 13107200 / 1024 / 1024 = 12.5 MB.&lt;br /&gt;
# jesus that&#039;s a lot of data; I&#039;m not gonna increase that in 4 places (nginx, apache, mod_security, php); let&#039;s just split it into two articles :(&lt;br /&gt;
# ...&lt;br /&gt;
# so Marcin is stressing urgancy to get Catarina a sandbox so she can rebuild osemain using some new theme that&#039;s not broken on the latest version of wordpress, php, etc on hetzner3&lt;br /&gt;
# I didn&#039;t want to do this site before the other less-priority ones, but it&#039;s just a sandbox&lt;br /&gt;
# I realized I never made a CHG file for osemain&lt;br /&gt;
# looks like I first did a snapshot Jan 31https://wiki.opensourceecology.org/wiki/Maltfield_Log/2025_Q1#Fri_Jan_31.2C_2025&lt;br /&gt;
# ugh, I just said I was &amp;quot;following the same guide as with the other sites&amp;quot;&lt;br /&gt;
## I was hoping to know which one to CHG to copy-from&lt;br /&gt;
## I guess it makes the most sense to copy from obi, which already has both a static and dynamic site setup (untested)&lt;br /&gt;
# ok, I made a first draft of our osemain CHG to migrate to hetnzer3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
# oh, crap, I&#039;m going to remove&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306064</id>
		<title>Maltfield Log/2025 Q2</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306064"/>
		<updated>2025-04-27T21:57:09Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: Apr 24&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the second quarter of the year 2025. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 24, 2025=&lt;br /&gt;
# it&#039;s 05:00; I tried to login to the wiki, but I got an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, under that it says I&#039;m already logged-in?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
You are already logged in as Maltfield. Use the form below to log in as another user. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, let&#039;s start the CHG to replace the failing disk on hetzner 2 https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
# I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried to remove the first partition from the RAID, but it said I can&#039;t?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# apparently the docs say that if the RAID is healthy, you have to force it with &#039;--fail&#039; https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
# crap, I realized I have an issue in my CHG (we need two sysadmins for peer review *sigh*)&lt;br /&gt;
## I listed this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## but it should be this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, it looks like I first need to execute this, to force the RAID into a failure state&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I was able to remove it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# by 10:32 UTC, I submitted the request to hetzner to replace /dev/sdb = &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# it says they should do it within 2-4 hours&lt;br /&gt;
# meanwhile, I updated https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# at 08:00 my time, I checked and saw that we had an email come from hetzner at 06:36 (my time)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, crap. I tried to load the wiki CHG article, but there&#039;s an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry! This site is experiencing technical difficulties.&lt;br /&gt;
&lt;br /&gt;
Try waiting a few minutes and reloading.&lt;br /&gt;
&lt;br /&gt;
(Cannot access the database)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the server wasn&#039;t shutdown, and my screen session is still intact, but dmesg is being flooded with RAID and io errors&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
[11136.011313] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.011372] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11136.319267] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11136.319322] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.827642] EXT4-fs error: 5 callbacks suppressed&lt;br /&gt;
[11138.827693] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
[11138.827793] EXT4-fs: 5 callbacks suppressed&lt;br /&gt;
[11138.827841] EXT4-fs (md2): previous I/O error to superblock detected&lt;br /&gt;
[11138.835255] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835311] md: super_written gets error=-5, uptodate=0&lt;br /&gt;
[11138.835367] Buffer I/O error on dev md2, logical block 0, lost sync page write&lt;br /&gt;
[11138.835472] EXT4-fs error (device md2): ext4_find_entry:1318: inode #6819864: comm postdrop: reading directory lblock 0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well anyway, I&#039;ll see if I can at least restart the RAID sync and install grub on the new disk&lt;br /&gt;
# son of a bitch, they removed the wrong drive!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 13:05:32 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT&lt;br /&gt;
sdb      8:16   0   477G  0 disk &lt;br /&gt;
sdc      8:32   0 232.9G  0 disk &lt;br /&gt;
├─sdc1   8:33   0    32G  0 part &lt;br /&gt;
├─sdc2   8:34   0   512M  0 part &lt;br /&gt;
└─sdc3   8:35   0 200.4G  0 part &lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
device node not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it shows a new drive (sdc) and and old drive (sdb)&lt;br /&gt;
# ugh, so now we have nothing in the raid?&lt;br /&gt;
# here&#039;s the new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdc | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# christ, so this new disk is half the size of our actual disk? what did they do?!?&lt;br /&gt;
# and now we have a prod server online with no redundancy. I can&#039;t tell them to put back-in the *correct* disk, or we&#039;ll have data loss&lt;br /&gt;
# I&#039;m going to stop all the web services before this disaster gets any worse&lt;br /&gt;
# great; io errors. this is a damn disaster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
Failed to stop apache2.service: Unit apache2.service not loaded.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and made partition backups, anyway&lt;br /&gt;
# wait, actually, it said that /dev/sdc = Crucial_CT250MX200SSD1_154410FA336C. That&#039;s our old /dev/sda&lt;br /&gt;
# so they *did* remove the right drive, but the re-insertion of the wrong drive pushed /dev/sda to /dev/sdc. That kinda breaks our ability to map the RAID, but let&#039;s at-least partition this new drive&lt;br /&gt;
# but this new drive isn&#039;t the right size. it&#039;s 512G while our old disk was 250G. I guess it&#039;s better to have too-big of a disk than too-small of a disk, but we won&#039;t be able to use that extra disk space. I&#039;m going to assume that they just didn&#039;t have 250G disks in-stock anymore.&lt;br /&gt;
# anyway, I tried to backup the partitions, but that wouldn&#039;t work since we&#039;re read-only&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
mkdir: cannot create directory ‘/var/tmp/chg.20250424_132010’: Read-only file system&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
chown: cannot access ‘/var/tmp/chg.20250424_132010’: No such file or directory&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what to do besides giving it a reboot, but that scares me&lt;br /&gt;
# I&#039;d like to take a backup, but I can&#039;t if I get read-only errors :(&lt;br /&gt;
# well, I guess that&#039;s why we made a backup before this. I don&#039;t think I have any option other than to reboot. and pray that grub is intact to bring it back.&lt;br /&gt;
# I gave it a reboot. If it doesn&#039;t come back, I&#039;ll try to boot to the rescue CD from within the hetzner wui&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# date &amp;amp;&amp;amp; reboot&lt;br /&gt;
Thu Apr 24 13:24:18 UTC 2025&lt;br /&gt;
/usr/bin/pkttyagent: error while loading shared libraries: /lib64/libpolkit-agent-1.so.0: cannot read file data: Input/output error&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&lt;br /&gt;
Failed to start reboot.target: Unit is not loaded properly: Input/output error.&lt;br /&gt;
See system logs and &#039;systemctl status reboot.target&#039; for details.&lt;br /&gt;
&lt;br /&gt;
Broadcast message from maltfield@opensourceecology.org on pts/4 (Thu 2025-04-24 13:24:18 UTC):&lt;br /&gt;
&lt;br /&gt;
The system is going down for reboot NOW!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# wtf, it can&#039;t even reboot it&#039;s so broken.&lt;br /&gt;
# I triggered a rest on the hetzner wui&lt;br /&gt;
# the server came back, and I immediately shutdown all services again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop apache2&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and triggered backups&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /etc/cron.d/backup_to_backblaze &lt;br /&gt;
20 07 * * * root time /bin/nice /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
20 04 03 * * root time /bin/nice /root/backups/backupReport.sh&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# time /root/backups/backup.sh &amp;amp;&amp;gt;&amp;gt; /var/log/backups/backup.log&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, sdc is gone. we have sda and sdb again, and sda is our original sda – as we wanted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Micron_1100_MTFDDAK512TBN_171416BD4379&lt;br /&gt;
ID_SERIAL_SHORT=171416BD4379&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I made a backup of the partitions; it&#039;s not surprising the sdb file is empty&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
[root@opensourceecology ~]# chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
[root@opensourceecology ~]# mkdir $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chown root:root $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 $chg_dir&lt;br /&gt;
[root@opensourceecology ~]# pushd $chg_dir&lt;br /&gt;
/var/tmp/chg.20250424_133230 ~&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# du -sh ${chg_dir}/*&lt;br /&gt;
4.0K    /var/tmp/chg.20250424_133230/sda_parttable_mbr.bak&lt;br /&gt;
0       /var/tmp/chg.20250424_133230/sdb_parttable_mbr.bak&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I copied the partition from sda to sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
Checking that no-one is using this disk right now ...&lt;br /&gt;
OK&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 62260 cylinders, 255 heads, 63 sectors/track&lt;br /&gt;
sfdisk:  /dev/sdb: unrecognized partition table type&lt;br /&gt;
&lt;br /&gt;
Old situation:&lt;br /&gt;
sfdisk: No partitions found&lt;br /&gt;
&lt;br /&gt;
New situation:&lt;br /&gt;
Units: sectors of 512 bytes, counting from 0&lt;br /&gt;
&lt;br /&gt;
   Device Boot    Start       End   #sectors  Id  System&lt;br /&gt;
/dev/sdb1          2048  67110912   67108865  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2      67112960  68161536    1048577  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3      68163584 488395120  420231537  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb4             0         -          0   0  Empty&lt;br /&gt;
Warning: partition 1 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 2 does not end at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not start at a cylinder boundary&lt;br /&gt;
Warning: partition 3 does not end at a cylinder boundary&lt;br /&gt;
Warning: no primary partition is marked bootable (active)&lt;br /&gt;
This does not matter for LILO, but the DOS MBR will not boot this disk.&lt;br /&gt;
Successfully wrote the new partition table&lt;br /&gt;
&lt;br /&gt;
Re-reading the partition table ...&lt;br /&gt;
&lt;br /&gt;
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)&lt;br /&gt;
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1&lt;br /&gt;
(See fdisk(8).)&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good, other than the complaint about not being able to boot from this disk; I&#039;ll check later what is LILO and if this will matter for raid grub&lt;br /&gt;
# I reloaded the partition table for this disk&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# blockdev --rereadpt /dev/sdb&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I added the new disk to the RAID, and it shows that it&#039;s starting to sync now. excellent&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm: added /dev/sdb1&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm: added /dev/sdb2&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
mdadm: added /dev/sdb3&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [&amp;gt;....................]  recovery =  0.0% (19712/33521664) finish=481.1min speed=1159K/sec&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, it looks like it&#039;s not syncing each partition of the RAID at the same time. it&#039;s doing md0 now and then it&#039;ll do the others after, I guess&lt;br /&gt;
# md0 is partition 1 (sda1/sdb1). That&#039;s *sigh* swap. It&#039;s 32GB.&lt;br /&gt;
# I kinda wish we&#039;d sync&#039;d /boot first. I don&#039;t think I can install grub until that&#039;s sync&#039;d. maybe?&lt;br /&gt;
# it says it&#039;s moving about 1024K/s. That&#039;s 1 MB per sec. 32G*1024 = 32,768 MB. That&#039;s 32,768 seconds / 60 = 546 minutes / 60 = 9 hours. Just for swap!&lt;br /&gt;
# assuming we have the same speed for the rest of the disk, that&#039;s 250 G * 1024 = 256,000 MB / 1 MB/s = 256,000 seconds. 256,000 seconds / 60 = 4,266.666666667 minutes / 60 = 4,266.666666667 = 71.11 hours. I guess we just have to accept the risk and hope that old /dev/sda with all our data doesn&#039;t fail within then next 3 days.&lt;br /&gt;
# I tried to go ahead and install grub on the new disk, but i got a command not found error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub-install /dev/sdb&lt;br /&gt;
-bash: grub-install: command not found&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub&lt;br /&gt;
grub2-bios-setup           grub2-glue-efi             grub2-mkconfig             grub2-mkpasswd-pbkdf2      grub2-probe                grub2-set-default&lt;br /&gt;
grub2-editenv              grub2-install              grub2-mkfont               grub2-mkrelpath            grub2-reboot               grub2-setpassword&lt;br /&gt;
grub2-file                 grub2-kbdcomp              grub2-mkimage              grub2-mkrescue             grub2-render-label         grub2-sparc64-setup&lt;br /&gt;
grub2-fstest               grub2-macbless             grub2-mklayout             grub2-mkstandalone         grub2-rpm-sort             grub2-syslinux2cfg&lt;br /&gt;
grub2-get-kernel-settings  grub2-menulst2cfg          grub2-mknetdir             grub2-ofpathname           grub2-script-check         grubby&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it should be &#039;grub2-install&#039; I tried that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that&#039;s two warnings but no errors; I&#039;ll take it.&lt;br /&gt;
# we&#039;re up to 12.4% on the RAID sync of swap. It&#039;s now going &amp;gt;50x faster than it was before; good news&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [==&amp;gt;..................]  recovery = 12.4% (4168832/33521664) finish=8.2min speed=59264K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# calculations at that speed would be 250*1024/58 = 4,413.793103448 seconds / 60 = 73 minutes. Oh, that&#039;s just over an hour.&lt;br /&gt;
# and now we&#039;re at 42.7%&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [========&amp;gt;............]  recovery = 42.7% (14334208/33521664) finish=6.6min speed=47845K/sec&lt;br /&gt;
      &lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
		resync=DELAYED&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology chg.20250424_133230]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups are still running; I&#039;ll let them finish before starting-up the webservers again&lt;br /&gt;
# I wrote a status email to Marcin&lt;br /&gt;
# the backups still aren&#039;t finished&lt;br /&gt;
# I checked on the raid replication, and it shows md0 (swap) and md1 (boot) are both done. Horray! Now we just need to finish root (/), which is 9.8% done and going at 60 MB/s. Great!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:05:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [=&amp;gt;...................]  recovery =  9.8% (20767872/209984640) finish=50.5min speed=62429K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave the grub install a double-tap now that it&#039;s synced with the first disk; the output was the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# grub2-install /dev/sdb&lt;br /&gt;
Installing for i386-pc platform.&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
grub2-install: warning: Couldn&#039;t find physical volume `(null)&#039;. Some modules may be missing from core image..&lt;br /&gt;
Installation finished. No error reported.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the output of lsblk looks much nicer now, too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# lsblk&lt;br /&gt;
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
sda       8:0    0 232.9G  0 disk  &lt;br /&gt;
├─sda1    8:1    0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sda2    8:2    0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sda3    8:3    0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
sdb       8:16   0   477G  0 disk  &lt;br /&gt;
├─sdb1    8:17   0    32G  0 part  &lt;br /&gt;
│ └─md0   9:0    0    32G  0 raid1 [SWAP]&lt;br /&gt;
├─sdb2    8:18   0   512M  0 part  &lt;br /&gt;
│ └─md1   9:1    0 511.4M  0 raid1 /boot&lt;br /&gt;
└─sdb3    8:19   0 200.4G  0 part  &lt;br /&gt;
  └─md2   9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# backups say they&#039;re 9% uploaded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/backups/backup.log&lt;br /&gt;
...&lt;br /&gt;
2025/04/24 14:13:48 INFO  :&lt;br /&gt;
Transferred:        2.210G / 20.472 GBytes, 11%, 2.904 MBytes/s, ETA 1h47m20s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:      13m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg: 10% /20.472G, 2.997M/s, 1h43m59s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I decided to just kill the backup script and manually upload it without the bwlimit, so it&#039;ll go-out faster&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# /bin/sudo -u b2user /bin/rclone -v copy /home/b2user/sync/daily_hetzner2_20250424_133017.tar.gpg b2:ose-server-backups&lt;br /&gt;
2025/04/24 14:15:20 INFO  :&lt;br /&gt;
Transferred:      116.500M / 20.472 GBytes, 1%, 1.958 MBytes/s, ETA 2h57m25s&lt;br /&gt;
Transferred:            0 / 1, 0%&lt;br /&gt;
Elapsed time:       1m0.5s&lt;br /&gt;
Transferring:&lt;br /&gt;
 *        daily_hetzner2_20250424_133017.tar.gpg:  0% /20.472G, 5.065M/s, 1h8m35s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# meanwhile we&#039;re at 24% on the RAID sync&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 14:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [====&amp;gt;................]  recovery = 23.9% (50200448/209984640) finish=101.1min speed=26325K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, important to note: our new disk doesn&#039;t say that it&#039;s failing :D&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# while the old disk says it&#039;s reached 100% of its lifecycle, the new disk says it&#039;s at – uhh – 96% of it&#039;s life? That doesn&#039;t sound very good :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78516&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       50&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3445&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       47&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   060   046   000    Old_age   Always       -       40 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       407132499909&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12839097351&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26313144762&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52083&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       33&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1449&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       20&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   061   049   000    Old_age   Always       -       39 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       600236629947&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18860233219&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11828985935&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Shame. I was hoping for at least something &amp;lt;50%. Well, I wonder how long that remaining 4% will last us :/&lt;br /&gt;
# ok, backups just finished; let&#039;s start the web services&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl start mariadb&lt;br /&gt;
[root@opensourceecology ~]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the wiki CHG with a status https://wiki.opensourceecology.org/wiki/Category:CHGs&lt;br /&gt;
# And I sent an email to Marcin recommending that he replace /dev/sda with an actual new drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&lt;br /&gt;
Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&lt;br /&gt;
I was a bit disappointed to learn that hetzner replaced a disk with 0% &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for choosing the free disk replacement..&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&lt;br /&gt;
Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on replacing that one next week too, but I would recommend that you pay for a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&lt;br /&gt;
Do you authorize me selecting €41.18 for the replacement of /dev/sda on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# from the output above, our old drive said it had &amp;quot;Power_On_Hours&amp;quot; of 78516/24/365 = 8.96 years&lt;br /&gt;
# and our new drive says Power_On_Hours = 52083/24/365 = 5.95 years. Well that&#039;s better, I guess.&lt;br /&gt;
# oh wow, the power cycle count is crazy; our disk we only rebooted 50 times and the new one was only 33 times.&lt;br /&gt;
# also the SMART data for both of these drives has different keys (not just values). apparently it&#039;s very vendor-specific, so some of these comparisons are apples-to-oranges&lt;br /&gt;
# right, we&#039;re at 69.7% replication on root. I&#039;m going to go make breakfast and check-in again after&lt;br /&gt;
# ...&lt;br /&gt;
# over lunch, I realized that Marcin&#039;s last email was possibly hyperbolic panic&lt;br /&gt;
# he&#039;s worried that he just kicked-off a marketing campaign (for the apprenticeship), which now links to information on a broken website – where potential applicants can&#039;t read the info&lt;br /&gt;
# but I think the content actually *is* accessible, just not to Marcin&lt;br /&gt;
# when you&#039;re logged-into the wiki, the cookies bypass the cache. So, regretablly, when hetnzer2&#039;s backend is offline, Marcin sees an error&lt;br /&gt;
# but I&#039;d bet that the frontpage of all the websites and the recently-published apprenticeship info page that he&#039;s published &amp;amp; promoted are still online when he sees that error – for users who are *not* logged-into the site&lt;br /&gt;
# but if the backend site is broken for &amp;gt;24 hours, then the cache will cache the errors (not the content)&lt;br /&gt;
# as a short-term hack, I recommended that we setup a daily reboot of hetzner2 at 10:40 (a good buffer after the backups finish uploading)&lt;br /&gt;
# I asked Marcin if he&#039;d like me to setup a daily reboot at 10:40&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&lt;br /&gt;
Of course I agree it&#039;s not good, and we should migrate away from hetzner2 asap. And I do wish I had more bandwidth to finish the migration faster for you.&lt;br /&gt;
&lt;br /&gt;
But you have a varnish cache that caches pages for 24 hours. Even if your backend webserver and database are down, popular pages (like the frontpage of your wiki or a recent article that you&#039;ve recently promoted) should still load for users.&lt;br /&gt;
&lt;br /&gt;
The big issue isn&#039;t marketing and read-only content. The big issue is editing. That&#039;s what is breaking.&lt;br /&gt;
&lt;br /&gt;
When you&#039;re logged into the wiki, it bypasses the varnish cache. So, even if the wiki appears down to you, the contents of (most) articles viewed in the past 24 hours will be still visible to potential apprenticeship applicants.&lt;br /&gt;
&lt;br /&gt;
The next time you see the websites are down, try loading it from another device where you&#039;re not logged-in. You&#039;ll probably see that the apprenticeship info is still accessible, even though the backend for the site is down.&lt;br /&gt;
&lt;br /&gt;
As a short-term hack, I recommend setting-up a daily reboot of the server. Backups typically finish before 10:10 UTC. I recommend we add a cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&lt;br /&gt;
The server seems to function for some time after a fresh reboot, and it caches pages for 24 hours. So the first time someone loads a page in the wiki after that reboot, it&#039;ll be cached for the entire time that the server is online until its next reboot. I think this will ensure higher availability of your read-only content (eg information about the apprenticeship).&lt;br /&gt;
&lt;br /&gt;
Would you like me to setup a daily reboot at 10:40 UTC on hetzner2? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# I checked-in on the RAID replication status; it&#039;s finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
	  [===================&amp;gt;.]  recovery = 96.5% (202794752/209984640) finish=2.5min speed=46324K/sec&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:20:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
	 	 523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like I started it just after 13:32 and it finished just before 15:20. So it took just under 2 hours. Great!&lt;br /&gt;
# I updated the article with status updates, marking the CHG as completed successfully https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb#2025-04-24_16:18_UTC&lt;br /&gt;
# And I sent an email to Marcin &amp;amp; Catarana to let them know it was successful, and asked again about buying a new drive for replacing /dev/sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Update: your new (used) disk is now fully synced with the old (failing) disk.&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-04-24_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
According to SMART data, you now have one failing disk and one not-failing disk.&lt;br /&gt;
&lt;br /&gt;
Your hetzner2 RAID is now healthy, and you have redundancy spread across two mirrored disks again.&lt;br /&gt;
&lt;br /&gt;
Next week I&#039;d like to replace the other failing disk. Please let me know if you approve the purchase of a new disk for its replacement. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin got back to me, approving the purchase of the new disk; I updated the ticket https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
# Note that the price is listed as &amp;quot;at cost&amp;quot; and it says&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# 1,000 hours is fine. That&#039;s compared to the 78,516 hours of /dev/sda and 52,083 hours of our &amp;quot;new&amp;quot; /dev/sdb&lt;br /&gt;
# but it&#039;s a bit concerning that it says it might not be in-stock. I&#039;m going to message them and ask if they can set one aside for us for next week&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hi Support,&lt;br /&gt;
&lt;br /&gt;
Can you set-aside a replacement disk for this server?&lt;br /&gt;
&lt;br /&gt;
Our disks&#039; SMART logs indicated that both disks should be replaced. Today we replaced one of the two disks, but the disk that you replaced it with has 4% of its life left, according to SMART data (it has 52,083 hours of operation).&lt;br /&gt;
&lt;br /&gt;
Next week we would like to replace the other disk, and this time we&#039;d like your &amp;quot;at cost&amp;quot; option, to get a disk with &amp;lt;1,000 hours of operation.&lt;br /&gt;
&lt;br /&gt;
But I was a bit concerned when I read this next to the WUI option for &amp;quot;at cost&amp;quot; on your website&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
&lt;br /&gt;
Specifically what worries me is the &amp;quot;may not be in stock&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
Can you please tell us if you have stock now? And if you do, can you please reserve one disk for us for next week?&lt;br /&gt;
&lt;br /&gt;
We don&#039;t want to remove a disk from our RAID and plan for downtime, only to discover that you don&#039;t have a disk available for us..&lt;br /&gt;
&lt;br /&gt;
Please let us know if you can reserve 1 disk for us for next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I asked Marcin if Wed next week at 11:00 UTC is ok for replacing hetzner2&#039;s sda&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
   * 13:00 in Germany (where the server lives)&lt;br /&gt;
   * 06:00 here in Ecuador, and&lt;br /&gt;
   * 06:00 at FeF&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime,&lt;br /&gt;
please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
agreeable to you, and if you have any questions.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Marcin returned the email confirming the time&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin got back to me and told me to setup the daily reboot cron on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, please set up reboot. That is decent for now&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 11:08 AM Michael Altfield &amp;lt;REDACTED@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I don&#039;t think the situation is as bad as you think.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;  &amp;gt; We are missing opportunity,&lt;br /&gt;
&amp;gt;  &amp;gt; the announcement is posted, and our servers are down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Of course I agree it&#039;s not good, and we should migrate away from&lt;br /&gt;
&amp;gt; hetzner2 asap. And I do wish I had more bandwidth to finish the&lt;br /&gt;
&amp;gt; migration faster for you.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; But you have a varnish cache that caches pages for 24 hours. Even if&lt;br /&gt;
&amp;gt; your backend webserver and database are down, popular pages (like the&lt;br /&gt;
&amp;gt; frontpage of your wiki or a recent article that you&#039;ve recently&lt;br /&gt;
&amp;gt; promoted) should still load for users.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The big issue isn&#039;t marketing and read-only content. The big issue is&lt;br /&gt;
&amp;gt; editing. That&#039;s what is breaking.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When you&#039;re logged into the wiki, it bypasses the varnish cache. So,&lt;br /&gt;
&amp;gt; even if the wiki appears down to you, the contents of (most) articles&lt;br /&gt;
&amp;gt; viewed in the past 24 hours will be still visible to potential&lt;br /&gt;
&amp;gt; apprenticeship applicants.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The next time you see the websites are down, try loading it from another&lt;br /&gt;
&amp;gt; device where you&#039;re not logged-in. You&#039;ll probably see that the&lt;br /&gt;
&amp;gt; apprenticeship info is still accessible, even though the backend for the&lt;br /&gt;
&amp;gt; site is down.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; As a short-term hack, I recommend setting-up a daily reboot of the&lt;br /&gt;
&amp;gt; server. Backups typically finish before 10:10 UTC. I recommend we add a&lt;br /&gt;
&amp;gt; cron to hetzner2 to reboot itself every day at 10:40 UTC = 05:40 FeF time.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The server seems to function for some time after a fresh reboot, and it&lt;br /&gt;
&amp;gt; caches pages for 24 hours. So the first time someone loads a page in the&lt;br /&gt;
&amp;gt; wiki after that reboot, it&#039;ll be cached for the entire time that the&lt;br /&gt;
&amp;gt; server is online until its next reboot. I think this will ensure higher&lt;br /&gt;
&amp;gt; availability of your read-only content (eg information about the&lt;br /&gt;
&amp;gt; apprenticeship).&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you like me to setup a daily reboot at 10:40 UTC on hetzner2?&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we don&#039;t have ansible for hetzner2; I did this manually&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology cron.d]# pwd&lt;br /&gt;
/etc/cron.d&lt;br /&gt;
[root@opensourceecology cron.d]# ls -lah&lt;br /&gt;
total 52K&lt;br /&gt;
drwxr-xr-x.   2 root root 4.0K Apr 24 17:56 .&lt;br /&gt;
drwxr-xr-x. 105 root root  12K Apr 18 21:52 ..&lt;br /&gt;
-rw-r--r--    1 root root  128 May 16  2023 0hourly&lt;br /&gt;
-rw-r--r--    1 root root 1.3K Apr  9  2019 awstats_generate_static_files&lt;br /&gt;
-rw-r--r--    1 root root  151 Apr 24 17:52 backup_to_backblaze&lt;br /&gt;
-rw-r--r--    1 root root   78 May 31  2024 cacti&lt;br /&gt;
-rw-r--r--    1 root root  125 Dec 11 00:16 letsencrypt&lt;br /&gt;
-rw-r--r--    1 root root  506 Mar 18  2019 phplist&lt;br /&gt;
-rw-r--r--    1 root root  108 Jan  7  2022 raid-check&lt;br /&gt;
-rw-r--r--    1 root root  118 Apr 24 17:56 reboot&lt;br /&gt;
-rw-------    1 root root  235 Dec 15  2022 sysstat&lt;br /&gt;
[root@opensourceecology cron.d]# cat reboot &lt;br /&gt;
# 2025-04-24: temp hack for unstable hetzner2 while we build-out hetzner3 to replace it&lt;br /&gt;
40 10 * * * root /sbin/reboot&lt;br /&gt;
[root@opensourceecology cron.d]# &lt;br /&gt;
# tomorrow morning I should check on the uptime and journalctl to make sure it rebooted sometime around 10:40 UTC&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# ok, back to hetzner3: we bought a second IPv4 address for the static sites, but the server&#039;s networking was never setup for it; let&#039;s add that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # cp interfaces interfaces.20250424&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that failed.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
You have mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
I restored the backup file, and it still failed. The journal and status aren&#039;t helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl status networking&lt;br /&gt;
× networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: failed (Result: exit-code) since Thu 2025-04-24 17:18:55 UTC; 52s ago&lt;br /&gt;
   Duration: 2month 1w 20h 39min 50.765s&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 3259336 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)&lt;br /&gt;
	Process: 3259371 ExecStopPost=/usr/bin/touch /run/network/restart-hotplug (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 3259336 (code=exited, status=1/FAILURE)&lt;br /&gt;
		CPU: 29ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
root@hetzner3 ~ # journalctl -u networking | tail&lt;br /&gt;
Apr 24 17:16:36 hetzner3 ifup[3258504]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:16:36 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259347]: RTNETLINK answers: File exists&lt;br /&gt;
Apr 24 17:18:55 hetzner3 ifup[3259336]: ifup: failed to bring up enp0s31f6&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: networking.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Apr 24 17:18:55 hetzner3 systemd[1]: Failed to start networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I run the ExecStart command manaully, I can add a verbose tag. but that&#039;s not especially helpful, either&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ifup --verbose -a --read-environment&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
&lt;br /&gt;
ifup: configuring interface enp0s31f6=enp0s31f6 (inet)&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-pre-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-pre-up.d/ethtool&lt;br /&gt;
ip addr add 144.76.164.201/255.255.255.224 broadcast 144.76.164.223       dev enp0s31f6 label enp0s31f6&lt;br /&gt;
RTNETLINK answers: File exists&lt;br /&gt;
ifup: failed to bring up enp0s31f6&lt;br /&gt;
run-parts --exit-on-error --verbose /etc/network/if-up.d&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/000resolvconf&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/ethtool&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/postfix&lt;br /&gt;
run-parts: executing /etc/network/if-up.d/resolved&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# curiously, though, the new IPv4 address is listed in `ip a`&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to give this server a reboot before proceeding, to make sure the IP config is sticky&lt;br /&gt;
# when it came-up, it lost the new IP :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, at least it&#039;s restarting now without errors; I can work with that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # systemctl restart networking&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /etc/network # systemctlstatus networking&lt;br /&gt;
-bash: systemctlstatus: command not found&lt;br /&gt;
root@hetzner3 /etc/network # systemctl status networking&lt;br /&gt;
● networking.service - Raise network interfaces&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (exited) since Thu 2025-04-24 17:33:40 UTC; 15s ago&lt;br /&gt;
	   Docs: man:interfaces(5)&lt;br /&gt;
	Process: 8598 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)&lt;br /&gt;
	Process: 9022 ExecStart=/bin/sh -c if [ -f /run/network/restart-hotplug ]; then /sbin/ifup -a --read-environment --allow=hotplug; fi (code=exited, status=0/SUCCESS)&lt;br /&gt;
   Main PID: 9022 (code=exited, status=0/SUCCESS)&lt;br /&gt;
		CPU: 357ms&lt;br /&gt;
&lt;br /&gt;
Apr 24 17:33:34 hetzner3 systemd[1]: Starting networking.service - Raise network interfaces...&lt;br /&gt;
Apr 24 17:33:39 hetzner3 ifup[8663]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 ifup[8907]: Waiting for DAD... Done&lt;br /&gt;
Apr 24 17:33:40 hetzner3 systemd[1]: Finished networking.service - Raise network interfaces.&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s try to add it now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces interfaces.20250424 &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # vim interfaces&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,23&lt;br /&gt;
&amp;gt; iface enp0s31f6 inet static&lt;br /&gt;
&amp;gt;   address 144.76.164.195&lt;br /&gt;
&amp;gt;   netmask 255.255.255.224&lt;br /&gt;
&amp;gt;   gateway 144.76.164.193&lt;br /&gt;
&amp;gt;   # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
&amp;gt;   #up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&amp;gt; &lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, but I have errors again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# curiously, it *did* add the new IP address; wtf&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
Job for networking.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status networking.service&amp;quot; and &amp;quot;journalctl -xeu networking.service&amp;quot; for details.&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/27 brd 144.76.164.223 scope global secondary enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet isn&#039;t very helpful because it seems the damn format has changed so many times over the years; lots of outdated info&lt;br /&gt;
# lots of people say they fixed this by deleting everything in interfaces.d/, but we don&#039;t have anything in that folder&lt;br /&gt;
# I did find this hetzner-specific docs on adding a second IP; it&#039;s totally different than what I&#039;ve read elsewhere https://docs.hetzner.com/robot/dedicated-server/network/net-config-debian-ubuntu&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
up ip addr add 10.4.2.1/32 dev eth0&lt;br /&gt;
down ip addr del 10.4.2.1/32 dev eth0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried this, and gave the server a reboot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # diff interfaces.20250424 interfaces&lt;br /&gt;
16a17,20&lt;br /&gt;
&amp;gt;   # 2025-04-24: add second IPv4 address&lt;br /&gt;
&amp;gt;   up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;   down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/network # cat interfaces&lt;br /&gt;
### Hetzner Online GmbH installimage&lt;br /&gt;
&lt;br /&gt;
source /etc/network/interfaces.d/*&lt;br /&gt;
&lt;br /&gt;
auto lo&lt;br /&gt;
iface lo inet loopback&lt;br /&gt;
iface lo inet6 loopback&lt;br /&gt;
&lt;br /&gt;
auto enp0s31f6&lt;br /&gt;
iface enp0s31f6 inet static&lt;br /&gt;
  address 144.76.164.201&lt;br /&gt;
  netmask 255.255.255.224&lt;br /&gt;
  gateway 144.76.164.193&lt;br /&gt;
  # route 144.76.164.192/27 via 144.76.164.193&lt;br /&gt;
  up route add -net 144.76.164.192 netmask 255.255.255.224 gw 144.76.164.193 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
  # 2025-04-24: add second IPv4 address&lt;br /&gt;
  up ip addr add 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
  down ip addr del 144.76.164.195/32 dev enp0s31f6&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::2&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
&lt;br /&gt;
iface enp0s31f6 inet6 static&lt;br /&gt;
  address 2a01:4f8:200:40d7::3&lt;br /&gt;
  netmask 64&lt;br /&gt;
  gateway fe80::1&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the system came-up with the IP I want. Cool!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/network # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 /etc/network # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I&#039;m able to restart the service without it yelling at me (or breaking the IP config)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # systemctl restart networking&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 ~ # ip a&lt;br /&gt;
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000&lt;br /&gt;
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00&lt;br /&gt;
	inet 127.0.0.1/8 scope host lo&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 ::1/128 scope host noprefixroute &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
2: enp0s31f6: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc fq_codel state UP group default qlen 1000&lt;br /&gt;
	link/ether 90:1b:0e:c4:28:b4 brd ff:ff:ff:ff:ff:ff&lt;br /&gt;
	inet 144.76.164.201/27 brd 144.76.164.223 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet 144.76.164.195/32 scope global enp0s31f6&lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::3/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 2a01:4f8:200:40d7::2/64 scope global &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
	inet6 fe80::921b:eff:fec4:28b4/64 scope link &lt;br /&gt;
	   valid_lft forever preferred_lft forever&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also able to ping the server on both IPs, which is a good sign&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ ping 144.76.164.201&lt;br /&gt;
PING 144.76.164.201 (144.76.164.201) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=1 ttl=50 time=490 ms&lt;br /&gt;
64 bytes from 144.76.164.201: icmp_seq=2 ttl=50 time=490 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.201 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1000ms&lt;br /&gt;
rtt min/avg/max/mdev = 489.558/489.676/489.795/0.118 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
user@disp9871:~$ ping 144.76.164.195&lt;br /&gt;
PING 144.76.164.195 (144.76.164.195) 56(84) bytes of data.&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=1 ttl=50 time=493 ms&lt;br /&gt;
64 bytes from 144.76.164.195: icmp_seq=2 ttl=50 time=512 ms&lt;br /&gt;
^C&lt;br /&gt;
--- 144.76.164.195 ping statistics ---&lt;br /&gt;
2 packets transmitted, 2 received, 0% packet loss, time 1001ms&lt;br /&gt;
rtt min/avg/max/mdev = 492.853/502.518/512.184/9.665 ms&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I used netcat to test it. Most ports are closed, and I found that nginx is listening on most of the other ports on all IPs – except 4443&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # nc -s 144.76.164.195 -l -p 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and this was how it looked on my laptop&#039;s side&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ nc 144.76.164.195 4443&lt;br /&gt;
I am typing this on my laptop computer&#039;s local terminal; it should show-up on the server&#039;s terminal&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so the server&#039;s new IPv4 address is configured (and persistent between reboots)&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 20, 2025=&lt;br /&gt;
# Marcin replied to my email authorizing the replacement of the /dev/sdb disk on hetzner2 at 2025-04-24 10:00 UTC https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
## I updated the article with the defined date &amp;amp; time&lt;br /&gt;
# ...&lt;br /&gt;
# I also checked hetzner3. I see that I setup email alerts for the RAID, but not for SMART.&lt;br /&gt;
## on hetzner2, we had no errors of the RAID, but we did have SMART errors. I guess eventually if it failed enough that RAID replication was breaking, we would have gotten alerts. But it would be good if we could get alerts *before* that happened..&lt;br /&gt;
# I checked munin on hetzner2 to see what data it collects for monitoring disks @ /disk-day.html&lt;br /&gt;
## looks like we have latency, throughput, usage, utilization, i/o, and inode usage. There&#039;s nothing about &amp;quot;SMART errors&amp;quot;&lt;br /&gt;
# looks like there *is* a smart module for munin https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
# it&#039;s already there on hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Mar 21  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 smart_&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# hetzner2 has it too &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology munin]# ls -lah /usr/share/munin/plugins | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Nov  6  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Nov  6  2023 smart_&lt;br /&gt;
[root@opensourceecology munin]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# crap, I just checked hetzner3&#039;s munin, and I realized that varnish is missing :(&lt;br /&gt;
# it looks like ansible *has* pushed-out the script and plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /usr/share/munin/plugins/ | grep -i varnish&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 varnish_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Feb 12 00:14 varnish5_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 28  2024 varnish5_.175431.2025-02-12@00:16:02~&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 25  2024 varnish5_.20240928.orig&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /etc/munin/plugins/ | grep -i varnish&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_backend_traffic -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_bad -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_expunge -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_hit_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_main_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_memory_usage -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_mgt_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_objects -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_request_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_threads -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_transfer_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Feb 12 00:16 varnish_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
 # I did a diff of the varnish5_ script from my server and ose&#039;s server, and I found 2 new lines at the top of the hetzner3 server&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@mail:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
#&lt;br /&gt;
# This program is free software; you can redistribute it and/or modify&lt;br /&gt;
maltfield@mail:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
# Ansible managed&lt;br /&gt;
&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so basically the issue appears to be that my &amp;quot;ansible managed&amp;quot; comment comes before the shebang, so varnish is interpreting everything as shell, instead of perl&lt;br /&gt;
# we can see the result of all these syntax errors with a test run too&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@mail:/etc/munin# munin-run varnish_hit_rate&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
client_req.value 704255&lt;br /&gt;
cache_miss.value 202581&lt;br /&gt;
cache_hitmiss.value 2181&lt;br /&gt;
cache_hit.value 499493&lt;br /&gt;
root@mail:/etc/munin#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 26: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 28: varnish5_: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 30: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 32: Varnish: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 34: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 36: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 38: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 39: [varnish5_*]: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 40: group: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 41: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 42: env.name: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 44: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 108: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 111: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 114: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 117: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 119: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 123: Syntax error: &amp;quot;(&amp;quot; unexpected&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I moved the &amp;quot;ansible managed&amp;quot; comment below the shebang in ansible, and pushed it out; now it works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
client_req.value 10714&lt;br /&gt;
cache_hitmiss.value 9&lt;br /&gt;
cache_hit.value 6478&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
cache_miss.value 4227&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also pushed-out smart at the same time, but it&#039;s not working&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_ suggest&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the docs page for the smart_ munin plugin says that we need this section at-minimum in the munin config file, so I added it to hetzner2 https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# tail -n4 zzz-ose &lt;br /&gt;
&lt;br /&gt;
[smart_*]&lt;br /&gt;
user root&lt;br /&gt;
group disk&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I manually created the symlinks for sda &amp;amp; sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /etc/munin/plugins&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sda&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sdb&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# sweet, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# munin-run smart_sdb&lt;br /&gt;
Program_Fail_Count.value 100&lt;br /&gt;
Reallocated_Event_Count.value 100&lt;br /&gt;
Ave_Block_Erase_Count.value 001&lt;br /&gt;
Reallocate_NAND_Blk_Cnt.value 100&lt;br /&gt;
Erase_Fail_Count.value 100&lt;br /&gt;
Reported_Uncorrect.value 100&lt;br /&gt;
SATA_Interfac_Downshift.value 100&lt;br /&gt;
Offline_Uncorrectable.value 100&lt;br /&gt;
smartctl_exit_status.value 8&lt;br /&gt;
Write_Error_Rate.value 100&lt;br /&gt;
FTL_Program_Page_Count.value 100&lt;br /&gt;
Current_Pending_Sector.value 100&lt;br /&gt;
Success_RAIN_Recov_Cnt.value 100&lt;br /&gt;
UDMA_CRC_Error_Count.value 100&lt;br /&gt;
Error_Correction_Count.value 100&lt;br /&gt;
Temperature_Celsius.value 064&lt;br /&gt;
Raw_Read_Error_Rate.value 100&lt;br /&gt;
Total_Host_Sector_Write.value 100&lt;br /&gt;
Power_Cycle_Count.value 100&lt;br /&gt;
Power_On_Hours.value 100&lt;br /&gt;
Host_Program_Page_Count.value 100&lt;br /&gt;
Unused_Reserve_NAND_Blk.value 000&lt;br /&gt;
Percent_Lifetime_Remain.value 000&lt;br /&gt;
Unexpect_Power_Loss_Ct.value 100&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, I&#039;m not getting the same results on hetzner3. I wonder if this munin plugin doesn&#039;t support nvme drives?&lt;br /&gt;
# oh, it looks like I&#039;m actually not updating that file anymore in ansible, because it has a backup. I&#039;m going to make a note in ansible so I don&#039;t make that mistake again.&lt;br /&gt;
# meanwhile, I manually updated the config file on hetzner3 too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin # cd plugin-conf.d/&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls&lt;br /&gt;
dhcpd3  munin-node  README  spamstats  zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # touch /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chown root:root /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chmod 0600 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # cp zzz-myconf /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls -lah /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
-rw------- 1 root root 1,7K Apr 20 17:29 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # vim zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # diff /var/tmp/munin-zzz-myconf.20250420 /etc/munin/plugin-conf.d/zzz-myconf &lt;br /&gt;
3c3&lt;br /&gt;
&amp;lt; # Version: 0.2&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Version: 0.3&lt;br /&gt;
9c9&lt;br /&gt;
&amp;lt; # Updated: 2024-12-12&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Updated: 2025-04-20&lt;br /&gt;
31a32,35&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; [smart_*]&lt;br /&gt;
&amp;gt; user root&lt;br /&gt;
&amp;gt; group disk&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that still fails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but, if I restart the service first and then run it, it – uhh – kinda works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # service munin-node restart&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it exits with a non-error, just a U. no further stats. huh.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
smartctl_exit_status.value U&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, it looks like the smart_ plugin doesn&#039;t work for nvme drives :(&lt;br /&gt;
## https://github.com/munin-monitoring/munin/issues/790&lt;br /&gt;
## https://github.com/aranemac/munin-smart-nvme&lt;br /&gt;
# I&#039;m not looking to compile some binary. I think we&#039;ve reached the point of diminished return here&lt;br /&gt;
# while historical smart charts would be great, what I really want to achieve is some email alerts from SMART, like we setup for the RAID&lt;br /&gt;
# I found a few guides about this&lt;br /&gt;
## https://linuxconfig.org/how-to-configure-smartd-and-be-notified-of-hard-disk-problems-via-email&lt;br /&gt;
## https://serverfault.com/questions/426761/is-smartd-properly-configured-to-send-alerts-by-email&lt;br /&gt;
## https://unix.stackexchange.com/questions/662633/best-practices-to-enable-smart-disk-notifications-on-a-linux-workstation&lt;br /&gt;
# I replaced the files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # mv /etc/smartd.conf /etc/smartd.conf.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).orig&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc # echo &amp;quot;DEVICESCAN -d removable -n standby -m REDACTED@opensourceecology.org -M exec /usr/share/smartmontools/smartd-runner&amp;quot; &amp;gt; /etc/smartd.conf&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but that didn&#039;t work; no email came when I restarted the service (even if I added -M test)&lt;br /&gt;
# I checked the status in systemd, and it says that it did try to send the mail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # systemctl status smartd&lt;br /&gt;
● smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/smartmontools.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (running) since Sun 2025-04-20 20:58:57 UTC; 3min 22s ago&lt;br /&gt;
	   Docs: man:smartd(8)&lt;br /&gt;
			 man:smartd.conf(5)&lt;br /&gt;
   Main PID: 1466569 (smartd)&lt;br /&gt;
	 Status: &amp;quot;Next check of 2 devices will start at 21:28:57&amp;quot;&lt;br /&gt;
	  Tasks: 1 (limit: 76834)&lt;br /&gt;
	 Memory: 1.2M&lt;br /&gt;
		CPU: 66ms&lt;br /&gt;
	 CGroup: /system.slice/smartmontools.service&lt;br /&gt;
			 └─1466569 /usr/sbin/smartd -n&lt;br /&gt;
&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, is SMART capable. Adding to &amp;quot;monitor&amp;quot; list.&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state read from /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme0n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NX0M104566-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 systemd[1]: Started smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon.&lt;br /&gt;
root@hetzner3 /etc #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I checked the postfix logs, and it looks like google is rejecting our mail?!?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # journalctl -fu postfix@-&lt;br /&gt;
...&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: Untrusted TLS connection established to aspmx.l.google.com[108.177.15.27]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bit&lt;br /&gt;
s) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: CB6E5B94BB2: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[108.177.15.27]:25, delay=1.2, delays=0.01/0.01/0.86/0.27, dsn=2.0.0, status=sent (250 2.0.0 OK  1745183017 ffacd0b85a97d-39efa5a45b6si4251829f8f.798 - gsmtp)&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/qmgr[4510]: CB6E5B94BB2: removed&lt;br /&gt;
Apr 20 21:04:36 hetzner3 postfix/smtp[1468114]: Untrusted TLS connection established to aspmx.l.google.com[2404:6800:4003:c02::1b]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/bounce socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/bounce socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/defer socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/defer socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: D13CAB94BB3: defer service failure&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: D13CAB94BB3: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[2404:6800:4003:c02::1b]:25, delay=4.5, delays=0.01/0.01/3.5/1, dsn=4.3.0, status=deferred (bounce or trace service failure)&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I changed it to my personal email, restarted, and I got two emails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme1&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NA0M345614, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme0&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NX0M104566, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I changed it back to the google groups email list email address, and I updated the wiki https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
# after lunch, I refreshed munin on hetzne2 and hetzner3, to see if smart info was not being charted&lt;br /&gt;
## on hetzner2, there&#039;s no changes. I don&#039;t see any charts related to SMART&lt;br /&gt;
## on hetzner3, there&#039;s two new charts (S.M.A.R.T values for drive nvme0n1 &amp;amp; S.M.A.R.T values for drive nvme1n1), but they&#039;re both empty; it only has 1 value (smartctl_exit_status), and it&#039;s &amp;quot;nan&amp;quot; for all time charts. This is expected, since it can&#039;t read the nvme smartctl output format.&lt;br /&gt;
# I think maybe I forgot to restart munin on hetzner2, so I gave that a try&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# service munin-node restart&lt;br /&gt;
Redirecting to /bin/systemctl restart munin-node.service&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# sudo -u munin /usr/bin/munin-cron&lt;br /&gt;
2025/04/20 21:29:38 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 55.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 56.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:52 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# whatever; I guess no munin logs on SMART for this dying server&lt;br /&gt;
# I also confirmed that varnish logs are now visible in munin&lt;br /&gt;
# I committed my ansible changes https://github.com/OpenSourceEcology/ansible/commit/2fb906fd62cf0773d84f50f1cf113ddfe66910ec&lt;br /&gt;
# anyway, I also updated smartd.conf on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology smartmontools]# cp smartd.conf smartd.conf.20250420.bak&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# vim smartd.conf&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# diff smartd.conf.20250420.bak smartd.conf&lt;br /&gt;
23c23,24&lt;br /&gt;
&amp;lt; DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; #DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
&amp;gt; DEVICESCAN -H -m REDACTED@opensourceecology.org -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
[root@opensourceecology smartmontools]# systemctl restart smartd&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh wow, that screaming about the disks failing wasn&#039;t just printed to my tty; it got printed to every tty on my screen session. It really is angry..&lt;br /&gt;
# but, alas, no email was sent – even from hetzner2. where email should *definitely* be working&lt;br /&gt;
# this time the postfix logs on hetzner2 gave us an error from gmail saying why they&#039;re blocking us&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21221]: 297716847E6: host aspmx.l.google.com[64.233.167.27] said: 421-4.7.28 Gmail has detected an unusual rate of unso&lt;br /&gt;
licited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.go&lt;br /&gt;
ogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42a931si4417083f8f.167 - gsmtp (in reply to end &lt;br /&gt;
of DATA command)&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21094]: 3CBF7684804: host aspmx.l.google.com[142.251.168.27] said: 421-4.7.28 Gmail has detected an unusual rate of uns&lt;br /&gt;
olicited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.g&lt;br /&gt;
oogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42967csi4306047f8f.165 - gsmtp (in reply to end&lt;br /&gt;
 of DATA command)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# marcin sent an email campaign today with phpList. If that didn&#039;t make it out due to this, that&#039;s kinda  problem.&lt;br /&gt;
# I see in the log that we&#039;re kinda spamming phplist_bounces@opensourceecology.org&lt;br /&gt;
# that&#039;s basically where phplist is supposed to let our admins know that it failed to deliver to some people on the mailing list&lt;br /&gt;
## I confirmed that this account *does* exist in the gsuite admin wui user list&lt;br /&gt;
# yeah, crap, it&#039;s blocking other mail sent to my personal account from apache.&lt;br /&gt;
# woah, I&#039;m tailing the mail log and I just got probably hundereds or thousands of emails tried to be sent. phpList is *supposed* to do it in small batches, but I wonder if, once it fails and gets added to the queue, it&#039;ll do the re-send without batching it..&lt;br /&gt;
# I checked phpList wui settings and config.php, and I don&#039;t see anything about rate-limiting&lt;br /&gt;
# here&#039;s the docs on it https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
# it says it should be set in config.php. By default, I think it&#039;s 5,000 emails per hour&lt;br /&gt;
# Marcin&#039;s campaign today was sent to 14,111 people&lt;br /&gt;
# I checked the event log page, and I see a lot of these &amp;quot;Maximum time for queue processing: 99999&amp;quot; – which I guess means we need to break these up into batches https://phplist.opensourceecology.org/lists/admin/?page=eventlog&lt;br /&gt;
# looks like the easiest thing to do is to add a pause with MAILQUEUE_THROTTLE https://discuss.phplist.org/t/some-advice-for-correct-configuration-of-sending-rate/429&lt;br /&gt;
# if we send one per second, then we&#039;ll send 3,600 per hour.&lt;br /&gt;
## If we have 15,000 people on our list, then at that rate we&#039;d need 4-5 hours to send a campaign. That sounds like a good idea.&lt;br /&gt;
# I updated the phpList config file to send only one email per second&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# diff config.20250420.php config.php &lt;br /&gt;
83a84,87&lt;br /&gt;
&amp;gt; // only send 1 email per second&lt;br /&gt;
&amp;gt; //  * https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
&amp;gt; define(&#039;MAILQUEUE_THROTTLE&#039;,1);&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we should also probably throttle postfix https://serverfault.com/questions/110919/postfix-throttling-for-outgoing-messages&lt;br /&gt;
# looks like for both hetzner2 and hetzner3, this is set to no delay&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 0s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I set this on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology postfix]# diff main.cf.20250420 main.cf&lt;br /&gt;
683a684,686&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # limit emails to the same-destination-domain to one-email-per-2-seconds&lt;br /&gt;
&amp;gt; default_destination_rate_delay = 2s&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# systemctl restart postfix&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 2s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I also added this to ansible and pushed it out to the server on hetnzer3 https://github.com/OpenSourceEcology/ansible/commit/7ed339cad055a9a0c5b04f26d32c9416daf3a2c7&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 19, 2025=&lt;br /&gt;
&lt;br /&gt;
# I responded to Tom&#039;s email about ssh&lt;br /&gt;
# Tom wasn&#039;t able to reset their account&#039;s password&lt;br /&gt;
# I think I created these accounts with `--disabled-password`, probably as some layered security for ssh (to force keys), but that kinda breaks sudo, which requires the password. I could make sudo NOPASSWD, but I think it&#039;s safer to have a user password set (and have ssh disabled passoword logins still) rather than set sudoers to NOPASSWD, in general&lt;br /&gt;
# disabled passwords are set with the &#039;!&#039; in the second field of /etc/shadown&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing:!:20133:0:99999:7:::&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I just manually edited /etc/shadow with vim to remove the exclimation point&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # vim /etc/shadow&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing::20133:0:99999:7:::&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom replied, saying he can become root on hetzner3 now.&lt;br /&gt;
# ...&lt;br /&gt;
# I returned to work on the plan for replacing the disks on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb#Change_Steps&lt;br /&gt;
# I confirmed that the disks (on both hetzner2 and hetzner3) are MBR partition scheme (not GPT) – indicated by &amp;quot;Disk label type: dos&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sda&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0x9b8e1266&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sda1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sdb&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0xd904fc05&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sdb1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# A quick spot-check shows that our backups usually finish at 09:55 – one time as late as 10:07. That&#039;s UTC.&lt;br /&gt;
# 10:00 UTC is 05:00 my time and 12:00 in Berlin. God that&#039;s early, but better to do this early in Germany time..&lt;br /&gt;
# I sent an email to Marcin asking if Thr 2025-04-24 @ 10:00 UTC (~05:00 FeF) would be a good time to do this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the first disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
Our backups finish daily at 10:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
 * 12:00 in Germany (where the server lives)&lt;br /&gt;
 * 05:00 here in Ecuador, and&lt;br /&gt;
 * 05:00 at FeF&lt;br /&gt;
&lt;br /&gt;
I propose next week on Thursday 2025-04-24 10:00 UTC.&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime, please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 18, 2025=&lt;br /&gt;
# Marcin sent another email this morning asking why osemain is down too now, and I responded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the&lt;br /&gt;
&amp;gt; last message&lt;br /&gt;
&lt;br /&gt;
Your whole database service was down, and it won&#039;t start. You have a varnish cache that stores a subset of pages in-memory for 24 hours. That&#039;s probably what you saw.&lt;br /&gt;
&lt;br /&gt;
I took webservers down yesterday to prevent the possibility of them corrupting the database worse, if it manages to start in recovery mode.&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt; go straight to migration to Hetzner 3.&lt;br /&gt;
&lt;br /&gt;
If you want high uptime, I don&#039;t recommend migrating to hetzner3 at this time. It&#039;s still not fully provisioned, and I actively work on it like a dev server. Which means I&#039;ll be restarting it and its services. It&#039;s not a safe place for production. That&#039;s why the wiki is the *last* service to migrate.&lt;br /&gt;
&lt;br /&gt;
Status update: yesterday I investigated to see if your underlying storage (disk, filesystem, or RAID) are failing, which might cause corruption. The filesystems were fine. RAID didn&#039;t have errors. The SMART logs on the disk said both of your two mirrored drives are failing and should be replaced within 24 hours. But I don&#039;t think that&#039;s evidence of corruption; I think it&#039;s just a timer that&#039;s alerting us to the possibility that the disks will fail soon. afaict, disk replacement is free (from Hetzner) but not trivial and high-risk. I&#039;ll postpone until after restoring the database.&lt;br /&gt;
&lt;br /&gt;
Likely not all of your database is corrupt. We *could* restore from backup, but I don&#039;t recommend that -- as you only have daily backups, and likely you&#039;ll have data loss.&lt;br /&gt;
&lt;br /&gt;
Yesterday I put the database in two recovery modes and was unable to get it to start. My plan is to continue to follow this guide, to see if I can find out which databases/tables/pages are corrupt and which are not. That way we can restore only the data we need from backups and minimize data loss&lt;br /&gt;
&lt;br /&gt;
 * https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&lt;br /&gt;
I have to go to the hospital today. If I have time, I will try to continue later tonight. And I plan to work on this over the weekend. I hope to have your sites back online early next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cheers,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 02:58, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Michael,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the last message -&lt;br /&gt;
&amp;gt; but now I&#039;m trying to post the blog posts and the main site appears to be&lt;br /&gt;
&amp;gt; down. Is our whole backend crashing?  Or is that something you are doing on&lt;br /&gt;
&amp;gt; your end?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Marcin&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On Thu, Apr 17, 2025 at 6:41 PM Marcin Jakubowski &amp;lt;&lt;br /&gt;
&amp;gt; REDACTED@opensourceecology.org&amp;gt; wrote:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Can we prioritize the wiki at this point to migrate the wiki right over to&lt;br /&gt;
&amp;gt;&amp;gt; Hetzner 3 with the  current up to date software, using the wiki backup from&lt;br /&gt;
&amp;gt;&amp;gt; 2 days ago, which is before the crash?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; The wiki was working at least the first part of yesterday, and I noticed&lt;br /&gt;
&amp;gt;&amp;gt; the crash at about 11 PM CST yesterday. Thus taking the backup from 4/15/25&lt;br /&gt;
&amp;gt;&amp;gt; should solve this? Ie, forget about trying to fix on Hetzner 2, go straight&lt;br /&gt;
&amp;gt;&amp;gt; to migration to Hetzner 3. Is that consistent with a possible shift in your&lt;br /&gt;
&amp;gt;&amp;gt; plans, or does that throw off the entire process of migration? OSE stands&lt;br /&gt;
&amp;gt;&amp;gt; stuck without it, I will have to do everything in Google docs if I don&#039;t&lt;br /&gt;
&amp;gt;&amp;gt; have wiki access, and i am justvputtingvout the announcent and recruiting.&lt;br /&gt;
&amp;gt;&amp;gt; I can switcj ro more publishing on the website, assuming that all works.&lt;br /&gt;
&amp;gt;&amp;gt; Please tell me what would be your proposed solution and how quickly you&lt;br /&gt;
&amp;gt;&amp;gt; think we can get back up to a functioning wiki, based on your schedule of&lt;br /&gt;
&amp;gt;&amp;gt; availability to work on this, so I can plan accordingly.  This is a much&lt;br /&gt;
&amp;gt;&amp;gt; higher priority than doing any of the main website migration.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thanks,&lt;br /&gt;
&amp;gt;&amp;gt; Marcin &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so back to trying to figure out the corruption of the mariadb&lt;br /&gt;
# looks like the attempt to start it in recovery mode 2 fails after 10 minutes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because a fatal signal was delivered to the control process. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    10m0.435s&lt;br /&gt;
user    0m0.011s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and the tail of the db log&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/mariadb/mariadb.log&lt;br /&gt;
250417 23:06:00  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:01  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:02  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:03  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:05  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:06  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:07  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:08  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:09  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so we have one more recovery mode we can try before it becomes destructive = 3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 3&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and gave it a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# damn, looks like it&#039;s stuck on the same thing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:33:17 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:33:17 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 20076 ...&lt;br /&gt;
250418 19:33:17 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:33:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:33:17 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:33:17 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:33:17 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:33:17 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:33:17 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:33:17  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:33:17  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:33:18  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:19  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:20  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet suggests this infinite loop is caused by the default of innodb_purge_threads=1, and it says we should set this to 0&lt;br /&gt;
## https://serverfault.com/questions/851342/mysql-crashed-and-not-starting-even-after-adding-innodb-force-recovery&lt;br /&gt;
## https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# I tried to cut off the systemctl restart early, but it&#039;s just stuck. I guess I just have to wait 10 minutes.&lt;br /&gt;
# anyway, I set the recovery back down to 2 and added the purge threads to 0 line; I&#039;ll try that when it&#039;s not blocked&lt;br /&gt;
# meanwhile, I read up on innodb_purge_threads, which is documented here https://dev.mysql.com/doc/refman/8.4/en/innodb-purge-configuration.html&lt;br /&gt;
# oh shit, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m2.102s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
[root@opensourceecology etc]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 19:44:30 UTC; 19s ago&lt;br /&gt;
  Process: 22469 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 22433 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 22468 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─22468 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─22693 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-...&lt;br /&gt;
&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 19:44:30 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the logs are being spammed with these last 5 lines a bunch; I guess something is still trying to access the db?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:44:28 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 22693 ...&lt;br /&gt;
250418 19:44:28 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:44:28 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:44:28 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:44:28 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:44:28 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:44:28 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:44:28 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:44:28  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:44:28  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:44:28  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:44:29 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883505166&lt;br /&gt;
250418 19:44:29 InnoDB: !!! innodb_force_recovery is set to 2 !!!&lt;br /&gt;
250418 19:44:29 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 19:44:29 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 19:44:29 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, the spam stopped. maybe just some startup thing.&lt;br /&gt;
# I was hoping at startup it would tell us which DBs/tables/pages were corrupt; I guess we have to initiate a scan or something.&lt;br /&gt;
# this guide doesn&#039;t say anything about that https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
# but this one recommends running `mysqlcheck` https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# this took about a minute to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# good news; looks like the wiki isn&#039;t fucked. it&#039;s just osemain, oswh, and cacti. restoring those from backups is probably not going to cause any data loss&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@opensourceecology dbFail.20250417]# head mysqlcheck.20250418.log &lt;br /&gt;
3dp_db.wp_commentmeta                              OK&lt;br /&gt;
3dp_db.wp_comments                                 OK&lt;br /&gt;
3dp_db.wp_links                                    OK&lt;br /&gt;
3dp_db.wp_masterslider_options                     OK&lt;br /&gt;
3dp_db.wp_masterslider_sliders                     OK&lt;br /&gt;
3dp_db.wp_options                                  OK&lt;br /&gt;
3dp_db.wp_postmeta                                 OK&lt;br /&gt;
3dp_db.wp_posts                                    OK&lt;br /&gt;
3dp_db.wp_revslider_css                            OK&lt;br /&gt;
3dp_db.wp_revslider_layer_animations               OK&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418.log &lt;br /&gt;
cacti_db.automation_ips&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.automation_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_cache&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_last&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output_boost_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
osemain_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
osemain_s_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
oswh_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s go ahead and take a mysqldump now, including the corrupt data. then I&#039;ll drop these three databases and restore from backups&lt;br /&gt;
## cacti_db&lt;br /&gt;
## osemain_db&lt;br /&gt;
## oswh_db&lt;br /&gt;
# I sent Marcin a status update email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I was able to start your database in recovery mode, and I see the following databases have corrupt tables:&lt;br /&gt;
&lt;br /&gt;
1. osemain&lt;br /&gt;
2. cacti&lt;br /&gt;
3. oswh&lt;br /&gt;
&lt;br /&gt;
Good news that the wiki isn&#039;t in that list. And that those particular corrupt DBs don&#039;t change much, so recovering just those databases from backups should result in an acceptable data loss, if any.&lt;br /&gt;
&lt;br /&gt;
I&#039;ll keep you updated.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I made the post-corruption mysqldump backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass --all-databases | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    2m48.845s&lt;br /&gt;
user    3m19.170s&lt;br /&gt;
sys     0m2.023s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# ls mysqldump*&lt;br /&gt;
mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz &lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s drop those three databases.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 14&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE cacti_db;&lt;br /&gt;
Query OK, 108 rows affected (0.38 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_db;&lt;br /&gt;
Query OK, 22 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oswh_db;&lt;br /&gt;
Query OK, 12 rows affected (0.03 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| fef_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
15 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# recovery mode isn&#039;t going to let us INSERT to recover data from backups, so let&#039;s take it out of recovery mode and see if the db will start&lt;br /&gt;
# nah, it failed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m2.805s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# logs are the same, I think?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:10:04 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:10:04 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 24305 ...&lt;br /&gt;
250418 20:10:04 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:10:04 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:10:04 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:10:04 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:10:04 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:10:04 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:10:04 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:10:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:10:04  InnoDB: Assertion failure in thread 140076605044480 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:10:04 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x560180c61cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x560180875975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f664031f630]&lt;br /&gt;
:0(__GI_raise)[0x7f663ea46387]&lt;br /&gt;
:0(__GI_abort)[0x7f663ea47a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x560180a0a45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x560180a0afa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x560180b0d504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x560180b02487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x560180a0d17d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x560180a010f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6640317ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f663eb0eb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:10:04 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I re-enabled recovery mode, but this time just as 1. This time it did start, but this loop gets spammed to the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:11:42 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883708456&lt;br /&gt;
250418 20:11:42 InnoDB: !!! innodb_force_recovery is set to 1 !!!&lt;br /&gt;
250418 20:11:42 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:11:42 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
250418 20:11:42  InnoDB: Assertion failure in thread 140282494781184 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:11:42 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed, &lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to &lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x55e2d6dbbcad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x55e2d69cf975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f962fbdc630]&lt;br /&gt;
:0(__GI_raise)[0x7f962e303387]&lt;br /&gt;
:0(__GI_abort)[0x7f962e304a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x55e2d6b6445f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x55e2d6b64fa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x55e2d6c67504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x55e2d6c5c487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x55e2d6b6717d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62e83c)[0x55e2d6b5a83c]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f962fbd4ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f962e3cbb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:11:42 mysqld_safe Number of processes running now: 0&lt;br /&gt;
250418 20:11:42 mysqld_safe mysqld restarted&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 27371 ...&lt;br /&gt;
250418 20:11:42 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:11:42 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:11:42 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:11:42 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:11:42 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:11:42 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:11:42 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:11:42  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, even though it *says* it&#039;s started&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m5.156s&lt;br /&gt;
user    0m0.008s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 20:11:07 UTC; 13s ago&lt;br /&gt;
  Process: 24459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 24423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 24458 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─24458 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─25620 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/v...&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:11:07 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
&lt;br /&gt;
real    0m0.012s&lt;br /&gt;
user    0m0.001s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can&#039;t connect to it with mysqlcheck&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log                              &lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I set it back to recovery mode 2, restarted, and tried the mysqlcheck again&lt;br /&gt;
# huh, all lines say OK&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418&lt;br /&gt;
mysqlcheck.20250418_201348.log  mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log&lt;br /&gt;
&lt;br /&gt;
real    0m11.597s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.009s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418_201559.log &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well now I&#039;m wondering if I should have run CHECK TABLE and REPAIR TABLE rather than just DROP them https://dev.mysql.com/doc/refman/8.4/en/myisam-table-close.html&lt;br /&gt;
# I&#039;m going to restore from the backup and then see if I can do that&lt;br /&gt;
# oh, right, we can&#039;t INSERT in recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1030 (HY000) at line 91: Got error -1 from storage engine&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, fuck, now I don&#039;t know why it won&#039;t start. And it doesn&#039;t tell me why. The good news is that I was able to get a db dump. maybe I can copy this huge dump over to some other server for repair and then copy it back?&lt;br /&gt;
# we should have backups. I&#039;m going to just purge all the non-system databases and see if we can get this thing started at all&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db d3ddb;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;d3ddb&#039; at line 1&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE d3d_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE fef_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE microfactory_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_db;&lt;br /&gt;
Query OK, 21 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_stabing_db;&lt;br /&gt;
ERROR 1008 (HY000): Can&#039;t drop database &#039;obi_stabing_db&#039;; database doesn&#039;t exist&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oseforum_db;&lt;br /&gt;
Query OK, 35 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_s_db;&lt;br /&gt;
Query OK, 20 rows affected (0.04 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osewiki_db;&lt;br /&gt;
Query OK, 59 rows affected (0.31 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE phplist_db;&lt;br /&gt;
Query OK, 42 rows affected (0.16 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE seedhome_db;&lt;br /&gt;
Query OK, 12 rows affected (0.05 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE store_db;&lt;br /&gt;
Query OK, 36 rows affected (0.11 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_staging_db;&lt;br /&gt;
Query OK, 21 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# even after that, it still won&#039;t start :&#039;(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m4.863s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Fri 2025-04-18 20:34:47 UTC; 14s ago&lt;br /&gt;
  Process: 18459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 18458 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 18423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 18458 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I purge those three system-level DBs, I want to confirm they&#039;re in our backups&lt;br /&gt;
# as I feared, it looks like they&#039;re missing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zgrep -E &#039;CREATE DATABASE&#039; mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | grep &#039;IF NOT EXISTS&#039; | grep -E &#039;^.{,100}$&#039;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `3dp_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `cacti_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `d3d_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `fef_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `microfactory_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `mysql` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_staging_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oseforum_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_s_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osewiki_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oswh_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `phplist_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `seedhome_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `store_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# according to this, information_schema is essentially a cache that gets created &amp;amp; destroyed every time mysql is restarted, so we should be ok to loose that https://stackoverflow.com/questions/15306132/information-schema-error-when-restoring-database-dump&lt;br /&gt;
# I&#039;m just going to manually dump these three anyway. Or try to&lt;br /&gt;
# well, I was able to get one of the three to backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass information_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_information_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz &lt;br /&gt;
mysqldump: Got error: 1044: &amp;quot;Access denied for user &#039;root&#039;@&#039;localhost&#039; to database &#039;information_schema&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass mysql | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_mysql.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    0m0.142s&lt;br /&gt;
user    0m0.155s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass performance_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_performance_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
mysqldump: Got error: 1142: &amp;quot;SELECT,LOCK TABL command denied to user &#039;root&#039;@&#039;localhost&#039; for table &#039;cond_instances&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.009s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# mysql looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode*&lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_information_schema.20250418_205054.sql.gz&lt;br /&gt;
716K    mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_performance_schema.20250418_205157.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to move this whole db dir out of the way and see if we can start it fresh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /var/lib&lt;br /&gt;
[root@opensourceecology lib]# du -sh mysql/&lt;br /&gt;
6.5G    mysql/&lt;br /&gt;
[root@opensourceecology lib]# ls -lah | grep -i mysql&lt;br /&gt;
drwxr-xr-x   4 mysql   mysql   4.0K Apr 18 20:50 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mv mysql mysql.20250418&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mkdir mysql&lt;br /&gt;
[root@opensourceecology lib]# chown mysql:mysql mysql&lt;br /&gt;
[root@opensourceecology lib]# chmod 0755 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 8.0K&lt;br /&gt;
drwxr-xr-x   2 mysql mysql 4.0K Apr 18 20:55 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, it&#039;s started outside recovery mode now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m3.550s&lt;br /&gt;
user    0m0.007s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
250418 20:55:06 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
250418 20:56:23 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:56:23 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 21252 ...&lt;br /&gt;
250418 20:56:23 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:56:23 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:56:23 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:56:23 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:56:23 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:56:23 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
InnoDB: The first specified data file ./ibdata1 did not exist:&lt;br /&gt;
InnoDB: a new database to be created!&lt;br /&gt;
250418 20:56:23  InnoDB: Setting file ./ibdata1 size to 10 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile0 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile0 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile1 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile1 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
InnoDB: Doublewrite buffer not found: creating new&lt;br /&gt;
InnoDB: Doublewrite buffer created&lt;br /&gt;
InnoDB: 127 rollback segment(s) active.&lt;br /&gt;
InnoDB: Creating foreign key constraint system tables&lt;br /&gt;
InnoDB: Foreign key constraint system tables created&lt;br /&gt;
250418 20:56:23  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:56:24 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 0&lt;br /&gt;
250418 20:56:24 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:56:24 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:56:24 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it created all these files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 29M&lt;br /&gt;
drwxr-xr-x   5 mysql mysql 4.0K Apr 18 20:56 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
-rw-rw----   1 mysql mysql  16K Apr 18 20:56 aria_log.00000001&lt;br /&gt;
-rw-rw----   1 mysql mysql   52 Apr 18 20:56 aria_log_control&lt;br /&gt;
-rw-rw----   1 mysql mysql  18M Apr 18 20:56 ibdata1&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile0&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile1&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 mysql&lt;br /&gt;
srwxrwxrwx   1 mysql mysql    0 Apr 18 20:56 mysql.sock&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 performance_schema&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 test&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that also would have killed the mysql password; I can&#039;t login&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1045 (28000): Access denied for user &#039;root&#039;@&#039;localhost&#039; (using password: YES)&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I hacked my way in and set the root password&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysqld_safe --skip-grant-tables --skip-networking &amp;amp;&lt;br /&gt;
mysql -u root&lt;br /&gt;
use mysql;&lt;br /&gt;
update user set password=PASSWORD(&amp;quot;new-password&amp;quot;) where User=&#039;root&#039;;&lt;br /&gt;
flush privileges;&lt;br /&gt;
exit&lt;br /&gt;
jobs -l&lt;br /&gt;
# kill mysqld_safe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now I can see our three databases, plus one named test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# usually this is where I&#039;d run the mysql hardening script, but let&#039;s just drop test manually and restore from backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE test;&lt;br /&gt;
Query OK, 0 rows affected (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first let&#039;s just restore the &#039;mysql&#039; database&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that appears to have worked; our users are present now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [mysql]&amp;gt; select User from user limit 10;&lt;br /&gt;
+------------------+&lt;br /&gt;
| User             |&lt;br /&gt;
+------------------+&lt;br /&gt;
| oseforum_user    |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| 3dp_user         |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| d3d_user         |&lt;br /&gt;
| fef_user         |&lt;br /&gt;
| microfactory_usr |&lt;br /&gt;
| munin_user       |&lt;br /&gt;
| obi2_user        |&lt;br /&gt;
| obi3_user        |&lt;br /&gt;
+------------------+&lt;br /&gt;
10 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [mysql]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, and ensured it&#039;s still working. Great.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 2&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s restore the rest – including even our corrupt databases – and see if it works or breaks&lt;br /&gt;
# that took about 11.5 minutes to import ~6.8G of data&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
&lt;br /&gt;
real    11m36.530s&lt;br /&gt;
user    1m52.944s&lt;br /&gt;
sys     0m3.593s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh /var/lib/mysql&lt;br /&gt;
6.8G    /var/lib/mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m still able to connect, and now I see all our DBs – including the ones it said were corrupt&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 6&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# woah, I gave it a restart, and it came back fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 3&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess we fixed it with no data loss?&lt;br /&gt;
# let&#039;s bring up the web servers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology lib]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology lib]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the wiki loads now&lt;br /&gt;
# so does osemain&lt;br /&gt;
# I&#039;d say we&#039;re back in business&lt;br /&gt;
# I sent an email to Marcin&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I think all your sites are back now.&lt;br /&gt;
&lt;br /&gt;
I was able to restore all of your databases from a dump of the database in recovery mode. So nothing needed to be restored from backups.&lt;br /&gt;
&lt;br /&gt;
Please let me know if you see any issues. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now that Marcin has ssh access on the server again, I wonder if he has permission to execute `restart` – that would be better for him than logging into the hetzner wui and doing hard resets, which likely caused this corruption&lt;br /&gt;
# at the risk of taking everything down after I just told Marcin that everything is up, I&#039;m going to try it&lt;br /&gt;
# looks like it won&#039;t let him reboot if other users are logged-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ reboot&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
Please retry operation after closing inhibitors and logging out other users.&lt;br /&gt;
Alternatively, ignore inhibitors and users with &#039;systemctl reboot -i&#039;.&lt;br /&gt;
[marcin@opensourceecology ~]$ systemctl reboot -i&lt;br /&gt;
==== AUTHENTICATING FOR org.freedesktop.login1.reboot-multiple-sessions ===&lt;br /&gt;
Authentication is required for rebooting the system while other users are logged in.&lt;br /&gt;
Multiple identities can be used for authentication:&lt;br /&gt;
 1.  maltfield&lt;br /&gt;
 2.  crupp&lt;br /&gt;
 3.  Tom Griffing (tgriffing)&lt;br /&gt;
 4.  jthomas&lt;br /&gt;
Choose identity to authenticate as (1-4):&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the sudoers command to give marcin *just* access to the reboot command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# visudo&lt;br /&gt;
[root@opensourceecology lib]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology lib]# tail /etc/sudoers&lt;br /&gt;
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom&lt;br /&gt;
&lt;br /&gt;
## Allows members of the users group to shutdown this system&lt;br /&gt;
# %users  localhost=/sbin/shutdown -h now&lt;br /&gt;
&lt;br /&gt;
## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)&lt;br /&gt;
#includedir /etc/sudoers.d&lt;br /&gt;
&lt;br /&gt;
# let marcin reboot the machine gracefully&lt;br /&gt;
marcin ALL = NOPASSWD: /sbin/reboot&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I couldn&#039;t test this on the server without changing marcin&#039;s password, so I spun-up a quick DispVM to ensure it *only* gives him access to reboot&lt;br /&gt;
# it&#039;s debian, but sudoers syntax should (hopefully) be the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@debian-12-dvm:~$ sudo su -&lt;br /&gt;
root@debian-12-dvm:~# adduser marcin --disabled-password --gecos &#039;&#039;&lt;br /&gt;
Adding user `marcin&#039; ...&lt;br /&gt;
Adding new group `marcin&#039; (1001) ...&lt;br /&gt;
Adding new user `marcin&#039; (1001) with group `marcin (1001)&#039; ...&lt;br /&gt;
Creating home directory `/home/marcin&#039; ...&lt;br /&gt;
Copying files from `/etc/skel&#039; ...&lt;br /&gt;
Adding new user `marcin&#039; to supplemental / extra groups `users&#039; ...&lt;br /&gt;
Adding user `marcin&#039; to group `users&#039; ...&lt;br /&gt;
root@debian-12-dvm:~# &lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# visudo&lt;br /&gt;
root@debian-12-dvm:~#&lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# passwd marcin&lt;br /&gt;
New password: &lt;br /&gt;
Retype new password: &lt;br /&gt;
passwd: password updated successfully&lt;br /&gt;
root@debian-12-dvm:~# sudo su - marcin&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo su -&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/su -&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo echo hi&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/echo hi&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$ &lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ reboot&lt;br /&gt;
-bash: reboot: command not found&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, that worked. Perfect.&lt;br /&gt;
# I tested it on hetzner2; it worked too.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ sudo reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I sent Marcin a reply ask him to test reboots via ssh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry the server just went down; that was me testing to make sure your &#039;marcin&#039; user now has permission to do a proper &amp;amp; safer `sudo reboot` of hetzner2. It does.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that&lt;br /&gt;
&amp;gt; I should plan on potential breakage at any time?&lt;br /&gt;
&lt;br /&gt;
Great question. There&#039;s a couple things I&#039;d like to implement to prevent this from happening again:&lt;br /&gt;
&lt;br /&gt;
1. Replace both of your disks on hetzner2&lt;br /&gt;
&lt;br /&gt;
2. Give you reboot permission on hetzner2&lt;br /&gt;
&lt;br /&gt;
My best-guess is that the corruption happened because you abruptly shutdown the server. As you know, that&#039;s generally not a good idea as it can cause data loss.&lt;br /&gt;
&lt;br /&gt;
But filesystems use journals and databases use pages. They *should* be able to recover from abrupt shutdowns. They wouldn&#039;t be very useful if they were so frail as to not be able to recover from something like that...&lt;br /&gt;
&lt;br /&gt;
But in this case, I think it was a &amp;quot;perfect storm&amp;quot; that you caused corruption and it wasn&#039;t able to recover from it due to a bug in mariadb. And, because your OS is EOL, we can&#039;t update to a newer version of mariadb that *is* able to recover from such a unlucky combination of events.&lt;br /&gt;
&lt;br /&gt;
So, in the meantime, instead of you logging into hetzner&#039;s WUI to trigger reboots, I&#039;d prefer if you would ssh into the hetzner2 server and execute&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
Please test this on your computer now to make sure you&#039;re setup for it. To ssh into hetzner2, execute this command on your computer:&lt;br /&gt;
&lt;br /&gt;
  ssh -p 32415 marcin@opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
And then at the prompt, execute this command (make sure you type this *after* you&#039;ve logged into hetzner, or you&#039;ll end-up rebooting your own laptop!)&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
The second thing I&#039;d like to do is replace both of your disks on hetzner2. I don&#039;t think they caused corruption in this case, but I did discover that they&#039;re both screaming that they&#039;re going to die soon and asking to be replaced, so I would be a fool not to heed that warning.&lt;br /&gt;
&lt;br /&gt;
Hetzner shouldn&#039;t charge us to replace a failing disk, but I&#039;ll schedule some downtime for remote hetzner hands to shutdown the machine, then I&#039;ll need to format the new drive, add it to the RAID (the mirror of two redundant disks), and update your grub boot partition.&lt;br /&gt;
&lt;br /&gt;
There&#039;s some risk in doing this, because you&#039;ll be running on one non-redundant disk (a disk which is screaming at us saying it&#039;s going to die within 24 hours) while the RAID is re-building. But, of course, there&#039;s risk in not doing it..&lt;br /&gt;
&lt;br /&gt;
Please confirm that you can now reboot hetzner2 via ssh.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 16:39, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Thats excellent, thabk you, looks good. Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that I should plan&lt;br /&gt;
&amp;gt; on potential breakage at any time? Regarding the full migration, how many&lt;br /&gt;
&amp;gt; more hours/days of provisioning do tou still expwct to need? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created an article for the CHG to replace the first disk on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
## I wonder if I can figure out which one grub uses and replace that one second..&lt;br /&gt;
# from my log yesterday, here&#039;s our two drive&#039;s serial numbers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fuck; looks like neither is referenced in /boot/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA4520&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA336C&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the steps to setup grub are actually quite simple, according to the hetzner docs https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## it says if we&#039;re doing it on the booted system, then we just need to run `grub-install /dev/sdX`&lt;br /&gt;
# it has additional instructions for grub1. And, uh, looks like we have grub1, grub2, *and* an efi dir in /boot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64                            initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img  System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64                      initramfs-3.10.0-327.18.2.el7.x86_64.img         System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64                        initramfs-3.10.0-514.26.2.el7.x86_64.img         System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64                        initramfs-3.10.0-693.2.2.el7.x86_64.img          System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64                         initramfs-3.10.0-693.2.2.el7.x86_64kdump.img     System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi                                                      initrd-plymouth.img                              vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
grub                                                     lost+found                                       vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
grub2                                                    symvers-3.10.0-1127.el7.x86_64.gz                vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img  symvers-3.10.0-1160.119.1.el7.x86_64.gz          vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img                     symvers-3.10.0-327.18.2.el7.x86_64.gz            vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img                symvers-3.10.0-514.26.2.el7.x86_64.gz            vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img               symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m thinking we should actually just tell hetzner to do a hot swap while the system is on, so we can do this &amp;quot;easy install&amp;quot; of grub without risking the system not coming-up after they removed the drive&lt;br /&gt;
# oh, the efi dir is empty, so I&#039;m thinking we&#039;re using grub2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# find efi&lt;br /&gt;
efi&lt;br /&gt;
efi/EFI&lt;br /&gt;
efi/EFI/centos&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, the grub dir just has one file in it?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub&lt;br /&gt;
total 10K&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Apr 11  2016 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
-rw-r--r--  1 root root 1.4K Nov 15  2011 splash.xpm.gz&lt;br /&gt;
[root@opensourceecology boot]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# grub2 looks most sane&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub2&lt;br /&gt;
total 52K&lt;br /&gt;
drwx------. 5 root root 1.0K Jul 26  2024 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Dec 15  2015 fonts&lt;br /&gt;
-rw-r--r--  1 root root 7.8K Jul 26  2024 grub.cfg&lt;br /&gt;
-rw-r--r--  1 root root 5.3K Jun  1  2016 grub.cfg.1499616907.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 6.1K Jul  9  2017 grub.cfg.1506097734.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 7.0K Sep 22  2017 grub.cfg.1588589453.rpmsave&lt;br /&gt;
-rw-r--r--. 1 root root 1.0K Jul 26  2024 grubenv&lt;br /&gt;
drwxr-xr-x. 2 root root 9.0K May 31  2016 i386-pc&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K May 31  2016 locale&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it looks like it&#039;s referencing the raid, not the drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### BEGIN /etc/grub.d/10_linux ###&lt;br /&gt;
menuentry &#039;CentOS Linux (3.10.0-1160.119.1.el7.x86_64) 7 (Core)&#039; --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option &#039;gnulinux-3.10.0-327.13.1.el7.x86_64-advanced-af18bd25-f715-4003-b055-170a07591c60&#039; {&lt;br /&gt;
		load_video&lt;br /&gt;
		set gfxpayload=keep&lt;br /&gt;
		insmod gzio&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod diskfilter&lt;br /&gt;
		insmod mdraid1x&lt;br /&gt;
		insmod ext2&lt;br /&gt;
		set root=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;&lt;br /&gt;
		if [ x$feature_platform_search_hint = xy ]; then&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root --hint=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;  9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		else&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root 9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		fi&lt;br /&gt;
		linux16 /vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
		initrd16 /initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# right, so if I understand this correctly: we&#039;re not updating grub. We&#039;re using &#039;grub-install&#039; to copy our grub config *to* the drive. that&#039;s easier and less concerning than I thought.&lt;br /&gt;
# well, since I can&#039;t see any good reason to pick one drive or the other to replace first, I&#039;m going to have them replace /dev/sdb first. Just because &#039;sda&#039; seems like it would be primary. I know it&#039;s probably not, but, anyway..&lt;br /&gt;
# that means we&#039;ll replace Crucial_CT250MX200SSD1_154410FA4520 first; I created another wiki entry for that https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
# Marcin sent me an email confirming that he&#039;s able to restart hetzner2 with `sudo reboot`. I asked him to use this in the future if he needs to reboot it again.&lt;br /&gt;
# the disk is getting pretty full, but I&#039;m going to leave these files in /var/tmp/ for at least a few days, to make sure we don&#039;t actually need to restore from a backup again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  150G   38G  80% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/lib/mysql.20250418 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 17, 2025=&lt;br /&gt;
# Marcin sent me an email last night (and again this morning) asking why the wiki is down&lt;br /&gt;
# I hadn&#039;t touched ose infra since 6 days ago&lt;br /&gt;
# the wiki is still on hetzner2, which is on EOL Cent, so I&#039;m not terribly surprised it&#039;s falling apart.&lt;br /&gt;
# I first warned Marcin about this many years ago, and hopefully the migration to hetzner3 will be finished before the end of this year&lt;br /&gt;
# anyway, let&#039;s check what happened to the wiki on hetzner2&lt;br /&gt;
# it&#039;s a 500 error complaining about the db&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ curl -iL wiki.opensourceecology.org&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:52 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://wiki.opensourceecology.org/&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 500 Internal Server Error&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:54 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 976&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Varnish: 434054&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish-v4&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h1&amp;gt;Sorry! This site is experiencing technical difficulties.&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Try waiting a few minutes and reloading.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;small&amp;gt;(Cannot access the database)&amp;lt;/small&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;hr /&amp;gt;&amp;lt;div style=&amp;quot;margin: 1.5em&amp;quot;&amp;gt;You can try searching via Google in the meantime.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;small&amp;gt;Note that their indexes of our content may be out of date.&amp;lt;/small&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;form method=&amp;quot;get&amp;quot; action=&amp;quot;//www.google.com/search&amp;quot; id=&amp;quot;googlesearch&amp;quot;&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;domains&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;num&amp;quot; value=&amp;quot;50&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;ie&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;oe&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;q&amp;quot; size=&amp;quot;31&amp;quot; maxlength=&amp;quot;255&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;submit&amp;quot; name=&amp;quot;btnG&amp;quot; value=&amp;quot;Search&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;p&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; checked=&amp;quot;checked&amp;quot; /&amp;gt;Open Source Ecology&amp;lt;/label&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;WWW&amp;lt;/label&amp;gt;&lt;br /&gt;
	&amp;lt;/p&amp;gt;&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# disk is fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G   96G   92G  52% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s no new logs in the apache error log when I hit the site in real-time (bypassing the cache)&lt;br /&gt;
# there&#039;s also no new logs in the mariadb error log when I hit the site in real-time&lt;br /&gt;
# well, the db isn&#039;t running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Thu 2025-04-17 17:39:24 UTC; 2h 42min ago&lt;br /&gt;
  Process: 1227 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 1226 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 1103 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 1226 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# error logs aren&#039;t very helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# journalctl -fu mariadb&lt;br /&gt;
-- Logs begin at Thu 2025-04-17 17:38:59 UTC. --&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I try to restart it manually, nothing gets put in the journal logs, but there&#039;s a bunch to the actual log file that the journal log mentions (damn systemd)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the log that pops-up when we try a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 20:24:31 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 20:24:31 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 10583 ...&lt;br /&gt;
250417 20:24:31 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 20:24:31 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 20:24:31 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 20:24:31 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 20:24:31 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 20:24:31 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 20:24:31 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 20:24:31  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 20:24:31  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 20:24:31  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 20:24:31  InnoDB: Assertion failure in thread 140093400303360 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250417 20:24:31 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x563a1c105cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x563a1bd19975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f6a294c9630]&lt;br /&gt;
:0(__GI_raise)[0x7f6a27bf0387]&lt;br /&gt;
:0(__GI_abort)[0x7f6a27bf1a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x563a1beae45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638f69)[0x563a1beaef69]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x563a1bfb1504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x563a1bfa6487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x563a1beb117d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x563a1bea50f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6a294c1ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f6a27cb8b0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250417 20:24:31 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# google points to this https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
## they say it could be a bug that might be fixed in v5.7. We&#039;re using 5.5.68. hetzner3 uses 5.8.&lt;br /&gt;
# reddit says we&#039;re fucked and should restore from backup https://old.reddit.com/r/mysql/comments/d3nkc7/innodb_assertion_failure_in_thread_4560_in_file/&lt;br /&gt;
# before reading any more, I&#039;m going to immediately make a local copy of our most-recent backups&lt;br /&gt;
# looks like we have a backup from 13 hours ago and one from 27 hours ago&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ date&lt;br /&gt;
Thu Apr 17 20:36:56 UTC 2025&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync&lt;br /&gt;
total 21G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 17 07:49 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    21G Apr 17 07:48 daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync.old/&lt;br /&gt;
total 22G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 16 07:52 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    22G Apr 16 07:52 daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this SE answer is helpful https://serverfault.com/questions/592793/mysql-crashed-and-wont-start-up&lt;br /&gt;
## it says we can force the db to start (in &amp;quot;recovery mode&amp;quot;) and then try to figure out which table is corrupted. Then we might be able to backup more-recent data from the not-corrupt tables and only recover the fucked table&lt;br /&gt;
## other warnings suggest solving the underlying issue: why did the data become corrupt?&lt;br /&gt;
## well, we know Marcin has been hard-resetting the server (via the hetzner wui) about every week because it keeps breaking since some months ago (it&#039;s EOL and not worth debugging)&lt;br /&gt;
## but it&#039;s also possible we have a worse issue, like a disk failing. We do have RAID1 tho, so idk. Still, it would be wise to check the SMART data and RAID logs and filesystem for corruption&lt;br /&gt;
# I sent a quick status update to Marcin so he knows the severity of the issue and that this isn&#039;t going to be fixed soon&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Your database is corrupt and won&#039;t start.&lt;br /&gt;
&lt;br /&gt;
Quick internet search for the error messages suggests this could be a bug that&#039;s been fixed in mariadb 5.7. You&#039;re using 5.6 and can&#039;t upgrade because your OS is EOL. hetnzer3 is running 5.8.&lt;br /&gt;
&lt;br /&gt;
 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
I&#039;m looking into seeing what is corrupt, what isn&#039;t corrupt, and if we can restore from backup.&lt;br /&gt;
&lt;br /&gt;
This is not going to be an easy or fast fix, sorry. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the backups of the backups finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /home/b2user/sync*/* /var/tmp/&lt;br /&gt;
sending incremental file list&lt;br /&gt;
daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
 22,975,631,986 100%  139.63MB/s    0:02:36 (xfr#1, to-chk=1/2)&lt;br /&gt;
daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
 21,566,407,634 100%  103.43MB/s    0:03:18 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 44,552,914,338 bytes  received 54 bytes  125,324,653.70 bytes/sec&lt;br /&gt;
total size is 44,542,039,620  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  138G   50G  74% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also going to take down the webservers, so that they can&#039;t fuck-up the database worse, if we do start it in some recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I should also make a backup of /var/lib/mysql&lt;br /&gt;
# I&#039;m going to create a dif for all of this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mkdir /var/tmp/dbFail.20250417&lt;br /&gt;
[root@opensourceecology ~]# chown root:root /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041* /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# vim /var/tmp/dbFail.20250417/info.txt&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /var/tmp/dbFail.20250417/info.txt &lt;br /&gt;
2025-04-17: Marcin emailed me last night saying the wiki was down with a db error. Today I tried to start it, but it refues to come-up. Looks like it&#039;s preventing itself from starting because it realizes something is corrupt and starting it would make things worse. Internet says maybe this was fixed in a newer version; we can&#039;t upgrade because Cent is EOL. Hetzner3 has the newer version&lt;br /&gt;
&lt;br /&gt;
		 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
		Anyway, I&#039;m creating this folder to store some backups before we make things worse.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# aaaand I added a copy of /var/lib/mysql/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /var/lib/mysql /var/tmp/dbFail.20250417/var-lib-mysql.$(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
sending incremental file list&lt;br /&gt;
created directory /var/tmp/dbFail.20250417/var-lib-mysql.20250417&lt;br /&gt;
mysql/&lt;br /&gt;
mysql/aria_log.00000001&lt;br /&gt;
		 16,384 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=707/709)&lt;br /&gt;
...&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rate_locations.frm&lt;br /&gt;
		  8,714 100%    9.26kB/s    0:00:00 (xfr#689, to-chk=1/709)&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rates.frm&lt;br /&gt;
		 13,128 100%   13.95kB/s    0:00:00 (xfr#690, to-chk=0/709)&lt;br /&gt;
&lt;br /&gt;
sent 7,384,914,964 bytes  received 13,343 bytes  114,495,012.51 bytes/sec&lt;br /&gt;
total size is 7,383,062,830  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# another important note: apparently we can keep increasing the value of innodb_force_recovery until it starts, but anything &amp;gt;3 could corrupt the data worse https://dba.stackexchange.com/q/241714&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Marko, MariaDB Innodb lead: MDEV-15370 was a bug when ugprading to 10.3, caused by MDEV-12288. Actually upgrades can still fail (MDEV-15912) if a slow shutdown of the old server was not made. Because the scenario does not involve upgrading to 10.3 or later, I am afraid that the user witnessed some kind of undo log corruption. Starting up with innodb_force_recovery=3 might allow dumping all data. If that crashes, then try innodb_force_recovery=5, but be aware that anything &amp;gt;3 may corrupt the database further, and therefore you should not use the database for anything else than mysqldump&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, a lot of the links for how to fix this are now dead&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## https://forums.mysql.com/read.php?22,603093,604631#msg-604631&lt;br /&gt;
## https://support.plesk.com/hc/en-us/articles/12377798484375-Plesk-is-not-accessible-ERROR-Zend-Db-Adapter-Exception-SQLSTATE-HY000-2002-No-such-file-or-directory&lt;br /&gt;
# we&#039;re running 5.6, so it should be this https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html&lt;br /&gt;
## but note that redirects to 8.6 for some reason? https://dev.mysql.com/doc/refman/8.4/en/forcing-innodb-recovery.html&lt;br /&gt;
## ah, so does 1.1 – apparently anything it doesn&#039;t like just reidrects to the latest version https://dev.mysql.com/doc/refman/1.1/en/forcing-innodb-recovery.html&lt;br /&gt;
# this suggests that, if we&#039;re going to use innodb_force_recovery 4 or greater, we only do it on another machine. So basically take the data I just backed-up put it on a separate machine, and do the fucker *there* instead https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## it also says that dumps of 4 or greater could still render corrupt data, so they shouldn&#039;t be trusted, anyway&lt;br /&gt;
## good news: it says the db blocks all INSERT, UPDATE, and DELETE commands when any recovery mode is enabled&lt;br /&gt;
### but we *can* run DROP. so the idea is to dump everything in recovery mode and drop what is corrupt. then restart with the recovery value set to 0 and restore.&lt;br /&gt;
## it says that dumps from recover mode of 1 or 2 or 3 are safe, and only the page is corrupt&lt;br /&gt;
### here&#039;s the definition of a page https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A unit representing how much data InnoDB transfers at any one time between disk (the data files) and memory (the buffer pool). A page can contain one or more rows, depending on how much data is in each row. If a row does not fit entirely into a single page, InnoDB sets up additional pointer-style data structures so that the information about the row can be stored in one page.&lt;br /&gt;
&lt;br /&gt;
One way to fit more data in each page is to use compressed row format. For tables that use BLOBs or large text fields, compact row format allows those large columns to be stored separately from the rest of the row, reducing I/O overhead and memory usage for queries that do not reference those columns.&lt;br /&gt;
&lt;br /&gt;
When InnoDB reads or writes sets of pages as a batch to increase I/O throughput, it reads or writes an extent at a time.&lt;br /&gt;
&lt;br /&gt;
All the InnoDB disk data structures within a MySQL instance share the same page size.&lt;br /&gt;
&lt;br /&gt;
See Also buffer pool, compact row format, compressed row format, data files, extent, page size, row.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess that just means data that hasn&#039;t been written to disk yet. So I *think* it should be OK to trust data that only has corrupt pages?&lt;br /&gt;
# ok, I think I have enough to proceed – at least for recovery modes 1, 2, and 3.&lt;br /&gt;
# but first let&#039;s check SMART&lt;br /&gt;
# oh, fuck, my notes on this are on the wiki. Of course.&lt;br /&gt;
# arch wiki to the rescue https://wiki.archlinux.org/title/S.M.A.R.T.&lt;br /&gt;
# fail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
-bash: smartctl: command not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# luckily the yum servers for this EOL OS are still online, and I could install it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# yum install smartmontools&lt;br /&gt;
...&lt;br /&gt;
Total download size: 546 k&lt;br /&gt;
Installed size: 2.0 M&lt;br /&gt;
Is this ok [y/d/N]: y&lt;br /&gt;
Downloading packages:&lt;br /&gt;
smartmontools-7.0-2.el7.x86_64.rpm                                                                                                              | 546 kB  00:00:00     &lt;br /&gt;
Running transaction check&lt;br /&gt;
Running transaction test&lt;br /&gt;
Transaction test succeeded&lt;br /&gt;
Running transaction&lt;br /&gt;
  Installing : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
  Verifying  : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
&lt;br /&gt;
Installed:&lt;br /&gt;
  smartmontools.x86_64 1:7.0-2.el7                                                                                                                                     &lt;br /&gt;
&lt;br /&gt;
Complete!&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# better&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
SMART support is: Available - device has SMART capability.&lt;br /&gt;
SMART support is: Enabled&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well this is terrifying; it says both our disks are gonna fail within 24 hours&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# compare that to hetnzer3, which says all is good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m not 100% convinced that this is true. I still want to initiate a test on the drives, but I&#039;m going to go ahead and pass this to hetzner support asap and ask them if there&#039;s a fee for them to replace our drives.&lt;br /&gt;
# oh, interesting. they have a walkthrough that says it&#039;s free via Server -&amp;gt; Technical -&amp;gt; Disk Failure https://robot.hetzner.com/support/index&lt;br /&gt;
## well, it lists two options&lt;br /&gt;
### Free Replacement drive nearly new or used and tested; depends on what is in stock. &lt;br /&gt;
### At cost Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
## we were given an option if we should hot swap while the system is on or shutdown. I&#039;m going to say shutdown. That&#039;ll be simpler from the OS side, I think&lt;br /&gt;
## dang, it says they&#039;ll swap the drive within 2-4 hours.&lt;br /&gt;
# I&#039;ve never done this before, but it&#039;s a hardware raid. My understanding is that as soon as it comes-up, it&#039;ll begin copying the data from one disk to the other disk. But, christ, if both disks are fucked then which disk should I choose them to replace? Can I see which one is more fucked than the other?&lt;br /&gt;
# hetzner provides 4 docs for assistance on this&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#information-on-defective-drives&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/maintainance/nvme/#show-serial-number-of-a-specific-nvme-ssd&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#creating-a-complete-smart-log&lt;br /&gt;
# that first doc says to run the command we just ran&lt;br /&gt;
# hmm..it says for more info we should look at the &amp;quot;Failed Attributes&amp;quot; – but we have none for either disk&lt;br /&gt;
# ok, the docs say we can get more info with -A&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so both say &amp;quot;Percent_Lifetime_Remain&amp;quot; is an issue. does that mean it&#039;s not *actually* writing corrupt data, but it&#039;s literally just a timer that hit and said &amp;quot;yeah you should probably replace the disk??&amp;quot;&lt;br /&gt;
# well, &amp;quot;Percent_Lifetime_Remain&amp;quot; doesn&#039;t appear in the docs table. nor in the source wikipedia table https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes&lt;br /&gt;
# yeah, reddit suggests that means the drive &amp;quot;should be replaced soon&amp;quot; but not that it&#039;s actually detected as failing now https://www.reddit.com/r/homelab/comments/kaaqma/percent_lifetime_remain_failing_now/&lt;br /&gt;
# in that case, I guess it doesn&#039;t matter which disk we replace. But let&#039;s go ahead and get one replaced. I don&#039;t think this was the cause of the db corruption (I still think it&#039;s &amp;quot;shutting down the computer abruptly + a bug in old mariadb that prevents it from recovering&amp;quot;), but I would be stupid not to take a free replacement of a RAID1-mirrored disk that&#039;s alerting us that it&#039;s too old to be in prod.&lt;br /&gt;
# the second hetnzer docs refer to nvme. that&#039;s relevant on hetzner3 but not hetzner2. anyway, I do want to know how to check this on hetzer2 (even if I can&#039;t update the wiki right now with this docs)&lt;br /&gt;
# wow, the output for smartctl looks very different for NVMEs on Debian than it does on CentOS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        39 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    6%&lt;br /&gt;
Data Units Read:                    152.358.379 [78,0 TB]&lt;br /&gt;
Data Units Written:                 52.125.092 [26,6 TB]&lt;br /&gt;
Host Read Commands:                 6.873.372.480&lt;br /&gt;
Host Write Commands:                1.362.559.127&lt;br /&gt;
Controller Busy Time:               22.226&lt;br /&gt;
Power Cycles:                       28&lt;br /&gt;
Power On Hours:                     17.245&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      159&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               39 Celsius&lt;br /&gt;
Temperature Sensor 2:               48 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        40 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    7%&lt;br /&gt;
Data Units Read:                    140.811.605 [72,0 TB]&lt;br /&gt;
Data Units Written:                 56.604.901 [28,9 TB]&lt;br /&gt;
Host Read Commands:                 1.304.073.899&lt;br /&gt;
Host Write Commands:                1.364.668.115&lt;br /&gt;
Controller Busy Time:               21.180&lt;br /&gt;
Power Cycles:                       23&lt;br /&gt;
Power On Hours:                     15.565&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      149&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               40 Celsius&lt;br /&gt;
Temperature Sensor 2:               45 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that shows we&#039;re at 6% and 7% usage on hetzner3, whereas I guess we&#039;re at 100% on hetzner2&lt;br /&gt;
# the third hetzner doc refers to a software raid. actually, I thought we were using a hardware raid, but now I&#039;m not sure&lt;br /&gt;
# this indicates that our raid is fine. two UUs (eg `[UU]`) is fine. Bad would be a U and a missing U (eg `[U_]`)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat &lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sdb2[1] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[1] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah crap, the process to bring the new drive back into the RAID is not-trivial https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## first we have to format the new drive exactly as the old drive, then add each partition into the RAID array, then update grub. And, of course, meanwhile we&#039;ll be running on one disk. So if we fuck-up any of those steps, we loose everything. This could take me a few days (or weeks), and meanwhile the sites are all offline and our daily backups on backblaze are being deleted/rotated out of existance. Sadly, I think I&#039;m going to postpone this until after we get the sites back-up.&lt;br /&gt;
# the last hetzner doc shows us how to get the serial number of our disks (which hetzner will ask-for when we tell them to swap it)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and ran a SMART test; it says it&#039;ll take just 2 minutes to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:07:55 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:08:18 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also kicked-off a long test, which I can check tomorrow&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:12 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:14 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, then we have the filesystem. it looks like /var/lib/msyql/ lives on &#039;/&#039; which is /dev/md2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h /var/lib/mysql&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
/dev/md2        197G  145G   43G  78% /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/md2&lt;br /&gt;
&lt;br /&gt;
Disk /dev/md2: 215.0 GB, 215024271360 bytes, 419969280 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk /dev/md2&lt;br /&gt;
NAME MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
md2    9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it won&#039;t let me check the filesystem while it&#039;s mounted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fsck /dev/md2&lt;br /&gt;
fsck from util-linux 2.23.2&lt;br /&gt;
e2fsck 1.42.9 (28-Dec-2013)&lt;br /&gt;
/dev/md2 is mounted.&lt;br /&gt;
e2fsck: Cannot continue, aborting.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it probably should be happening on-boot, but I couldn&#039;t find it in dmesg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i check&lt;br /&gt;
[    0.000000] Early table checksum verification disabled&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i fsck&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, instead we can just use tune2fs to get the info on the last check that was run&lt;br /&gt;
# looks like it ran today; probably when Marcin rebooted it https://unix.stackexchange.com/questions/400851/what-should-i-do-to-force-the-root-filesystem-check-and-optionally-a-fix-at-bo&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2&lt;br /&gt;
tune2fs 1.42.9 (28-Dec-2013)&lt;br /&gt;
Filesystem volume name:   &amp;lt;none&amp;gt;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Filesystem UUID:          af18bd25-f715-4003-b055-170a07591c60&lt;br /&gt;
Filesystem magic number:  0xEF53&lt;br /&gt;
Filesystem revision #:    1 (dynamic)&lt;br /&gt;
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize&lt;br /&gt;
Filesystem flags:         signed_directory_hash&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Filesystem OS type:       Linux&lt;br /&gt;
Inode count:              13131776&lt;br /&gt;
Block count:              52496160&lt;br /&gt;
Reserved block count:     2624808&lt;br /&gt;
Free blocks:              26575102&lt;br /&gt;
Free inodes:              12417672&lt;br /&gt;
First block:              0&lt;br /&gt;
Block size:               4096&lt;br /&gt;
Fragment size:            4096&lt;br /&gt;
Reserved GDT blocks:      1011&lt;br /&gt;
Blocks per group:         32768&lt;br /&gt;
Fragments per group:      32768&lt;br /&gt;
Inodes per group:         8192&lt;br /&gt;
Inode blocks per group:   512&lt;br /&gt;
Flex block group size:    16&lt;br /&gt;
Filesystem created:       Tue May 31 06:01:12 2016&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Last write time:          Thu Apr 17 17:39:00 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
Check interval:           0 (&amp;lt;none&amp;gt;)&lt;br /&gt;
Lifetime writes:          124 TB&lt;br /&gt;
Reserved blocks uid:      0 (user root)&lt;br /&gt;
Reserved blocks gid:      0 (group root)&lt;br /&gt;
First inode:              11&lt;br /&gt;
Inode size:               256&lt;br /&gt;
Required extra isize:     28&lt;br /&gt;
Desired extra isize:      28&lt;br /&gt;
Journal inode:            8&lt;br /&gt;
Default directory hash:   half_md4&lt;br /&gt;
Directory Hash Seed:      b9456d9f-1608-4444-99c2-02e6f327e42d&lt;br /&gt;
Journal backup:           inode blocks&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# both of the filesystems (/ and /boot) look fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md1 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /boot&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              46&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Tue May 31 06:01:07 2016&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, so far I couldn&#039;t find any signs of corruption on the disk/fs level&lt;br /&gt;
# back to the db, I set the recovery option in the my.cnf file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# cp my.cnf my.cnf.20250417&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 1&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it didn&#039;t come-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried changing it to restore level 2; this time it got stuck &amp;quot;waiting for the background threads&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 22:32:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 22:32:49 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 14901 ...&lt;br /&gt;
250417 22:32:49 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 22:32:49 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 22:32:49 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 22:32:49 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 22:32:49 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 22:32:49 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 22:32:49 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 22:32:49  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 22:32:49  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 22:32:49  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:50  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:51  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:52  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:53  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:54  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:55  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:56  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:57  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:58  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it seems infinite. I don&#039;t know if it&#039;s going to time-out, but I&#039;m just going to leave it and come-back tomorrow.&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 11, 2025=&lt;br /&gt;
&lt;br /&gt;
# let&#039;s get Catarina that broken staging site for osemain on hetzner3&lt;br /&gt;
# Marcin still hasn&#039;t regained access to his ssh key (so he can update the ose keepass), but he did finally send me the password to our hetzner account&lt;br /&gt;
# so now I can order a second IPv4 address, as needed for obi &amp;amp; osemain to have two distinct sites on hetzner3&lt;br /&gt;
# I logged-into hetzner https://robot.hetzner.com/server&lt;br /&gt;
# I also typed a &amp;quot;name&amp;quot; into the blank &amp;quot;name&amp;quot; fields for our two servers. one is now called &amp;quot;hetzner2&amp;quot; and the new one &amp;quot;hetzner3&amp;quot;&lt;br /&gt;
# I clicked on the server for &amp;quot;hetzner3&amp;quot; and the tab &amp;quot;IPs&amp;quot;.&lt;br /&gt;
## Then I clicked on &amp;quot;Order additional IPs / Nets&amp;quot;&lt;br /&gt;
## I selected &amp;quot;One additional IP with costs (€ 1.70 max. per month / € 0.0027 per hour + € 4.90 once-off setup)&amp;quot;&lt;br /&gt;
## it required me to enter a reason (IPv4 is scarce) to which I wrote:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
we need to run two websites with the same domain name that are already running on our primary IPv4 address, and a client doesn&#039;t have IPv6 working at their office&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## and I clicked &amp;quot;Apply for IP/subnet in obligation&amp;quot;&lt;br /&gt;
## I got a message; looks like it needs human approval&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your request for additional IPs/subnets was successfully sent. We will send you an email as soon as your IP/subnet is ready.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I typed an email to Marcin and Catarina to notify them of this order&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
As authorized on our last call, I ordered an additional IPv4 address for your hetzner account.&lt;br /&gt;
&lt;br /&gt;
IPv4 addresses are scarce, and it appears that they need to approve it manually.&lt;br /&gt;
&lt;br /&gt;
The cost is €1.70 per month + € 4.90 once-off setup.&lt;br /&gt;
&lt;br /&gt;
This will allow us to run more than one website with the same domain off the same server. That will be needed for osemain and obi.&lt;br /&gt;
&lt;br /&gt;
Once you finish rebuilding those websites on hetzner3 to use a new not-broken theme, we can cancel this second IP address.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I finished typing ^ that email, I got an email from hetzner indicating that we have a new IP&lt;br /&gt;
# I refreshed the hetzner wui, and now I see the new IP&lt;br /&gt;
# ...&lt;br /&gt;
# following-up on the bus factor, I added Catarina &amp;amp; Tom&#039;s ssh keys to their authorized_keys files on hetzner3&lt;br /&gt;
## I sent them both emails asking them to confirm access&lt;br /&gt;
# I also emailed Marcin asking if he installed zulucrypt yet to try to recover his old ssh key&lt;br /&gt;
# update: within a few hours, Marcin had successfully decrypted and mounted his old veracrypt volume using zuluCrypt&lt;br /&gt;
# he created this article on the wiki https://wiki.opensourceecology.org/wiki/Zulucrypt&lt;br /&gt;
# I found that he had previously documented scattered articles about backups, luks, veracrypt, pgp, cybersec general, etc in a ton of different articles. So I spent some time adding categories and &amp;quot;see also&amp;quot; sections to those articles, in hopes he will be more easily able to do this in the future&lt;br /&gt;
# I also asked him to please document what he needed for himself 5 years from now into a README file next to the &#039;ose-veracrypt&#039; volume on his usb drive.&lt;br /&gt;
# Marcin confirmed that he was able to restore his ssh keys and ssh into hetzner3. awesome.&lt;br /&gt;
# ...&lt;br /&gt;
# I logged all my hours and sent an invoice to OSE for last month (Mar 2025)&lt;br /&gt;
# gah, I had obliterated half my 2025Q1 log. when I tried to restore it, I got a 413 error lgo&lt;br /&gt;
# I checked php and nginx; it&#039;s 10M. How did I write &amp;gt;10 MB of text in one quarter?&lt;br /&gt;
# there&#039;s too many layers on this server; I checked the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Apr 11 22:18:20.306872 2025] [:error] [pid 13182] [client 127.0.0.1:56606] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;], referer: https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&lt;br /&gt;
HTTP/1.1 413 Request Entity Too Large&lt;br /&gt;
Message: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413)&lt;br /&gt;
Apache-Error: [file &amp;quot;apache2_util.c&amp;quot;] [line 271] [level 3] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;]&lt;br /&gt;
127.0.0.1 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.0&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot;&lt;br /&gt;
146.70.199.124 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.1&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it&#039;s modsecurity?&lt;br /&gt;
# gah, that&#039;s a lot of files to review&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# find .  |grep -i security&lt;br /&gt;
./conf.d/mod_security.wordpress.include&lt;br /&gt;
./conf.d/mod_security.conf&lt;br /&gt;
./conf.modules.d/10-mod_security.conf&lt;br /&gt;
./modsecurity.d&lt;br /&gt;
./modsecurity.d/activated_rules&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_42_tight_security.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_35_bad_robots.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_45_trojans.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_48_local_exceptions.conf.example&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_bad_robots.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_23_request_limits.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_49_inbound_blocking.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_60_correlation.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound_malware.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_scanners.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_40_generic_attacks.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_50_outbound.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_47_common_exceptions.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_30_http_policy.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_20_protocol_violations.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_xss_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_59_outbound_blocking.conf&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf.20181024.orig&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf&lt;br /&gt;
./modsecurity.d/do_not_log_passwords.conf&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it&#039;s SecRequestBodyLimit http://stackoverflow.com/questions/13887812/ddg#14690797&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -irl &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf&lt;br /&gt;
modules/mod_security2.so&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s 13107200&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimit 13107200&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimitAction Reject&lt;br /&gt;
Binary file modules/mod_security2.so matches&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# docs say it&#039;s in bytes https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecRequestBodyLimit&lt;br /&gt;
# so 13107200 / 1024 / 1024 = 12.5 MB.&lt;br /&gt;
# jesus that&#039;s a lot of data; I&#039;m not gonna increase that in 4 places (nginx, apache, mod_security, php); let&#039;s just split it into two articles :(&lt;br /&gt;
# ...&lt;br /&gt;
# so Marcin is stressing urgancy to get Catarina a sandbox so she can rebuild osemain using some new theme that&#039;s not broken on the latest version of wordpress, php, etc on hetzner3&lt;br /&gt;
# I didn&#039;t want to do this site before the other less-priority ones, but it&#039;s just a sandbox&lt;br /&gt;
# I realized I never made a CHG file for osemain&lt;br /&gt;
# looks like I first did a snapshot Jan 31https://wiki.opensourceecology.org/wiki/Maltfield_Log/2025_Q1#Fri_Jan_31.2C_2025&lt;br /&gt;
# ugh, I just said I was &amp;quot;following the same guide as with the other sites&amp;quot;&lt;br /&gt;
## I was hoping to know which one to CHG to copy-from&lt;br /&gt;
## I guess it makes the most sense to copy from obi, which already has both a static and dynamic site setup (untested)&lt;br /&gt;
# ok, I made a first draft of our osemain CHG to migrate to hetnzer3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
# oh, crap, I&#039;m going to remove&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306063</id>
		<title>Maltfield Log/2025 Q2</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306063"/>
		<updated>2025-04-27T21:48:36Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: Apr 20&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the second quarter of the year 2025. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 20, 2025=&lt;br /&gt;
# Marcin replied to my email authorizing the replacement of the /dev/sdb disk on hetzner2 at 2025-04-24 10:00 UTC https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
## I updated the article with the defined date &amp;amp; time&lt;br /&gt;
# ...&lt;br /&gt;
# I also checked hetzner3. I see that I setup email alerts for the RAID, but not for SMART.&lt;br /&gt;
## on hetzner2, we had no errors of the RAID, but we did have SMART errors. I guess eventually if it failed enough that RAID replication was breaking, we would have gotten alerts. But it would be good if we could get alerts *before* that happened..&lt;br /&gt;
# I checked munin on hetzner2 to see what data it collects for monitoring disks @ /disk-day.html&lt;br /&gt;
## looks like we have latency, throughput, usage, utilization, i/o, and inode usage. There&#039;s nothing about &amp;quot;SMART errors&amp;quot;&lt;br /&gt;
# looks like there *is* a smart module for munin https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
# it&#039;s already there on hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Mar 21  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 smart_&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# hetzner2 has it too &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology munin]# ls -lah /usr/share/munin/plugins | grep -i smart&lt;br /&gt;
-rwxr-xr-x 1 root root  11K Nov  6  2023 hddtemp_smartctl&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Nov  6  2023 smart_&lt;br /&gt;
[root@opensourceecology munin]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# crap, I just checked hetzner3&#039;s munin, and I realized that varnish is missing :(&lt;br /&gt;
# it looks like ansible *has* pushed-out the script and plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /usr/share/munin/plugins/ | grep -i varnish&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar 21  2023 varnish_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Feb 12 00:14 varnish5_&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 28  2024 varnish5_.175431.2025-02-12@00:16:02~&lt;br /&gt;
-rwxr-xr-x 1 root root  28K Sep 25  2024 varnish5_.20240928.orig&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah /etc/munin/plugins/ | grep -i varnish&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_backend_traffic -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_bad -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_expunge -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_hit_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_main_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_memory_usage -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_mgt_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_objects -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_request_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_threads -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25  2024 varnish_transfer_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Feb 12 00:16 varnish_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
 # I did a diff of the varnish5_ script from my server and ose&#039;s server, and I found 2 new lines at the top of the hetzner3 server&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@mail:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
#&lt;br /&gt;
# This program is free software; you can redistribute it and/or modify&lt;br /&gt;
maltfield@mail:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ head /usr/share/munin/plugins/varnish5_&lt;br /&gt;
# Ansible managed&lt;br /&gt;
&lt;br /&gt;
#!/usr/bin/perl&lt;br /&gt;
# -*- perl -*-&lt;br /&gt;
#&lt;br /&gt;
# varnish5_ - Munin plugin to for Varnish 5.x and 6.x&lt;br /&gt;
# Copyright (C) 2009,2018  Redpill Linpro AS&lt;br /&gt;
#&lt;br /&gt;
# Author: Kristian Lyngstøl &amp;lt;kristian@bohemians.org&amp;gt;&lt;br /&gt;
#         Pål-Eivind Johnsen &amp;lt;pej@redpill-linpro.com&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so basically the issue appears to be that my &amp;quot;ansible managed&amp;quot; comment comes before the shebang, so varnish is interpreting everything as shell, instead of perl&lt;br /&gt;
# we can see the result of all these syntax errors with a test run too&lt;br /&gt;
## my server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@mail:/etc/munin# munin-run varnish_hit_rate&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
client_req.value 704255&lt;br /&gt;
cache_miss.value 202581&lt;br /&gt;
cache_hitmiss.value 2181&lt;br /&gt;
cache_hit.value 499493&lt;br /&gt;
root@mail:/etc/munin#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## ose&#039;s hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 26: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 28: varnish5_: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 30: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 32: Varnish: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 34: =head1: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 36: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 38: The: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 39: [varnish5_*]: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 40: group: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 41: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 42: env.name: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 44: env.varnishstat: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 108: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 111: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 114: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 117: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 119: my: not found&lt;br /&gt;
/etc/munin/plugins/varnish_hit_rate: 123: Syntax error: &amp;quot;(&amp;quot; unexpected&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I moved the &amp;quot;ansible managed&amp;quot; comment below the shebang in ansible, and pushed it out; now it works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run varnish_hit_rate&lt;br /&gt;
client_req.value 10714&lt;br /&gt;
cache_hitmiss.value 9&lt;br /&gt;
cache_hit.value 6478&lt;br /&gt;
cache_hitpass.value 0&lt;br /&gt;
cache_miss.value 4227&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also pushed-out smart at the same time, but it&#039;s not working&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_ suggest&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the docs page for the smart_ munin plugin says that we need this section at-minimum in the munin config file, so I added it to hetzner2 https://gallery.munin-monitoring.org/plugins/munin/smart_/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# tail -n4 zzz-ose &lt;br /&gt;
&lt;br /&gt;
[smart_*]&lt;br /&gt;
user root&lt;br /&gt;
group disk&lt;br /&gt;
[root@opensourceecology plugin-conf.d]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I manually created the symlinks for sda &amp;amp; sdb&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /etc/munin/plugins&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sda&lt;br /&gt;
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_sdb&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# sweet, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# munin-run smart_sdb&lt;br /&gt;
Program_Fail_Count.value 100&lt;br /&gt;
Reallocated_Event_Count.value 100&lt;br /&gt;
Ave_Block_Erase_Count.value 001&lt;br /&gt;
Reallocate_NAND_Blk_Cnt.value 100&lt;br /&gt;
Erase_Fail_Count.value 100&lt;br /&gt;
Reported_Uncorrect.value 100&lt;br /&gt;
SATA_Interfac_Downshift.value 100&lt;br /&gt;
Offline_Uncorrectable.value 100&lt;br /&gt;
smartctl_exit_status.value 8&lt;br /&gt;
Write_Error_Rate.value 100&lt;br /&gt;
FTL_Program_Page_Count.value 100&lt;br /&gt;
Current_Pending_Sector.value 100&lt;br /&gt;
Success_RAIN_Recov_Cnt.value 100&lt;br /&gt;
UDMA_CRC_Error_Count.value 100&lt;br /&gt;
Error_Correction_Count.value 100&lt;br /&gt;
Temperature_Celsius.value 064&lt;br /&gt;
Raw_Read_Error_Rate.value 100&lt;br /&gt;
Total_Host_Sector_Write.value 100&lt;br /&gt;
Power_Cycle_Count.value 100&lt;br /&gt;
Power_On_Hours.value 100&lt;br /&gt;
Host_Program_Page_Count.value 100&lt;br /&gt;
Unused_Reserve_NAND_Blk.value 000&lt;br /&gt;
Percent_Lifetime_Remain.value 000&lt;br /&gt;
Unexpect_Power_Loss_Ct.value 100&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, I&#039;m not getting the same results on hetzner3. I wonder if this munin plugin doesn&#039;t support nvme drives?&lt;br /&gt;
# oh, it looks like I&#039;m actually not updating that file anymore in ansible, because it has a backup. I&#039;m going to make a note in ansible so I don&#039;t make that mistake again.&lt;br /&gt;
# meanwhile, I manually updated the config file on hetzner3 too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin # cd plugin-conf.d/&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls&lt;br /&gt;
dhcpd3  munin-node  README  spamstats  zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # touch /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chown root:root /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # chmod 0600 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # cp zzz-myconf /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # ls -lah /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
-rw------- 1 root root 1,7K Apr 20 17:29 /var/tmp/munin-zzz-myconf.20250420&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # vim zzz-myconf&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # diff /var/tmp/munin-zzz-myconf.20250420 /etc/munin/plugin-conf.d/zzz-myconf &lt;br /&gt;
3c3&lt;br /&gt;
&amp;lt; # Version: 0.2&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Version: 0.3&lt;br /&gt;
9c9&lt;br /&gt;
&amp;lt; # Updated: 2024-12-12&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; # Updated: 2025-04-20&lt;br /&gt;
31a32,35&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; [smart_*]&lt;br /&gt;
&amp;gt; user root&lt;br /&gt;
&amp;gt; group disk&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that still fails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
Warning: the execution of &#039;munin-run&#039; via &#039;systemd-run&#039; returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the &#039;systemd-run&#039; wrapper. Details of the latter can be found via &#039;journalctl&#039;.&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but, if I restart the service first and then run it, it – uhh – kinda works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # service munin-node restart&lt;br /&gt;
root@hetzner3 /etc/munin/plugin-conf.d # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it exits with a non-error, just a U. no further stats. huh.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # munin-run smart_nvme0n1&lt;br /&gt;
smartctl_exit_status.value U&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, it looks like the smart_ plugin doesn&#039;t work for nvme drives :(&lt;br /&gt;
## https://github.com/munin-monitoring/munin/issues/790&lt;br /&gt;
## https://github.com/aranemac/munin-smart-nvme&lt;br /&gt;
# I&#039;m not looking to compile some binary. I think we&#039;ve reached the point of diminished return here&lt;br /&gt;
# while historical smart charts would be great, what I really want to achieve is some email alerts from SMART, like we setup for the RAID&lt;br /&gt;
# I found a few guides about this&lt;br /&gt;
## https://linuxconfig.org/how-to-configure-smartd-and-be-notified-of-hard-disk-problems-via-email&lt;br /&gt;
## https://serverfault.com/questions/426761/is-smartd-properly-configured-to-send-alerts-by-email&lt;br /&gt;
## https://unix.stackexchange.com/questions/662633/best-practices-to-enable-smart-disk-notifications-on-a-linux-workstation&lt;br /&gt;
# I replaced the files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # mv /etc/smartd.conf /etc/smartd.conf.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).orig&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc # echo &amp;quot;DEVICESCAN -d removable -n standby -m REDACTED@opensourceecology.org -M exec /usr/share/smartmontools/smartd-runner&amp;quot; &amp;gt; /etc/smartd.conf&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but that didn&#039;t work; no email came when I restarted the service (even if I added -M test)&lt;br /&gt;
# I checked the status in systemd, and it says that it did try to send the mail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # systemctl status smartd&lt;br /&gt;
● smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/smartmontools.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: active (running) since Sun 2025-04-20 20:58:57 UTC; 3min 22s ago&lt;br /&gt;
	   Docs: man:smartd(8)&lt;br /&gt;
			 man:smartd.conf(5)&lt;br /&gt;
   Main PID: 1466569 (smartd)&lt;br /&gt;
	 Status: &amp;quot;Next check of 2 devices will start at 21:28:57&amp;quot;&lt;br /&gt;
	  Tasks: 1 (limit: 76834)&lt;br /&gt;
	 Memory: 1.2M&lt;br /&gt;
		CPU: 66ms&lt;br /&gt;
	 CGroup: /system.slice/smartmontools.service&lt;br /&gt;
			 └─1466569 /usr/sbin/smartd -n&lt;br /&gt;
&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, is SMART capable. Adding to &amp;quot;monitor&amp;quot; list.&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state read from /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Executing test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org ...&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Test of &amp;lt;mail&amp;gt; to REDACTED@opensourceecology.org: successful&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme0n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NX0M104566-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 smartd[1466569]: Device: /dev/nvme1n1, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVLB512HAJQ_00000-S3W8NA0M345614-n1.nvme.state&lt;br /&gt;
Apr 20 20:58:57 hetzner3 systemd[1]: Started smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon.&lt;br /&gt;
root@hetzner3 /etc #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I checked the postfix logs, and it looks like google is rejecting our mail?!?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # journalctl -fu postfix@-&lt;br /&gt;
...&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: Untrusted TLS connection established to aspmx.l.google.com[108.177.15.27]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bit&lt;br /&gt;
s) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/smtp[1468111]: CB6E5B94BB2: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[108.177.15.27]:25, delay=1.2, delays=0.01/0.01/0.86/0.27, dsn=2.0.0, status=sent (250 2.0.0 OK  1745183017 ffacd0b85a97d-39efa5a45b6si4251829f8f.798 - gsmtp)&lt;br /&gt;
Apr 20 21:04:34 hetzner3 postfix/qmgr[4510]: CB6E5B94BB2: removed&lt;br /&gt;
Apr 20 21:04:36 hetzner3 postfix/smtp[1468114]: Untrusted TLS connection established to aspmx.l.google.com[2404:6800:4003:c02::1b]:25: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/bounce socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/bounce socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: unexpected protocol delivery_request_protocol from private/defer socket (expected: delivery_status_protocol)&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: read private/defer socket: Application error&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: warning: D13CAB94BB3: defer service failure&lt;br /&gt;
Apr 20 21:04:38 hetzner3 postfix/smtp[1468114]: D13CAB94BB3: to=&amp;lt;REDACTED@opensourceecology.org&amp;gt;, relay=aspmx.l.google.com[2404:6800:4003:c02::1b]:25, delay=4.5, delays=0.01/0.01/3.5/1, dsn=4.3.0, status=deferred (bounce or trace service failure)&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I changed it to my personal email, restarted, and I got two emails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme1&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NA0M345614, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This message was generated by the smartd daemon running on:&lt;br /&gt;
&lt;br /&gt;
   host name:  hetzner3&lt;br /&gt;
   DNS domain: opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
The following warning/error was logged by the smartd daemon:&lt;br /&gt;
&lt;br /&gt;
TEST EMAIL from smartd for device: /dev/nvme0&lt;br /&gt;
&lt;br /&gt;
Device info:&lt;br /&gt;
SAMSUNG MZVLB512HAJQ-00000, S/N:S3W8NX0M104566, FW:EXA7301Q, 512 GB&lt;br /&gt;
&lt;br /&gt;
For details see host&#039;s SYSLOG.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I changed it back to the google groups email list email address, and I updated the wiki https://wiki.opensourceecology.org/wiki/Hetzner3&lt;br /&gt;
# after lunch, I refreshed munin on hetzne2 and hetzner3, to see if smart info was not being charted&lt;br /&gt;
## on hetzner2, there&#039;s no changes. I don&#039;t see any charts related to SMART&lt;br /&gt;
## on hetzner3, there&#039;s two new charts (S.M.A.R.T values for drive nvme0n1 &amp;amp; S.M.A.R.T values for drive nvme1n1), but they&#039;re both empty; it only has 1 value (smartctl_exit_status), and it&#039;s &amp;quot;nan&amp;quot; for all time charts. This is expected, since it can&#039;t read the nvme smartctl output format.&lt;br /&gt;
# I think maybe I forgot to restart munin on hetzner2, so I gave that a try&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# service munin-node restart&lt;br /&gt;
Redirecting to /bin/systemctl restart munin-node.service&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# sudo -u munin /usr/bin/munin-cron&lt;br /&gt;
2025/04/20 21:29:38 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 55.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/munin/munin-update line 56.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:51 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
2025/04/20 21:29:52 [Warning] Could not open includedir directory /etc/munin/conf.d: No such file or directory&lt;br /&gt;
readdir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 983.&lt;br /&gt;
closedir() attempted on invalid dirhandle $DIR at /usr/share/perl5/vendor_perl/Munin/Master/Utils.pm line 984.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# whatever; I guess no munin logs on SMART for this dying server&lt;br /&gt;
# I also confirmed that varnish logs are now visible in munin&lt;br /&gt;
# I committed my ansible changes https://github.com/OpenSourceEcology/ansible/commit/2fb906fd62cf0773d84f50f1cf113ddfe66910ec&lt;br /&gt;
# anyway, I also updated smartd.conf on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology smartmontools]# cp smartd.conf smartd.conf.20250420.bak&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# vim smartd.conf&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology smartmontools]# diff smartd.conf.20250420.bak smartd.conf&lt;br /&gt;
23c23,24&lt;br /&gt;
&amp;lt; DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; #DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
&amp;gt; DEVICESCAN -H -m REDACTED@opensourceecology.org -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
[root@opensourceecology smartmontools]# systemctl restart smartd&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
SMART Disk monitor:&lt;br /&gt;
				   Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
																					 SMART Disk monitor:&lt;br /&gt;
Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!&lt;br /&gt;
[root@opensourceecology smartmontools]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh wow, that screaming about the disks failing wasn&#039;t just printed to my tty; it got printed to every tty on my screen session. It really is angry..&lt;br /&gt;
# but, alas, no email was sent – even from hetzner2. where email should *definitely* be working&lt;br /&gt;
# this time the postfix logs on hetzner2 gave us an error from gmail saying why they&#039;re blocking us&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21221]: 297716847E6: host aspmx.l.google.com[64.233.167.27] said: 421-4.7.28 Gmail has detected an unusual rate of unso&lt;br /&gt;
licited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.go&lt;br /&gt;
ogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42a931si4417083f8f.167 - gsmtp (in reply to end &lt;br /&gt;
of DATA command)&lt;br /&gt;
Apr 20 21:40:27 opensourceecology postfix/smtp[21094]: 3CBF7684804: host aspmx.l.google.com[142.251.168.27] said: 421-4.7.28 Gmail has detected an unusual rate of uns&lt;br /&gt;
olicited mail. To protect 421-4.7.28 our users from spam, mail has been temporarily rate limited. For 421-4.7.28 more information, go to 421-4.7.28  https://support.g&lt;br /&gt;
oogle.com/mail/?p=UnsolicitedRateLimitError to 421 4.7.28 review our Bulk Email Senders Guidelines. ffacd0b85a97d-39efa42967csi4306047f8f.165 - gsmtp (in reply to end&lt;br /&gt;
 of DATA command)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# marcin sent an email campaign today with phpList. If that didn&#039;t make it out due to this, that&#039;s kinda  problem.&lt;br /&gt;
# I see in the log that we&#039;re kinda spamming phplist_bounces@opensourceecology.org&lt;br /&gt;
# that&#039;s basically where phplist is supposed to let our admins know that it failed to deliver to some people on the mailing list&lt;br /&gt;
## I confirmed that this account *does* exist in the gsuite admin wui user list&lt;br /&gt;
# yeah, crap, it&#039;s blocking other mail sent to my personal account from apache.&lt;br /&gt;
# woah, I&#039;m tailing the mail log and I just got probably hundereds or thousands of emails tried to be sent. phpList is *supposed* to do it in small batches, but I wonder if, once it fails and gets added to the queue, it&#039;ll do the re-send without batching it..&lt;br /&gt;
# I checked phpList wui settings and config.php, and I don&#039;t see anything about rate-limiting&lt;br /&gt;
# here&#039;s the docs on it https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
# it says it should be set in config.php. By default, I think it&#039;s 5,000 emails per hour&lt;br /&gt;
# Marcin&#039;s campaign today was sent to 14,111 people&lt;br /&gt;
# I checked the event log page, and I see a lot of these &amp;quot;Maximum time for queue processing: 99999&amp;quot; – which I guess means we need to break these up into batches https://phplist.opensourceecology.org/lists/admin/?page=eventlog&lt;br /&gt;
# looks like the easiest thing to do is to add a pause with MAILQUEUE_THROTTLE https://discuss.phplist.org/t/some-advice-for-correct-configuration-of-sending-rate/429&lt;br /&gt;
# if we send one per second, then we&#039;ll send 3,600 per hour.&lt;br /&gt;
## If we have 15,000 people on our list, then at that rate we&#039;d need 4-5 hours to send a campaign. That sounds like a good idea.&lt;br /&gt;
# I updated the phpList config file to send only one email per second&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# diff config.20250420.php config.php &lt;br /&gt;
83a84,87&lt;br /&gt;
&amp;gt; // only send 1 email per second&lt;br /&gt;
&amp;gt; //  * https://www.phplist.org/manual/books/phplist-manual/page/setting-the-send-speed-%28rate%29&lt;br /&gt;
&amp;gt; define(&#039;MAILQUEUE_THROTTLE&#039;,1);&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we should also probably throttle postfix https://serverfault.com/questions/110919/postfix-throttling-for-outgoing-messages&lt;br /&gt;
# looks like for both hetzner2 and hetzner3, this is set to no delay&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 0s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology phplist.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I set this on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology postfix]# diff main.cf.20250420 main.cf&lt;br /&gt;
683a684,686&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # limit emails to the same-destination-domain to one-email-per-2-seconds&lt;br /&gt;
&amp;gt; default_destination_rate_delay = 2s&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# systemctl restart postfix&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
[root@opensourceecology postfix]# postconf | grep -i _rate_&lt;br /&gt;
anvil_rate_time_unit = 60s&lt;br /&gt;
default_destination_rate_delay = 2s&lt;br /&gt;
error_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
lmtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
local_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
relay_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
retry_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtp_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
smtpd_client_connection_rate_limit = 0&lt;br /&gt;
smtpd_client_message_rate_limit = 0&lt;br /&gt;
smtpd_client_new_tls_session_rate_limit = 0&lt;br /&gt;
smtpd_client_recipient_rate_limit = 0&lt;br /&gt;
virtual_destination_rate_delay = $default_destination_rate_delay&lt;br /&gt;
[root@opensourceecology postfix]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I also added this to ansible and pushed it out to the server on hetnzer3 https://github.com/OpenSourceEcology/ansible/commit/7ed339cad055a9a0c5b04f26d32c9416daf3a2c7&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 19, 2025=&lt;br /&gt;
&lt;br /&gt;
# I responded to Tom&#039;s email about ssh&lt;br /&gt;
# Tom wasn&#039;t able to reset their account&#039;s password&lt;br /&gt;
# I think I created these accounts with `--disabled-password`, probably as some layered security for ssh (to force keys), but that kinda breaks sudo, which requires the password. I could make sudo NOPASSWD, but I think it&#039;s safer to have a user password set (and have ssh disabled passoword logins still) rather than set sudoers to NOPASSWD, in general&lt;br /&gt;
# disabled passwords are set with the &#039;!&#039; in the second field of /etc/shadown&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing:!:20133:0:99999:7:::&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I just manually edited /etc/shadow with vim to remove the exclimation point&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # vim /etc/shadow&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing::20133:0:99999:7:::&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom replied, saying he can become root on hetzner3 now.&lt;br /&gt;
# ...&lt;br /&gt;
# I returned to work on the plan for replacing the disks on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb#Change_Steps&lt;br /&gt;
# I confirmed that the disks (on both hetzner2 and hetzner3) are MBR partition scheme (not GPT) – indicated by &amp;quot;Disk label type: dos&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sda&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0x9b8e1266&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sda1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sdb&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0xd904fc05&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sdb1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# A quick spot-check shows that our backups usually finish at 09:55 – one time as late as 10:07. That&#039;s UTC.&lt;br /&gt;
# 10:00 UTC is 05:00 my time and 12:00 in Berlin. God that&#039;s early, but better to do this early in Germany time..&lt;br /&gt;
# I sent an email to Marcin asking if Thr 2025-04-24 @ 10:00 UTC (~05:00 FeF) would be a good time to do this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the first disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
Our backups finish daily at 10:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
 * 12:00 in Germany (where the server lives)&lt;br /&gt;
 * 05:00 here in Ecuador, and&lt;br /&gt;
 * 05:00 at FeF&lt;br /&gt;
&lt;br /&gt;
I propose next week on Thursday 2025-04-24 10:00 UTC.&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime, please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 18, 2025=&lt;br /&gt;
# Marcin sent another email this morning asking why osemain is down too now, and I responded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the&lt;br /&gt;
&amp;gt; last message&lt;br /&gt;
&lt;br /&gt;
Your whole database service was down, and it won&#039;t start. You have a varnish cache that stores a subset of pages in-memory for 24 hours. That&#039;s probably what you saw.&lt;br /&gt;
&lt;br /&gt;
I took webservers down yesterday to prevent the possibility of them corrupting the database worse, if it manages to start in recovery mode.&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt; go straight to migration to Hetzner 3.&lt;br /&gt;
&lt;br /&gt;
If you want high uptime, I don&#039;t recommend migrating to hetzner3 at this time. It&#039;s still not fully provisioned, and I actively work on it like a dev server. Which means I&#039;ll be restarting it and its services. It&#039;s not a safe place for production. That&#039;s why the wiki is the *last* service to migrate.&lt;br /&gt;
&lt;br /&gt;
Status update: yesterday I investigated to see if your underlying storage (disk, filesystem, or RAID) are failing, which might cause corruption. The filesystems were fine. RAID didn&#039;t have errors. The SMART logs on the disk said both of your two mirrored drives are failing and should be replaced within 24 hours. But I don&#039;t think that&#039;s evidence of corruption; I think it&#039;s just a timer that&#039;s alerting us to the possibility that the disks will fail soon. afaict, disk replacement is free (from Hetzner) but not trivial and high-risk. I&#039;ll postpone until after restoring the database.&lt;br /&gt;
&lt;br /&gt;
Likely not all of your database is corrupt. We *could* restore from backup, but I don&#039;t recommend that -- as you only have daily backups, and likely you&#039;ll have data loss.&lt;br /&gt;
&lt;br /&gt;
Yesterday I put the database in two recovery modes and was unable to get it to start. My plan is to continue to follow this guide, to see if I can find out which databases/tables/pages are corrupt and which are not. That way we can restore only the data we need from backups and minimize data loss&lt;br /&gt;
&lt;br /&gt;
 * https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&lt;br /&gt;
I have to go to the hospital today. If I have time, I will try to continue later tonight. And I plan to work on this over the weekend. I hope to have your sites back online early next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cheers,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 02:58, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Michael,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the last message -&lt;br /&gt;
&amp;gt; but now I&#039;m trying to post the blog posts and the main site appears to be&lt;br /&gt;
&amp;gt; down. Is our whole backend crashing?  Or is that something you are doing on&lt;br /&gt;
&amp;gt; your end?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Marcin&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On Thu, Apr 17, 2025 at 6:41 PM Marcin Jakubowski &amp;lt;&lt;br /&gt;
&amp;gt; REDACTED@opensourceecology.org&amp;gt; wrote:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Can we prioritize the wiki at this point to migrate the wiki right over to&lt;br /&gt;
&amp;gt;&amp;gt; Hetzner 3 with the  current up to date software, using the wiki backup from&lt;br /&gt;
&amp;gt;&amp;gt; 2 days ago, which is before the crash?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; The wiki was working at least the first part of yesterday, and I noticed&lt;br /&gt;
&amp;gt;&amp;gt; the crash at about 11 PM CST yesterday. Thus taking the backup from 4/15/25&lt;br /&gt;
&amp;gt;&amp;gt; should solve this? Ie, forget about trying to fix on Hetzner 2, go straight&lt;br /&gt;
&amp;gt;&amp;gt; to migration to Hetzner 3. Is that consistent with a possible shift in your&lt;br /&gt;
&amp;gt;&amp;gt; plans, or does that throw off the entire process of migration? OSE stands&lt;br /&gt;
&amp;gt;&amp;gt; stuck without it, I will have to do everything in Google docs if I don&#039;t&lt;br /&gt;
&amp;gt;&amp;gt; have wiki access, and i am justvputtingvout the announcent and recruiting.&lt;br /&gt;
&amp;gt;&amp;gt; I can switcj ro more publishing on the website, assuming that all works.&lt;br /&gt;
&amp;gt;&amp;gt; Please tell me what would be your proposed solution and how quickly you&lt;br /&gt;
&amp;gt;&amp;gt; think we can get back up to a functioning wiki, based on your schedule of&lt;br /&gt;
&amp;gt;&amp;gt; availability to work on this, so I can plan accordingly.  This is a much&lt;br /&gt;
&amp;gt;&amp;gt; higher priority than doing any of the main website migration.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thanks,&lt;br /&gt;
&amp;gt;&amp;gt; Marcin &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so back to trying to figure out the corruption of the mariadb&lt;br /&gt;
# looks like the attempt to start it in recovery mode 2 fails after 10 minutes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because a fatal signal was delivered to the control process. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    10m0.435s&lt;br /&gt;
user    0m0.011s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and the tail of the db log&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/mariadb/mariadb.log&lt;br /&gt;
250417 23:06:00  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:01  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:02  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:03  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:05  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:06  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:07  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:08  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:09  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so we have one more recovery mode we can try before it becomes destructive = 3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 3&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and gave it a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# damn, looks like it&#039;s stuck on the same thing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:33:17 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:33:17 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 20076 ...&lt;br /&gt;
250418 19:33:17 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:33:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:33:17 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:33:17 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:33:17 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:33:17 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:33:17 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:33:17  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:33:17  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:33:18  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:19  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:20  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet suggests this infinite loop is caused by the default of innodb_purge_threads=1, and it says we should set this to 0&lt;br /&gt;
## https://serverfault.com/questions/851342/mysql-crashed-and-not-starting-even-after-adding-innodb-force-recovery&lt;br /&gt;
## https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# I tried to cut off the systemctl restart early, but it&#039;s just stuck. I guess I just have to wait 10 minutes.&lt;br /&gt;
# anyway, I set the recovery back down to 2 and added the purge threads to 0 line; I&#039;ll try that when it&#039;s not blocked&lt;br /&gt;
# meanwhile, I read up on innodb_purge_threads, which is documented here https://dev.mysql.com/doc/refman/8.4/en/innodb-purge-configuration.html&lt;br /&gt;
# oh shit, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m2.102s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
[root@opensourceecology etc]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 19:44:30 UTC; 19s ago&lt;br /&gt;
  Process: 22469 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 22433 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 22468 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─22468 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─22693 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-...&lt;br /&gt;
&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 19:44:30 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the logs are being spammed with these last 5 lines a bunch; I guess something is still trying to access the db?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:44:28 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 22693 ...&lt;br /&gt;
250418 19:44:28 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:44:28 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:44:28 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:44:28 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:44:28 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:44:28 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:44:28 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:44:28  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:44:28  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:44:28  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:44:29 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883505166&lt;br /&gt;
250418 19:44:29 InnoDB: !!! innodb_force_recovery is set to 2 !!!&lt;br /&gt;
250418 19:44:29 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 19:44:29 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 19:44:29 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, the spam stopped. maybe just some startup thing.&lt;br /&gt;
# I was hoping at startup it would tell us which DBs/tables/pages were corrupt; I guess we have to initiate a scan or something.&lt;br /&gt;
# this guide doesn&#039;t say anything about that https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
# but this one recommends running `mysqlcheck` https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# this took about a minute to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# good news; looks like the wiki isn&#039;t fucked. it&#039;s just osemain, oswh, and cacti. restoring those from backups is probably not going to cause any data loss&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@opensourceecology dbFail.20250417]# head mysqlcheck.20250418.log &lt;br /&gt;
3dp_db.wp_commentmeta                              OK&lt;br /&gt;
3dp_db.wp_comments                                 OK&lt;br /&gt;
3dp_db.wp_links                                    OK&lt;br /&gt;
3dp_db.wp_masterslider_options                     OK&lt;br /&gt;
3dp_db.wp_masterslider_sliders                     OK&lt;br /&gt;
3dp_db.wp_options                                  OK&lt;br /&gt;
3dp_db.wp_postmeta                                 OK&lt;br /&gt;
3dp_db.wp_posts                                    OK&lt;br /&gt;
3dp_db.wp_revslider_css                            OK&lt;br /&gt;
3dp_db.wp_revslider_layer_animations               OK&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418.log &lt;br /&gt;
cacti_db.automation_ips&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.automation_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_cache&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_last&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output_boost_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
osemain_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
osemain_s_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
oswh_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s go ahead and take a mysqldump now, including the corrupt data. then I&#039;ll drop these three databases and restore from backups&lt;br /&gt;
## cacti_db&lt;br /&gt;
## osemain_db&lt;br /&gt;
## oswh_db&lt;br /&gt;
# I sent Marcin a status update email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I was able to start your database in recovery mode, and I see the following databases have corrupt tables:&lt;br /&gt;
&lt;br /&gt;
1. osemain&lt;br /&gt;
2. cacti&lt;br /&gt;
3. oswh&lt;br /&gt;
&lt;br /&gt;
Good news that the wiki isn&#039;t in that list. And that those particular corrupt DBs don&#039;t change much, so recovering just those databases from backups should result in an acceptable data loss, if any.&lt;br /&gt;
&lt;br /&gt;
I&#039;ll keep you updated.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I made the post-corruption mysqldump backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass --all-databases | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    2m48.845s&lt;br /&gt;
user    3m19.170s&lt;br /&gt;
sys     0m2.023s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# ls mysqldump*&lt;br /&gt;
mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz &lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s drop those three databases.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 14&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE cacti_db;&lt;br /&gt;
Query OK, 108 rows affected (0.38 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_db;&lt;br /&gt;
Query OK, 22 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oswh_db;&lt;br /&gt;
Query OK, 12 rows affected (0.03 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| fef_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
15 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# recovery mode isn&#039;t going to let us INSERT to recover data from backups, so let&#039;s take it out of recovery mode and see if the db will start&lt;br /&gt;
# nah, it failed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m2.805s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# logs are the same, I think?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:10:04 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:10:04 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 24305 ...&lt;br /&gt;
250418 20:10:04 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:10:04 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:10:04 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:10:04 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:10:04 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:10:04 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:10:04 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:10:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:10:04  InnoDB: Assertion failure in thread 140076605044480 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:10:04 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x560180c61cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x560180875975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f664031f630]&lt;br /&gt;
:0(__GI_raise)[0x7f663ea46387]&lt;br /&gt;
:0(__GI_abort)[0x7f663ea47a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x560180a0a45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x560180a0afa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x560180b0d504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x560180b02487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x560180a0d17d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x560180a010f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6640317ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f663eb0eb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:10:04 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I re-enabled recovery mode, but this time just as 1. This time it did start, but this loop gets spammed to the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:11:42 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883708456&lt;br /&gt;
250418 20:11:42 InnoDB: !!! innodb_force_recovery is set to 1 !!!&lt;br /&gt;
250418 20:11:42 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:11:42 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
250418 20:11:42  InnoDB: Assertion failure in thread 140282494781184 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:11:42 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed, &lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to &lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x55e2d6dbbcad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x55e2d69cf975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f962fbdc630]&lt;br /&gt;
:0(__GI_raise)[0x7f962e303387]&lt;br /&gt;
:0(__GI_abort)[0x7f962e304a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x55e2d6b6445f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x55e2d6b64fa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x55e2d6c67504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x55e2d6c5c487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x55e2d6b6717d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62e83c)[0x55e2d6b5a83c]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f962fbd4ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f962e3cbb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:11:42 mysqld_safe Number of processes running now: 0&lt;br /&gt;
250418 20:11:42 mysqld_safe mysqld restarted&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 27371 ...&lt;br /&gt;
250418 20:11:42 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:11:42 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:11:42 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:11:42 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:11:42 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:11:42 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:11:42 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:11:42  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, even though it *says* it&#039;s started&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m5.156s&lt;br /&gt;
user    0m0.008s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 20:11:07 UTC; 13s ago&lt;br /&gt;
  Process: 24459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 24423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 24458 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─24458 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─25620 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/v...&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:11:07 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
&lt;br /&gt;
real    0m0.012s&lt;br /&gt;
user    0m0.001s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can&#039;t connect to it with mysqlcheck&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log                              &lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I set it back to recovery mode 2, restarted, and tried the mysqlcheck again&lt;br /&gt;
# huh, all lines say OK&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418&lt;br /&gt;
mysqlcheck.20250418_201348.log  mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log&lt;br /&gt;
&lt;br /&gt;
real    0m11.597s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.009s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418_201559.log &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well now I&#039;m wondering if I should have run CHECK TABLE and REPAIR TABLE rather than just DROP them https://dev.mysql.com/doc/refman/8.4/en/myisam-table-close.html&lt;br /&gt;
# I&#039;m going to restore from the backup and then see if I can do that&lt;br /&gt;
# oh, right, we can&#039;t INSERT in recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1030 (HY000) at line 91: Got error -1 from storage engine&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, fuck, now I don&#039;t know why it won&#039;t start. And it doesn&#039;t tell me why. The good news is that I was able to get a db dump. maybe I can copy this huge dump over to some other server for repair and then copy it back?&lt;br /&gt;
# we should have backups. I&#039;m going to just purge all the non-system databases and see if we can get this thing started at all&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db d3ddb;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;d3ddb&#039; at line 1&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE d3d_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE fef_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE microfactory_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_db;&lt;br /&gt;
Query OK, 21 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_stabing_db;&lt;br /&gt;
ERROR 1008 (HY000): Can&#039;t drop database &#039;obi_stabing_db&#039;; database doesn&#039;t exist&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oseforum_db;&lt;br /&gt;
Query OK, 35 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_s_db;&lt;br /&gt;
Query OK, 20 rows affected (0.04 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osewiki_db;&lt;br /&gt;
Query OK, 59 rows affected (0.31 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE phplist_db;&lt;br /&gt;
Query OK, 42 rows affected (0.16 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE seedhome_db;&lt;br /&gt;
Query OK, 12 rows affected (0.05 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE store_db;&lt;br /&gt;
Query OK, 36 rows affected (0.11 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_staging_db;&lt;br /&gt;
Query OK, 21 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# even after that, it still won&#039;t start :&#039;(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m4.863s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Fri 2025-04-18 20:34:47 UTC; 14s ago&lt;br /&gt;
  Process: 18459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 18458 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 18423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 18458 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I purge those three system-level DBs, I want to confirm they&#039;re in our backups&lt;br /&gt;
# as I feared, it looks like they&#039;re missing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zgrep -E &#039;CREATE DATABASE&#039; mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | grep &#039;IF NOT EXISTS&#039; | grep -E &#039;^.{,100}$&#039;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `3dp_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `cacti_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `d3d_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `fef_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `microfactory_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `mysql` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_staging_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oseforum_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_s_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osewiki_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oswh_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `phplist_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `seedhome_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `store_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# according to this, information_schema is essentially a cache that gets created &amp;amp; destroyed every time mysql is restarted, so we should be ok to loose that https://stackoverflow.com/questions/15306132/information-schema-error-when-restoring-database-dump&lt;br /&gt;
# I&#039;m just going to manually dump these three anyway. Or try to&lt;br /&gt;
# well, I was able to get one of the three to backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass information_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_information_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz &lt;br /&gt;
mysqldump: Got error: 1044: &amp;quot;Access denied for user &#039;root&#039;@&#039;localhost&#039; to database &#039;information_schema&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass mysql | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_mysql.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    0m0.142s&lt;br /&gt;
user    0m0.155s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass performance_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_performance_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
mysqldump: Got error: 1142: &amp;quot;SELECT,LOCK TABL command denied to user &#039;root&#039;@&#039;localhost&#039; for table &#039;cond_instances&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.009s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# mysql looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode*&lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_information_schema.20250418_205054.sql.gz&lt;br /&gt;
716K    mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_performance_schema.20250418_205157.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to move this whole db dir out of the way and see if we can start it fresh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /var/lib&lt;br /&gt;
[root@opensourceecology lib]# du -sh mysql/&lt;br /&gt;
6.5G    mysql/&lt;br /&gt;
[root@opensourceecology lib]# ls -lah | grep -i mysql&lt;br /&gt;
drwxr-xr-x   4 mysql   mysql   4.0K Apr 18 20:50 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mv mysql mysql.20250418&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mkdir mysql&lt;br /&gt;
[root@opensourceecology lib]# chown mysql:mysql mysql&lt;br /&gt;
[root@opensourceecology lib]# chmod 0755 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 8.0K&lt;br /&gt;
drwxr-xr-x   2 mysql mysql 4.0K Apr 18 20:55 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, it&#039;s started outside recovery mode now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m3.550s&lt;br /&gt;
user    0m0.007s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
250418 20:55:06 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
250418 20:56:23 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:56:23 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 21252 ...&lt;br /&gt;
250418 20:56:23 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:56:23 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:56:23 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:56:23 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:56:23 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:56:23 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
InnoDB: The first specified data file ./ibdata1 did not exist:&lt;br /&gt;
InnoDB: a new database to be created!&lt;br /&gt;
250418 20:56:23  InnoDB: Setting file ./ibdata1 size to 10 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile0 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile0 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile1 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile1 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
InnoDB: Doublewrite buffer not found: creating new&lt;br /&gt;
InnoDB: Doublewrite buffer created&lt;br /&gt;
InnoDB: 127 rollback segment(s) active.&lt;br /&gt;
InnoDB: Creating foreign key constraint system tables&lt;br /&gt;
InnoDB: Foreign key constraint system tables created&lt;br /&gt;
250418 20:56:23  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:56:24 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 0&lt;br /&gt;
250418 20:56:24 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:56:24 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:56:24 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it created all these files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 29M&lt;br /&gt;
drwxr-xr-x   5 mysql mysql 4.0K Apr 18 20:56 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
-rw-rw----   1 mysql mysql  16K Apr 18 20:56 aria_log.00000001&lt;br /&gt;
-rw-rw----   1 mysql mysql   52 Apr 18 20:56 aria_log_control&lt;br /&gt;
-rw-rw----   1 mysql mysql  18M Apr 18 20:56 ibdata1&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile0&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile1&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 mysql&lt;br /&gt;
srwxrwxrwx   1 mysql mysql    0 Apr 18 20:56 mysql.sock&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 performance_schema&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 test&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that also would have killed the mysql password; I can&#039;t login&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1045 (28000): Access denied for user &#039;root&#039;@&#039;localhost&#039; (using password: YES)&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I hacked my way in and set the root password&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysqld_safe --skip-grant-tables --skip-networking &amp;amp;&lt;br /&gt;
mysql -u root&lt;br /&gt;
use mysql;&lt;br /&gt;
update user set password=PASSWORD(&amp;quot;new-password&amp;quot;) where User=&#039;root&#039;;&lt;br /&gt;
flush privileges;&lt;br /&gt;
exit&lt;br /&gt;
jobs -l&lt;br /&gt;
# kill mysqld_safe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now I can see our three databases, plus one named test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# usually this is where I&#039;d run the mysql hardening script, but let&#039;s just drop test manually and restore from backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE test;&lt;br /&gt;
Query OK, 0 rows affected (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first let&#039;s just restore the &#039;mysql&#039; database&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that appears to have worked; our users are present now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [mysql]&amp;gt; select User from user limit 10;&lt;br /&gt;
+------------------+&lt;br /&gt;
| User             |&lt;br /&gt;
+------------------+&lt;br /&gt;
| oseforum_user    |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| 3dp_user         |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| d3d_user         |&lt;br /&gt;
| fef_user         |&lt;br /&gt;
| microfactory_usr |&lt;br /&gt;
| munin_user       |&lt;br /&gt;
| obi2_user        |&lt;br /&gt;
| obi3_user        |&lt;br /&gt;
+------------------+&lt;br /&gt;
10 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [mysql]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, and ensured it&#039;s still working. Great.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 2&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s restore the rest – including even our corrupt databases – and see if it works or breaks&lt;br /&gt;
# that took about 11.5 minutes to import ~6.8G of data&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
&lt;br /&gt;
real    11m36.530s&lt;br /&gt;
user    1m52.944s&lt;br /&gt;
sys     0m3.593s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh /var/lib/mysql&lt;br /&gt;
6.8G    /var/lib/mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m still able to connect, and now I see all our DBs – including the ones it said were corrupt&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 6&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# woah, I gave it a restart, and it came back fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 3&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess we fixed it with no data loss?&lt;br /&gt;
# let&#039;s bring up the web servers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology lib]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology lib]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the wiki loads now&lt;br /&gt;
# so does osemain&lt;br /&gt;
# I&#039;d say we&#039;re back in business&lt;br /&gt;
# I sent an email to Marcin&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I think all your sites are back now.&lt;br /&gt;
&lt;br /&gt;
I was able to restore all of your databases from a dump of the database in recovery mode. So nothing needed to be restored from backups.&lt;br /&gt;
&lt;br /&gt;
Please let me know if you see any issues. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now that Marcin has ssh access on the server again, I wonder if he has permission to execute `restart` – that would be better for him than logging into the hetzner wui and doing hard resets, which likely caused this corruption&lt;br /&gt;
# at the risk of taking everything down after I just told Marcin that everything is up, I&#039;m going to try it&lt;br /&gt;
# looks like it won&#039;t let him reboot if other users are logged-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ reboot&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
Please retry operation after closing inhibitors and logging out other users.&lt;br /&gt;
Alternatively, ignore inhibitors and users with &#039;systemctl reboot -i&#039;.&lt;br /&gt;
[marcin@opensourceecology ~]$ systemctl reboot -i&lt;br /&gt;
==== AUTHENTICATING FOR org.freedesktop.login1.reboot-multiple-sessions ===&lt;br /&gt;
Authentication is required for rebooting the system while other users are logged in.&lt;br /&gt;
Multiple identities can be used for authentication:&lt;br /&gt;
 1.  maltfield&lt;br /&gt;
 2.  crupp&lt;br /&gt;
 3.  Tom Griffing (tgriffing)&lt;br /&gt;
 4.  jthomas&lt;br /&gt;
Choose identity to authenticate as (1-4):&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the sudoers command to give marcin *just* access to the reboot command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# visudo&lt;br /&gt;
[root@opensourceecology lib]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology lib]# tail /etc/sudoers&lt;br /&gt;
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom&lt;br /&gt;
&lt;br /&gt;
## Allows members of the users group to shutdown this system&lt;br /&gt;
# %users  localhost=/sbin/shutdown -h now&lt;br /&gt;
&lt;br /&gt;
## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)&lt;br /&gt;
#includedir /etc/sudoers.d&lt;br /&gt;
&lt;br /&gt;
# let marcin reboot the machine gracefully&lt;br /&gt;
marcin ALL = NOPASSWD: /sbin/reboot&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I couldn&#039;t test this on the server without changing marcin&#039;s password, so I spun-up a quick DispVM to ensure it *only* gives him access to reboot&lt;br /&gt;
# it&#039;s debian, but sudoers syntax should (hopefully) be the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@debian-12-dvm:~$ sudo su -&lt;br /&gt;
root@debian-12-dvm:~# adduser marcin --disabled-password --gecos &#039;&#039;&lt;br /&gt;
Adding user `marcin&#039; ...&lt;br /&gt;
Adding new group `marcin&#039; (1001) ...&lt;br /&gt;
Adding new user `marcin&#039; (1001) with group `marcin (1001)&#039; ...&lt;br /&gt;
Creating home directory `/home/marcin&#039; ...&lt;br /&gt;
Copying files from `/etc/skel&#039; ...&lt;br /&gt;
Adding new user `marcin&#039; to supplemental / extra groups `users&#039; ...&lt;br /&gt;
Adding user `marcin&#039; to group `users&#039; ...&lt;br /&gt;
root@debian-12-dvm:~# &lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# visudo&lt;br /&gt;
root@debian-12-dvm:~#&lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# passwd marcin&lt;br /&gt;
New password: &lt;br /&gt;
Retype new password: &lt;br /&gt;
passwd: password updated successfully&lt;br /&gt;
root@debian-12-dvm:~# sudo su - marcin&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo su -&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/su -&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo echo hi&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/echo hi&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$ &lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ reboot&lt;br /&gt;
-bash: reboot: command not found&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, that worked. Perfect.&lt;br /&gt;
# I tested it on hetzner2; it worked too.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ sudo reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I sent Marcin a reply ask him to test reboots via ssh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry the server just went down; that was me testing to make sure your &#039;marcin&#039; user now has permission to do a proper &amp;amp; safer `sudo reboot` of hetzner2. It does.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that&lt;br /&gt;
&amp;gt; I should plan on potential breakage at any time?&lt;br /&gt;
&lt;br /&gt;
Great question. There&#039;s a couple things I&#039;d like to implement to prevent this from happening again:&lt;br /&gt;
&lt;br /&gt;
1. Replace both of your disks on hetzner2&lt;br /&gt;
&lt;br /&gt;
2. Give you reboot permission on hetzner2&lt;br /&gt;
&lt;br /&gt;
My best-guess is that the corruption happened because you abruptly shutdown the server. As you know, that&#039;s generally not a good idea as it can cause data loss.&lt;br /&gt;
&lt;br /&gt;
But filesystems use journals and databases use pages. They *should* be able to recover from abrupt shutdowns. They wouldn&#039;t be very useful if they were so frail as to not be able to recover from something like that...&lt;br /&gt;
&lt;br /&gt;
But in this case, I think it was a &amp;quot;perfect storm&amp;quot; that you caused corruption and it wasn&#039;t able to recover from it due to a bug in mariadb. And, because your OS is EOL, we can&#039;t update to a newer version of mariadb that *is* able to recover from such a unlucky combination of events.&lt;br /&gt;
&lt;br /&gt;
So, in the meantime, instead of you logging into hetzner&#039;s WUI to trigger reboots, I&#039;d prefer if you would ssh into the hetzner2 server and execute&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
Please test this on your computer now to make sure you&#039;re setup for it. To ssh into hetzner2, execute this command on your computer:&lt;br /&gt;
&lt;br /&gt;
  ssh -p 32415 marcin@opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
And then at the prompt, execute this command (make sure you type this *after* you&#039;ve logged into hetzner, or you&#039;ll end-up rebooting your own laptop!)&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
The second thing I&#039;d like to do is replace both of your disks on hetzner2. I don&#039;t think they caused corruption in this case, but I did discover that they&#039;re both screaming that they&#039;re going to die soon and asking to be replaced, so I would be a fool not to heed that warning.&lt;br /&gt;
&lt;br /&gt;
Hetzner shouldn&#039;t charge us to replace a failing disk, but I&#039;ll schedule some downtime for remote hetzner hands to shutdown the machine, then I&#039;ll need to format the new drive, add it to the RAID (the mirror of two redundant disks), and update your grub boot partition.&lt;br /&gt;
&lt;br /&gt;
There&#039;s some risk in doing this, because you&#039;ll be running on one non-redundant disk (a disk which is screaming at us saying it&#039;s going to die within 24 hours) while the RAID is re-building. But, of course, there&#039;s risk in not doing it..&lt;br /&gt;
&lt;br /&gt;
Please confirm that you can now reboot hetzner2 via ssh.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 16:39, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Thats excellent, thabk you, looks good. Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that I should plan&lt;br /&gt;
&amp;gt; on potential breakage at any time? Regarding the full migration, how many&lt;br /&gt;
&amp;gt; more hours/days of provisioning do tou still expwct to need? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created an article for the CHG to replace the first disk on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
## I wonder if I can figure out which one grub uses and replace that one second..&lt;br /&gt;
# from my log yesterday, here&#039;s our two drive&#039;s serial numbers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fuck; looks like neither is referenced in /boot/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA4520&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA336C&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the steps to setup grub are actually quite simple, according to the hetzner docs https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## it says if we&#039;re doing it on the booted system, then we just need to run `grub-install /dev/sdX`&lt;br /&gt;
# it has additional instructions for grub1. And, uh, looks like we have grub1, grub2, *and* an efi dir in /boot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64                            initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img  System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64                      initramfs-3.10.0-327.18.2.el7.x86_64.img         System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64                        initramfs-3.10.0-514.26.2.el7.x86_64.img         System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64                        initramfs-3.10.0-693.2.2.el7.x86_64.img          System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64                         initramfs-3.10.0-693.2.2.el7.x86_64kdump.img     System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi                                                      initrd-plymouth.img                              vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
grub                                                     lost+found                                       vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
grub2                                                    symvers-3.10.0-1127.el7.x86_64.gz                vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img  symvers-3.10.0-1160.119.1.el7.x86_64.gz          vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img                     symvers-3.10.0-327.18.2.el7.x86_64.gz            vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img                symvers-3.10.0-514.26.2.el7.x86_64.gz            vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img               symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m thinking we should actually just tell hetzner to do a hot swap while the system is on, so we can do this &amp;quot;easy install&amp;quot; of grub without risking the system not coming-up after they removed the drive&lt;br /&gt;
# oh, the efi dir is empty, so I&#039;m thinking we&#039;re using grub2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# find efi&lt;br /&gt;
efi&lt;br /&gt;
efi/EFI&lt;br /&gt;
efi/EFI/centos&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, the grub dir just has one file in it?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub&lt;br /&gt;
total 10K&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Apr 11  2016 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
-rw-r--r--  1 root root 1.4K Nov 15  2011 splash.xpm.gz&lt;br /&gt;
[root@opensourceecology boot]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# grub2 looks most sane&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub2&lt;br /&gt;
total 52K&lt;br /&gt;
drwx------. 5 root root 1.0K Jul 26  2024 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Dec 15  2015 fonts&lt;br /&gt;
-rw-r--r--  1 root root 7.8K Jul 26  2024 grub.cfg&lt;br /&gt;
-rw-r--r--  1 root root 5.3K Jun  1  2016 grub.cfg.1499616907.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 6.1K Jul  9  2017 grub.cfg.1506097734.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 7.0K Sep 22  2017 grub.cfg.1588589453.rpmsave&lt;br /&gt;
-rw-r--r--. 1 root root 1.0K Jul 26  2024 grubenv&lt;br /&gt;
drwxr-xr-x. 2 root root 9.0K May 31  2016 i386-pc&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K May 31  2016 locale&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it looks like it&#039;s referencing the raid, not the drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### BEGIN /etc/grub.d/10_linux ###&lt;br /&gt;
menuentry &#039;CentOS Linux (3.10.0-1160.119.1.el7.x86_64) 7 (Core)&#039; --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option &#039;gnulinux-3.10.0-327.13.1.el7.x86_64-advanced-af18bd25-f715-4003-b055-170a07591c60&#039; {&lt;br /&gt;
		load_video&lt;br /&gt;
		set gfxpayload=keep&lt;br /&gt;
		insmod gzio&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod diskfilter&lt;br /&gt;
		insmod mdraid1x&lt;br /&gt;
		insmod ext2&lt;br /&gt;
		set root=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;&lt;br /&gt;
		if [ x$feature_platform_search_hint = xy ]; then&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root --hint=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;  9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		else&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root 9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		fi&lt;br /&gt;
		linux16 /vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
		initrd16 /initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# right, so if I understand this correctly: we&#039;re not updating grub. We&#039;re using &#039;grub-install&#039; to copy our grub config *to* the drive. that&#039;s easier and less concerning than I thought.&lt;br /&gt;
# well, since I can&#039;t see any good reason to pick one drive or the other to replace first, I&#039;m going to have them replace /dev/sdb first. Just because &#039;sda&#039; seems like it would be primary. I know it&#039;s probably not, but, anyway..&lt;br /&gt;
# that means we&#039;ll replace Crucial_CT250MX200SSD1_154410FA4520 first; I created another wiki entry for that https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
# Marcin sent me an email confirming that he&#039;s able to restart hetzner2 with `sudo reboot`. I asked him to use this in the future if he needs to reboot it again.&lt;br /&gt;
# the disk is getting pretty full, but I&#039;m going to leave these files in /var/tmp/ for at least a few days, to make sure we don&#039;t actually need to restore from a backup again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  150G   38G  80% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/lib/mysql.20250418 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 17, 2025=&lt;br /&gt;
# Marcin sent me an email last night (and again this morning) asking why the wiki is down&lt;br /&gt;
# I hadn&#039;t touched ose infra since 6 days ago&lt;br /&gt;
# the wiki is still on hetzner2, which is on EOL Cent, so I&#039;m not terribly surprised it&#039;s falling apart.&lt;br /&gt;
# I first warned Marcin about this many years ago, and hopefully the migration to hetzner3 will be finished before the end of this year&lt;br /&gt;
# anyway, let&#039;s check what happened to the wiki on hetzner2&lt;br /&gt;
# it&#039;s a 500 error complaining about the db&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ curl -iL wiki.opensourceecology.org&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:52 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://wiki.opensourceecology.org/&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 500 Internal Server Error&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:54 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 976&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Varnish: 434054&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish-v4&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h1&amp;gt;Sorry! This site is experiencing technical difficulties.&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Try waiting a few minutes and reloading.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;small&amp;gt;(Cannot access the database)&amp;lt;/small&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;hr /&amp;gt;&amp;lt;div style=&amp;quot;margin: 1.5em&amp;quot;&amp;gt;You can try searching via Google in the meantime.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;small&amp;gt;Note that their indexes of our content may be out of date.&amp;lt;/small&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;form method=&amp;quot;get&amp;quot; action=&amp;quot;//www.google.com/search&amp;quot; id=&amp;quot;googlesearch&amp;quot;&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;domains&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;num&amp;quot; value=&amp;quot;50&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;ie&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;oe&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;q&amp;quot; size=&amp;quot;31&amp;quot; maxlength=&amp;quot;255&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;submit&amp;quot; name=&amp;quot;btnG&amp;quot; value=&amp;quot;Search&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;p&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; checked=&amp;quot;checked&amp;quot; /&amp;gt;Open Source Ecology&amp;lt;/label&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;WWW&amp;lt;/label&amp;gt;&lt;br /&gt;
	&amp;lt;/p&amp;gt;&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# disk is fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G   96G   92G  52% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s no new logs in the apache error log when I hit the site in real-time (bypassing the cache)&lt;br /&gt;
# there&#039;s also no new logs in the mariadb error log when I hit the site in real-time&lt;br /&gt;
# well, the db isn&#039;t running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Thu 2025-04-17 17:39:24 UTC; 2h 42min ago&lt;br /&gt;
  Process: 1227 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 1226 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 1103 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 1226 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# error logs aren&#039;t very helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# journalctl -fu mariadb&lt;br /&gt;
-- Logs begin at Thu 2025-04-17 17:38:59 UTC. --&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I try to restart it manually, nothing gets put in the journal logs, but there&#039;s a bunch to the actual log file that the journal log mentions (damn systemd)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the log that pops-up when we try a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 20:24:31 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 20:24:31 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 10583 ...&lt;br /&gt;
250417 20:24:31 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 20:24:31 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 20:24:31 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 20:24:31 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 20:24:31 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 20:24:31 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 20:24:31 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 20:24:31  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 20:24:31  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 20:24:31  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 20:24:31  InnoDB: Assertion failure in thread 140093400303360 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250417 20:24:31 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x563a1c105cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x563a1bd19975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f6a294c9630]&lt;br /&gt;
:0(__GI_raise)[0x7f6a27bf0387]&lt;br /&gt;
:0(__GI_abort)[0x7f6a27bf1a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x563a1beae45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638f69)[0x563a1beaef69]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x563a1bfb1504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x563a1bfa6487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x563a1beb117d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x563a1bea50f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6a294c1ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f6a27cb8b0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250417 20:24:31 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# google points to this https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
## they say it could be a bug that might be fixed in v5.7. We&#039;re using 5.5.68. hetzner3 uses 5.8.&lt;br /&gt;
# reddit says we&#039;re fucked and should restore from backup https://old.reddit.com/r/mysql/comments/d3nkc7/innodb_assertion_failure_in_thread_4560_in_file/&lt;br /&gt;
# before reading any more, I&#039;m going to immediately make a local copy of our most-recent backups&lt;br /&gt;
# looks like we have a backup from 13 hours ago and one from 27 hours ago&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ date&lt;br /&gt;
Thu Apr 17 20:36:56 UTC 2025&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync&lt;br /&gt;
total 21G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 17 07:49 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    21G Apr 17 07:48 daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync.old/&lt;br /&gt;
total 22G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 16 07:52 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    22G Apr 16 07:52 daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this SE answer is helpful https://serverfault.com/questions/592793/mysql-crashed-and-wont-start-up&lt;br /&gt;
## it says we can force the db to start (in &amp;quot;recovery mode&amp;quot;) and then try to figure out which table is corrupted. Then we might be able to backup more-recent data from the not-corrupt tables and only recover the fucked table&lt;br /&gt;
## other warnings suggest solving the underlying issue: why did the data become corrupt?&lt;br /&gt;
## well, we know Marcin has been hard-resetting the server (via the hetzner wui) about every week because it keeps breaking since some months ago (it&#039;s EOL and not worth debugging)&lt;br /&gt;
## but it&#039;s also possible we have a worse issue, like a disk failing. We do have RAID1 tho, so idk. Still, it would be wise to check the SMART data and RAID logs and filesystem for corruption&lt;br /&gt;
# I sent a quick status update to Marcin so he knows the severity of the issue and that this isn&#039;t going to be fixed soon&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Your database is corrupt and won&#039;t start.&lt;br /&gt;
&lt;br /&gt;
Quick internet search for the error messages suggests this could be a bug that&#039;s been fixed in mariadb 5.7. You&#039;re using 5.6 and can&#039;t upgrade because your OS is EOL. hetnzer3 is running 5.8.&lt;br /&gt;
&lt;br /&gt;
 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
I&#039;m looking into seeing what is corrupt, what isn&#039;t corrupt, and if we can restore from backup.&lt;br /&gt;
&lt;br /&gt;
This is not going to be an easy or fast fix, sorry. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the backups of the backups finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /home/b2user/sync*/* /var/tmp/&lt;br /&gt;
sending incremental file list&lt;br /&gt;
daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
 22,975,631,986 100%  139.63MB/s    0:02:36 (xfr#1, to-chk=1/2)&lt;br /&gt;
daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
 21,566,407,634 100%  103.43MB/s    0:03:18 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 44,552,914,338 bytes  received 54 bytes  125,324,653.70 bytes/sec&lt;br /&gt;
total size is 44,542,039,620  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  138G   50G  74% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also going to take down the webservers, so that they can&#039;t fuck-up the database worse, if we do start it in some recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I should also make a backup of /var/lib/mysql&lt;br /&gt;
# I&#039;m going to create a dif for all of this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mkdir /var/tmp/dbFail.20250417&lt;br /&gt;
[root@opensourceecology ~]# chown root:root /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041* /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# vim /var/tmp/dbFail.20250417/info.txt&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /var/tmp/dbFail.20250417/info.txt &lt;br /&gt;
2025-04-17: Marcin emailed me last night saying the wiki was down with a db error. Today I tried to start it, but it refues to come-up. Looks like it&#039;s preventing itself from starting because it realizes something is corrupt and starting it would make things worse. Internet says maybe this was fixed in a newer version; we can&#039;t upgrade because Cent is EOL. Hetzner3 has the newer version&lt;br /&gt;
&lt;br /&gt;
		 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
		Anyway, I&#039;m creating this folder to store some backups before we make things worse.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# aaaand I added a copy of /var/lib/mysql/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /var/lib/mysql /var/tmp/dbFail.20250417/var-lib-mysql.$(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
sending incremental file list&lt;br /&gt;
created directory /var/tmp/dbFail.20250417/var-lib-mysql.20250417&lt;br /&gt;
mysql/&lt;br /&gt;
mysql/aria_log.00000001&lt;br /&gt;
		 16,384 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=707/709)&lt;br /&gt;
...&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rate_locations.frm&lt;br /&gt;
		  8,714 100%    9.26kB/s    0:00:00 (xfr#689, to-chk=1/709)&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rates.frm&lt;br /&gt;
		 13,128 100%   13.95kB/s    0:00:00 (xfr#690, to-chk=0/709)&lt;br /&gt;
&lt;br /&gt;
sent 7,384,914,964 bytes  received 13,343 bytes  114,495,012.51 bytes/sec&lt;br /&gt;
total size is 7,383,062,830  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# another important note: apparently we can keep increasing the value of innodb_force_recovery until it starts, but anything &amp;gt;3 could corrupt the data worse https://dba.stackexchange.com/q/241714&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Marko, MariaDB Innodb lead: MDEV-15370 was a bug when ugprading to 10.3, caused by MDEV-12288. Actually upgrades can still fail (MDEV-15912) if a slow shutdown of the old server was not made. Because the scenario does not involve upgrading to 10.3 or later, I am afraid that the user witnessed some kind of undo log corruption. Starting up with innodb_force_recovery=3 might allow dumping all data. If that crashes, then try innodb_force_recovery=5, but be aware that anything &amp;gt;3 may corrupt the database further, and therefore you should not use the database for anything else than mysqldump&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, a lot of the links for how to fix this are now dead&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## https://forums.mysql.com/read.php?22,603093,604631#msg-604631&lt;br /&gt;
## https://support.plesk.com/hc/en-us/articles/12377798484375-Plesk-is-not-accessible-ERROR-Zend-Db-Adapter-Exception-SQLSTATE-HY000-2002-No-such-file-or-directory&lt;br /&gt;
# we&#039;re running 5.6, so it should be this https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html&lt;br /&gt;
## but note that redirects to 8.6 for some reason? https://dev.mysql.com/doc/refman/8.4/en/forcing-innodb-recovery.html&lt;br /&gt;
## ah, so does 1.1 – apparently anything it doesn&#039;t like just reidrects to the latest version https://dev.mysql.com/doc/refman/1.1/en/forcing-innodb-recovery.html&lt;br /&gt;
# this suggests that, if we&#039;re going to use innodb_force_recovery 4 or greater, we only do it on another machine. So basically take the data I just backed-up put it on a separate machine, and do the fucker *there* instead https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## it also says that dumps of 4 or greater could still render corrupt data, so they shouldn&#039;t be trusted, anyway&lt;br /&gt;
## good news: it says the db blocks all INSERT, UPDATE, and DELETE commands when any recovery mode is enabled&lt;br /&gt;
### but we *can* run DROP. so the idea is to dump everything in recovery mode and drop what is corrupt. then restart with the recovery value set to 0 and restore.&lt;br /&gt;
## it says that dumps from recover mode of 1 or 2 or 3 are safe, and only the page is corrupt&lt;br /&gt;
### here&#039;s the definition of a page https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A unit representing how much data InnoDB transfers at any one time between disk (the data files) and memory (the buffer pool). A page can contain one or more rows, depending on how much data is in each row. If a row does not fit entirely into a single page, InnoDB sets up additional pointer-style data structures so that the information about the row can be stored in one page.&lt;br /&gt;
&lt;br /&gt;
One way to fit more data in each page is to use compressed row format. For tables that use BLOBs or large text fields, compact row format allows those large columns to be stored separately from the rest of the row, reducing I/O overhead and memory usage for queries that do not reference those columns.&lt;br /&gt;
&lt;br /&gt;
When InnoDB reads or writes sets of pages as a batch to increase I/O throughput, it reads or writes an extent at a time.&lt;br /&gt;
&lt;br /&gt;
All the InnoDB disk data structures within a MySQL instance share the same page size.&lt;br /&gt;
&lt;br /&gt;
See Also buffer pool, compact row format, compressed row format, data files, extent, page size, row.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess that just means data that hasn&#039;t been written to disk yet. So I *think* it should be OK to trust data that only has corrupt pages?&lt;br /&gt;
# ok, I think I have enough to proceed – at least for recovery modes 1, 2, and 3.&lt;br /&gt;
# but first let&#039;s check SMART&lt;br /&gt;
# oh, fuck, my notes on this are on the wiki. Of course.&lt;br /&gt;
# arch wiki to the rescue https://wiki.archlinux.org/title/S.M.A.R.T.&lt;br /&gt;
# fail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
-bash: smartctl: command not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# luckily the yum servers for this EOL OS are still online, and I could install it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# yum install smartmontools&lt;br /&gt;
...&lt;br /&gt;
Total download size: 546 k&lt;br /&gt;
Installed size: 2.0 M&lt;br /&gt;
Is this ok [y/d/N]: y&lt;br /&gt;
Downloading packages:&lt;br /&gt;
smartmontools-7.0-2.el7.x86_64.rpm                                                                                                              | 546 kB  00:00:00     &lt;br /&gt;
Running transaction check&lt;br /&gt;
Running transaction test&lt;br /&gt;
Transaction test succeeded&lt;br /&gt;
Running transaction&lt;br /&gt;
  Installing : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
  Verifying  : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
&lt;br /&gt;
Installed:&lt;br /&gt;
  smartmontools.x86_64 1:7.0-2.el7                                                                                                                                     &lt;br /&gt;
&lt;br /&gt;
Complete!&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# better&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
SMART support is: Available - device has SMART capability.&lt;br /&gt;
SMART support is: Enabled&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well this is terrifying; it says both our disks are gonna fail within 24 hours&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# compare that to hetnzer3, which says all is good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m not 100% convinced that this is true. I still want to initiate a test on the drives, but I&#039;m going to go ahead and pass this to hetzner support asap and ask them if there&#039;s a fee for them to replace our drives.&lt;br /&gt;
# oh, interesting. they have a walkthrough that says it&#039;s free via Server -&amp;gt; Technical -&amp;gt; Disk Failure https://robot.hetzner.com/support/index&lt;br /&gt;
## well, it lists two options&lt;br /&gt;
### Free Replacement drive nearly new or used and tested; depends on what is in stock. &lt;br /&gt;
### At cost Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
## we were given an option if we should hot swap while the system is on or shutdown. I&#039;m going to say shutdown. That&#039;ll be simpler from the OS side, I think&lt;br /&gt;
## dang, it says they&#039;ll swap the drive within 2-4 hours.&lt;br /&gt;
# I&#039;ve never done this before, but it&#039;s a hardware raid. My understanding is that as soon as it comes-up, it&#039;ll begin copying the data from one disk to the other disk. But, christ, if both disks are fucked then which disk should I choose them to replace? Can I see which one is more fucked than the other?&lt;br /&gt;
# hetzner provides 4 docs for assistance on this&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#information-on-defective-drives&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/maintainance/nvme/#show-serial-number-of-a-specific-nvme-ssd&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#creating-a-complete-smart-log&lt;br /&gt;
# that first doc says to run the command we just ran&lt;br /&gt;
# hmm..it says for more info we should look at the &amp;quot;Failed Attributes&amp;quot; – but we have none for either disk&lt;br /&gt;
# ok, the docs say we can get more info with -A&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so both say &amp;quot;Percent_Lifetime_Remain&amp;quot; is an issue. does that mean it&#039;s not *actually* writing corrupt data, but it&#039;s literally just a timer that hit and said &amp;quot;yeah you should probably replace the disk??&amp;quot;&lt;br /&gt;
# well, &amp;quot;Percent_Lifetime_Remain&amp;quot; doesn&#039;t appear in the docs table. nor in the source wikipedia table https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes&lt;br /&gt;
# yeah, reddit suggests that means the drive &amp;quot;should be replaced soon&amp;quot; but not that it&#039;s actually detected as failing now https://www.reddit.com/r/homelab/comments/kaaqma/percent_lifetime_remain_failing_now/&lt;br /&gt;
# in that case, I guess it doesn&#039;t matter which disk we replace. But let&#039;s go ahead and get one replaced. I don&#039;t think this was the cause of the db corruption (I still think it&#039;s &amp;quot;shutting down the computer abruptly + a bug in old mariadb that prevents it from recovering&amp;quot;), but I would be stupid not to take a free replacement of a RAID1-mirrored disk that&#039;s alerting us that it&#039;s too old to be in prod.&lt;br /&gt;
# the second hetnzer docs refer to nvme. that&#039;s relevant on hetzner3 but not hetzner2. anyway, I do want to know how to check this on hetzer2 (even if I can&#039;t update the wiki right now with this docs)&lt;br /&gt;
# wow, the output for smartctl looks very different for NVMEs on Debian than it does on CentOS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        39 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    6%&lt;br /&gt;
Data Units Read:                    152.358.379 [78,0 TB]&lt;br /&gt;
Data Units Written:                 52.125.092 [26,6 TB]&lt;br /&gt;
Host Read Commands:                 6.873.372.480&lt;br /&gt;
Host Write Commands:                1.362.559.127&lt;br /&gt;
Controller Busy Time:               22.226&lt;br /&gt;
Power Cycles:                       28&lt;br /&gt;
Power On Hours:                     17.245&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      159&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               39 Celsius&lt;br /&gt;
Temperature Sensor 2:               48 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        40 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    7%&lt;br /&gt;
Data Units Read:                    140.811.605 [72,0 TB]&lt;br /&gt;
Data Units Written:                 56.604.901 [28,9 TB]&lt;br /&gt;
Host Read Commands:                 1.304.073.899&lt;br /&gt;
Host Write Commands:                1.364.668.115&lt;br /&gt;
Controller Busy Time:               21.180&lt;br /&gt;
Power Cycles:                       23&lt;br /&gt;
Power On Hours:                     15.565&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      149&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               40 Celsius&lt;br /&gt;
Temperature Sensor 2:               45 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that shows we&#039;re at 6% and 7% usage on hetzner3, whereas I guess we&#039;re at 100% on hetzner2&lt;br /&gt;
# the third hetzner doc refers to a software raid. actually, I thought we were using a hardware raid, but now I&#039;m not sure&lt;br /&gt;
# this indicates that our raid is fine. two UUs (eg `[UU]`) is fine. Bad would be a U and a missing U (eg `[U_]`)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat &lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sdb2[1] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[1] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah crap, the process to bring the new drive back into the RAID is not-trivial https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## first we have to format the new drive exactly as the old drive, then add each partition into the RAID array, then update grub. And, of course, meanwhile we&#039;ll be running on one disk. So if we fuck-up any of those steps, we loose everything. This could take me a few days (or weeks), and meanwhile the sites are all offline and our daily backups on backblaze are being deleted/rotated out of existance. Sadly, I think I&#039;m going to postpone this until after we get the sites back-up.&lt;br /&gt;
# the last hetzner doc shows us how to get the serial number of our disks (which hetzner will ask-for when we tell them to swap it)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and ran a SMART test; it says it&#039;ll take just 2 minutes to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:07:55 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:08:18 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also kicked-off a long test, which I can check tomorrow&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:12 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:14 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, then we have the filesystem. it looks like /var/lib/msyql/ lives on &#039;/&#039; which is /dev/md2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h /var/lib/mysql&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
/dev/md2        197G  145G   43G  78% /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/md2&lt;br /&gt;
&lt;br /&gt;
Disk /dev/md2: 215.0 GB, 215024271360 bytes, 419969280 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk /dev/md2&lt;br /&gt;
NAME MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
md2    9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it won&#039;t let me check the filesystem while it&#039;s mounted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fsck /dev/md2&lt;br /&gt;
fsck from util-linux 2.23.2&lt;br /&gt;
e2fsck 1.42.9 (28-Dec-2013)&lt;br /&gt;
/dev/md2 is mounted.&lt;br /&gt;
e2fsck: Cannot continue, aborting.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it probably should be happening on-boot, but I couldn&#039;t find it in dmesg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i check&lt;br /&gt;
[    0.000000] Early table checksum verification disabled&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i fsck&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, instead we can just use tune2fs to get the info on the last check that was run&lt;br /&gt;
# looks like it ran today; probably when Marcin rebooted it https://unix.stackexchange.com/questions/400851/what-should-i-do-to-force-the-root-filesystem-check-and-optionally-a-fix-at-bo&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2&lt;br /&gt;
tune2fs 1.42.9 (28-Dec-2013)&lt;br /&gt;
Filesystem volume name:   &amp;lt;none&amp;gt;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Filesystem UUID:          af18bd25-f715-4003-b055-170a07591c60&lt;br /&gt;
Filesystem magic number:  0xEF53&lt;br /&gt;
Filesystem revision #:    1 (dynamic)&lt;br /&gt;
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize&lt;br /&gt;
Filesystem flags:         signed_directory_hash&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Filesystem OS type:       Linux&lt;br /&gt;
Inode count:              13131776&lt;br /&gt;
Block count:              52496160&lt;br /&gt;
Reserved block count:     2624808&lt;br /&gt;
Free blocks:              26575102&lt;br /&gt;
Free inodes:              12417672&lt;br /&gt;
First block:              0&lt;br /&gt;
Block size:               4096&lt;br /&gt;
Fragment size:            4096&lt;br /&gt;
Reserved GDT blocks:      1011&lt;br /&gt;
Blocks per group:         32768&lt;br /&gt;
Fragments per group:      32768&lt;br /&gt;
Inodes per group:         8192&lt;br /&gt;
Inode blocks per group:   512&lt;br /&gt;
Flex block group size:    16&lt;br /&gt;
Filesystem created:       Tue May 31 06:01:12 2016&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Last write time:          Thu Apr 17 17:39:00 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
Check interval:           0 (&amp;lt;none&amp;gt;)&lt;br /&gt;
Lifetime writes:          124 TB&lt;br /&gt;
Reserved blocks uid:      0 (user root)&lt;br /&gt;
Reserved blocks gid:      0 (group root)&lt;br /&gt;
First inode:              11&lt;br /&gt;
Inode size:               256&lt;br /&gt;
Required extra isize:     28&lt;br /&gt;
Desired extra isize:      28&lt;br /&gt;
Journal inode:            8&lt;br /&gt;
Default directory hash:   half_md4&lt;br /&gt;
Directory Hash Seed:      b9456d9f-1608-4444-99c2-02e6f327e42d&lt;br /&gt;
Journal backup:           inode blocks&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# both of the filesystems (/ and /boot) look fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md1 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /boot&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              46&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Tue May 31 06:01:07 2016&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, so far I couldn&#039;t find any signs of corruption on the disk/fs level&lt;br /&gt;
# back to the db, I set the recovery option in the my.cnf file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# cp my.cnf my.cnf.20250417&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 1&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it didn&#039;t come-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried changing it to restore level 2; this time it got stuck &amp;quot;waiting for the background threads&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 22:32:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 22:32:49 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 14901 ...&lt;br /&gt;
250417 22:32:49 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 22:32:49 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 22:32:49 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 22:32:49 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 22:32:49 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 22:32:49 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 22:32:49 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 22:32:49  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 22:32:49  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 22:32:49  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:50  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:51  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:52  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:53  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:54  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:55  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:56  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:57  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:58  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it seems infinite. I don&#039;t know if it&#039;s going to time-out, but I&#039;m just going to leave it and come-back tomorrow.&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 11, 2025=&lt;br /&gt;
&lt;br /&gt;
# let&#039;s get Catarina that broken staging site for osemain on hetzner3&lt;br /&gt;
# Marcin still hasn&#039;t regained access to his ssh key (so he can update the ose keepass), but he did finally send me the password to our hetzner account&lt;br /&gt;
# so now I can order a second IPv4 address, as needed for obi &amp;amp; osemain to have two distinct sites on hetzner3&lt;br /&gt;
# I logged-into hetzner https://robot.hetzner.com/server&lt;br /&gt;
# I also typed a &amp;quot;name&amp;quot; into the blank &amp;quot;name&amp;quot; fields for our two servers. one is now called &amp;quot;hetzner2&amp;quot; and the new one &amp;quot;hetzner3&amp;quot;&lt;br /&gt;
# I clicked on the server for &amp;quot;hetzner3&amp;quot; and the tab &amp;quot;IPs&amp;quot;.&lt;br /&gt;
## Then I clicked on &amp;quot;Order additional IPs / Nets&amp;quot;&lt;br /&gt;
## I selected &amp;quot;One additional IP with costs (€ 1.70 max. per month / € 0.0027 per hour + € 4.90 once-off setup)&amp;quot;&lt;br /&gt;
## it required me to enter a reason (IPv4 is scarce) to which I wrote:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
we need to run two websites with the same domain name that are already running on our primary IPv4 address, and a client doesn&#039;t have IPv6 working at their office&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## and I clicked &amp;quot;Apply for IP/subnet in obligation&amp;quot;&lt;br /&gt;
## I got a message; looks like it needs human approval&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your request for additional IPs/subnets was successfully sent. We will send you an email as soon as your IP/subnet is ready.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I typed an email to Marcin and Catarina to notify them of this order&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
As authorized on our last call, I ordered an additional IPv4 address for your hetzner account.&lt;br /&gt;
&lt;br /&gt;
IPv4 addresses are scarce, and it appears that they need to approve it manually.&lt;br /&gt;
&lt;br /&gt;
The cost is €1.70 per month + € 4.90 once-off setup.&lt;br /&gt;
&lt;br /&gt;
This will allow us to run more than one website with the same domain off the same server. That will be needed for osemain and obi.&lt;br /&gt;
&lt;br /&gt;
Once you finish rebuilding those websites on hetzner3 to use a new not-broken theme, we can cancel this second IP address.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I finished typing ^ that email, I got an email from hetzner indicating that we have a new IP&lt;br /&gt;
# I refreshed the hetzner wui, and now I see the new IP&lt;br /&gt;
# ...&lt;br /&gt;
# following-up on the bus factor, I added Catarina &amp;amp; Tom&#039;s ssh keys to their authorized_keys files on hetzner3&lt;br /&gt;
## I sent them both emails asking them to confirm access&lt;br /&gt;
# I also emailed Marcin asking if he installed zulucrypt yet to try to recover his old ssh key&lt;br /&gt;
# update: within a few hours, Marcin had successfully decrypted and mounted his old veracrypt volume using zuluCrypt&lt;br /&gt;
# he created this article on the wiki https://wiki.opensourceecology.org/wiki/Zulucrypt&lt;br /&gt;
# I found that he had previously documented scattered articles about backups, luks, veracrypt, pgp, cybersec general, etc in a ton of different articles. So I spent some time adding categories and &amp;quot;see also&amp;quot; sections to those articles, in hopes he will be more easily able to do this in the future&lt;br /&gt;
# I also asked him to please document what he needed for himself 5 years from now into a README file next to the &#039;ose-veracrypt&#039; volume on his usb drive.&lt;br /&gt;
# Marcin confirmed that he was able to restore his ssh keys and ssh into hetzner3. awesome.&lt;br /&gt;
# ...&lt;br /&gt;
# I logged all my hours and sent an invoice to OSE for last month (Mar 2025)&lt;br /&gt;
# gah, I had obliterated half my 2025Q1 log. when I tried to restore it, I got a 413 error lgo&lt;br /&gt;
# I checked php and nginx; it&#039;s 10M. How did I write &amp;gt;10 MB of text in one quarter?&lt;br /&gt;
# there&#039;s too many layers on this server; I checked the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Apr 11 22:18:20.306872 2025] [:error] [pid 13182] [client 127.0.0.1:56606] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;], referer: https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&lt;br /&gt;
HTTP/1.1 413 Request Entity Too Large&lt;br /&gt;
Message: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413)&lt;br /&gt;
Apache-Error: [file &amp;quot;apache2_util.c&amp;quot;] [line 271] [level 3] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;]&lt;br /&gt;
127.0.0.1 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.0&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot;&lt;br /&gt;
146.70.199.124 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.1&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it&#039;s modsecurity?&lt;br /&gt;
# gah, that&#039;s a lot of files to review&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# find .  |grep -i security&lt;br /&gt;
./conf.d/mod_security.wordpress.include&lt;br /&gt;
./conf.d/mod_security.conf&lt;br /&gt;
./conf.modules.d/10-mod_security.conf&lt;br /&gt;
./modsecurity.d&lt;br /&gt;
./modsecurity.d/activated_rules&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_42_tight_security.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_35_bad_robots.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_45_trojans.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_48_local_exceptions.conf.example&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_bad_robots.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_23_request_limits.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_49_inbound_blocking.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_60_correlation.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound_malware.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_scanners.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_40_generic_attacks.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_50_outbound.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_47_common_exceptions.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_30_http_policy.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_20_protocol_violations.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_xss_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_59_outbound_blocking.conf&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf.20181024.orig&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf&lt;br /&gt;
./modsecurity.d/do_not_log_passwords.conf&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it&#039;s SecRequestBodyLimit http://stackoverflow.com/questions/13887812/ddg#14690797&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -irl &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf&lt;br /&gt;
modules/mod_security2.so&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s 13107200&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimit 13107200&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimitAction Reject&lt;br /&gt;
Binary file modules/mod_security2.so matches&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# docs say it&#039;s in bytes https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecRequestBodyLimit&lt;br /&gt;
# so 13107200 / 1024 / 1024 = 12.5 MB.&lt;br /&gt;
# jesus that&#039;s a lot of data; I&#039;m not gonna increase that in 4 places (nginx, apache, mod_security, php); let&#039;s just split it into two articles :(&lt;br /&gt;
# ...&lt;br /&gt;
# so Marcin is stressing urgancy to get Catarina a sandbox so she can rebuild osemain using some new theme that&#039;s not broken on the latest version of wordpress, php, etc on hetzner3&lt;br /&gt;
# I didn&#039;t want to do this site before the other less-priority ones, but it&#039;s just a sandbox&lt;br /&gt;
# I realized I never made a CHG file for osemain&lt;br /&gt;
# looks like I first did a snapshot Jan 31https://wiki.opensourceecology.org/wiki/Maltfield_Log/2025_Q1#Fri_Jan_31.2C_2025&lt;br /&gt;
# ugh, I just said I was &amp;quot;following the same guide as with the other sites&amp;quot;&lt;br /&gt;
## I was hoping to know which one to CHG to copy-from&lt;br /&gt;
## I guess it makes the most sense to copy from obi, which already has both a static and dynamic site setup (untested)&lt;br /&gt;
# ok, I made a first draft of our osemain CHG to migrate to hetnzer3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
# oh, crap, I&#039;m going to remove&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306062</id>
		<title>Maltfield Log/2025 Q2</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306062"/>
		<updated>2025-04-27T21:44:09Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the second quarter of the year 2025. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Sat Apr 19, 2025=&lt;br /&gt;
&lt;br /&gt;
# I responded to Tom&#039;s email about ssh&lt;br /&gt;
# Tom wasn&#039;t able to reset their account&#039;s password&lt;br /&gt;
# I think I created these accounts with `--disabled-password`, probably as some layered security for ssh (to force keys), but that kinda breaks sudo, which requires the password. I could make sudo NOPASSWD, but I think it&#039;s safer to have a user password set (and have ssh disabled passoword logins still) rather than set sudoers to NOPASSWD, in general&lt;br /&gt;
# disabled passwords are set with the &#039;!&#039; in the second field of /etc/shadown&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing:!:20133:0:99999:7:::&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I just manually edited /etc/shadow with vim to remove the exclimation point&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # vim /etc/shadow&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # tail /etc/shadow&lt;br /&gt;
varnish:!:19990::::::&lt;br /&gt;
vcache:!:19990::::::&lt;br /&gt;
varnishlog:!:19990::::::&lt;br /&gt;
mysql:!:19991::::::&lt;br /&gt;
munin:!:19991::::::&lt;br /&gt;
wp:!:19994:0:99999:7:::&lt;br /&gt;
not-apache:!:19995:0:99999:7:::&lt;br /&gt;
marcin:!:20133:0:99999:7:::&lt;br /&gt;
cmota:!:20133:0:99999:7:::&lt;br /&gt;
tgriffing::20133:0:99999:7:::&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Tom replied, saying he can become root on hetzner3 now.&lt;br /&gt;
# ...&lt;br /&gt;
# I returned to work on the plan for replacing the disks on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb#Change_Steps&lt;br /&gt;
# I confirmed that the disks (on both hetzner2 and hetzner3) are MBR partition scheme (not GPT) – indicated by &amp;quot;Disk label type: dos&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sda&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sda: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0x9b8e1266&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sda1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sda3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/sdb&lt;br /&gt;
&lt;br /&gt;
Disk /dev/sdb: 250.1 GB, 250059350016 bytes, 488397168 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
Disk label type: dos&lt;br /&gt;
Disk identifier: 0xd904fc05&lt;br /&gt;
&lt;br /&gt;
   Device Boot      Start         End      Blocks   Id  System&lt;br /&gt;
/dev/sdb1            2048    67110912    33554432+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb2        67112960    68161536      524288+  fd  Linux raid autodetect&lt;br /&gt;
/dev/sdb3        68163584   488395120   210115768+  fd  Linux raid autodetect&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# A quick spot-check shows that our backups usually finish at 09:55 – one time as late as 10:07. That&#039;s UTC.&lt;br /&gt;
# 10:00 UTC is 05:00 my time and 12:00 in Berlin. God that&#039;s early, but better to do this early in Germany time..&lt;br /&gt;
# I sent an email to Marcin asking if Thr 2025-04-24 @ 10:00 UTC (~05:00 FeF) would be a good time to do this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
When would be a good time to replace the first disk on hetzner2?&lt;br /&gt;
&lt;br /&gt;
Our backups finish daily at 10:00 UTC, which is:&lt;br /&gt;
&lt;br /&gt;
 * 12:00 in Germany (where the server lives)&lt;br /&gt;
 * 05:00 here in Ecuador, and&lt;br /&gt;
 * 05:00 at FeF&lt;br /&gt;
&lt;br /&gt;
I propose next week on Thursday 2025-04-24 10:00 UTC.&lt;br /&gt;
&lt;br /&gt;
For details about what this change entails, and expected downtime, please see the change ticket:&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve this change, if the suggested time is agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 18, 2025=&lt;br /&gt;
# Marcin sent another email this morning asking why osemain is down too now, and I responded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the&lt;br /&gt;
&amp;gt; last message&lt;br /&gt;
&lt;br /&gt;
Your whole database service was down, and it won&#039;t start. You have a varnish cache that stores a subset of pages in-memory for 24 hours. That&#039;s probably what you saw.&lt;br /&gt;
&lt;br /&gt;
I took webservers down yesterday to prevent the possibility of them corrupting the database worse, if it manages to start in recovery mode.&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt; go straight to migration to Hetzner 3.&lt;br /&gt;
&lt;br /&gt;
If you want high uptime, I don&#039;t recommend migrating to hetzner3 at this time. It&#039;s still not fully provisioned, and I actively work on it like a dev server. Which means I&#039;ll be restarting it and its services. It&#039;s not a safe place for production. That&#039;s why the wiki is the *last* service to migrate.&lt;br /&gt;
&lt;br /&gt;
Status update: yesterday I investigated to see if your underlying storage (disk, filesystem, or RAID) are failing, which might cause corruption. The filesystems were fine. RAID didn&#039;t have errors. The SMART logs on the disk said both of your two mirrored drives are failing and should be replaced within 24 hours. But I don&#039;t think that&#039;s evidence of corruption; I think it&#039;s just a timer that&#039;s alerting us to the possibility that the disks will fail soon. afaict, disk replacement is free (from Hetzner) but not trivial and high-risk. I&#039;ll postpone until after restoring the database.&lt;br /&gt;
&lt;br /&gt;
Likely not all of your database is corrupt. We *could* restore from backup, but I don&#039;t recommend that -- as you only have daily backups, and likely you&#039;ll have data loss.&lt;br /&gt;
&lt;br /&gt;
Yesterday I put the database in two recovery modes and was unable to get it to start. My plan is to continue to follow this guide, to see if I can find out which databases/tables/pages are corrupt and which are not. That way we can restore only the data we need from backups and minimize data loss&lt;br /&gt;
&lt;br /&gt;
 * https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&lt;br /&gt;
I have to go to the hospital today. If I have time, I will try to continue later tonight. And I plan to work on this over the weekend. I hope to have your sites back online early next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cheers,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 02:58, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Michael,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the last message -&lt;br /&gt;
&amp;gt; but now I&#039;m trying to post the blog posts and the main site appears to be&lt;br /&gt;
&amp;gt; down. Is our whole backend crashing?  Or is that something you are doing on&lt;br /&gt;
&amp;gt; your end?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Marcin&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On Thu, Apr 17, 2025 at 6:41 PM Marcin Jakubowski &amp;lt;&lt;br /&gt;
&amp;gt; REDACTED@opensourceecology.org&amp;gt; wrote:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Can we prioritize the wiki at this point to migrate the wiki right over to&lt;br /&gt;
&amp;gt;&amp;gt; Hetzner 3 with the  current up to date software, using the wiki backup from&lt;br /&gt;
&amp;gt;&amp;gt; 2 days ago, which is before the crash?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; The wiki was working at least the first part of yesterday, and I noticed&lt;br /&gt;
&amp;gt;&amp;gt; the crash at about 11 PM CST yesterday. Thus taking the backup from 4/15/25&lt;br /&gt;
&amp;gt;&amp;gt; should solve this? Ie, forget about trying to fix on Hetzner 2, go straight&lt;br /&gt;
&amp;gt;&amp;gt; to migration to Hetzner 3. Is that consistent with a possible shift in your&lt;br /&gt;
&amp;gt;&amp;gt; plans, or does that throw off the entire process of migration? OSE stands&lt;br /&gt;
&amp;gt;&amp;gt; stuck without it, I will have to do everything in Google docs if I don&#039;t&lt;br /&gt;
&amp;gt;&amp;gt; have wiki access, and i am justvputtingvout the announcent and recruiting.&lt;br /&gt;
&amp;gt;&amp;gt; I can switcj ro more publishing on the website, assuming that all works.&lt;br /&gt;
&amp;gt;&amp;gt; Please tell me what would be your proposed solution and how quickly you&lt;br /&gt;
&amp;gt;&amp;gt; think we can get back up to a functioning wiki, based on your schedule of&lt;br /&gt;
&amp;gt;&amp;gt; availability to work on this, so I can plan accordingly.  This is a much&lt;br /&gt;
&amp;gt;&amp;gt; higher priority than doing any of the main website migration.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thanks,&lt;br /&gt;
&amp;gt;&amp;gt; Marcin &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so back to trying to figure out the corruption of the mariadb&lt;br /&gt;
# looks like the attempt to start it in recovery mode 2 fails after 10 minutes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because a fatal signal was delivered to the control process. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    10m0.435s&lt;br /&gt;
user    0m0.011s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and the tail of the db log&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/mariadb/mariadb.log&lt;br /&gt;
250417 23:06:00  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:01  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:02  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:03  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:05  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:06  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:07  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:08  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:09  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so we have one more recovery mode we can try before it becomes destructive = 3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 3&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and gave it a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# damn, looks like it&#039;s stuck on the same thing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:33:17 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:33:17 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 20076 ...&lt;br /&gt;
250418 19:33:17 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:33:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:33:17 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:33:17 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:33:17 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:33:17 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:33:17 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:33:17  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:33:17  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:33:18  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:19  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:20  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet suggests this infinite loop is caused by the default of innodb_purge_threads=1, and it says we should set this to 0&lt;br /&gt;
## https://serverfault.com/questions/851342/mysql-crashed-and-not-starting-even-after-adding-innodb-force-recovery&lt;br /&gt;
## https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# I tried to cut off the systemctl restart early, but it&#039;s just stuck. I guess I just have to wait 10 minutes.&lt;br /&gt;
# anyway, I set the recovery back down to 2 and added the purge threads to 0 line; I&#039;ll try that when it&#039;s not blocked&lt;br /&gt;
# meanwhile, I read up on innodb_purge_threads, which is documented here https://dev.mysql.com/doc/refman/8.4/en/innodb-purge-configuration.html&lt;br /&gt;
# oh shit, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m2.102s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
[root@opensourceecology etc]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 19:44:30 UTC; 19s ago&lt;br /&gt;
  Process: 22469 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 22433 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 22468 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─22468 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─22693 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-...&lt;br /&gt;
&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 19:44:30 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the logs are being spammed with these last 5 lines a bunch; I guess something is still trying to access the db?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:44:28 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 22693 ...&lt;br /&gt;
250418 19:44:28 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:44:28 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:44:28 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:44:28 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:44:28 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:44:28 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:44:28 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:44:28  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:44:28  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:44:28  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:44:29 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883505166&lt;br /&gt;
250418 19:44:29 InnoDB: !!! innodb_force_recovery is set to 2 !!!&lt;br /&gt;
250418 19:44:29 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 19:44:29 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 19:44:29 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, the spam stopped. maybe just some startup thing.&lt;br /&gt;
# I was hoping at startup it would tell us which DBs/tables/pages were corrupt; I guess we have to initiate a scan or something.&lt;br /&gt;
# this guide doesn&#039;t say anything about that https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
# but this one recommends running `mysqlcheck` https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# this took about a minute to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# good news; looks like the wiki isn&#039;t fucked. it&#039;s just osemain, oswh, and cacti. restoring those from backups is probably not going to cause any data loss&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@opensourceecology dbFail.20250417]# head mysqlcheck.20250418.log &lt;br /&gt;
3dp_db.wp_commentmeta                              OK&lt;br /&gt;
3dp_db.wp_comments                                 OK&lt;br /&gt;
3dp_db.wp_links                                    OK&lt;br /&gt;
3dp_db.wp_masterslider_options                     OK&lt;br /&gt;
3dp_db.wp_masterslider_sliders                     OK&lt;br /&gt;
3dp_db.wp_options                                  OK&lt;br /&gt;
3dp_db.wp_postmeta                                 OK&lt;br /&gt;
3dp_db.wp_posts                                    OK&lt;br /&gt;
3dp_db.wp_revslider_css                            OK&lt;br /&gt;
3dp_db.wp_revslider_layer_animations               OK&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418.log &lt;br /&gt;
cacti_db.automation_ips&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.automation_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_cache&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_last&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output_boost_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
osemain_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
osemain_s_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
oswh_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s go ahead and take a mysqldump now, including the corrupt data. then I&#039;ll drop these three databases and restore from backups&lt;br /&gt;
## cacti_db&lt;br /&gt;
## osemain_db&lt;br /&gt;
## oswh_db&lt;br /&gt;
# I sent Marcin a status update email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I was able to start your database in recovery mode, and I see the following databases have corrupt tables:&lt;br /&gt;
&lt;br /&gt;
1. osemain&lt;br /&gt;
2. cacti&lt;br /&gt;
3. oswh&lt;br /&gt;
&lt;br /&gt;
Good news that the wiki isn&#039;t in that list. And that those particular corrupt DBs don&#039;t change much, so recovering just those databases from backups should result in an acceptable data loss, if any.&lt;br /&gt;
&lt;br /&gt;
I&#039;ll keep you updated.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I made the post-corruption mysqldump backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass --all-databases | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    2m48.845s&lt;br /&gt;
user    3m19.170s&lt;br /&gt;
sys     0m2.023s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# ls mysqldump*&lt;br /&gt;
mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz &lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s drop those three databases.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 14&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE cacti_db;&lt;br /&gt;
Query OK, 108 rows affected (0.38 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_db;&lt;br /&gt;
Query OK, 22 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oswh_db;&lt;br /&gt;
Query OK, 12 rows affected (0.03 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| fef_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
15 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# recovery mode isn&#039;t going to let us INSERT to recover data from backups, so let&#039;s take it out of recovery mode and see if the db will start&lt;br /&gt;
# nah, it failed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m2.805s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# logs are the same, I think?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:10:04 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:10:04 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 24305 ...&lt;br /&gt;
250418 20:10:04 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:10:04 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:10:04 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:10:04 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:10:04 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:10:04 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:10:04 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:10:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:10:04  InnoDB: Assertion failure in thread 140076605044480 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:10:04 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x560180c61cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x560180875975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f664031f630]&lt;br /&gt;
:0(__GI_raise)[0x7f663ea46387]&lt;br /&gt;
:0(__GI_abort)[0x7f663ea47a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x560180a0a45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x560180a0afa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x560180b0d504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x560180b02487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x560180a0d17d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x560180a010f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6640317ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f663eb0eb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:10:04 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I re-enabled recovery mode, but this time just as 1. This time it did start, but this loop gets spammed to the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:11:42 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883708456&lt;br /&gt;
250418 20:11:42 InnoDB: !!! innodb_force_recovery is set to 1 !!!&lt;br /&gt;
250418 20:11:42 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:11:42 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
250418 20:11:42  InnoDB: Assertion failure in thread 140282494781184 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:11:42 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed, &lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to &lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x55e2d6dbbcad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x55e2d69cf975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f962fbdc630]&lt;br /&gt;
:0(__GI_raise)[0x7f962e303387]&lt;br /&gt;
:0(__GI_abort)[0x7f962e304a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x55e2d6b6445f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x55e2d6b64fa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x55e2d6c67504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x55e2d6c5c487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x55e2d6b6717d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62e83c)[0x55e2d6b5a83c]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f962fbd4ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f962e3cbb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:11:42 mysqld_safe Number of processes running now: 0&lt;br /&gt;
250418 20:11:42 mysqld_safe mysqld restarted&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 27371 ...&lt;br /&gt;
250418 20:11:42 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:11:42 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:11:42 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:11:42 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:11:42 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:11:42 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:11:42 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:11:42  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, even though it *says* it&#039;s started&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m5.156s&lt;br /&gt;
user    0m0.008s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 20:11:07 UTC; 13s ago&lt;br /&gt;
  Process: 24459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 24423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 24458 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─24458 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─25620 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/v...&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:11:07 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
&lt;br /&gt;
real    0m0.012s&lt;br /&gt;
user    0m0.001s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can&#039;t connect to it with mysqlcheck&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log                              &lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I set it back to recovery mode 2, restarted, and tried the mysqlcheck again&lt;br /&gt;
# huh, all lines say OK&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418&lt;br /&gt;
mysqlcheck.20250418_201348.log  mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log&lt;br /&gt;
&lt;br /&gt;
real    0m11.597s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.009s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418_201559.log &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well now I&#039;m wondering if I should have run CHECK TABLE and REPAIR TABLE rather than just DROP them https://dev.mysql.com/doc/refman/8.4/en/myisam-table-close.html&lt;br /&gt;
# I&#039;m going to restore from the backup and then see if I can do that&lt;br /&gt;
# oh, right, we can&#039;t INSERT in recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1030 (HY000) at line 91: Got error -1 from storage engine&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, fuck, now I don&#039;t know why it won&#039;t start. And it doesn&#039;t tell me why. The good news is that I was able to get a db dump. maybe I can copy this huge dump over to some other server for repair and then copy it back?&lt;br /&gt;
# we should have backups. I&#039;m going to just purge all the non-system databases and see if we can get this thing started at all&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db d3ddb;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;d3ddb&#039; at line 1&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE d3d_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE fef_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE microfactory_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_db;&lt;br /&gt;
Query OK, 21 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_stabing_db;&lt;br /&gt;
ERROR 1008 (HY000): Can&#039;t drop database &#039;obi_stabing_db&#039;; database doesn&#039;t exist&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oseforum_db;&lt;br /&gt;
Query OK, 35 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_s_db;&lt;br /&gt;
Query OK, 20 rows affected (0.04 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osewiki_db;&lt;br /&gt;
Query OK, 59 rows affected (0.31 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE phplist_db;&lt;br /&gt;
Query OK, 42 rows affected (0.16 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE seedhome_db;&lt;br /&gt;
Query OK, 12 rows affected (0.05 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE store_db;&lt;br /&gt;
Query OK, 36 rows affected (0.11 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_staging_db;&lt;br /&gt;
Query OK, 21 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# even after that, it still won&#039;t start :&#039;(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m4.863s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Fri 2025-04-18 20:34:47 UTC; 14s ago&lt;br /&gt;
  Process: 18459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 18458 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 18423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 18458 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I purge those three system-level DBs, I want to confirm they&#039;re in our backups&lt;br /&gt;
# as I feared, it looks like they&#039;re missing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zgrep -E &#039;CREATE DATABASE&#039; mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | grep &#039;IF NOT EXISTS&#039; | grep -E &#039;^.{,100}$&#039;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `3dp_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `cacti_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `d3d_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `fef_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `microfactory_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `mysql` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_staging_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oseforum_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_s_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osewiki_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oswh_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `phplist_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `seedhome_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `store_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# according to this, information_schema is essentially a cache that gets created &amp;amp; destroyed every time mysql is restarted, so we should be ok to loose that https://stackoverflow.com/questions/15306132/information-schema-error-when-restoring-database-dump&lt;br /&gt;
# I&#039;m just going to manually dump these three anyway. Or try to&lt;br /&gt;
# well, I was able to get one of the three to backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass information_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_information_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz &lt;br /&gt;
mysqldump: Got error: 1044: &amp;quot;Access denied for user &#039;root&#039;@&#039;localhost&#039; to database &#039;information_schema&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass mysql | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_mysql.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    0m0.142s&lt;br /&gt;
user    0m0.155s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass performance_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_performance_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
mysqldump: Got error: 1142: &amp;quot;SELECT,LOCK TABL command denied to user &#039;root&#039;@&#039;localhost&#039; for table &#039;cond_instances&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.009s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# mysql looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode*&lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_information_schema.20250418_205054.sql.gz&lt;br /&gt;
716K    mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_performance_schema.20250418_205157.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to move this whole db dir out of the way and see if we can start it fresh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /var/lib&lt;br /&gt;
[root@opensourceecology lib]# du -sh mysql/&lt;br /&gt;
6.5G    mysql/&lt;br /&gt;
[root@opensourceecology lib]# ls -lah | grep -i mysql&lt;br /&gt;
drwxr-xr-x   4 mysql   mysql   4.0K Apr 18 20:50 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mv mysql mysql.20250418&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mkdir mysql&lt;br /&gt;
[root@opensourceecology lib]# chown mysql:mysql mysql&lt;br /&gt;
[root@opensourceecology lib]# chmod 0755 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 8.0K&lt;br /&gt;
drwxr-xr-x   2 mysql mysql 4.0K Apr 18 20:55 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, it&#039;s started outside recovery mode now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m3.550s&lt;br /&gt;
user    0m0.007s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
250418 20:55:06 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
250418 20:56:23 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:56:23 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 21252 ...&lt;br /&gt;
250418 20:56:23 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:56:23 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:56:23 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:56:23 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:56:23 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:56:23 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
InnoDB: The first specified data file ./ibdata1 did not exist:&lt;br /&gt;
InnoDB: a new database to be created!&lt;br /&gt;
250418 20:56:23  InnoDB: Setting file ./ibdata1 size to 10 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile0 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile0 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile1 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile1 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
InnoDB: Doublewrite buffer not found: creating new&lt;br /&gt;
InnoDB: Doublewrite buffer created&lt;br /&gt;
InnoDB: 127 rollback segment(s) active.&lt;br /&gt;
InnoDB: Creating foreign key constraint system tables&lt;br /&gt;
InnoDB: Foreign key constraint system tables created&lt;br /&gt;
250418 20:56:23  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:56:24 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 0&lt;br /&gt;
250418 20:56:24 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:56:24 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:56:24 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it created all these files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 29M&lt;br /&gt;
drwxr-xr-x   5 mysql mysql 4.0K Apr 18 20:56 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
-rw-rw----   1 mysql mysql  16K Apr 18 20:56 aria_log.00000001&lt;br /&gt;
-rw-rw----   1 mysql mysql   52 Apr 18 20:56 aria_log_control&lt;br /&gt;
-rw-rw----   1 mysql mysql  18M Apr 18 20:56 ibdata1&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile0&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile1&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 mysql&lt;br /&gt;
srwxrwxrwx   1 mysql mysql    0 Apr 18 20:56 mysql.sock&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 performance_schema&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 test&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that also would have killed the mysql password; I can&#039;t login&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1045 (28000): Access denied for user &#039;root&#039;@&#039;localhost&#039; (using password: YES)&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I hacked my way in and set the root password&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysqld_safe --skip-grant-tables --skip-networking &amp;amp;&lt;br /&gt;
mysql -u root&lt;br /&gt;
use mysql;&lt;br /&gt;
update user set password=PASSWORD(&amp;quot;new-password&amp;quot;) where User=&#039;root&#039;;&lt;br /&gt;
flush privileges;&lt;br /&gt;
exit&lt;br /&gt;
jobs -l&lt;br /&gt;
# kill mysqld_safe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now I can see our three databases, plus one named test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# usually this is where I&#039;d run the mysql hardening script, but let&#039;s just drop test manually and restore from backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE test;&lt;br /&gt;
Query OK, 0 rows affected (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first let&#039;s just restore the &#039;mysql&#039; database&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that appears to have worked; our users are present now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [mysql]&amp;gt; select User from user limit 10;&lt;br /&gt;
+------------------+&lt;br /&gt;
| User             |&lt;br /&gt;
+------------------+&lt;br /&gt;
| oseforum_user    |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| 3dp_user         |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| d3d_user         |&lt;br /&gt;
| fef_user         |&lt;br /&gt;
| microfactory_usr |&lt;br /&gt;
| munin_user       |&lt;br /&gt;
| obi2_user        |&lt;br /&gt;
| obi3_user        |&lt;br /&gt;
+------------------+&lt;br /&gt;
10 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [mysql]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, and ensured it&#039;s still working. Great.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 2&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s restore the rest – including even our corrupt databases – and see if it works or breaks&lt;br /&gt;
# that took about 11.5 minutes to import ~6.8G of data&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
&lt;br /&gt;
real    11m36.530s&lt;br /&gt;
user    1m52.944s&lt;br /&gt;
sys     0m3.593s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh /var/lib/mysql&lt;br /&gt;
6.8G    /var/lib/mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m still able to connect, and now I see all our DBs – including the ones it said were corrupt&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 6&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# woah, I gave it a restart, and it came back fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 3&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess we fixed it with no data loss?&lt;br /&gt;
# let&#039;s bring up the web servers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology lib]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology lib]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the wiki loads now&lt;br /&gt;
# so does osemain&lt;br /&gt;
# I&#039;d say we&#039;re back in business&lt;br /&gt;
# I sent an email to Marcin&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I think all your sites are back now.&lt;br /&gt;
&lt;br /&gt;
I was able to restore all of your databases from a dump of the database in recovery mode. So nothing needed to be restored from backups.&lt;br /&gt;
&lt;br /&gt;
Please let me know if you see any issues. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now that Marcin has ssh access on the server again, I wonder if he has permission to execute `restart` – that would be better for him than logging into the hetzner wui and doing hard resets, which likely caused this corruption&lt;br /&gt;
# at the risk of taking everything down after I just told Marcin that everything is up, I&#039;m going to try it&lt;br /&gt;
# looks like it won&#039;t let him reboot if other users are logged-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ reboot&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
Please retry operation after closing inhibitors and logging out other users.&lt;br /&gt;
Alternatively, ignore inhibitors and users with &#039;systemctl reboot -i&#039;.&lt;br /&gt;
[marcin@opensourceecology ~]$ systemctl reboot -i&lt;br /&gt;
==== AUTHENTICATING FOR org.freedesktop.login1.reboot-multiple-sessions ===&lt;br /&gt;
Authentication is required for rebooting the system while other users are logged in.&lt;br /&gt;
Multiple identities can be used for authentication:&lt;br /&gt;
 1.  maltfield&lt;br /&gt;
 2.  crupp&lt;br /&gt;
 3.  Tom Griffing (tgriffing)&lt;br /&gt;
 4.  jthomas&lt;br /&gt;
Choose identity to authenticate as (1-4):&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the sudoers command to give marcin *just* access to the reboot command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# visudo&lt;br /&gt;
[root@opensourceecology lib]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology lib]# tail /etc/sudoers&lt;br /&gt;
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom&lt;br /&gt;
&lt;br /&gt;
## Allows members of the users group to shutdown this system&lt;br /&gt;
# %users  localhost=/sbin/shutdown -h now&lt;br /&gt;
&lt;br /&gt;
## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)&lt;br /&gt;
#includedir /etc/sudoers.d&lt;br /&gt;
&lt;br /&gt;
# let marcin reboot the machine gracefully&lt;br /&gt;
marcin ALL = NOPASSWD: /sbin/reboot&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I couldn&#039;t test this on the server without changing marcin&#039;s password, so I spun-up a quick DispVM to ensure it *only* gives him access to reboot&lt;br /&gt;
# it&#039;s debian, but sudoers syntax should (hopefully) be the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@debian-12-dvm:~$ sudo su -&lt;br /&gt;
root@debian-12-dvm:~# adduser marcin --disabled-password --gecos &#039;&#039;&lt;br /&gt;
Adding user `marcin&#039; ...&lt;br /&gt;
Adding new group `marcin&#039; (1001) ...&lt;br /&gt;
Adding new user `marcin&#039; (1001) with group `marcin (1001)&#039; ...&lt;br /&gt;
Creating home directory `/home/marcin&#039; ...&lt;br /&gt;
Copying files from `/etc/skel&#039; ...&lt;br /&gt;
Adding new user `marcin&#039; to supplemental / extra groups `users&#039; ...&lt;br /&gt;
Adding user `marcin&#039; to group `users&#039; ...&lt;br /&gt;
root@debian-12-dvm:~# &lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# visudo&lt;br /&gt;
root@debian-12-dvm:~#&lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# passwd marcin&lt;br /&gt;
New password: &lt;br /&gt;
Retype new password: &lt;br /&gt;
passwd: password updated successfully&lt;br /&gt;
root@debian-12-dvm:~# sudo su - marcin&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo su -&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/su -&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo echo hi&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/echo hi&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$ &lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ reboot&lt;br /&gt;
-bash: reboot: command not found&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, that worked. Perfect.&lt;br /&gt;
# I tested it on hetzner2; it worked too.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ sudo reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I sent Marcin a reply ask him to test reboots via ssh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry the server just went down; that was me testing to make sure your &#039;marcin&#039; user now has permission to do a proper &amp;amp; safer `sudo reboot` of hetzner2. It does.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that&lt;br /&gt;
&amp;gt; I should plan on potential breakage at any time?&lt;br /&gt;
&lt;br /&gt;
Great question. There&#039;s a couple things I&#039;d like to implement to prevent this from happening again:&lt;br /&gt;
&lt;br /&gt;
1. Replace both of your disks on hetzner2&lt;br /&gt;
&lt;br /&gt;
2. Give you reboot permission on hetzner2&lt;br /&gt;
&lt;br /&gt;
My best-guess is that the corruption happened because you abruptly shutdown the server. As you know, that&#039;s generally not a good idea as it can cause data loss.&lt;br /&gt;
&lt;br /&gt;
But filesystems use journals and databases use pages. They *should* be able to recover from abrupt shutdowns. They wouldn&#039;t be very useful if they were so frail as to not be able to recover from something like that...&lt;br /&gt;
&lt;br /&gt;
But in this case, I think it was a &amp;quot;perfect storm&amp;quot; that you caused corruption and it wasn&#039;t able to recover from it due to a bug in mariadb. And, because your OS is EOL, we can&#039;t update to a newer version of mariadb that *is* able to recover from such a unlucky combination of events.&lt;br /&gt;
&lt;br /&gt;
So, in the meantime, instead of you logging into hetzner&#039;s WUI to trigger reboots, I&#039;d prefer if you would ssh into the hetzner2 server and execute&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
Please test this on your computer now to make sure you&#039;re setup for it. To ssh into hetzner2, execute this command on your computer:&lt;br /&gt;
&lt;br /&gt;
  ssh -p 32415 marcin@opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
And then at the prompt, execute this command (make sure you type this *after* you&#039;ve logged into hetzner, or you&#039;ll end-up rebooting your own laptop!)&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
The second thing I&#039;d like to do is replace both of your disks on hetzner2. I don&#039;t think they caused corruption in this case, but I did discover that they&#039;re both screaming that they&#039;re going to die soon and asking to be replaced, so I would be a fool not to heed that warning.&lt;br /&gt;
&lt;br /&gt;
Hetzner shouldn&#039;t charge us to replace a failing disk, but I&#039;ll schedule some downtime for remote hetzner hands to shutdown the machine, then I&#039;ll need to format the new drive, add it to the RAID (the mirror of two redundant disks), and update your grub boot partition.&lt;br /&gt;
&lt;br /&gt;
There&#039;s some risk in doing this, because you&#039;ll be running on one non-redundant disk (a disk which is screaming at us saying it&#039;s going to die within 24 hours) while the RAID is re-building. But, of course, there&#039;s risk in not doing it..&lt;br /&gt;
&lt;br /&gt;
Please confirm that you can now reboot hetzner2 via ssh.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 16:39, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Thats excellent, thabk you, looks good. Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that I should plan&lt;br /&gt;
&amp;gt; on potential breakage at any time? Regarding the full migration, how many&lt;br /&gt;
&amp;gt; more hours/days of provisioning do tou still expwct to need? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created an article for the CHG to replace the first disk on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
## I wonder if I can figure out which one grub uses and replace that one second..&lt;br /&gt;
# from my log yesterday, here&#039;s our two drive&#039;s serial numbers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fuck; looks like neither is referenced in /boot/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA4520&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA336C&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the steps to setup grub are actually quite simple, according to the hetzner docs https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## it says if we&#039;re doing it on the booted system, then we just need to run `grub-install /dev/sdX`&lt;br /&gt;
# it has additional instructions for grub1. And, uh, looks like we have grub1, grub2, *and* an efi dir in /boot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64                            initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img  System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64                      initramfs-3.10.0-327.18.2.el7.x86_64.img         System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64                        initramfs-3.10.0-514.26.2.el7.x86_64.img         System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64                        initramfs-3.10.0-693.2.2.el7.x86_64.img          System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64                         initramfs-3.10.0-693.2.2.el7.x86_64kdump.img     System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi                                                      initrd-plymouth.img                              vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
grub                                                     lost+found                                       vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
grub2                                                    symvers-3.10.0-1127.el7.x86_64.gz                vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img  symvers-3.10.0-1160.119.1.el7.x86_64.gz          vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img                     symvers-3.10.0-327.18.2.el7.x86_64.gz            vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img                symvers-3.10.0-514.26.2.el7.x86_64.gz            vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img               symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m thinking we should actually just tell hetzner to do a hot swap while the system is on, so we can do this &amp;quot;easy install&amp;quot; of grub without risking the system not coming-up after they removed the drive&lt;br /&gt;
# oh, the efi dir is empty, so I&#039;m thinking we&#039;re using grub2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# find efi&lt;br /&gt;
efi&lt;br /&gt;
efi/EFI&lt;br /&gt;
efi/EFI/centos&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, the grub dir just has one file in it?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub&lt;br /&gt;
total 10K&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Apr 11  2016 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
-rw-r--r--  1 root root 1.4K Nov 15  2011 splash.xpm.gz&lt;br /&gt;
[root@opensourceecology boot]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# grub2 looks most sane&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub2&lt;br /&gt;
total 52K&lt;br /&gt;
drwx------. 5 root root 1.0K Jul 26  2024 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Dec 15  2015 fonts&lt;br /&gt;
-rw-r--r--  1 root root 7.8K Jul 26  2024 grub.cfg&lt;br /&gt;
-rw-r--r--  1 root root 5.3K Jun  1  2016 grub.cfg.1499616907.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 6.1K Jul  9  2017 grub.cfg.1506097734.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 7.0K Sep 22  2017 grub.cfg.1588589453.rpmsave&lt;br /&gt;
-rw-r--r--. 1 root root 1.0K Jul 26  2024 grubenv&lt;br /&gt;
drwxr-xr-x. 2 root root 9.0K May 31  2016 i386-pc&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K May 31  2016 locale&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it looks like it&#039;s referencing the raid, not the drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### BEGIN /etc/grub.d/10_linux ###&lt;br /&gt;
menuentry &#039;CentOS Linux (3.10.0-1160.119.1.el7.x86_64) 7 (Core)&#039; --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option &#039;gnulinux-3.10.0-327.13.1.el7.x86_64-advanced-af18bd25-f715-4003-b055-170a07591c60&#039; {&lt;br /&gt;
		load_video&lt;br /&gt;
		set gfxpayload=keep&lt;br /&gt;
		insmod gzio&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod diskfilter&lt;br /&gt;
		insmod mdraid1x&lt;br /&gt;
		insmod ext2&lt;br /&gt;
		set root=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;&lt;br /&gt;
		if [ x$feature_platform_search_hint = xy ]; then&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root --hint=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;  9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		else&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root 9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		fi&lt;br /&gt;
		linux16 /vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
		initrd16 /initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# right, so if I understand this correctly: we&#039;re not updating grub. We&#039;re using &#039;grub-install&#039; to copy our grub config *to* the drive. that&#039;s easier and less concerning than I thought.&lt;br /&gt;
# well, since I can&#039;t see any good reason to pick one drive or the other to replace first, I&#039;m going to have them replace /dev/sdb first. Just because &#039;sda&#039; seems like it would be primary. I know it&#039;s probably not, but, anyway..&lt;br /&gt;
# that means we&#039;ll replace Crucial_CT250MX200SSD1_154410FA4520 first; I created another wiki entry for that https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
# Marcin sent me an email confirming that he&#039;s able to restart hetzner2 with `sudo reboot`. I asked him to use this in the future if he needs to reboot it again.&lt;br /&gt;
# the disk is getting pretty full, but I&#039;m going to leave these files in /var/tmp/ for at least a few days, to make sure we don&#039;t actually need to restore from a backup again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  150G   38G  80% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/lib/mysql.20250418 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 17, 2025=&lt;br /&gt;
# Marcin sent me an email last night (and again this morning) asking why the wiki is down&lt;br /&gt;
# I hadn&#039;t touched ose infra since 6 days ago&lt;br /&gt;
# the wiki is still on hetzner2, which is on EOL Cent, so I&#039;m not terribly surprised it&#039;s falling apart.&lt;br /&gt;
# I first warned Marcin about this many years ago, and hopefully the migration to hetzner3 will be finished before the end of this year&lt;br /&gt;
# anyway, let&#039;s check what happened to the wiki on hetzner2&lt;br /&gt;
# it&#039;s a 500 error complaining about the db&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ curl -iL wiki.opensourceecology.org&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:52 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://wiki.opensourceecology.org/&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 500 Internal Server Error&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:54 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 976&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Varnish: 434054&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish-v4&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h1&amp;gt;Sorry! This site is experiencing technical difficulties.&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Try waiting a few minutes and reloading.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;small&amp;gt;(Cannot access the database)&amp;lt;/small&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;hr /&amp;gt;&amp;lt;div style=&amp;quot;margin: 1.5em&amp;quot;&amp;gt;You can try searching via Google in the meantime.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;small&amp;gt;Note that their indexes of our content may be out of date.&amp;lt;/small&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;form method=&amp;quot;get&amp;quot; action=&amp;quot;//www.google.com/search&amp;quot; id=&amp;quot;googlesearch&amp;quot;&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;domains&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;num&amp;quot; value=&amp;quot;50&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;ie&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;oe&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;q&amp;quot; size=&amp;quot;31&amp;quot; maxlength=&amp;quot;255&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;submit&amp;quot; name=&amp;quot;btnG&amp;quot; value=&amp;quot;Search&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;p&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; checked=&amp;quot;checked&amp;quot; /&amp;gt;Open Source Ecology&amp;lt;/label&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;WWW&amp;lt;/label&amp;gt;&lt;br /&gt;
	&amp;lt;/p&amp;gt;&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# disk is fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G   96G   92G  52% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s no new logs in the apache error log when I hit the site in real-time (bypassing the cache)&lt;br /&gt;
# there&#039;s also no new logs in the mariadb error log when I hit the site in real-time&lt;br /&gt;
# well, the db isn&#039;t running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Thu 2025-04-17 17:39:24 UTC; 2h 42min ago&lt;br /&gt;
  Process: 1227 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 1226 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 1103 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 1226 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# error logs aren&#039;t very helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# journalctl -fu mariadb&lt;br /&gt;
-- Logs begin at Thu 2025-04-17 17:38:59 UTC. --&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I try to restart it manually, nothing gets put in the journal logs, but there&#039;s a bunch to the actual log file that the journal log mentions (damn systemd)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the log that pops-up when we try a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 20:24:31 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 20:24:31 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 10583 ...&lt;br /&gt;
250417 20:24:31 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 20:24:31 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 20:24:31 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 20:24:31 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 20:24:31 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 20:24:31 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 20:24:31 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 20:24:31  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 20:24:31  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 20:24:31  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 20:24:31  InnoDB: Assertion failure in thread 140093400303360 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250417 20:24:31 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x563a1c105cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x563a1bd19975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f6a294c9630]&lt;br /&gt;
:0(__GI_raise)[0x7f6a27bf0387]&lt;br /&gt;
:0(__GI_abort)[0x7f6a27bf1a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x563a1beae45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638f69)[0x563a1beaef69]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x563a1bfb1504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x563a1bfa6487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x563a1beb117d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x563a1bea50f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6a294c1ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f6a27cb8b0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250417 20:24:31 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# google points to this https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
## they say it could be a bug that might be fixed in v5.7. We&#039;re using 5.5.68. hetzner3 uses 5.8.&lt;br /&gt;
# reddit says we&#039;re fucked and should restore from backup https://old.reddit.com/r/mysql/comments/d3nkc7/innodb_assertion_failure_in_thread_4560_in_file/&lt;br /&gt;
# before reading any more, I&#039;m going to immediately make a local copy of our most-recent backups&lt;br /&gt;
# looks like we have a backup from 13 hours ago and one from 27 hours ago&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ date&lt;br /&gt;
Thu Apr 17 20:36:56 UTC 2025&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync&lt;br /&gt;
total 21G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 17 07:49 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    21G Apr 17 07:48 daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync.old/&lt;br /&gt;
total 22G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 16 07:52 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    22G Apr 16 07:52 daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this SE answer is helpful https://serverfault.com/questions/592793/mysql-crashed-and-wont-start-up&lt;br /&gt;
## it says we can force the db to start (in &amp;quot;recovery mode&amp;quot;) and then try to figure out which table is corrupted. Then we might be able to backup more-recent data from the not-corrupt tables and only recover the fucked table&lt;br /&gt;
## other warnings suggest solving the underlying issue: why did the data become corrupt?&lt;br /&gt;
## well, we know Marcin has been hard-resetting the server (via the hetzner wui) about every week because it keeps breaking since some months ago (it&#039;s EOL and not worth debugging)&lt;br /&gt;
## but it&#039;s also possible we have a worse issue, like a disk failing. We do have RAID1 tho, so idk. Still, it would be wise to check the SMART data and RAID logs and filesystem for corruption&lt;br /&gt;
# I sent a quick status update to Marcin so he knows the severity of the issue and that this isn&#039;t going to be fixed soon&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Your database is corrupt and won&#039;t start.&lt;br /&gt;
&lt;br /&gt;
Quick internet search for the error messages suggests this could be a bug that&#039;s been fixed in mariadb 5.7. You&#039;re using 5.6 and can&#039;t upgrade because your OS is EOL. hetnzer3 is running 5.8.&lt;br /&gt;
&lt;br /&gt;
 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
I&#039;m looking into seeing what is corrupt, what isn&#039;t corrupt, and if we can restore from backup.&lt;br /&gt;
&lt;br /&gt;
This is not going to be an easy or fast fix, sorry. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the backups of the backups finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /home/b2user/sync*/* /var/tmp/&lt;br /&gt;
sending incremental file list&lt;br /&gt;
daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
 22,975,631,986 100%  139.63MB/s    0:02:36 (xfr#1, to-chk=1/2)&lt;br /&gt;
daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
 21,566,407,634 100%  103.43MB/s    0:03:18 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 44,552,914,338 bytes  received 54 bytes  125,324,653.70 bytes/sec&lt;br /&gt;
total size is 44,542,039,620  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  138G   50G  74% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also going to take down the webservers, so that they can&#039;t fuck-up the database worse, if we do start it in some recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I should also make a backup of /var/lib/mysql&lt;br /&gt;
# I&#039;m going to create a dif for all of this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mkdir /var/tmp/dbFail.20250417&lt;br /&gt;
[root@opensourceecology ~]# chown root:root /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041* /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# vim /var/tmp/dbFail.20250417/info.txt&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /var/tmp/dbFail.20250417/info.txt &lt;br /&gt;
2025-04-17: Marcin emailed me last night saying the wiki was down with a db error. Today I tried to start it, but it refues to come-up. Looks like it&#039;s preventing itself from starting because it realizes something is corrupt and starting it would make things worse. Internet says maybe this was fixed in a newer version; we can&#039;t upgrade because Cent is EOL. Hetzner3 has the newer version&lt;br /&gt;
&lt;br /&gt;
		 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
		Anyway, I&#039;m creating this folder to store some backups before we make things worse.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# aaaand I added a copy of /var/lib/mysql/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /var/lib/mysql /var/tmp/dbFail.20250417/var-lib-mysql.$(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
sending incremental file list&lt;br /&gt;
created directory /var/tmp/dbFail.20250417/var-lib-mysql.20250417&lt;br /&gt;
mysql/&lt;br /&gt;
mysql/aria_log.00000001&lt;br /&gt;
		 16,384 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=707/709)&lt;br /&gt;
...&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rate_locations.frm&lt;br /&gt;
		  8,714 100%    9.26kB/s    0:00:00 (xfr#689, to-chk=1/709)&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rates.frm&lt;br /&gt;
		 13,128 100%   13.95kB/s    0:00:00 (xfr#690, to-chk=0/709)&lt;br /&gt;
&lt;br /&gt;
sent 7,384,914,964 bytes  received 13,343 bytes  114,495,012.51 bytes/sec&lt;br /&gt;
total size is 7,383,062,830  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# another important note: apparently we can keep increasing the value of innodb_force_recovery until it starts, but anything &amp;gt;3 could corrupt the data worse https://dba.stackexchange.com/q/241714&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Marko, MariaDB Innodb lead: MDEV-15370 was a bug when ugprading to 10.3, caused by MDEV-12288. Actually upgrades can still fail (MDEV-15912) if a slow shutdown of the old server was not made. Because the scenario does not involve upgrading to 10.3 or later, I am afraid that the user witnessed some kind of undo log corruption. Starting up with innodb_force_recovery=3 might allow dumping all data. If that crashes, then try innodb_force_recovery=5, but be aware that anything &amp;gt;3 may corrupt the database further, and therefore you should not use the database for anything else than mysqldump&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, a lot of the links for how to fix this are now dead&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## https://forums.mysql.com/read.php?22,603093,604631#msg-604631&lt;br /&gt;
## https://support.plesk.com/hc/en-us/articles/12377798484375-Plesk-is-not-accessible-ERROR-Zend-Db-Adapter-Exception-SQLSTATE-HY000-2002-No-such-file-or-directory&lt;br /&gt;
# we&#039;re running 5.6, so it should be this https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html&lt;br /&gt;
## but note that redirects to 8.6 for some reason? https://dev.mysql.com/doc/refman/8.4/en/forcing-innodb-recovery.html&lt;br /&gt;
## ah, so does 1.1 – apparently anything it doesn&#039;t like just reidrects to the latest version https://dev.mysql.com/doc/refman/1.1/en/forcing-innodb-recovery.html&lt;br /&gt;
# this suggests that, if we&#039;re going to use innodb_force_recovery 4 or greater, we only do it on another machine. So basically take the data I just backed-up put it on a separate machine, and do the fucker *there* instead https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## it also says that dumps of 4 or greater could still render corrupt data, so they shouldn&#039;t be trusted, anyway&lt;br /&gt;
## good news: it says the db blocks all INSERT, UPDATE, and DELETE commands when any recovery mode is enabled&lt;br /&gt;
### but we *can* run DROP. so the idea is to dump everything in recovery mode and drop what is corrupt. then restart with the recovery value set to 0 and restore.&lt;br /&gt;
## it says that dumps from recover mode of 1 or 2 or 3 are safe, and only the page is corrupt&lt;br /&gt;
### here&#039;s the definition of a page https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A unit representing how much data InnoDB transfers at any one time between disk (the data files) and memory (the buffer pool). A page can contain one or more rows, depending on how much data is in each row. If a row does not fit entirely into a single page, InnoDB sets up additional pointer-style data structures so that the information about the row can be stored in one page.&lt;br /&gt;
&lt;br /&gt;
One way to fit more data in each page is to use compressed row format. For tables that use BLOBs or large text fields, compact row format allows those large columns to be stored separately from the rest of the row, reducing I/O overhead and memory usage for queries that do not reference those columns.&lt;br /&gt;
&lt;br /&gt;
When InnoDB reads or writes sets of pages as a batch to increase I/O throughput, it reads or writes an extent at a time.&lt;br /&gt;
&lt;br /&gt;
All the InnoDB disk data structures within a MySQL instance share the same page size.&lt;br /&gt;
&lt;br /&gt;
See Also buffer pool, compact row format, compressed row format, data files, extent, page size, row.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess that just means data that hasn&#039;t been written to disk yet. So I *think* it should be OK to trust data that only has corrupt pages?&lt;br /&gt;
# ok, I think I have enough to proceed – at least for recovery modes 1, 2, and 3.&lt;br /&gt;
# but first let&#039;s check SMART&lt;br /&gt;
# oh, fuck, my notes on this are on the wiki. Of course.&lt;br /&gt;
# arch wiki to the rescue https://wiki.archlinux.org/title/S.M.A.R.T.&lt;br /&gt;
# fail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
-bash: smartctl: command not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# luckily the yum servers for this EOL OS are still online, and I could install it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# yum install smartmontools&lt;br /&gt;
...&lt;br /&gt;
Total download size: 546 k&lt;br /&gt;
Installed size: 2.0 M&lt;br /&gt;
Is this ok [y/d/N]: y&lt;br /&gt;
Downloading packages:&lt;br /&gt;
smartmontools-7.0-2.el7.x86_64.rpm                                                                                                              | 546 kB  00:00:00     &lt;br /&gt;
Running transaction check&lt;br /&gt;
Running transaction test&lt;br /&gt;
Transaction test succeeded&lt;br /&gt;
Running transaction&lt;br /&gt;
  Installing : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
  Verifying  : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
&lt;br /&gt;
Installed:&lt;br /&gt;
  smartmontools.x86_64 1:7.0-2.el7                                                                                                                                     &lt;br /&gt;
&lt;br /&gt;
Complete!&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# better&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
SMART support is: Available - device has SMART capability.&lt;br /&gt;
SMART support is: Enabled&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well this is terrifying; it says both our disks are gonna fail within 24 hours&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# compare that to hetnzer3, which says all is good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m not 100% convinced that this is true. I still want to initiate a test on the drives, but I&#039;m going to go ahead and pass this to hetzner support asap and ask them if there&#039;s a fee for them to replace our drives.&lt;br /&gt;
# oh, interesting. they have a walkthrough that says it&#039;s free via Server -&amp;gt; Technical -&amp;gt; Disk Failure https://robot.hetzner.com/support/index&lt;br /&gt;
## well, it lists two options&lt;br /&gt;
### Free Replacement drive nearly new or used and tested; depends on what is in stock. &lt;br /&gt;
### At cost Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
## we were given an option if we should hot swap while the system is on or shutdown. I&#039;m going to say shutdown. That&#039;ll be simpler from the OS side, I think&lt;br /&gt;
## dang, it says they&#039;ll swap the drive within 2-4 hours.&lt;br /&gt;
# I&#039;ve never done this before, but it&#039;s a hardware raid. My understanding is that as soon as it comes-up, it&#039;ll begin copying the data from one disk to the other disk. But, christ, if both disks are fucked then which disk should I choose them to replace? Can I see which one is more fucked than the other?&lt;br /&gt;
# hetzner provides 4 docs for assistance on this&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#information-on-defective-drives&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/maintainance/nvme/#show-serial-number-of-a-specific-nvme-ssd&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#creating-a-complete-smart-log&lt;br /&gt;
# that first doc says to run the command we just ran&lt;br /&gt;
# hmm..it says for more info we should look at the &amp;quot;Failed Attributes&amp;quot; – but we have none for either disk&lt;br /&gt;
# ok, the docs say we can get more info with -A&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so both say &amp;quot;Percent_Lifetime_Remain&amp;quot; is an issue. does that mean it&#039;s not *actually* writing corrupt data, but it&#039;s literally just a timer that hit and said &amp;quot;yeah you should probably replace the disk??&amp;quot;&lt;br /&gt;
# well, &amp;quot;Percent_Lifetime_Remain&amp;quot; doesn&#039;t appear in the docs table. nor in the source wikipedia table https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes&lt;br /&gt;
# yeah, reddit suggests that means the drive &amp;quot;should be replaced soon&amp;quot; but not that it&#039;s actually detected as failing now https://www.reddit.com/r/homelab/comments/kaaqma/percent_lifetime_remain_failing_now/&lt;br /&gt;
# in that case, I guess it doesn&#039;t matter which disk we replace. But let&#039;s go ahead and get one replaced. I don&#039;t think this was the cause of the db corruption (I still think it&#039;s &amp;quot;shutting down the computer abruptly + a bug in old mariadb that prevents it from recovering&amp;quot;), but I would be stupid not to take a free replacement of a RAID1-mirrored disk that&#039;s alerting us that it&#039;s too old to be in prod.&lt;br /&gt;
# the second hetnzer docs refer to nvme. that&#039;s relevant on hetzner3 but not hetzner2. anyway, I do want to know how to check this on hetzer2 (even if I can&#039;t update the wiki right now with this docs)&lt;br /&gt;
# wow, the output for smartctl looks very different for NVMEs on Debian than it does on CentOS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        39 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    6%&lt;br /&gt;
Data Units Read:                    152.358.379 [78,0 TB]&lt;br /&gt;
Data Units Written:                 52.125.092 [26,6 TB]&lt;br /&gt;
Host Read Commands:                 6.873.372.480&lt;br /&gt;
Host Write Commands:                1.362.559.127&lt;br /&gt;
Controller Busy Time:               22.226&lt;br /&gt;
Power Cycles:                       28&lt;br /&gt;
Power On Hours:                     17.245&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      159&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               39 Celsius&lt;br /&gt;
Temperature Sensor 2:               48 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        40 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    7%&lt;br /&gt;
Data Units Read:                    140.811.605 [72,0 TB]&lt;br /&gt;
Data Units Written:                 56.604.901 [28,9 TB]&lt;br /&gt;
Host Read Commands:                 1.304.073.899&lt;br /&gt;
Host Write Commands:                1.364.668.115&lt;br /&gt;
Controller Busy Time:               21.180&lt;br /&gt;
Power Cycles:                       23&lt;br /&gt;
Power On Hours:                     15.565&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      149&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               40 Celsius&lt;br /&gt;
Temperature Sensor 2:               45 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that shows we&#039;re at 6% and 7% usage on hetzner3, whereas I guess we&#039;re at 100% on hetzner2&lt;br /&gt;
# the third hetzner doc refers to a software raid. actually, I thought we were using a hardware raid, but now I&#039;m not sure&lt;br /&gt;
# this indicates that our raid is fine. two UUs (eg `[UU]`) is fine. Bad would be a U and a missing U (eg `[U_]`)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat &lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sdb2[1] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[1] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah crap, the process to bring the new drive back into the RAID is not-trivial https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## first we have to format the new drive exactly as the old drive, then add each partition into the RAID array, then update grub. And, of course, meanwhile we&#039;ll be running on one disk. So if we fuck-up any of those steps, we loose everything. This could take me a few days (or weeks), and meanwhile the sites are all offline and our daily backups on backblaze are being deleted/rotated out of existance. Sadly, I think I&#039;m going to postpone this until after we get the sites back-up.&lt;br /&gt;
# the last hetzner doc shows us how to get the serial number of our disks (which hetzner will ask-for when we tell them to swap it)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and ran a SMART test; it says it&#039;ll take just 2 minutes to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:07:55 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:08:18 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also kicked-off a long test, which I can check tomorrow&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:12 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:14 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, then we have the filesystem. it looks like /var/lib/msyql/ lives on &#039;/&#039; which is /dev/md2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h /var/lib/mysql&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
/dev/md2        197G  145G   43G  78% /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/md2&lt;br /&gt;
&lt;br /&gt;
Disk /dev/md2: 215.0 GB, 215024271360 bytes, 419969280 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk /dev/md2&lt;br /&gt;
NAME MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
md2    9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it won&#039;t let me check the filesystem while it&#039;s mounted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fsck /dev/md2&lt;br /&gt;
fsck from util-linux 2.23.2&lt;br /&gt;
e2fsck 1.42.9 (28-Dec-2013)&lt;br /&gt;
/dev/md2 is mounted.&lt;br /&gt;
e2fsck: Cannot continue, aborting.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it probably should be happening on-boot, but I couldn&#039;t find it in dmesg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i check&lt;br /&gt;
[    0.000000] Early table checksum verification disabled&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i fsck&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, instead we can just use tune2fs to get the info on the last check that was run&lt;br /&gt;
# looks like it ran today; probably when Marcin rebooted it https://unix.stackexchange.com/questions/400851/what-should-i-do-to-force-the-root-filesystem-check-and-optionally-a-fix-at-bo&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2&lt;br /&gt;
tune2fs 1.42.9 (28-Dec-2013)&lt;br /&gt;
Filesystem volume name:   &amp;lt;none&amp;gt;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Filesystem UUID:          af18bd25-f715-4003-b055-170a07591c60&lt;br /&gt;
Filesystem magic number:  0xEF53&lt;br /&gt;
Filesystem revision #:    1 (dynamic)&lt;br /&gt;
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize&lt;br /&gt;
Filesystem flags:         signed_directory_hash&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Filesystem OS type:       Linux&lt;br /&gt;
Inode count:              13131776&lt;br /&gt;
Block count:              52496160&lt;br /&gt;
Reserved block count:     2624808&lt;br /&gt;
Free blocks:              26575102&lt;br /&gt;
Free inodes:              12417672&lt;br /&gt;
First block:              0&lt;br /&gt;
Block size:               4096&lt;br /&gt;
Fragment size:            4096&lt;br /&gt;
Reserved GDT blocks:      1011&lt;br /&gt;
Blocks per group:         32768&lt;br /&gt;
Fragments per group:      32768&lt;br /&gt;
Inodes per group:         8192&lt;br /&gt;
Inode blocks per group:   512&lt;br /&gt;
Flex block group size:    16&lt;br /&gt;
Filesystem created:       Tue May 31 06:01:12 2016&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Last write time:          Thu Apr 17 17:39:00 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
Check interval:           0 (&amp;lt;none&amp;gt;)&lt;br /&gt;
Lifetime writes:          124 TB&lt;br /&gt;
Reserved blocks uid:      0 (user root)&lt;br /&gt;
Reserved blocks gid:      0 (group root)&lt;br /&gt;
First inode:              11&lt;br /&gt;
Inode size:               256&lt;br /&gt;
Required extra isize:     28&lt;br /&gt;
Desired extra isize:      28&lt;br /&gt;
Journal inode:            8&lt;br /&gt;
Default directory hash:   half_md4&lt;br /&gt;
Directory Hash Seed:      b9456d9f-1608-4444-99c2-02e6f327e42d&lt;br /&gt;
Journal backup:           inode blocks&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# both of the filesystems (/ and /boot) look fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md1 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /boot&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              46&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Tue May 31 06:01:07 2016&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, so far I couldn&#039;t find any signs of corruption on the disk/fs level&lt;br /&gt;
# back to the db, I set the recovery option in the my.cnf file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# cp my.cnf my.cnf.20250417&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 1&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it didn&#039;t come-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried changing it to restore level 2; this time it got stuck &amp;quot;waiting for the background threads&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 22:32:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 22:32:49 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 14901 ...&lt;br /&gt;
250417 22:32:49 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 22:32:49 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 22:32:49 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 22:32:49 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 22:32:49 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 22:32:49 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 22:32:49 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 22:32:49  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 22:32:49  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 22:32:49  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:50  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:51  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:52  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:53  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:54  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:55  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:56  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:57  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:58  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it seems infinite. I don&#039;t know if it&#039;s going to time-out, but I&#039;m just going to leave it and come-back tomorrow.&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 11, 2025=&lt;br /&gt;
&lt;br /&gt;
# let&#039;s get Catarina that broken staging site for osemain on hetzner3&lt;br /&gt;
# Marcin still hasn&#039;t regained access to his ssh key (so he can update the ose keepass), but he did finally send me the password to our hetzner account&lt;br /&gt;
# so now I can order a second IPv4 address, as needed for obi &amp;amp; osemain to have two distinct sites on hetzner3&lt;br /&gt;
# I logged-into hetzner https://robot.hetzner.com/server&lt;br /&gt;
# I also typed a &amp;quot;name&amp;quot; into the blank &amp;quot;name&amp;quot; fields for our two servers. one is now called &amp;quot;hetzner2&amp;quot; and the new one &amp;quot;hetzner3&amp;quot;&lt;br /&gt;
# I clicked on the server for &amp;quot;hetzner3&amp;quot; and the tab &amp;quot;IPs&amp;quot;.&lt;br /&gt;
## Then I clicked on &amp;quot;Order additional IPs / Nets&amp;quot;&lt;br /&gt;
## I selected &amp;quot;One additional IP with costs (€ 1.70 max. per month / € 0.0027 per hour + € 4.90 once-off setup)&amp;quot;&lt;br /&gt;
## it required me to enter a reason (IPv4 is scarce) to which I wrote:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
we need to run two websites with the same domain name that are already running on our primary IPv4 address, and a client doesn&#039;t have IPv6 working at their office&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## and I clicked &amp;quot;Apply for IP/subnet in obligation&amp;quot;&lt;br /&gt;
## I got a message; looks like it needs human approval&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your request for additional IPs/subnets was successfully sent. We will send you an email as soon as your IP/subnet is ready.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I typed an email to Marcin and Catarina to notify them of this order&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
As authorized on our last call, I ordered an additional IPv4 address for your hetzner account.&lt;br /&gt;
&lt;br /&gt;
IPv4 addresses are scarce, and it appears that they need to approve it manually.&lt;br /&gt;
&lt;br /&gt;
The cost is €1.70 per month + € 4.90 once-off setup.&lt;br /&gt;
&lt;br /&gt;
This will allow us to run more than one website with the same domain off the same server. That will be needed for osemain and obi.&lt;br /&gt;
&lt;br /&gt;
Once you finish rebuilding those websites on hetzner3 to use a new not-broken theme, we can cancel this second IP address.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I finished typing ^ that email, I got an email from hetzner indicating that we have a new IP&lt;br /&gt;
# I refreshed the hetzner wui, and now I see the new IP&lt;br /&gt;
# ...&lt;br /&gt;
# following-up on the bus factor, I added Catarina &amp;amp; Tom&#039;s ssh keys to their authorized_keys files on hetzner3&lt;br /&gt;
## I sent them both emails asking them to confirm access&lt;br /&gt;
# I also emailed Marcin asking if he installed zulucrypt yet to try to recover his old ssh key&lt;br /&gt;
# update: within a few hours, Marcin had successfully decrypted and mounted his old veracrypt volume using zuluCrypt&lt;br /&gt;
# he created this article on the wiki https://wiki.opensourceecology.org/wiki/Zulucrypt&lt;br /&gt;
# I found that he had previously documented scattered articles about backups, luks, veracrypt, pgp, cybersec general, etc in a ton of different articles. So I spent some time adding categories and &amp;quot;see also&amp;quot; sections to those articles, in hopes he will be more easily able to do this in the future&lt;br /&gt;
# I also asked him to please document what he needed for himself 5 years from now into a README file next to the &#039;ose-veracrypt&#039; volume on his usb drive.&lt;br /&gt;
# Marcin confirmed that he was able to restore his ssh keys and ssh into hetzner3. awesome.&lt;br /&gt;
# ...&lt;br /&gt;
# I logged all my hours and sent an invoice to OSE for last month (Mar 2025)&lt;br /&gt;
# gah, I had obliterated half my 2025Q1 log. when I tried to restore it, I got a 413 error lgo&lt;br /&gt;
# I checked php and nginx; it&#039;s 10M. How did I write &amp;gt;10 MB of text in one quarter?&lt;br /&gt;
# there&#039;s too many layers on this server; I checked the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Apr 11 22:18:20.306872 2025] [:error] [pid 13182] [client 127.0.0.1:56606] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;], referer: https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&lt;br /&gt;
HTTP/1.1 413 Request Entity Too Large&lt;br /&gt;
Message: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413)&lt;br /&gt;
Apache-Error: [file &amp;quot;apache2_util.c&amp;quot;] [line 271] [level 3] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;]&lt;br /&gt;
127.0.0.1 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.0&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot;&lt;br /&gt;
146.70.199.124 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.1&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it&#039;s modsecurity?&lt;br /&gt;
# gah, that&#039;s a lot of files to review&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# find .  |grep -i security&lt;br /&gt;
./conf.d/mod_security.wordpress.include&lt;br /&gt;
./conf.d/mod_security.conf&lt;br /&gt;
./conf.modules.d/10-mod_security.conf&lt;br /&gt;
./modsecurity.d&lt;br /&gt;
./modsecurity.d/activated_rules&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_42_tight_security.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_35_bad_robots.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_45_trojans.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_48_local_exceptions.conf.example&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_bad_robots.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_23_request_limits.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_49_inbound_blocking.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_60_correlation.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound_malware.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_scanners.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_40_generic_attacks.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_50_outbound.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_47_common_exceptions.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_30_http_policy.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_20_protocol_violations.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_xss_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_59_outbound_blocking.conf&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf.20181024.orig&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf&lt;br /&gt;
./modsecurity.d/do_not_log_passwords.conf&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it&#039;s SecRequestBodyLimit http://stackoverflow.com/questions/13887812/ddg#14690797&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -irl &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf&lt;br /&gt;
modules/mod_security2.so&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s 13107200&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimit 13107200&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimitAction Reject&lt;br /&gt;
Binary file modules/mod_security2.so matches&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# docs say it&#039;s in bytes https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecRequestBodyLimit&lt;br /&gt;
# so 13107200 / 1024 / 1024 = 12.5 MB.&lt;br /&gt;
# jesus that&#039;s a lot of data; I&#039;m not gonna increase that in 4 places (nginx, apache, mod_security, php); let&#039;s just split it into two articles :(&lt;br /&gt;
# ...&lt;br /&gt;
# so Marcin is stressing urgancy to get Catarina a sandbox so she can rebuild osemain using some new theme that&#039;s not broken on the latest version of wordpress, php, etc on hetzner3&lt;br /&gt;
# I didn&#039;t want to do this site before the other less-priority ones, but it&#039;s just a sandbox&lt;br /&gt;
# I realized I never made a CHG file for osemain&lt;br /&gt;
# looks like I first did a snapshot Jan 31https://wiki.opensourceecology.org/wiki/Maltfield_Log/2025_Q1#Fri_Jan_31.2C_2025&lt;br /&gt;
# ugh, I just said I was &amp;quot;following the same guide as with the other sites&amp;quot;&lt;br /&gt;
## I was hoping to know which one to CHG to copy-from&lt;br /&gt;
## I guess it makes the most sense to copy from obi, which already has both a static and dynamic site setup (untested)&lt;br /&gt;
# ok, I made a first draft of our osemain CHG to migrate to hetnzer3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
# oh, crap, I&#039;m going to remove&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306061</id>
		<title>Maltfield Log/2025 Q2</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306061"/>
		<updated>2025-04-27T21:41:18Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: Apr 18&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the second quarter of the year 2025. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Fri Apr 18, 2025=&lt;br /&gt;
# Marcin sent another email this morning asking why osemain is down too now, and I responded&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the&lt;br /&gt;
&amp;gt; last message&lt;br /&gt;
&lt;br /&gt;
Your whole database service was down, and it won&#039;t start. You have a varnish cache that stores a subset of pages in-memory for 24 hours. That&#039;s probably what you saw.&lt;br /&gt;
&lt;br /&gt;
I took webservers down yesterday to prevent the possibility of them corrupting the database worse, if it manages to start in recovery mode.&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt; go straight to migration to Hetzner 3.&lt;br /&gt;
&lt;br /&gt;
If you want high uptime, I don&#039;t recommend migrating to hetzner3 at this time. It&#039;s still not fully provisioned, and I actively work on it like a dev server. Which means I&#039;ll be restarting it and its services. It&#039;s not a safe place for production. That&#039;s why the wiki is the *last* service to migrate.&lt;br /&gt;
&lt;br /&gt;
Status update: yesterday I investigated to see if your underlying storage (disk, filesystem, or RAID) are failing, which might cause corruption. The filesystems were fine. RAID didn&#039;t have errors. The SMART logs on the disk said both of your two mirrored drives are failing and should be replaced within 24 hours. But I don&#039;t think that&#039;s evidence of corruption; I think it&#039;s just a timer that&#039;s alerting us to the possibility that the disks will fail soon. afaict, disk replacement is free (from Hetzner) but not trivial and high-risk. I&#039;ll postpone until after restoring the database.&lt;br /&gt;
&lt;br /&gt;
Likely not all of your database is corrupt. We *could* restore from backup, but I don&#039;t recommend that -- as you only have daily backups, and likely you&#039;ll have data loss.&lt;br /&gt;
&lt;br /&gt;
Yesterday I put the database in two recovery modes and was unable to get it to start. My plan is to continue to follow this guide, to see if I can find out which databases/tables/pages are corrupt and which are not. That way we can restore only the data we need from backups and minimize data loss&lt;br /&gt;
&lt;br /&gt;
 * https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&lt;br /&gt;
I have to go to the hospital today. If I have time, I will try to continue later tonight. And I plan to work on this over the weekend. I hope to have your sites back online early next week.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cheers,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 02:58, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Michael,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; It seems that the ose main website was up when I wrote the last message -&lt;br /&gt;
&amp;gt; but now I&#039;m trying to post the blog posts and the main site appears to be&lt;br /&gt;
&amp;gt; down. Is our whole backend crashing?  Or is that something you are doing on&lt;br /&gt;
&amp;gt; your end?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Marcin&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; On Thu, Apr 17, 2025 at 6:41 PM Marcin Jakubowski &amp;lt;&lt;br /&gt;
&amp;gt; REDACTED@opensourceecology.org&amp;gt; wrote:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Can we prioritize the wiki at this point to migrate the wiki right over to&lt;br /&gt;
&amp;gt;&amp;gt; Hetzner 3 with the  current up to date software, using the wiki backup from&lt;br /&gt;
&amp;gt;&amp;gt; 2 days ago, which is before the crash?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; The wiki was working at least the first part of yesterday, and I noticed&lt;br /&gt;
&amp;gt;&amp;gt; the crash at about 11 PM CST yesterday. Thus taking the backup from 4/15/25&lt;br /&gt;
&amp;gt;&amp;gt; should solve this? Ie, forget about trying to fix on Hetzner 2, go straight&lt;br /&gt;
&amp;gt;&amp;gt; to migration to Hetzner 3. Is that consistent with a possible shift in your&lt;br /&gt;
&amp;gt;&amp;gt; plans, or does that throw off the entire process of migration? OSE stands&lt;br /&gt;
&amp;gt;&amp;gt; stuck without it, I will have to do everything in Google docs if I don&#039;t&lt;br /&gt;
&amp;gt;&amp;gt; have wiki access, and i am justvputtingvout the announcent and recruiting.&lt;br /&gt;
&amp;gt;&amp;gt; I can switcj ro more publishing on the website, assuming that all works.&lt;br /&gt;
&amp;gt;&amp;gt; Please tell me what would be your proposed solution and how quickly you&lt;br /&gt;
&amp;gt;&amp;gt; think we can get back up to a functioning wiki, based on your schedule of&lt;br /&gt;
&amp;gt;&amp;gt; availability to work on this, so I can plan accordingly.  This is a much&lt;br /&gt;
&amp;gt;&amp;gt; higher priority than doing any of the main website migration.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Thanks,&lt;br /&gt;
&amp;gt;&amp;gt; Marcin &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so back to trying to figure out the corruption of the mariadb&lt;br /&gt;
# looks like the attempt to start it in recovery mode 2 fails after 10 minutes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because a fatal signal was delivered to the control process. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    10m0.435s&lt;br /&gt;
user    0m0.011s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and the tail of the db log&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tail -f /var/log/mariadb/mariadb.log&lt;br /&gt;
250417 23:06:00  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:01  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:02  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:03  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:05  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:06  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:07  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:08  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 23:06:09  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so we have one more recovery mode we can try before it becomes destructive = 3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 3&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and gave it a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# damn, looks like it&#039;s stuck on the same thing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:33:17 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:33:17 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 20076 ...&lt;br /&gt;
250418 19:33:17 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:33:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:33:17 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:33:17 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:33:17 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:33:17 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:33:17 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:33:17  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:33:17  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:33:18  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:19  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:33:20  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the internet suggests this infinite loop is caused by the default of innodb_purge_threads=1, and it says we should set this to 0&lt;br /&gt;
## https://serverfault.com/questions/851342/mysql-crashed-and-not-starting-even-after-adding-innodb-force-recovery&lt;br /&gt;
## https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# I tried to cut off the systemctl restart early, but it&#039;s just stuck. I guess I just have to wait 10 minutes.&lt;br /&gt;
# anyway, I set the recovery back down to 2 and added the purge threads to 0 line; I&#039;ll try that when it&#039;s not blocked&lt;br /&gt;
# meanwhile, I read up on innodb_purge_threads, which is documented here https://dev.mysql.com/doc/refman/8.4/en/innodb-purge-configuration.html&lt;br /&gt;
# oh shit, that worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m2.102s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
[root@opensourceecology etc]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 19:44:30 UTC; 19s ago&lt;br /&gt;
  Process: 22469 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 22433 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 22468 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─22468 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─22693 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-...&lt;br /&gt;
&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mariadb-prepare-db-dir[22433]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 19:44:28 opensourceecology.org mysqld_safe[22468]: 250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 19:44:30 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the logs are being spammed with these last 5 lines a bunch; I guess something is still trying to access the db?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 19:44:28 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 19:44:28 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 22693 ...&lt;br /&gt;
250418 19:44:28 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 19:44:28 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 19:44:28 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 19:44:28 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 19:44:28 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 19:44:28 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 19:44:28 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 19:44:28  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250418 19:44:28  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250418 19:44:28  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 19:44:29 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883505166&lt;br /&gt;
250418 19:44:29 InnoDB: !!! innodb_force_recovery is set to 2 !!!&lt;br /&gt;
250418 19:44:29 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 19:44:29 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 19:44:29 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
InnoDB: A new raw disk partition was initialized or&lt;br /&gt;
InnoDB: innodb_force_recovery is on: we do not allow&lt;br /&gt;
InnoDB: database modifications by the user. Shut down&lt;br /&gt;
InnoDB: mysqld and edit my.cnf so that newraw is replaced&lt;br /&gt;
InnoDB: with raw, and innodb_force_... is removed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, the spam stopped. maybe just some startup thing.&lt;br /&gt;
# I was hoping at startup it would tell us which DBs/tables/pages were corrupt; I guess we have to initiate a scan or something.&lt;br /&gt;
# this guide doesn&#039;t say anything about that https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
# but this one recommends running `mysqlcheck` https://community.spiceworks.com/t/how-to-recover-crashed-innodb-tables-on-mysql-database-server/1013051&lt;br /&gt;
# this took about a minute to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# good news; looks like the wiki isn&#039;t fucked. it&#039;s just osemain, oswh, and cacti. restoring those from backups is probably not going to cause any data loss&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@opensourceecology dbFail.20250417]# head mysqlcheck.20250418.log &lt;br /&gt;
3dp_db.wp_commentmeta                              OK&lt;br /&gt;
3dp_db.wp_comments                                 OK&lt;br /&gt;
3dp_db.wp_links                                    OK&lt;br /&gt;
3dp_db.wp_masterslider_options                     OK&lt;br /&gt;
3dp_db.wp_masterslider_sliders                     OK&lt;br /&gt;
3dp_db.wp_options                                  OK&lt;br /&gt;
3dp_db.wp_postmeta                                 OK&lt;br /&gt;
3dp_db.wp_posts                                    OK&lt;br /&gt;
3dp_db.wp_revslider_css                            OK&lt;br /&gt;
3dp_db.wp_revslider_layer_animations               OK&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418.log &lt;br /&gt;
cacti_db.automation_ips&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.automation_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_cache&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.data_source_stats_hourly_last&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
cacti_db.poller_output_boost_processes&lt;br /&gt;
note     : The storage engine for the table doesn&#039;t support check&lt;br /&gt;
osemain_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
osemain_s_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
oswh_db.wp_options&lt;br /&gt;
warning  : 1 client is using or hasn&#039;t closed the table properly&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s go ahead and take a mysqldump now, including the corrupt data. then I&#039;ll drop these three databases and restore from backups&lt;br /&gt;
## cacti_db&lt;br /&gt;
## osemain_db&lt;br /&gt;
## oswh_db&lt;br /&gt;
# I sent Marcin a status update email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I was able to start your database in recovery mode, and I see the following databases have corrupt tables:&lt;br /&gt;
&lt;br /&gt;
1. osemain&lt;br /&gt;
2. cacti&lt;br /&gt;
3. oswh&lt;br /&gt;
&lt;br /&gt;
Good news that the wiki isn&#039;t in that list. And that those particular corrupt DBs don&#039;t change much, so recovering just those databases from backups should result in an acceptable data loss, if any.&lt;br /&gt;
&lt;br /&gt;
I&#039;ll keep you updated.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I made the post-corruption mysqldump backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass --all-databases | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    2m48.845s&lt;br /&gt;
user    3m19.170s&lt;br /&gt;
sys     0m2.023s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# ls mysqldump*&lt;br /&gt;
mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz &lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s drop those three databases.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 14&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE cacti_db;&lt;br /&gt;
Query OK, 108 rows affected (0.38 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_db;&lt;br /&gt;
Query OK, 22 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oswh_db;&lt;br /&gt;
Query OK, 12 rows affected (0.03 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| fef_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_db             |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
15 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that looked good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# recovery mode isn&#039;t going to let us INSERT to recover data from backups, so let&#039;s take it out of recovery mode and see if the db will start&lt;br /&gt;
# nah, it failed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m2.805s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# logs are the same, I think?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:10:04 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:10:04 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 24305 ...&lt;br /&gt;
250418 20:10:04 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:10:04 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:10:04 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:10:04 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:10:04 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:10:04 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:10:04 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:10:04  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:10:04  InnoDB: Assertion failure in thread 140076605044480 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:10:04 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x560180c61cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x560180875975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f664031f630]&lt;br /&gt;
:0(__GI_raise)[0x7f663ea46387]&lt;br /&gt;
:0(__GI_abort)[0x7f663ea47a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x560180a0a45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x560180a0afa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x560180b0d504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x560180b02487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x560180a0d17d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x560180a010f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6640317ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f663eb0eb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:10:04 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I re-enabled recovery mode, but this time just as 1. This time it did start, but this loop gets spammed to the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250418 20:11:42 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 625883708456&lt;br /&gt;
250418 20:11:42 InnoDB: !!! innodb_force_recovery is set to 1 !!!&lt;br /&gt;
250418 20:11:42 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:11:42 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
250418 20:11:42  InnoDB: Assertion failure in thread 140282494781184 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250418 20:11:42 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed, &lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to &lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x55e2d6dbbcad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x55e2d69cf975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f962fbdc630]&lt;br /&gt;
:0(__GI_raise)[0x7f962e303387]&lt;br /&gt;
:0(__GI_abort)[0x7f962e304a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x55e2d6b6445f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638fa4)[0x55e2d6b64fa4]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x55e2d6c67504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x55e2d6c5c487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x55e2d6b6717d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62e83c)[0x55e2d6b5a83c]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f962fbd4ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f962e3cbb0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250418 20:11:42 mysqld_safe Number of processes running now: 0&lt;br /&gt;
250418 20:11:42 mysqld_safe mysqld restarted&lt;br /&gt;
250418 20:11:42 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 27371 ...&lt;br /&gt;
250418 20:11:42 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:11:42 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:11:42 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:11:42 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:11:42 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:11:42 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250418 20:11:42 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250418 20:11:42  InnoDB: Waiting for the background threads to start&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, even though it *says* it&#039;s started&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m5.156s&lt;br /&gt;
user    0m0.008s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: active (running) since Fri 2025-04-18 20:11:07 UTC; 13s ago&lt;br /&gt;
  Process: 24459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 24423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 24458 (mysqld_safe)&lt;br /&gt;
   CGroup: /system.slice/mariadb.service&lt;br /&gt;
		   ├─24458 /bin/sh /usr/bin/mysqld_safe --basedir=/usr&lt;br /&gt;
		   └─25620 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/run/mariadb/mariadb.pid --socket=/v...&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mariadb-prepare-db-dir[24423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:11:02 opensourceecology.org mysqld_safe[24458]: 250418 20:11:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:11:07 opensourceecology.org systemd[1]: Started MariaDB database server.&lt;br /&gt;
&lt;br /&gt;
real    0m0.012s&lt;br /&gt;
user    0m0.001s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can&#039;t connect to it with mysqlcheck&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log                              &lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I set it back to recovery mode 2, restarted, and tried the mysqlcheck again&lt;br /&gt;
# huh, all lines say OK&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418&lt;br /&gt;
mysqlcheck.20250418_201348.log  mysqlcheck.20250418.log&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# less mysqlcheck.20250418_201348.log&lt;br /&gt;
mysqlcheck: Got error: 2002: Can&#039;t connect to local MySQL server through socket &#039;/var/lib/mysql/mysql.sock&#039; (111) when trying to connect&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqlcheck --all-databases -u root -p$mysqlPass &amp;amp;&amp;gt; mysqlcheck.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).log&lt;br /&gt;
&lt;br /&gt;
real    0m11.597s&lt;br /&gt;
user    0m0.010s&lt;br /&gt;
sys     0m0.009s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# grep -vi OK mysqlcheck.20250418_201559.log &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well now I&#039;m wondering if I should have run CHECK TABLE and REPAIR TABLE rather than just DROP them https://dev.mysql.com/doc/refman/8.4/en/myisam-table-close.html&lt;br /&gt;
# I&#039;m going to restore from the backup and then see if I can do that&lt;br /&gt;
# oh, right, we can&#039;t INSERT in recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1030 (HY000) at line 91: Got error -1 from storage engine&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, fuck, now I don&#039;t know why it won&#039;t start. And it doesn&#039;t tell me why. The good news is that I was able to get a db dump. maybe I can copy this huge dump over to some other server for repair and then copy it back?&lt;br /&gt;
# we should have backups. I&#039;m going to just purge all the non-system databases and see if we can get this thing started at all&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db d3ddb;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;d3ddb&#039; at line 1&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE 3dp_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE d3d_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE fef_db;&lt;br /&gt;
Query OK, 12 rows affected (0.06 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE microfactory_db;&lt;br /&gt;
Query OK, 20 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_db;&lt;br /&gt;
Query OK, 21 rows affected (0.09 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_stabing_db;&lt;br /&gt;
ERROR 1008 (HY000): Can&#039;t drop database &#039;obi_stabing_db&#039;; database doesn&#039;t exist&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE oseforum_db;&lt;br /&gt;
Query OK, 35 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osemain_s_db;&lt;br /&gt;
Query OK, 20 rows affected (0.04 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE osewiki_db;&lt;br /&gt;
Query OK, 59 rows affected (0.31 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE phplist_db;&lt;br /&gt;
Query OK, 42 rows affected (0.16 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE seedhome_db;&lt;br /&gt;
Query OK, 12 rows affected (0.05 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE store_db;&lt;br /&gt;
Query OK, 36 rows affected (0.11 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE obi_staging_db;&lt;br /&gt;
Query OK, 21 rows affected (0.08 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# even after that, it still won&#039;t start :&#039;(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
&lt;br /&gt;
real    0m4.863s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology etc]# time systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Fri 2025-04-18 20:34:47 UTC; 14s ago&lt;br /&gt;
  Process: 18459 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 18458 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 18423 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 18458 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mariadb-prepare-db-dir[18423]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 18 20:34:46 opensourceecology.org mysqld_safe[18458]: 250418 20:34:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 18 20:34:47 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I purge those three system-level DBs, I want to confirm they&#039;re in our backups&lt;br /&gt;
# as I feared, it looks like they&#039;re missing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zgrep -E &#039;CREATE DATABASE&#039; mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | grep &#039;IF NOT EXISTS&#039; | grep -E &#039;^.{,100}$&#039;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `3dp_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `cacti_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `d3d_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `fef_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `microfactory_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `mysql` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `obi_staging_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oseforum_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osemain_s_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `osewiki_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `oswh_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `phplist_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `seedhome_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `store_db` /*!40100 DEFAULT CHARACTER SET latin1 */;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# according to this, information_schema is essentially a cache that gets created &amp;amp; destroyed every time mysql is restarted, so we should be ok to loose that https://stackoverflow.com/questions/15306132/information-schema-error-when-restoring-database-dump&lt;br /&gt;
# I&#039;m just going to manually dump these three anyway. Or try to&lt;br /&gt;
# well, I was able to get one of the three to backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass information_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_information_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz &lt;br /&gt;
mysqldump: Got error: 1044: &amp;quot;Access denied for user &#039;root&#039;@&#039;localhost&#039; to database &#039;information_schema&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.010s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.008s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass mysql | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_mysql.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
&lt;br /&gt;
real    0m0.142s&lt;br /&gt;
user    0m0.155s&lt;br /&gt;
sys     0m0.010s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time mysqldump -uroot -p$mysqlPass performance_schema | gzip -c &amp;gt; mysqldump-after-corruption-while-in-recovery-mode_performance_schema.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).sql.gz&lt;br /&gt;
mysqldump: Got error: 1142: &amp;quot;SELECT,LOCK TABL command denied to user &#039;root&#039;@&#039;localhost&#039; for table &#039;cond_instances&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.009s&lt;br /&gt;
user    0m0.009s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# mysql looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh mysqldump-after-corruption-while-in-recovery-mode*&lt;br /&gt;
1.3G    mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_information_schema.20250418_205054.sql.gz&lt;br /&gt;
716K    mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz&lt;br /&gt;
4.0K    mysqldump-after-corruption-while-in-recovery-mode_performance_schema.20250418_205157.sql.gz&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m just going to move this whole db dir out of the way and see if we can start it fresh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cd /var/lib&lt;br /&gt;
[root@opensourceecology lib]# du -sh mysql/&lt;br /&gt;
6.5G    mysql/&lt;br /&gt;
[root@opensourceecology lib]# ls -lah | grep -i mysql&lt;br /&gt;
drwxr-xr-x   4 mysql   mysql   4.0K Apr 18 20:50 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# systemctl stop mariadb&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mv mysql mysql.20250418&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# mkdir mysql&lt;br /&gt;
[root@opensourceecology lib]# chown mysql:mysql mysql&lt;br /&gt;
[root@opensourceecology lib]# chmod 0755 mysql&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 8.0K&lt;br /&gt;
drwxr-xr-x   2 mysql mysql 4.0K Apr 18 20:55 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, it&#039;s started outside recovery mode now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# time systemctl restart mariadb&lt;br /&gt;
&lt;br /&gt;
real    0m3.550s&lt;br /&gt;
user    0m0.007s&lt;br /&gt;
sys     0m0.012s&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
250418 20:55:06 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
250418 20:56:23 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250418 20:56:23 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 21252 ...&lt;br /&gt;
250418 20:56:23 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250418 20:56:23 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250418 20:56:23 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250418 20:56:23 InnoDB: Using Linux native AIO&lt;br /&gt;
250418 20:56:23 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250418 20:56:23 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
InnoDB: The first specified data file ./ibdata1 did not exist:&lt;br /&gt;
InnoDB: a new database to be created!&lt;br /&gt;
250418 20:56:23  InnoDB: Setting file ./ibdata1 size to 10 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile0 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile0 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
250418 20:56:23  InnoDB: Log file ./ib_logfile1 did not exist: new to be created&lt;br /&gt;
InnoDB: Setting log file ./ib_logfile1 size to 5 MB&lt;br /&gt;
InnoDB: Database physically writes the file full: wait...&lt;br /&gt;
InnoDB: Doublewrite buffer not found: creating new&lt;br /&gt;
InnoDB: Doublewrite buffer created&lt;br /&gt;
InnoDB: 127 rollback segment(s) active.&lt;br /&gt;
InnoDB: Creating foreign key constraint system tables&lt;br /&gt;
InnoDB: Foreign key constraint system tables created&lt;br /&gt;
250418 20:56:23  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250418 20:56:24 Percona XtraDB (http://www.percona.com) 5.5.61-MariaDB-38.13 started; log sequence number 0&lt;br /&gt;
250418 20:56:24 [Note] Plugin &#039;FEEDBACK&#039; is disabled.&lt;br /&gt;
250418 20:56:24 [Note] Event Scheduler: Loaded 0 events&lt;br /&gt;
250418 20:56:24 [Note] /usr/libexec/mysqld: ready for connections.&lt;br /&gt;
Version: &#039;5.5.68-MariaDB&#039;  socket: &#039;/var/lib/mysql/mysql.sock&#039;  port: 0  MariaDB Server&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it created all these files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# ls -lah mysql&lt;br /&gt;
total 29M&lt;br /&gt;
drwxr-xr-x   5 mysql mysql 4.0K Apr 18 20:56 .&lt;br /&gt;
drwxr-xr-x. 42 root  root  4.0K Apr 18 20:55 ..&lt;br /&gt;
-rw-rw----   1 mysql mysql  16K Apr 18 20:56 aria_log.00000001&lt;br /&gt;
-rw-rw----   1 mysql mysql   52 Apr 18 20:56 aria_log_control&lt;br /&gt;
-rw-rw----   1 mysql mysql  18M Apr 18 20:56 ibdata1&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile0&lt;br /&gt;
-rw-rw----   1 mysql mysql 5.0M Apr 18 20:56 ib_logfile1&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 mysql&lt;br /&gt;
srwxrwxrwx   1 mysql mysql    0 Apr 18 20:56 mysql.sock&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 performance_schema&lt;br /&gt;
drwx------   2 mysql mysql 4.0K Apr 18 20:56 test&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that also would have killed the mysql password; I can&#039;t login&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
ERROR 1045 (28000): Access denied for user &#039;root&#039;@&#039;localhost&#039; (using password: YES)&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I hacked my way in and set the root password&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysqld_safe --skip-grant-tables --skip-networking &amp;amp;&lt;br /&gt;
mysql -u root&lt;br /&gt;
use mysql;&lt;br /&gt;
update user set password=PASSWORD(&amp;quot;new-password&amp;quot;) where User=&#039;root&#039;;&lt;br /&gt;
flush privileges;&lt;br /&gt;
exit&lt;br /&gt;
jobs -l&lt;br /&gt;
# kill mysqld_safe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now I can see our three databases, plus one named test&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# usually this is where I&#039;d run the mysql hardening script, but let&#039;s just drop test manually and restore from backup&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 4&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
+====================+&lt;br /&gt;
| mysql              |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| test               |&lt;br /&gt;
+--------------------+&lt;br /&gt;
+--------------------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; DROP DATABASE test;&lt;br /&gt;
Query OK, 0 rows affected (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; exit&lt;br /&gt;
Bye&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first let&#039;s just restore the &#039;mysql&#039; database&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# zcat mysqldump-after-corruption-while-in-recovery-mode_mysql.20250418_205149.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that appears to have worked; our users are present now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [mysql]&amp;gt; select User from user limit 10;&lt;br /&gt;
+------------------+&lt;br /&gt;
| User             |&lt;br /&gt;
+------------------+&lt;br /&gt;
| oseforum_user    |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| 3dp_user         |&lt;br /&gt;
| cacti_user       |&lt;br /&gt;
| d3d_user         |&lt;br /&gt;
| fef_user         |&lt;br /&gt;
| microfactory_usr |&lt;br /&gt;
| munin_user       |&lt;br /&gt;
| obi2_user        |&lt;br /&gt;
| obi3_user        |&lt;br /&gt;
+------------------+&lt;br /&gt;
10 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [mysql]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I gave it a restart, and ensured it&#039;s still working. Great.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 2&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
+--------------------+&lt;br /&gt;
3 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s restore the rest – including even our corrupt databases – and see if it works or breaks&lt;br /&gt;
# that took about 11.5 minutes to import ~6.8G of data&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# time zcat mysqldump-after-corruption-while-in-recovery-mode.20250418_200122.sql.gz | mysql -uroot -p$mysqlPass mysql&lt;br /&gt;
&lt;br /&gt;
real    11m36.530s&lt;br /&gt;
user    1m52.944s&lt;br /&gt;
sys     0m3.593s&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# du -sh /var/lib/mysql&lt;br /&gt;
6.8G    /var/lib/mysql&lt;br /&gt;
[root@opensourceecology dbFail.20250417]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m still able to connect, and now I see all our DBs – including the ones it said were corrupt&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 6&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# woah, I gave it a restart, and it came back fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl restart mariadb&lt;br /&gt;
[root@opensourceecology lib]# mysql -uroot -p$mysqlPass&lt;br /&gt;
Welcome to the MariaDB monitor.  Commands end with ; or \g.&lt;br /&gt;
Your MariaDB connection id is 3&lt;br /&gt;
Server version: 5.5.68-MariaDB MariaDB Server&lt;br /&gt;
&lt;br /&gt;
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.&lt;br /&gt;
&lt;br /&gt;
Type &#039;help;&#039; or &#039;\h&#039; for help. Type &#039;\c&#039; to clear the current input statement.&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; show databases;&lt;br /&gt;
+--------------------+&lt;br /&gt;
| Database           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
| information_schema |&lt;br /&gt;
| 3dp_db             |&lt;br /&gt;
| cacti_db           |&lt;br /&gt;
| d3d_db             |&lt;br /&gt;
| fef_db             |&lt;br /&gt;
| microfactory_db    |&lt;br /&gt;
| mysql              |&lt;br /&gt;
| obi_db             |&lt;br /&gt;
| obi_staging_db     |&lt;br /&gt;
| oseforum_db        |&lt;br /&gt;
| osemain_db         |&lt;br /&gt;
| osemain_s_db       |&lt;br /&gt;
| osewiki_db         |&lt;br /&gt;
| oswh_db            |&lt;br /&gt;
| performance_schema |&lt;br /&gt;
| phplist_db         |&lt;br /&gt;
| seedhome_db        |&lt;br /&gt;
| store_db           |&lt;br /&gt;
+--------------------+&lt;br /&gt;
18 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [(none)]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess we fixed it with no data loss?&lt;br /&gt;
# let&#039;s bring up the web servers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# systemctl start httpd&lt;br /&gt;
[root@opensourceecology lib]# systemctl start varnish&lt;br /&gt;
[root@opensourceecology lib]# systemctl start nginx&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the wiki loads now&lt;br /&gt;
# so does osemain&lt;br /&gt;
# I&#039;d say we&#039;re back in business&lt;br /&gt;
# I sent an email to Marcin&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I think all your sites are back now.&lt;br /&gt;
&lt;br /&gt;
I was able to restore all of your databases from a dump of the database in recovery mode. So nothing needed to be restored from backups.&lt;br /&gt;
&lt;br /&gt;
Please let me know if you see any issues. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now that Marcin has ssh access on the server again, I wonder if he has permission to execute `restart` – that would be better for him than logging into the hetzner wui and doing hard resets, which likely caused this corruption&lt;br /&gt;
# at the risk of taking everything down after I just told Marcin that everything is up, I&#039;m going to try it&lt;br /&gt;
# looks like it won&#039;t let him reboot if other users are logged-in&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ reboot&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
User maltfield is logged in on sshd.&lt;br /&gt;
Please retry operation after closing inhibitors and logging out other users.&lt;br /&gt;
Alternatively, ignore inhibitors and users with &#039;systemctl reboot -i&#039;.&lt;br /&gt;
[marcin@opensourceecology ~]$ systemctl reboot -i&lt;br /&gt;
==== AUTHENTICATING FOR org.freedesktop.login1.reboot-multiple-sessions ===&lt;br /&gt;
Authentication is required for rebooting the system while other users are logged in.&lt;br /&gt;
Multiple identities can be used for authentication:&lt;br /&gt;
 1.  maltfield&lt;br /&gt;
 2.  crupp&lt;br /&gt;
 3.  Tom Griffing (tgriffing)&lt;br /&gt;
 4.  jthomas&lt;br /&gt;
Choose identity to authenticate as (1-4):&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the sudoers command to give marcin *just* access to the reboot command&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology lib]# visudo&lt;br /&gt;
[root@opensourceecology lib]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology lib]# tail /etc/sudoers&lt;br /&gt;
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom&lt;br /&gt;
&lt;br /&gt;
## Allows members of the users group to shutdown this system&lt;br /&gt;
# %users  localhost=/sbin/shutdown -h now&lt;br /&gt;
&lt;br /&gt;
## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)&lt;br /&gt;
#includedir /etc/sudoers.d&lt;br /&gt;
&lt;br /&gt;
# let marcin reboot the machine gracefully&lt;br /&gt;
marcin ALL = NOPASSWD: /sbin/reboot&lt;br /&gt;
[root@opensourceecology lib]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I couldn&#039;t test this on the server without changing marcin&#039;s password, so I spun-up a quick DispVM to ensure it *only* gives him access to reboot&lt;br /&gt;
# it&#039;s debian, but sudoers syntax should (hopefully) be the same&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@debian-12-dvm:~$ sudo su -&lt;br /&gt;
root@debian-12-dvm:~# adduser marcin --disabled-password --gecos &#039;&#039;&lt;br /&gt;
Adding user `marcin&#039; ...&lt;br /&gt;
Adding new group `marcin&#039; (1001) ...&lt;br /&gt;
Adding new user `marcin&#039; (1001) with group `marcin (1001)&#039; ...&lt;br /&gt;
Creating home directory `/home/marcin&#039; ...&lt;br /&gt;
Copying files from `/etc/skel&#039; ...&lt;br /&gt;
Adding new user `marcin&#039; to supplemental / extra groups `users&#039; ...&lt;br /&gt;
Adding user `marcin&#039; to group `users&#039; ...&lt;br /&gt;
root@debian-12-dvm:~# &lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# visudo&lt;br /&gt;
root@debian-12-dvm:~#&lt;br /&gt;
&lt;br /&gt;
root@debian-12-dvm:~# passwd marcin&lt;br /&gt;
New password: &lt;br /&gt;
Retype new password: &lt;br /&gt;
passwd: password updated successfully&lt;br /&gt;
root@debian-12-dvm:~# sudo su - marcin&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo su -&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/su -&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$&lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo echo hi&lt;br /&gt;
[sudo] password for marcin: &lt;br /&gt;
Sorry, user marcin is not allowed to execute &#039;/usr/bin/echo hi&#039; as root on localhost.&lt;br /&gt;
marcin@debian-12-dvm:~$ &lt;br /&gt;
&lt;br /&gt;
marcin@debian-12-dvm:~$ reboot&lt;br /&gt;
-bash: reboot: command not found&lt;br /&gt;
marcin@debian-12-dvm:~$ sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, that worked. Perfect.&lt;br /&gt;
# I tested it on hetzner2; it worked too.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[marcin@opensourceecology ~]$ sudo reboot&lt;br /&gt;
Connection to opensourceecology.org closed by remote host.&lt;br /&gt;
Connection to opensourceecology.org closed.&lt;br /&gt;
ssh: connect to host opensourceecology.org port 32415: Connection refused&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I sent Marcin a reply ask him to test reboots via ssh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sorry the server just went down; that was me testing to make sure your &#039;marcin&#039; user now has permission to do a proper &amp;amp; safer `sudo reboot` of hetzner2. It does.&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that&lt;br /&gt;
&amp;gt; I should plan on potential breakage at any time?&lt;br /&gt;
&lt;br /&gt;
Great question. There&#039;s a couple things I&#039;d like to implement to prevent this from happening again:&lt;br /&gt;
&lt;br /&gt;
1. Replace both of your disks on hetzner2&lt;br /&gt;
&lt;br /&gt;
2. Give you reboot permission on hetzner2&lt;br /&gt;
&lt;br /&gt;
My best-guess is that the corruption happened because you abruptly shutdown the server. As you know, that&#039;s generally not a good idea as it can cause data loss.&lt;br /&gt;
&lt;br /&gt;
But filesystems use journals and databases use pages. They *should* be able to recover from abrupt shutdowns. They wouldn&#039;t be very useful if they were so frail as to not be able to recover from something like that...&lt;br /&gt;
&lt;br /&gt;
But in this case, I think it was a &amp;quot;perfect storm&amp;quot; that you caused corruption and it wasn&#039;t able to recover from it due to a bug in mariadb. And, because your OS is EOL, we can&#039;t update to a newer version of mariadb that *is* able to recover from such a unlucky combination of events.&lt;br /&gt;
&lt;br /&gt;
So, in the meantime, instead of you logging into hetzner&#039;s WUI to trigger reboots, I&#039;d prefer if you would ssh into the hetzner2 server and execute&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
Please test this on your computer now to make sure you&#039;re setup for it. To ssh into hetzner2, execute this command on your computer:&lt;br /&gt;
&lt;br /&gt;
  ssh -p 32415 marcin@opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
And then at the prompt, execute this command (make sure you type this *after* you&#039;ve logged into hetzner, or you&#039;ll end-up rebooting your own laptop!)&lt;br /&gt;
&lt;br /&gt;
  sudo reboot&lt;br /&gt;
&lt;br /&gt;
The second thing I&#039;d like to do is replace both of your disks on hetzner2. I don&#039;t think they caused corruption in this case, but I did discover that they&#039;re both screaming that they&#039;re going to die soon and asking to be replaced, so I would be a fool not to heed that warning.&lt;br /&gt;
&lt;br /&gt;
Hetzner shouldn&#039;t charge us to replace a failing disk, but I&#039;ll schedule some downtime for remote hetzner hands to shutdown the machine, then I&#039;ll need to format the new drive, add it to the RAID (the mirror of two redundant disks), and update your grub boot partition.&lt;br /&gt;
&lt;br /&gt;
There&#039;s some risk in doing this, because you&#039;ll be running on one non-redundant disk (a disk which is screaming at us saying it&#039;s going to die within 24 hours) while the RAID is re-building. But, of course, there&#039;s risk in not doing it..&lt;br /&gt;
&lt;br /&gt;
Please confirm that you can now reboot hetzner2 via ssh.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 4/18/25 16:39, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; Thats excellent, thabk you, looks good. Do things look stable or are the&lt;br /&gt;
&amp;gt; risks of recurrence in the near future significant, such that I should plan&lt;br /&gt;
&amp;gt; on potential breakage at any time? Regarding the full migration, how many&lt;br /&gt;
&amp;gt; more hours/days of provisioning do tou still expwct to need? &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created an article for the CHG to replace the first disk on hetzner2 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
## I wonder if I can figure out which one grub uses and replace that one second..&lt;br /&gt;
# from my log yesterday, here&#039;s our two drive&#039;s serial numbers&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fuck; looks like neither is referenced in /boot/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA4520&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# grep -irl &#039;154410FA336C&#039; /boot&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the steps to setup grub are actually quite simple, according to the hetzner docs https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## it says if we&#039;re doing it on the booted system, then we just need to run `grub-install /dev/sdX`&lt;br /&gt;
# it has additional instructions for grub1. And, uh, looks like we have grub1, grub2, *and* an efi dir in /boot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology grub2]# ls /boot&lt;br /&gt;
config-3.10.0-1127.el7.x86_64                            initramfs-3.10.0-1160.119.1.el7.x86_64kdump.img  System.map-3.10.0-1127.el7.x86_64&lt;br /&gt;
config-3.10.0-1160.119.1.el7.x86_64                      initramfs-3.10.0-327.18.2.el7.x86_64.img         System.map-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
config-3.10.0-327.18.2.el7.x86_64                        initramfs-3.10.0-514.26.2.el7.x86_64.img         System.map-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
config-3.10.0-514.26.2.el7.x86_64                        initramfs-3.10.0-693.2.2.el7.x86_64.img          System.map-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
config-3.10.0-693.2.2.el7.x86_64                         initramfs-3.10.0-693.2.2.el7.x86_64kdump.img     System.map-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
efi                                                      initrd-plymouth.img                              vmlinuz-0-rescue-34946d7b5edb0946bfb52c0f6cae67af&lt;br /&gt;
grub                                                     lost+found                                       vmlinuz-3.10.0-1127.el7.x86_64&lt;br /&gt;
grub2                                                    symvers-3.10.0-1127.el7.x86_64.gz                vmlinuz-3.10.0-1160.119.1.el7.x86_64&lt;br /&gt;
initramfs-0-rescue-34946d7b5edb0946bfb52c0f6cae67af.img  symvers-3.10.0-1160.119.1.el7.x86_64.gz          vmlinuz-3.10.0-327.18.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64.img                     symvers-3.10.0-327.18.2.el7.x86_64.gz            vmlinuz-3.10.0-514.26.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1127.el7.x86_64kdump.img                symvers-3.10.0-514.26.2.el7.x86_64.gz            vmlinuz-3.10.0-693.2.2.el7.x86_64&lt;br /&gt;
initramfs-3.10.0-1160.119.1.el7.x86_64.img               symvers-3.10.0-693.2.2.el7.x86_64.gz&lt;br /&gt;
[root@opensourceecology grub2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m thinking we should actually just tell hetzner to do a hot swap while the system is on, so we can do this &amp;quot;easy install&amp;quot; of grub without risking the system not coming-up after they removed the drive&lt;br /&gt;
# oh, the efi dir is empty, so I&#039;m thinking we&#039;re using grub2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# find efi&lt;br /&gt;
efi&lt;br /&gt;
efi/EFI&lt;br /&gt;
efi/EFI/centos&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, the grub dir just has one file in it?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub&lt;br /&gt;
total 10K&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Apr 11  2016 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
-rw-r--r--  1 root root 1.4K Nov 15  2011 splash.xpm.gz&lt;br /&gt;
[root@opensourceecology boot]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# grub2 looks most sane&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology boot]# ls -lah grub2&lt;br /&gt;
total 52K&lt;br /&gt;
drwx------. 5 root root 1.0K Jul 26  2024 .&lt;br /&gt;
dr-xr-xr-x. 6 root root 5.0K Jul 26  2024 ..&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K Dec 15  2015 fonts&lt;br /&gt;
-rw-r--r--  1 root root 7.8K Jul 26  2024 grub.cfg&lt;br /&gt;
-rw-r--r--  1 root root 5.3K Jun  1  2016 grub.cfg.1499616907.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 6.1K Jul  9  2017 grub.cfg.1506097734.rpmsave&lt;br /&gt;
-rw-r--r--  1 root root 7.0K Sep 22  2017 grub.cfg.1588589453.rpmsave&lt;br /&gt;
-rw-r--r--. 1 root root 1.0K Jul 26  2024 grubenv&lt;br /&gt;
drwxr-xr-x. 2 root root 9.0K May 31  2016 i386-pc&lt;br /&gt;
drwxr-xr-x. 2 root root 1.0K May 31  2016 locale&lt;br /&gt;
[root@opensourceecology boot]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it looks like it&#039;s referencing the raid, not the drive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### BEGIN /etc/grub.d/10_linux ###&lt;br /&gt;
menuentry &#039;CentOS Linux (3.10.0-1160.119.1.el7.x86_64) 7 (Core)&#039; --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option &#039;gnulinux-3.10.0-327.13.1.el7.x86_64-advanced-af18bd25-f715-4003-b055-170a07591c60&#039; {&lt;br /&gt;
		load_video&lt;br /&gt;
		set gfxpayload=keep&lt;br /&gt;
		insmod gzio&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod part_msdos&lt;br /&gt;
		insmod diskfilter&lt;br /&gt;
		insmod mdraid1x&lt;br /&gt;
		insmod ext2&lt;br /&gt;
		set root=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;&lt;br /&gt;
		if [ x$feature_platform_search_hint = xy ]; then&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root --hint=&#039;mduuid/7141f546f6e3f5962a80bdc64c4f6d4a&#039;  9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		else&lt;br /&gt;
		  search --no-floppy --fs-uuid --set=root 9f6b5264-da8c-406d-a444-45e3fb3aeb26&lt;br /&gt;
		fi&lt;br /&gt;
		linux16 /vmlinuz-3.10.0-1160.119.1.el7.x86_64 root=/dev/md/2 ro nomodeset rd.auto=1 crashkernel=auto LANG=en_US.UTF-8&lt;br /&gt;
		initrd16 /initramfs-3.10.0-1160.119.1.el7.x86_64.img&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# right, so if I understand this correctly: we&#039;re not updating grub. We&#039;re using &#039;grub-install&#039; to copy our grub config *to* the drive. that&#039;s easier and less concerning than I thought.&lt;br /&gt;
# well, since I can&#039;t see any good reason to pick one drive or the other to replace first, I&#039;m going to have them replace /dev/sdb first. Just because &#039;sda&#039; seems like it would be primary. I know it&#039;s probably not, but, anyway..&lt;br /&gt;
# that means we&#039;ll replace Crucial_CT250MX200SSD1_154410FA4520 first; I created another wiki entry for that https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sdb&lt;br /&gt;
# Marcin sent me an email confirming that he&#039;s able to restart hetzner2 with `sudo reboot`. I asked him to use this in the future if he needs to reboot it again.&lt;br /&gt;
# the disk is getting pretty full, but I&#039;m going to leave these files in /var/tmp/ for at least a few days, to make sure we don&#039;t actually need to restore from a backup again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  150G   38G  80% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/lib/mysql.20250418 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Apr 17, 2025=&lt;br /&gt;
# Marcin sent me an email last night (and again this morning) asking why the wiki is down&lt;br /&gt;
# I hadn&#039;t touched ose infra since 6 days ago&lt;br /&gt;
# the wiki is still on hetzner2, which is on EOL Cent, so I&#039;m not terribly surprised it&#039;s falling apart.&lt;br /&gt;
# I first warned Marcin about this many years ago, and hopefully the migration to hetzner3 will be finished before the end of this year&lt;br /&gt;
# anyway, let&#039;s check what happened to the wiki on hetzner2&lt;br /&gt;
# it&#039;s a 500 error complaining about the db&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ curl -iL wiki.opensourceecology.org&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:52 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://wiki.opensourceecology.org/&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 500 Internal Server Error&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:54 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 976&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Varnish: 434054&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish-v4&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h1&amp;gt;Sorry! This site is experiencing technical difficulties.&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Try waiting a few minutes and reloading.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;small&amp;gt;(Cannot access the database)&amp;lt;/small&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;hr /&amp;gt;&amp;lt;div style=&amp;quot;margin: 1.5em&amp;quot;&amp;gt;You can try searching via Google in the meantime.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;small&amp;gt;Note that their indexes of our content may be out of date.&amp;lt;/small&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;form method=&amp;quot;get&amp;quot; action=&amp;quot;//www.google.com/search&amp;quot; id=&amp;quot;googlesearch&amp;quot;&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;domains&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;num&amp;quot; value=&amp;quot;50&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;ie&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;oe&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;q&amp;quot; size=&amp;quot;31&amp;quot; maxlength=&amp;quot;255&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;submit&amp;quot; name=&amp;quot;btnG&amp;quot; value=&amp;quot;Search&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;p&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; checked=&amp;quot;checked&amp;quot; /&amp;gt;Open Source Ecology&amp;lt;/label&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;WWW&amp;lt;/label&amp;gt;&lt;br /&gt;
	&amp;lt;/p&amp;gt;&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# disk is fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G   96G   92G  52% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s no new logs in the apache error log when I hit the site in real-time (bypassing the cache)&lt;br /&gt;
# there&#039;s also no new logs in the mariadb error log when I hit the site in real-time&lt;br /&gt;
# well, the db isn&#039;t running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Thu 2025-04-17 17:39:24 UTC; 2h 42min ago&lt;br /&gt;
  Process: 1227 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 1226 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 1103 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 1226 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# error logs aren&#039;t very helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# journalctl -fu mariadb&lt;br /&gt;
-- Logs begin at Thu 2025-04-17 17:38:59 UTC. --&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I try to restart it manually, nothing gets put in the journal logs, but there&#039;s a bunch to the actual log file that the journal log mentions (damn systemd)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the log that pops-up when we try a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 20:24:31 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 20:24:31 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 10583 ...&lt;br /&gt;
250417 20:24:31 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 20:24:31 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 20:24:31 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 20:24:31 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 20:24:31 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 20:24:31 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 20:24:31 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 20:24:31  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 20:24:31  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 20:24:31  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 20:24:31  InnoDB: Assertion failure in thread 140093400303360 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250417 20:24:31 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x563a1c105cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x563a1bd19975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f6a294c9630]&lt;br /&gt;
:0(__GI_raise)[0x7f6a27bf0387]&lt;br /&gt;
:0(__GI_abort)[0x7f6a27bf1a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x563a1beae45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638f69)[0x563a1beaef69]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x563a1bfb1504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x563a1bfa6487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x563a1beb117d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x563a1bea50f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6a294c1ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f6a27cb8b0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250417 20:24:31 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# google points to this https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
## they say it could be a bug that might be fixed in v5.7. We&#039;re using 5.5.68. hetzner3 uses 5.8.&lt;br /&gt;
# reddit says we&#039;re fucked and should restore from backup https://old.reddit.com/r/mysql/comments/d3nkc7/innodb_assertion_failure_in_thread_4560_in_file/&lt;br /&gt;
# before reading any more, I&#039;m going to immediately make a local copy of our most-recent backups&lt;br /&gt;
# looks like we have a backup from 13 hours ago and one from 27 hours ago&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ date&lt;br /&gt;
Thu Apr 17 20:36:56 UTC 2025&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync&lt;br /&gt;
total 21G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 17 07:49 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    21G Apr 17 07:48 daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync.old/&lt;br /&gt;
total 22G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 16 07:52 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    22G Apr 16 07:52 daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this SE answer is helpful https://serverfault.com/questions/592793/mysql-crashed-and-wont-start-up&lt;br /&gt;
## it says we can force the db to start (in &amp;quot;recovery mode&amp;quot;) and then try to figure out which table is corrupted. Then we might be able to backup more-recent data from the not-corrupt tables and only recover the fucked table&lt;br /&gt;
## other warnings suggest solving the underlying issue: why did the data become corrupt?&lt;br /&gt;
## well, we know Marcin has been hard-resetting the server (via the hetzner wui) about every week because it keeps breaking since some months ago (it&#039;s EOL and not worth debugging)&lt;br /&gt;
## but it&#039;s also possible we have a worse issue, like a disk failing. We do have RAID1 tho, so idk. Still, it would be wise to check the SMART data and RAID logs and filesystem for corruption&lt;br /&gt;
# I sent a quick status update to Marcin so he knows the severity of the issue and that this isn&#039;t going to be fixed soon&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Your database is corrupt and won&#039;t start.&lt;br /&gt;
&lt;br /&gt;
Quick internet search for the error messages suggests this could be a bug that&#039;s been fixed in mariadb 5.7. You&#039;re using 5.6 and can&#039;t upgrade because your OS is EOL. hetnzer3 is running 5.8.&lt;br /&gt;
&lt;br /&gt;
 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
I&#039;m looking into seeing what is corrupt, what isn&#039;t corrupt, and if we can restore from backup.&lt;br /&gt;
&lt;br /&gt;
This is not going to be an easy or fast fix, sorry. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the backups of the backups finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /home/b2user/sync*/* /var/tmp/&lt;br /&gt;
sending incremental file list&lt;br /&gt;
daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
 22,975,631,986 100%  139.63MB/s    0:02:36 (xfr#1, to-chk=1/2)&lt;br /&gt;
daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
 21,566,407,634 100%  103.43MB/s    0:03:18 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 44,552,914,338 bytes  received 54 bytes  125,324,653.70 bytes/sec&lt;br /&gt;
total size is 44,542,039,620  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  138G   50G  74% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also going to take down the webservers, so that they can&#039;t fuck-up the database worse, if we do start it in some recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I should also make a backup of /var/lib/mysql&lt;br /&gt;
# I&#039;m going to create a dif for all of this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mkdir /var/tmp/dbFail.20250417&lt;br /&gt;
[root@opensourceecology ~]# chown root:root /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041* /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# vim /var/tmp/dbFail.20250417/info.txt&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /var/tmp/dbFail.20250417/info.txt &lt;br /&gt;
2025-04-17: Marcin emailed me last night saying the wiki was down with a db error. Today I tried to start it, but it refues to come-up. Looks like it&#039;s preventing itself from starting because it realizes something is corrupt and starting it would make things worse. Internet says maybe this was fixed in a newer version; we can&#039;t upgrade because Cent is EOL. Hetzner3 has the newer version&lt;br /&gt;
&lt;br /&gt;
		 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
		Anyway, I&#039;m creating this folder to store some backups before we make things worse.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# aaaand I added a copy of /var/lib/mysql/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /var/lib/mysql /var/tmp/dbFail.20250417/var-lib-mysql.$(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
sending incremental file list&lt;br /&gt;
created directory /var/tmp/dbFail.20250417/var-lib-mysql.20250417&lt;br /&gt;
mysql/&lt;br /&gt;
mysql/aria_log.00000001&lt;br /&gt;
		 16,384 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=707/709)&lt;br /&gt;
...&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rate_locations.frm&lt;br /&gt;
		  8,714 100%    9.26kB/s    0:00:00 (xfr#689, to-chk=1/709)&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rates.frm&lt;br /&gt;
		 13,128 100%   13.95kB/s    0:00:00 (xfr#690, to-chk=0/709)&lt;br /&gt;
&lt;br /&gt;
sent 7,384,914,964 bytes  received 13,343 bytes  114,495,012.51 bytes/sec&lt;br /&gt;
total size is 7,383,062,830  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# another important note: apparently we can keep increasing the value of innodb_force_recovery until it starts, but anything &amp;gt;3 could corrupt the data worse https://dba.stackexchange.com/q/241714&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Marko, MariaDB Innodb lead: MDEV-15370 was a bug when ugprading to 10.3, caused by MDEV-12288. Actually upgrades can still fail (MDEV-15912) if a slow shutdown of the old server was not made. Because the scenario does not involve upgrading to 10.3 or later, I am afraid that the user witnessed some kind of undo log corruption. Starting up with innodb_force_recovery=3 might allow dumping all data. If that crashes, then try innodb_force_recovery=5, but be aware that anything &amp;gt;3 may corrupt the database further, and therefore you should not use the database for anything else than mysqldump&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, a lot of the links for how to fix this are now dead&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## https://forums.mysql.com/read.php?22,603093,604631#msg-604631&lt;br /&gt;
## https://support.plesk.com/hc/en-us/articles/12377798484375-Plesk-is-not-accessible-ERROR-Zend-Db-Adapter-Exception-SQLSTATE-HY000-2002-No-such-file-or-directory&lt;br /&gt;
# we&#039;re running 5.6, so it should be this https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html&lt;br /&gt;
## but note that redirects to 8.6 for some reason? https://dev.mysql.com/doc/refman/8.4/en/forcing-innodb-recovery.html&lt;br /&gt;
## ah, so does 1.1 – apparently anything it doesn&#039;t like just reidrects to the latest version https://dev.mysql.com/doc/refman/1.1/en/forcing-innodb-recovery.html&lt;br /&gt;
# this suggests that, if we&#039;re going to use innodb_force_recovery 4 or greater, we only do it on another machine. So basically take the data I just backed-up put it on a separate machine, and do the fucker *there* instead https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## it also says that dumps of 4 or greater could still render corrupt data, so they shouldn&#039;t be trusted, anyway&lt;br /&gt;
## good news: it says the db blocks all INSERT, UPDATE, and DELETE commands when any recovery mode is enabled&lt;br /&gt;
### but we *can* run DROP. so the idea is to dump everything in recovery mode and drop what is corrupt. then restart with the recovery value set to 0 and restore.&lt;br /&gt;
## it says that dumps from recover mode of 1 or 2 or 3 are safe, and only the page is corrupt&lt;br /&gt;
### here&#039;s the definition of a page https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A unit representing how much data InnoDB transfers at any one time between disk (the data files) and memory (the buffer pool). A page can contain one or more rows, depending on how much data is in each row. If a row does not fit entirely into a single page, InnoDB sets up additional pointer-style data structures so that the information about the row can be stored in one page.&lt;br /&gt;
&lt;br /&gt;
One way to fit more data in each page is to use compressed row format. For tables that use BLOBs or large text fields, compact row format allows those large columns to be stored separately from the rest of the row, reducing I/O overhead and memory usage for queries that do not reference those columns.&lt;br /&gt;
&lt;br /&gt;
When InnoDB reads or writes sets of pages as a batch to increase I/O throughput, it reads or writes an extent at a time.&lt;br /&gt;
&lt;br /&gt;
All the InnoDB disk data structures within a MySQL instance share the same page size.&lt;br /&gt;
&lt;br /&gt;
See Also buffer pool, compact row format, compressed row format, data files, extent, page size, row.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess that just means data that hasn&#039;t been written to disk yet. So I *think* it should be OK to trust data that only has corrupt pages?&lt;br /&gt;
# ok, I think I have enough to proceed – at least for recovery modes 1, 2, and 3.&lt;br /&gt;
# but first let&#039;s check SMART&lt;br /&gt;
# oh, fuck, my notes on this are on the wiki. Of course.&lt;br /&gt;
# arch wiki to the rescue https://wiki.archlinux.org/title/S.M.A.R.T.&lt;br /&gt;
# fail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
-bash: smartctl: command not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# luckily the yum servers for this EOL OS are still online, and I could install it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# yum install smartmontools&lt;br /&gt;
...&lt;br /&gt;
Total download size: 546 k&lt;br /&gt;
Installed size: 2.0 M&lt;br /&gt;
Is this ok [y/d/N]: y&lt;br /&gt;
Downloading packages:&lt;br /&gt;
smartmontools-7.0-2.el7.x86_64.rpm                                                                                                              | 546 kB  00:00:00     &lt;br /&gt;
Running transaction check&lt;br /&gt;
Running transaction test&lt;br /&gt;
Transaction test succeeded&lt;br /&gt;
Running transaction&lt;br /&gt;
  Installing : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
  Verifying  : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
&lt;br /&gt;
Installed:&lt;br /&gt;
  smartmontools.x86_64 1:7.0-2.el7                                                                                                                                     &lt;br /&gt;
&lt;br /&gt;
Complete!&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# better&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
SMART support is: Available - device has SMART capability.&lt;br /&gt;
SMART support is: Enabled&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well this is terrifying; it says both our disks are gonna fail within 24 hours&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# compare that to hetnzer3, which says all is good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m not 100% convinced that this is true. I still want to initiate a test on the drives, but I&#039;m going to go ahead and pass this to hetzner support asap and ask them if there&#039;s a fee for them to replace our drives.&lt;br /&gt;
# oh, interesting. they have a walkthrough that says it&#039;s free via Server -&amp;gt; Technical -&amp;gt; Disk Failure https://robot.hetzner.com/support/index&lt;br /&gt;
## well, it lists two options&lt;br /&gt;
### Free Replacement drive nearly new or used and tested; depends on what is in stock. &lt;br /&gt;
### At cost Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
## we were given an option if we should hot swap while the system is on or shutdown. I&#039;m going to say shutdown. That&#039;ll be simpler from the OS side, I think&lt;br /&gt;
## dang, it says they&#039;ll swap the drive within 2-4 hours.&lt;br /&gt;
# I&#039;ve never done this before, but it&#039;s a hardware raid. My understanding is that as soon as it comes-up, it&#039;ll begin copying the data from one disk to the other disk. But, christ, if both disks are fucked then which disk should I choose them to replace? Can I see which one is more fucked than the other?&lt;br /&gt;
# hetzner provides 4 docs for assistance on this&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#information-on-defective-drives&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/maintainance/nvme/#show-serial-number-of-a-specific-nvme-ssd&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#creating-a-complete-smart-log&lt;br /&gt;
# that first doc says to run the command we just ran&lt;br /&gt;
# hmm..it says for more info we should look at the &amp;quot;Failed Attributes&amp;quot; – but we have none for either disk&lt;br /&gt;
# ok, the docs say we can get more info with -A&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so both say &amp;quot;Percent_Lifetime_Remain&amp;quot; is an issue. does that mean it&#039;s not *actually* writing corrupt data, but it&#039;s literally just a timer that hit and said &amp;quot;yeah you should probably replace the disk??&amp;quot;&lt;br /&gt;
# well, &amp;quot;Percent_Lifetime_Remain&amp;quot; doesn&#039;t appear in the docs table. nor in the source wikipedia table https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes&lt;br /&gt;
# yeah, reddit suggests that means the drive &amp;quot;should be replaced soon&amp;quot; but not that it&#039;s actually detected as failing now https://www.reddit.com/r/homelab/comments/kaaqma/percent_lifetime_remain_failing_now/&lt;br /&gt;
# in that case, I guess it doesn&#039;t matter which disk we replace. But let&#039;s go ahead and get one replaced. I don&#039;t think this was the cause of the db corruption (I still think it&#039;s &amp;quot;shutting down the computer abruptly + a bug in old mariadb that prevents it from recovering&amp;quot;), but I would be stupid not to take a free replacement of a RAID1-mirrored disk that&#039;s alerting us that it&#039;s too old to be in prod.&lt;br /&gt;
# the second hetnzer docs refer to nvme. that&#039;s relevant on hetzner3 but not hetzner2. anyway, I do want to know how to check this on hetzer2 (even if I can&#039;t update the wiki right now with this docs)&lt;br /&gt;
# wow, the output for smartctl looks very different for NVMEs on Debian than it does on CentOS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        39 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    6%&lt;br /&gt;
Data Units Read:                    152.358.379 [78,0 TB]&lt;br /&gt;
Data Units Written:                 52.125.092 [26,6 TB]&lt;br /&gt;
Host Read Commands:                 6.873.372.480&lt;br /&gt;
Host Write Commands:                1.362.559.127&lt;br /&gt;
Controller Busy Time:               22.226&lt;br /&gt;
Power Cycles:                       28&lt;br /&gt;
Power On Hours:                     17.245&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      159&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               39 Celsius&lt;br /&gt;
Temperature Sensor 2:               48 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        40 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    7%&lt;br /&gt;
Data Units Read:                    140.811.605 [72,0 TB]&lt;br /&gt;
Data Units Written:                 56.604.901 [28,9 TB]&lt;br /&gt;
Host Read Commands:                 1.304.073.899&lt;br /&gt;
Host Write Commands:                1.364.668.115&lt;br /&gt;
Controller Busy Time:               21.180&lt;br /&gt;
Power Cycles:                       23&lt;br /&gt;
Power On Hours:                     15.565&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      149&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               40 Celsius&lt;br /&gt;
Temperature Sensor 2:               45 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that shows we&#039;re at 6% and 7% usage on hetzner3, whereas I guess we&#039;re at 100% on hetzner2&lt;br /&gt;
# the third hetzner doc refers to a software raid. actually, I thought we were using a hardware raid, but now I&#039;m not sure&lt;br /&gt;
# this indicates that our raid is fine. two UUs (eg `[UU]`) is fine. Bad would be a U and a missing U (eg `[U_]`)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat &lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sdb2[1] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[1] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah crap, the process to bring the new drive back into the RAID is not-trivial https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## first we have to format the new drive exactly as the old drive, then add each partition into the RAID array, then update grub. And, of course, meanwhile we&#039;ll be running on one disk. So if we fuck-up any of those steps, we loose everything. This could take me a few days (or weeks), and meanwhile the sites are all offline and our daily backups on backblaze are being deleted/rotated out of existance. Sadly, I think I&#039;m going to postpone this until after we get the sites back-up.&lt;br /&gt;
# the last hetzner doc shows us how to get the serial number of our disks (which hetzner will ask-for when we tell them to swap it)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and ran a SMART test; it says it&#039;ll take just 2 minutes to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:07:55 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:08:18 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also kicked-off a long test, which I can check tomorrow&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:12 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:14 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, then we have the filesystem. it looks like /var/lib/msyql/ lives on &#039;/&#039; which is /dev/md2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h /var/lib/mysql&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
/dev/md2        197G  145G   43G  78% /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/md2&lt;br /&gt;
&lt;br /&gt;
Disk /dev/md2: 215.0 GB, 215024271360 bytes, 419969280 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk /dev/md2&lt;br /&gt;
NAME MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
md2    9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it won&#039;t let me check the filesystem while it&#039;s mounted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fsck /dev/md2&lt;br /&gt;
fsck from util-linux 2.23.2&lt;br /&gt;
e2fsck 1.42.9 (28-Dec-2013)&lt;br /&gt;
/dev/md2 is mounted.&lt;br /&gt;
e2fsck: Cannot continue, aborting.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it probably should be happening on-boot, but I couldn&#039;t find it in dmesg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i check&lt;br /&gt;
[    0.000000] Early table checksum verification disabled&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i fsck&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, instead we can just use tune2fs to get the info on the last check that was run&lt;br /&gt;
# looks like it ran today; probably when Marcin rebooted it https://unix.stackexchange.com/questions/400851/what-should-i-do-to-force-the-root-filesystem-check-and-optionally-a-fix-at-bo&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2&lt;br /&gt;
tune2fs 1.42.9 (28-Dec-2013)&lt;br /&gt;
Filesystem volume name:   &amp;lt;none&amp;gt;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Filesystem UUID:          af18bd25-f715-4003-b055-170a07591c60&lt;br /&gt;
Filesystem magic number:  0xEF53&lt;br /&gt;
Filesystem revision #:    1 (dynamic)&lt;br /&gt;
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize&lt;br /&gt;
Filesystem flags:         signed_directory_hash&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Filesystem OS type:       Linux&lt;br /&gt;
Inode count:              13131776&lt;br /&gt;
Block count:              52496160&lt;br /&gt;
Reserved block count:     2624808&lt;br /&gt;
Free blocks:              26575102&lt;br /&gt;
Free inodes:              12417672&lt;br /&gt;
First block:              0&lt;br /&gt;
Block size:               4096&lt;br /&gt;
Fragment size:            4096&lt;br /&gt;
Reserved GDT blocks:      1011&lt;br /&gt;
Blocks per group:         32768&lt;br /&gt;
Fragments per group:      32768&lt;br /&gt;
Inodes per group:         8192&lt;br /&gt;
Inode blocks per group:   512&lt;br /&gt;
Flex block group size:    16&lt;br /&gt;
Filesystem created:       Tue May 31 06:01:12 2016&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Last write time:          Thu Apr 17 17:39:00 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
Check interval:           0 (&amp;lt;none&amp;gt;)&lt;br /&gt;
Lifetime writes:          124 TB&lt;br /&gt;
Reserved blocks uid:      0 (user root)&lt;br /&gt;
Reserved blocks gid:      0 (group root)&lt;br /&gt;
First inode:              11&lt;br /&gt;
Inode size:               256&lt;br /&gt;
Required extra isize:     28&lt;br /&gt;
Desired extra isize:      28&lt;br /&gt;
Journal inode:            8&lt;br /&gt;
Default directory hash:   half_md4&lt;br /&gt;
Directory Hash Seed:      b9456d9f-1608-4444-99c2-02e6f327e42d&lt;br /&gt;
Journal backup:           inode blocks&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# both of the filesystems (/ and /boot) look fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md1 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /boot&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              46&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Tue May 31 06:01:07 2016&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, so far I couldn&#039;t find any signs of corruption on the disk/fs level&lt;br /&gt;
# back to the db, I set the recovery option in the my.cnf file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# cp my.cnf my.cnf.20250417&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 1&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it didn&#039;t come-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried changing it to restore level 2; this time it got stuck &amp;quot;waiting for the background threads&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 22:32:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 22:32:49 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 14901 ...&lt;br /&gt;
250417 22:32:49 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 22:32:49 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 22:32:49 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 22:32:49 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 22:32:49 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 22:32:49 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 22:32:49 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 22:32:49  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 22:32:49  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 22:32:49  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:50  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:51  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:52  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:53  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:54  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:55  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:56  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:57  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:58  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it seems infinite. I don&#039;t know if it&#039;s going to time-out, but I&#039;m just going to leave it and come-back tomorrow.&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 11, 2025=&lt;br /&gt;
&lt;br /&gt;
# let&#039;s get Catarina that broken staging site for osemain on hetzner3&lt;br /&gt;
# Marcin still hasn&#039;t regained access to his ssh key (so he can update the ose keepass), but he did finally send me the password to our hetzner account&lt;br /&gt;
# so now I can order a second IPv4 address, as needed for obi &amp;amp; osemain to have two distinct sites on hetzner3&lt;br /&gt;
# I logged-into hetzner https://robot.hetzner.com/server&lt;br /&gt;
# I also typed a &amp;quot;name&amp;quot; into the blank &amp;quot;name&amp;quot; fields for our two servers. one is now called &amp;quot;hetzner2&amp;quot; and the new one &amp;quot;hetzner3&amp;quot;&lt;br /&gt;
# I clicked on the server for &amp;quot;hetzner3&amp;quot; and the tab &amp;quot;IPs&amp;quot;.&lt;br /&gt;
## Then I clicked on &amp;quot;Order additional IPs / Nets&amp;quot;&lt;br /&gt;
## I selected &amp;quot;One additional IP with costs (€ 1.70 max. per month / € 0.0027 per hour + € 4.90 once-off setup)&amp;quot;&lt;br /&gt;
## it required me to enter a reason (IPv4 is scarce) to which I wrote:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
we need to run two websites with the same domain name that are already running on our primary IPv4 address, and a client doesn&#039;t have IPv6 working at their office&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## and I clicked &amp;quot;Apply for IP/subnet in obligation&amp;quot;&lt;br /&gt;
## I got a message; looks like it needs human approval&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your request for additional IPs/subnets was successfully sent. We will send you an email as soon as your IP/subnet is ready.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I typed an email to Marcin and Catarina to notify them of this order&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
As authorized on our last call, I ordered an additional IPv4 address for your hetzner account.&lt;br /&gt;
&lt;br /&gt;
IPv4 addresses are scarce, and it appears that they need to approve it manually.&lt;br /&gt;
&lt;br /&gt;
The cost is €1.70 per month + € 4.90 once-off setup.&lt;br /&gt;
&lt;br /&gt;
This will allow us to run more than one website with the same domain off the same server. That will be needed for osemain and obi.&lt;br /&gt;
&lt;br /&gt;
Once you finish rebuilding those websites on hetzner3 to use a new not-broken theme, we can cancel this second IP address.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I finished typing ^ that email, I got an email from hetzner indicating that we have a new IP&lt;br /&gt;
# I refreshed the hetzner wui, and now I see the new IP&lt;br /&gt;
# ...&lt;br /&gt;
# following-up on the bus factor, I added Catarina &amp;amp; Tom&#039;s ssh keys to their authorized_keys files on hetzner3&lt;br /&gt;
## I sent them both emails asking them to confirm access&lt;br /&gt;
# I also emailed Marcin asking if he installed zulucrypt yet to try to recover his old ssh key&lt;br /&gt;
# update: within a few hours, Marcin had successfully decrypted and mounted his old veracrypt volume using zuluCrypt&lt;br /&gt;
# he created this article on the wiki https://wiki.opensourceecology.org/wiki/Zulucrypt&lt;br /&gt;
# I found that he had previously documented scattered articles about backups, luks, veracrypt, pgp, cybersec general, etc in a ton of different articles. So I spent some time adding categories and &amp;quot;see also&amp;quot; sections to those articles, in hopes he will be more easily able to do this in the future&lt;br /&gt;
# I also asked him to please document what he needed for himself 5 years from now into a README file next to the &#039;ose-veracrypt&#039; volume on his usb drive.&lt;br /&gt;
# Marcin confirmed that he was able to restore his ssh keys and ssh into hetzner3. awesome.&lt;br /&gt;
# ...&lt;br /&gt;
# I logged all my hours and sent an invoice to OSE for last month (Mar 2025)&lt;br /&gt;
# gah, I had obliterated half my 2025Q1 log. when I tried to restore it, I got a 413 error lgo&lt;br /&gt;
# I checked php and nginx; it&#039;s 10M. How did I write &amp;gt;10 MB of text in one quarter?&lt;br /&gt;
# there&#039;s too many layers on this server; I checked the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Apr 11 22:18:20.306872 2025] [:error] [pid 13182] [client 127.0.0.1:56606] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;], referer: https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&lt;br /&gt;
HTTP/1.1 413 Request Entity Too Large&lt;br /&gt;
Message: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413)&lt;br /&gt;
Apache-Error: [file &amp;quot;apache2_util.c&amp;quot;] [line 271] [level 3] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;]&lt;br /&gt;
127.0.0.1 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.0&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot;&lt;br /&gt;
146.70.199.124 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.1&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it&#039;s modsecurity?&lt;br /&gt;
# gah, that&#039;s a lot of files to review&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# find .  |grep -i security&lt;br /&gt;
./conf.d/mod_security.wordpress.include&lt;br /&gt;
./conf.d/mod_security.conf&lt;br /&gt;
./conf.modules.d/10-mod_security.conf&lt;br /&gt;
./modsecurity.d&lt;br /&gt;
./modsecurity.d/activated_rules&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_42_tight_security.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_35_bad_robots.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_45_trojans.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_48_local_exceptions.conf.example&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_bad_robots.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_23_request_limits.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_49_inbound_blocking.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_60_correlation.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound_malware.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_scanners.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_40_generic_attacks.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_50_outbound.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_47_common_exceptions.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_30_http_policy.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_20_protocol_violations.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_xss_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_59_outbound_blocking.conf&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf.20181024.orig&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf&lt;br /&gt;
./modsecurity.d/do_not_log_passwords.conf&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it&#039;s SecRequestBodyLimit http://stackoverflow.com/questions/13887812/ddg#14690797&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -irl &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf&lt;br /&gt;
modules/mod_security2.so&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s 13107200&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimit 13107200&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimitAction Reject&lt;br /&gt;
Binary file modules/mod_security2.so matches&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# docs say it&#039;s in bytes https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecRequestBodyLimit&lt;br /&gt;
# so 13107200 / 1024 / 1024 = 12.5 MB.&lt;br /&gt;
# jesus that&#039;s a lot of data; I&#039;m not gonna increase that in 4 places (nginx, apache, mod_security, php); let&#039;s just split it into two articles :(&lt;br /&gt;
# ...&lt;br /&gt;
# so Marcin is stressing urgancy to get Catarina a sandbox so she can rebuild osemain using some new theme that&#039;s not broken on the latest version of wordpress, php, etc on hetzner3&lt;br /&gt;
# I didn&#039;t want to do this site before the other less-priority ones, but it&#039;s just a sandbox&lt;br /&gt;
# I realized I never made a CHG file for osemain&lt;br /&gt;
# looks like I first did a snapshot Jan 31https://wiki.opensourceecology.org/wiki/Maltfield_Log/2025_Q1#Fri_Jan_31.2C_2025&lt;br /&gt;
# ugh, I just said I was &amp;quot;following the same guide as with the other sites&amp;quot;&lt;br /&gt;
## I was hoping to know which one to CHG to copy-from&lt;br /&gt;
## I guess it makes the most sense to copy from obi, which already has both a static and dynamic site setup (untested)&lt;br /&gt;
# ok, I made a first draft of our osemain CHG to migrate to hetnzer3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
# oh, crap, I&#039;m going to remove&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306060</id>
		<title>Maltfield Log/2025 Q2</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306060"/>
		<updated>2025-04-27T21:33:02Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: Apr 17&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the second quarter of the year 2025. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 17, 2025=&lt;br /&gt;
# Marcin sent me an email last night (and again this morning) asking why the wiki is down&lt;br /&gt;
# I hadn&#039;t touched ose infra since 6 days ago&lt;br /&gt;
# the wiki is still on hetzner2, which is on EOL Cent, so I&#039;m not terribly surprised it&#039;s falling apart.&lt;br /&gt;
# I first warned Marcin about this many years ago, and hopefully the migration to hetzner3 will be finished before the end of this year&lt;br /&gt;
# anyway, let&#039;s check what happened to the wiki on hetzner2&lt;br /&gt;
# it&#039;s a 500 error complaining about the db&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp9871:~$ curl -iL wiki.opensourceecology.org&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:52 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://wiki.opensourceecology.org/&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 500 Internal Server Error&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 17 Apr 2025 20:17:54 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 976&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Varnish: 434054&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish-v4&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h1&amp;gt;Sorry! This site is experiencing technical difficulties.&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Try waiting a few minutes and reloading.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;small&amp;gt;(Cannot access the database)&amp;lt;/small&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;hr /&amp;gt;&amp;lt;div style=&amp;quot;margin: 1.5em&amp;quot;&amp;gt;You can try searching via Google in the meantime.&amp;lt;br /&amp;gt;&lt;br /&gt;
&amp;lt;small&amp;gt;Note that their indexes of our content may be out of date.&amp;lt;/small&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;form method=&amp;quot;get&amp;quot; action=&amp;quot;//www.google.com/search&amp;quot; id=&amp;quot;googlesearch&amp;quot;&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;domains&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;num&amp;quot; value=&amp;quot;50&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;ie&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;hidden&amp;quot; name=&amp;quot;oe&amp;quot; value=&amp;quot;UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;q&amp;quot; size=&amp;quot;31&amp;quot; maxlength=&amp;quot;255&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;input type=&amp;quot;submit&amp;quot; name=&amp;quot;btnG&amp;quot; value=&amp;quot;Search&amp;quot; /&amp;gt;&lt;br /&gt;
	&amp;lt;p&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;https://wiki.opensourceecology.org&amp;quot; checked=&amp;quot;checked&amp;quot; /&amp;gt;Open Source Ecology&amp;lt;/label&amp;gt;&lt;br /&gt;
		&amp;lt;label&amp;gt;&amp;lt;input type=&amp;quot;radio&amp;quot; name=&amp;quot;sitesearch&amp;quot; value=&amp;quot;&amp;quot; /&amp;gt;WWW&amp;lt;/label&amp;gt;&lt;br /&gt;
	&amp;lt;/p&amp;gt;&lt;br /&gt;
user@disp9871:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# disk is fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G   96G   92G  52% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s no new logs in the apache error log when I hit the site in real-time (bypassing the cache)&lt;br /&gt;
# there&#039;s also no new logs in the mariadb error log when I hit the site in real-time&lt;br /&gt;
# well, the db isn&#039;t running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl status mariadb&lt;br /&gt;
● mariadb.service - MariaDB database server&lt;br /&gt;
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)&lt;br /&gt;
   Active: failed (Result: exit-code) since Thu 2025-04-17 17:39:24 UTC; 2h 42min ago&lt;br /&gt;
  Process: 1227 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)&lt;br /&gt;
  Process: 1226 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)&lt;br /&gt;
  Process: 1103 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)&lt;br /&gt;
 Main PID: 1226 (code=exited, status=0/SUCCESS)&lt;br /&gt;
&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-p...db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
Hint: Some lines were ellipsized, use -l to show in full.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# error logs aren&#039;t very helpful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# journalctl -fu mariadb&lt;br /&gt;
-- Logs begin at Thu 2025-04-17 17:38:59 UTC. --&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org systemd[1]: Starting MariaDB database server...&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mariadb-prepare-db-dir[1103]: If this is not the case, make sure the /var/lib/mysql is empty before running mariadb-prepare-db-dir.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Logging to &#039;/var/log/mariadb/mariadb.log&#039;.&lt;br /&gt;
Apr 17 17:39:22 opensourceecology.org mysqld_safe[1226]: 250417 17:39:22 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service: control process exited, code=exited status=1&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Failed to start MariaDB database server.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: Unit mariadb.service entered failed state.&lt;br /&gt;
Apr 17 17:39:24 opensourceecology.org systemd[1]: mariadb.service failed.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I try to restart it manually, nothing gets put in the journal logs, but there&#039;s a bunch to the actual log file that the journal log mentions (damn systemd)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the log that pops-up when we try a restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 20:24:31 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 20:24:31 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 10583 ...&lt;br /&gt;
250417 20:24:31 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 20:24:31 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 20:24:31 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 20:24:31 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 20:24:31 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 20:24:31 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 20:24:31 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 20:24:31  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 20:24:31  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 20:24:31  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 20:24:31  InnoDB: Assertion failure in thread 140093400303360 in file trx0purge.c line 822&lt;br /&gt;
InnoDB: Failing assertion: purge_sys-&amp;gt;purge_trx_no &amp;lt;= purge_sys-&amp;gt;rseg-&amp;gt;last_trx_no&lt;br /&gt;
InnoDB: We intentionally generate a memory trap.&lt;br /&gt;
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/&lt;br /&gt;
InnoDB: If you get repeated assertion failures or crashes, even&lt;br /&gt;
InnoDB: immediately after the mysqld startup, there may be&lt;br /&gt;
InnoDB: corruption in the InnoDB tablespace. Please refer to&lt;br /&gt;
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html&lt;br /&gt;
InnoDB: about forcing recovery.&lt;br /&gt;
250417 20:24:31 [ERROR] mysqld got signal 6 ;&lt;br /&gt;
This could be because you hit a bug. It is also possible that this binary&lt;br /&gt;
or one of the libraries it was linked against is corrupt, improperly built,&lt;br /&gt;
or misconfigured. This error can also be caused by malfunctioning hardware.&lt;br /&gt;
&lt;br /&gt;
To report this bug, see http://kb.askmonty.org/en/reporting-bugs&lt;br /&gt;
&lt;br /&gt;
We will try our best to scrape up some info that will hopefully help&lt;br /&gt;
diagnose the problem, but since we have already crashed,&lt;br /&gt;
something is definitely wrong and this may fail.&lt;br /&gt;
&lt;br /&gt;
Server version: 5.5.68-MariaDB&lt;br /&gt;
key_buffer_size=134217728&lt;br /&gt;
read_buffer_size=131072&lt;br /&gt;
max_used_connections=0&lt;br /&gt;
max_threads=153&lt;br /&gt;
thread_count=0&lt;br /&gt;
It is possible that mysqld could use up to&lt;br /&gt;
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466719 K  bytes of memory&lt;br /&gt;
Hope that&#039;s ok; if not, decrease some variables in the equation.&lt;br /&gt;
&lt;br /&gt;
Thread pointer: 0x0&lt;br /&gt;
Attempting backtrace. You can use the following information to find out&lt;br /&gt;
where mysqld died. If you see no messages after this, something went&lt;br /&gt;
terribly wrong...&lt;br /&gt;
stack_bottom = 0x0 thread_stack 0x48000&lt;br /&gt;
/usr/libexec/mysqld(my_print_stacktrace+0x3d)[0x563a1c105cad]&lt;br /&gt;
/usr/libexec/mysqld(handle_fatal_signal+0x515)[0x563a1bd19975]&lt;br /&gt;
sigaction.c:0(__restore_rt)[0x7f6a294c9630]&lt;br /&gt;
:0(__GI_raise)[0x7f6a27bf0387]&lt;br /&gt;
:0(__GI_abort)[0x7f6a27bf1a78]&lt;br /&gt;
/usr/libexec/mysqld(+0x63845f)[0x563a1beae45f]&lt;br /&gt;
/usr/libexec/mysqld(+0x638f69)[0x563a1beaef69]&lt;br /&gt;
/usr/libexec/mysqld(+0x73b504)[0x563a1bfb1504]&lt;br /&gt;
/usr/libexec/mysqld(+0x730487)[0x563a1bfa6487]&lt;br /&gt;
/usr/libexec/mysqld(+0x63b17d)[0x563a1beb117d]&lt;br /&gt;
/usr/libexec/mysqld(+0x62f0f6)[0x563a1bea50f6]&lt;br /&gt;
pthread_create.c:0(start_thread)[0x7f6a294c1ea5]&lt;br /&gt;
/lib64/libc.so.6(clone+0x6d)[0x7f6a27cb8b0d]&lt;br /&gt;
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains&lt;br /&gt;
information that should help you find out what is causing the crash.&lt;br /&gt;
250417 20:24:31 mysqld_safe mysqld from pid file /var/run/mariadb/mariadb.pid ended&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# google points to this https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
## they say it could be a bug that might be fixed in v5.7. We&#039;re using 5.5.68. hetzner3 uses 5.8.&lt;br /&gt;
# reddit says we&#039;re fucked and should restore from backup https://old.reddit.com/r/mysql/comments/d3nkc7/innodb_assertion_failure_in_thread_4560_in_file/&lt;br /&gt;
# before reading any more, I&#039;m going to immediately make a local copy of our most-recent backups&lt;br /&gt;
# looks like we have a backup from 13 hours ago and one from 27 hours ago&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ date&lt;br /&gt;
Thu Apr 17 20:36:56 UTC 2025&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync&lt;br /&gt;
total 21G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 17 07:49 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    21G Apr 17 07:48 daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# ls -lah /home/b2user/sync.old/&lt;br /&gt;
total 22G&lt;br /&gt;
drwxr-xr-x  2 root   root   4.0K Apr 16 07:52 .&lt;br /&gt;
drwx------ 10 b2user b2user 4.0K Apr 17 07:20 ..&lt;br /&gt;
-rw-r--r--  1 b2user root    22G Apr 16 07:52 daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this SE answer is helpful https://serverfault.com/questions/592793/mysql-crashed-and-wont-start-up&lt;br /&gt;
## it says we can force the db to start (in &amp;quot;recovery mode&amp;quot;) and then try to figure out which table is corrupted. Then we might be able to backup more-recent data from the not-corrupt tables and only recover the fucked table&lt;br /&gt;
## other warnings suggest solving the underlying issue: why did the data become corrupt?&lt;br /&gt;
## well, we know Marcin has been hard-resetting the server (via the hetzner wui) about every week because it keeps breaking since some months ago (it&#039;s EOL and not worth debugging)&lt;br /&gt;
## but it&#039;s also possible we have a worse issue, like a disk failing. We do have RAID1 tho, so idk. Still, it would be wise to check the SMART data and RAID logs and filesystem for corruption&lt;br /&gt;
# I sent a quick status update to Marcin so he knows the severity of the issue and that this isn&#039;t going to be fixed soon&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Your database is corrupt and won&#039;t start.&lt;br /&gt;
&lt;br /&gt;
Quick internet search for the error messages suggests this could be a bug that&#039;s been fixed in mariadb 5.7. You&#039;re using 5.6 and can&#039;t upgrade because your OS is EOL. hetnzer3 is running 5.8.&lt;br /&gt;
&lt;br /&gt;
 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
I&#039;m looking into seeing what is corrupt, what isn&#039;t corrupt, and if we can restore from backup.&lt;br /&gt;
&lt;br /&gt;
This is not going to be an easy or fast fix, sorry. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the backups of the backups finished&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /home/b2user/sync*/* /var/tmp/&lt;br /&gt;
sending incremental file list&lt;br /&gt;
daily_hetzner2_20250416_072001.tar.gpg&lt;br /&gt;
 22,975,631,986 100%  139.63MB/s    0:02:36 (xfr#1, to-chk=1/2)&lt;br /&gt;
daily_hetzner2_20250417_072001.tar.gpg&lt;br /&gt;
 21,566,407,634 100%  103.43MB/s    0:03:18 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 44,552,914,338 bytes  received 54 bytes  125,324,653.70 bytes/sec&lt;br /&gt;
total size is 44,542,039,620  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# df -h&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
devtmpfs         32G     0   32G   0% /dev&lt;br /&gt;
tmpfs            32G     0   32G   0% /dev/shm&lt;br /&gt;
tmpfs            32G   17M   32G   1% /run&lt;br /&gt;
tmpfs            32G     0   32G   0% /sys/fs/cgroup&lt;br /&gt;
/dev/md2        197G  138G   50G  74% /&lt;br /&gt;
/dev/md1        488M  386M   77M  84% /boot&lt;br /&gt;
tmpfs           6.3G     0  6.3G   0% /run/user/1005&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m also going to take down the webservers, so that they can&#039;t fuck-up the database worse, if we do start it in some recovery mode&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop httpd&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop varnish&lt;br /&gt;
[root@opensourceecology ~]# systemctl stop nginx&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I should also make a backup of /var/lib/mysql&lt;br /&gt;
# I&#039;m going to create a dif for all of this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mkdir /var/tmp/dbFail.20250417&lt;br /&gt;
[root@opensourceecology ~]# chown root:root /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# chmod 0700 /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041&lt;br /&gt;
[root@opensourceecology ~]# mv /var/tmp/daily_hetzner2_2025041* /var/tmp/dbFail.20250417/&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# vim /var/tmp/dbFail.20250417/info.txt&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /var/tmp/dbFail.20250417/info.txt &lt;br /&gt;
2025-04-17: Marcin emailed me last night saying the wiki was down with a db error. Today I tried to start it, but it refues to come-up. Looks like it&#039;s preventing itself from starting because it realizes something is corrupt and starting it would make things worse. Internet says maybe this was fixed in a newer version; we can&#039;t upgrade because Cent is EOL. Hetzner3 has the newer version&lt;br /&gt;
&lt;br /&gt;
		 * https://bugs.mysql.com/bug.php?id=61516&lt;br /&gt;
&lt;br /&gt;
		Anyway, I&#039;m creating this folder to store some backups before we make things worse.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# aaaand I added a copy of /var/lib/mysql/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# rsync -av --progress /var/lib/mysql /var/tmp/dbFail.20250417/var-lib-mysql.$(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
sending incremental file list&lt;br /&gt;
created directory /var/tmp/dbFail.20250417/var-lib-mysql.20250417&lt;br /&gt;
mysql/&lt;br /&gt;
mysql/aria_log.00000001&lt;br /&gt;
		 16,384 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=707/709)&lt;br /&gt;
...&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rate_locations.frm&lt;br /&gt;
		  8,714 100%    9.26kB/s    0:00:00 (xfr#689, to-chk=1/709)&lt;br /&gt;
mysql/store_db/wp_woocommerce_tax_rates.frm&lt;br /&gt;
		 13,128 100%   13.95kB/s    0:00:00 (xfr#690, to-chk=0/709)&lt;br /&gt;
&lt;br /&gt;
sent 7,384,914,964 bytes  received 13,343 bytes  114,495,012.51 bytes/sec&lt;br /&gt;
total size is 7,383,062,830  speedup is 1.00&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# another important note: apparently we can keep increasing the value of innodb_force_recovery until it starts, but anything &amp;gt;3 could corrupt the data worse https://dba.stackexchange.com/q/241714&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Marko, MariaDB Innodb lead: MDEV-15370 was a bug when ugprading to 10.3, caused by MDEV-12288. Actually upgrades can still fail (MDEV-15912) if a slow shutdown of the old server was not made. Because the scenario does not involve upgrading to 10.3 or later, I am afraid that the user witnessed some kind of undo log corruption. Starting up with innodb_force_recovery=3 might allow dumping all data. If that crashes, then try innodb_force_recovery=5, but be aware that anything &amp;gt;3 may corrupt the database further, and therefore you should not use the database for anything else than mysqldump&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Unfortunately, a lot of the links for how to fix this are now dead&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html&lt;br /&gt;
## https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## https://forums.mysql.com/read.php?22,603093,604631#msg-604631&lt;br /&gt;
## https://support.plesk.com/hc/en-us/articles/12377798484375-Plesk-is-not-accessible-ERROR-Zend-Db-Adapter-Exception-SQLSTATE-HY000-2002-No-such-file-or-directory&lt;br /&gt;
# we&#039;re running 5.6, so it should be this https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html&lt;br /&gt;
## but note that redirects to 8.6 for some reason? https://dev.mysql.com/doc/refman/8.4/en/forcing-innodb-recovery.html&lt;br /&gt;
## ah, so does 1.1 – apparently anything it doesn&#039;t like just reidrects to the latest version https://dev.mysql.com/doc/refman/1.1/en/forcing-innodb-recovery.html&lt;br /&gt;
# this suggests that, if we&#039;re going to use innodb_force_recovery 4 or greater, we only do it on another machine. So basically take the data I just backed-up put it on a separate machine, and do the fucker *there* instead https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
## it also says that dumps of 4 or greater could still render corrupt data, so they shouldn&#039;t be trusted, anyway&lt;br /&gt;
## good news: it says the db blocks all INSERT, UPDATE, and DELETE commands when any recovery mode is enabled&lt;br /&gt;
### but we *can* run DROP. so the idea is to dump everything in recovery mode and drop what is corrupt. then restart with the recovery value set to 0 and restore.&lt;br /&gt;
## it says that dumps from recover mode of 1 or 2 or 3 are safe, and only the page is corrupt&lt;br /&gt;
### here&#039;s the definition of a page https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A unit representing how much data InnoDB transfers at any one time between disk (the data files) and memory (the buffer pool). A page can contain one or more rows, depending on how much data is in each row. If a row does not fit entirely into a single page, InnoDB sets up additional pointer-style data structures so that the information about the row can be stored in one page.&lt;br /&gt;
&lt;br /&gt;
One way to fit more data in each page is to use compressed row format. For tables that use BLOBs or large text fields, compact row format allows those large columns to be stored separately from the rest of the row, reducing I/O overhead and memory usage for queries that do not reference those columns.&lt;br /&gt;
&lt;br /&gt;
When InnoDB reads or writes sets of pages as a batch to increase I/O throughput, it reads or writes an extent at a time.&lt;br /&gt;
&lt;br /&gt;
All the InnoDB disk data structures within a MySQL instance share the same page size.&lt;br /&gt;
&lt;br /&gt;
See Also buffer pool, compact row format, compressed row format, data files, extent, page size, row.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess that just means data that hasn&#039;t been written to disk yet. So I *think* it should be OK to trust data that only has corrupt pages?&lt;br /&gt;
# ok, I think I have enough to proceed – at least for recovery modes 1, 2, and 3.&lt;br /&gt;
# but first let&#039;s check SMART&lt;br /&gt;
# oh, fuck, my notes on this are on the wiki. Of course.&lt;br /&gt;
# arch wiki to the rescue https://wiki.archlinux.org/title/S.M.A.R.T.&lt;br /&gt;
# fail&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
-bash: smartctl: command not found&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# luckily the yum servers for this EOL OS are still online, and I could install it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# yum install smartmontools&lt;br /&gt;
...&lt;br /&gt;
Total download size: 546 k&lt;br /&gt;
Installed size: 2.0 M&lt;br /&gt;
Is this ok [y/d/N]: y&lt;br /&gt;
Downloading packages:&lt;br /&gt;
smartmontools-7.0-2.el7.x86_64.rpm                                                                                                              | 546 kB  00:00:00     &lt;br /&gt;
Running transaction check&lt;br /&gt;
Running transaction test&lt;br /&gt;
Transaction test succeeded&lt;br /&gt;
Running transaction&lt;br /&gt;
  Installing : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
  Verifying  : 1:smartmontools-7.0-2.el7.x86_64                                                                                                                    1/1 &lt;br /&gt;
&lt;br /&gt;
Installed:&lt;br /&gt;
  smartmontools.x86_64 1:7.0-2.el7                                                                                                                                     &lt;br /&gt;
&lt;br /&gt;
Complete!&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# better&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl --info /dev/sda | grep &#039;SMART support is:&#039;&lt;br /&gt;
SMART support is: Available - device has SMART capability.&lt;br /&gt;
SMART support is: Enabled&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well this is terrifying; it says both our disks are gonna fail within 24 hours&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# compare that to hetnzer3, which says all is good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -H /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m not 100% convinced that this is true. I still want to initiate a test on the drives, but I&#039;m going to go ahead and pass this to hetzner support asap and ask them if there&#039;s a fee for them to replace our drives.&lt;br /&gt;
# oh, interesting. they have a walkthrough that says it&#039;s free via Server -&amp;gt; Technical -&amp;gt; Disk Failure https://robot.hetzner.com/support/index&lt;br /&gt;
## well, it lists two options&lt;br /&gt;
### Free Replacement drive nearly new or used and tested; depends on what is in stock. &lt;br /&gt;
### At cost Replacement drive guaranteed to be nearly new (less than 1000 hours of operation); one-time fee € 41.18 (excl. VAT); may not be in stock.&lt;br /&gt;
## we were given an option if we should hot swap while the system is on or shutdown. I&#039;m going to say shutdown. That&#039;ll be simpler from the OS side, I think&lt;br /&gt;
## dang, it says they&#039;ll swap the drive within 2-4 hours.&lt;br /&gt;
# I&#039;ve never done this before, but it&#039;s a hardware raid. My understanding is that as soon as it comes-up, it&#039;ll begin copying the data from one disk to the other disk. But, christ, if both disks are fucked then which disk should I choose them to replace? Can I see which one is more fucked than the other?&lt;br /&gt;
# hetzner provides 4 docs for assistance on this&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#information-on-defective-drives&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/maintainance/nvme/#show-serial-number-of-a-specific-nvme-ssd&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/#creating-a-complete-smart-log&lt;br /&gt;
# that first doc says to run the command we just ran&lt;br /&gt;
# hmm..it says for more info we should look at the &amp;quot;Failed Attributes&amp;quot; – but we have none for either disk&lt;br /&gt;
# ok, the docs say we can get more info with -A&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so both say &amp;quot;Percent_Lifetime_Remain&amp;quot; is an issue. does that mean it&#039;s not *actually* writing corrupt data, but it&#039;s literally just a timer that hit and said &amp;quot;yeah you should probably replace the disk??&amp;quot;&lt;br /&gt;
# well, &amp;quot;Percent_Lifetime_Remain&amp;quot; doesn&#039;t appear in the docs table. nor in the source wikipedia table https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology#Known_ATA_S.M.A.R.T._attributes&lt;br /&gt;
# yeah, reddit suggests that means the drive &amp;quot;should be replaced soon&amp;quot; but not that it&#039;s actually detected as failing now https://www.reddit.com/r/homelab/comments/kaaqma/percent_lifetime_remain_failing_now/&lt;br /&gt;
# in that case, I guess it doesn&#039;t matter which disk we replace. But let&#039;s go ahead and get one replaced. I don&#039;t think this was the cause of the db corruption (I still think it&#039;s &amp;quot;shutting down the computer abruptly + a bug in old mariadb that prevents it from recovering&amp;quot;), but I would be stupid not to take a free replacement of a RAID1-mirrored disk that&#039;s alerting us that it&#039;s too old to be in prod.&lt;br /&gt;
# the second hetnzer docs refer to nvme. that&#039;s relevant on hetzner3 but not hetzner2. anyway, I do want to know how to check this on hetzer2 (even if I can&#039;t update the wiki right now with this docs)&lt;br /&gt;
# wow, the output for smartctl looks very different for NVMEs on Debian than it does on CentOS&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        39 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    6%&lt;br /&gt;
Data Units Read:                    152.358.379 [78,0 TB]&lt;br /&gt;
Data Units Written:                 52.125.092 [26,6 TB]&lt;br /&gt;
Host Read Commands:                 6.873.372.480&lt;br /&gt;
Host Write Commands:                1.362.559.127&lt;br /&gt;
Controller Busy Time:               22.226&lt;br /&gt;
Power Cycles:                       28&lt;br /&gt;
Power On Hours:                     17.245&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      159&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               39 Celsius&lt;br /&gt;
Temperature Sensor 2:               48 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # smartctl -A /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF SMART DATA SECTION&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        40 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    7%&lt;br /&gt;
Data Units Read:                    140.811.605 [72,0 TB]&lt;br /&gt;
Data Units Written:                 56.604.901 [28,9 TB]&lt;br /&gt;
Host Read Commands:                 1.304.073.899&lt;br /&gt;
Host Write Commands:                1.364.668.115&lt;br /&gt;
Controller Busy Time:               21.180&lt;br /&gt;
Power Cycles:                       23&lt;br /&gt;
Power On Hours:                     15.565&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      149&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               40 Celsius&lt;br /&gt;
Temperature Sensor 2:               45 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that shows we&#039;re at 6% and 7% usage on hetzner3, whereas I guess we&#039;re at 100% on hetzner2&lt;br /&gt;
# the third hetzner doc refers to a software raid. actually, I thought we were using a hardware raid, but now I&#039;m not sure&lt;br /&gt;
# this indicates that our raid is fine. two UUs (eg `[UU]`) is fine. Bad would be a U and a missing U (eg `[U_]`)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat &lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sdb2[1] sda2[0]&lt;br /&gt;
	  523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
	  209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
	  bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[1] sda1[0]&lt;br /&gt;
	  33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah crap, the process to bring the new drive back into the RAID is not-trivial https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
## first we have to format the new drive exactly as the old drive, then add each partition into the RAID array, then update grub. And, of course, meanwhile we&#039;ll be running on one disk. So if we fuck-up any of those steps, we loose everything. This could take me a few days (or weeks), and meanwhile the sites are all offline and our daily backups on backblaze are being deleted/rotated out of existance. Sadly, I think I&#039;m going to postpone this until after we get the sites back-up.&lt;br /&gt;
# the last hetzner doc shows us how to get the serial number of our disks (which hetzner will ask-for when we tell them to swap it)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA336C&lt;br /&gt;
ID_SERIAL_SHORT=154410FA336C&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
ID_SERIAL=Crucial_CT250MX200SSD1_154410FA4520&lt;br /&gt;
ID_SERIAL_SHORT=154410FA4520&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went ahead and ran a SMART test; it says it&#039;ll take just 2 minutes to run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:07:55 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t short /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Short self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 2 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:08:18 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also kicked-off a long test, which I can check tomorrow&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:12 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -t long /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===&lt;br /&gt;
Sending command: &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot;.&lt;br /&gt;
Drive command &amp;quot;Execute SMART Extended self-test routine immediately in off-line mode&amp;quot; successful.&lt;br /&gt;
Testing has begun.&lt;br /&gt;
Please wait 5 minutes for test to complete.&lt;br /&gt;
Test will complete after Thu Apr 17 22:15:14 2025&lt;br /&gt;
&lt;br /&gt;
Use smartctl -X to abort test.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, then we have the filesystem. it looks like /var/lib/msyql/ lives on &#039;/&#039; which is /dev/md2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# df -h /var/lib/mysql&lt;br /&gt;
Filesystem      Size  Used Avail Use% Mounted on&lt;br /&gt;
/dev/md2        197G  145G   43G  78% /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# fdisk -l /dev/md2&lt;br /&gt;
&lt;br /&gt;
Disk /dev/md2: 215.0 GB, 215024271360 bytes, 419969280 sectors&lt;br /&gt;
Units = sectors of 1 * 512 = 512 bytes&lt;br /&gt;
Sector size (logical/physical): 512 bytes / 4096 bytes&lt;br /&gt;
I/O size (minimum/optimal): 4096 bytes / 4096 bytes&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# lsblk /dev/md2&lt;br /&gt;
NAME MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT&lt;br /&gt;
md2    9:2    0 200.3G  0 raid1 /&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it won&#039;t let me check the filesystem while it&#039;s mounted&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# fsck /dev/md2&lt;br /&gt;
fsck from util-linux 2.23.2&lt;br /&gt;
e2fsck 1.42.9 (28-Dec-2013)&lt;br /&gt;
/dev/md2 is mounted.&lt;br /&gt;
e2fsck: Cannot continue, aborting.&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it probably should be happening on-boot, but I couldn&#039;t find it in dmesg&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i check&lt;br /&gt;
[    0.000000] Early table checksum verification disabled&lt;br /&gt;
[root@opensourceecology ~]# dmesg | grep -i fsck&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, instead we can just use tune2fs to get the info on the last check that was run&lt;br /&gt;
# looks like it ran today; probably when Marcin rebooted it https://unix.stackexchange.com/questions/400851/what-should-i-do-to-force-the-root-filesystem-check-and-optionally-a-fix-at-bo&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2&lt;br /&gt;
tune2fs 1.42.9 (28-Dec-2013)&lt;br /&gt;
Filesystem volume name:   &amp;lt;none&amp;gt;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Filesystem UUID:          af18bd25-f715-4003-b055-170a07591c60&lt;br /&gt;
Filesystem magic number:  0xEF53&lt;br /&gt;
Filesystem revision #:    1 (dynamic)&lt;br /&gt;
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize&lt;br /&gt;
Filesystem flags:         signed_directory_hash&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Filesystem OS type:       Linux&lt;br /&gt;
Inode count:              13131776&lt;br /&gt;
Block count:              52496160&lt;br /&gt;
Reserved block count:     2624808&lt;br /&gt;
Free blocks:              26575102&lt;br /&gt;
Free inodes:              12417672&lt;br /&gt;
First block:              0&lt;br /&gt;
Block size:               4096&lt;br /&gt;
Fragment size:            4096&lt;br /&gt;
Reserved GDT blocks:      1011&lt;br /&gt;
Blocks per group:         32768&lt;br /&gt;
Fragments per group:      32768&lt;br /&gt;
Inodes per group:         8192&lt;br /&gt;
Inode blocks per group:   512&lt;br /&gt;
Flex block group size:    16&lt;br /&gt;
Filesystem created:       Tue May 31 06:01:12 2016&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Last write time:          Thu Apr 17 17:39:00 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
Check interval:           0 (&amp;lt;none&amp;gt;)&lt;br /&gt;
Lifetime writes:          124 TB&lt;br /&gt;
Reserved blocks uid:      0 (user root)&lt;br /&gt;
Reserved blocks gid:      0 (group root)&lt;br /&gt;
First inode:              11&lt;br /&gt;
Inode size:               256&lt;br /&gt;
Required extra isize:     28&lt;br /&gt;
Desired extra isize:      28&lt;br /&gt;
Journal inode:            8&lt;br /&gt;
Default directory hash:   half_md4&lt;br /&gt;
Directory Hash Seed:      b9456d9f-1608-4444-99c2-02e6f327e42d&lt;br /&gt;
Journal backup:           inode blocks&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# both of the filesystems (/ and /boot) look fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md1 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /boot&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              46&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Tue May 31 06:01:07 2016&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# tune2fs -l /dev/md2 | grep -iE &#039;state|error|mount|checked&#039;&lt;br /&gt;
Last mounted on:          /&lt;br /&gt;
Default mount options:    user_xattr acl&lt;br /&gt;
Filesystem state:         clean&lt;br /&gt;
Errors behavior:          Continue&lt;br /&gt;
Last mount time:          Thu Apr 17 17:39:11 2025&lt;br /&gt;
Mount count:              1&lt;br /&gt;
Maximum mount count:      -1&lt;br /&gt;
Last checked:             Thu Apr 17 17:39:00 2025&lt;br /&gt;
[root@opensourceecology ~]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, so far I couldn&#039;t find any signs of corruption on the disk/fs level&lt;br /&gt;
# back to the db, I set the recovery option in the my.cnf file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# cp my.cnf my.cnf.20250417&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# vim my.cnf&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology etc]# diff my.cnf.20250417 my.cnf&lt;br /&gt;
1a2,5&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # attempt to recover corrupt db https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html&lt;br /&gt;
&amp;gt; innodb_force_recovery = 1&lt;br /&gt;
&amp;gt; &lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it didn&#039;t come-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology etc]# systemctl restart mariadb&lt;br /&gt;
Job for mariadb.service failed because the control process exited with error code. See &amp;quot;systemctl status mariadb.service&amp;quot; and &amp;quot;journalctl -xe&amp;quot; for details.&lt;br /&gt;
[root@opensourceecology etc]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried changing it to restore level 2; this time it got stuck &amp;quot;waiting for the background threads&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
250417 22:32:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql&lt;br /&gt;
250417 22:32:49 [Note] /usr/libexec/mysqld (mysqld 5.5.68-MariaDB) starting as process 14901 ...&lt;br /&gt;
250417 22:32:49 InnoDB: The InnoDB memory heap is disabled&lt;br /&gt;
250417 22:32:49 InnoDB: Mutexes and rw_locks use GCC atomic builtins&lt;br /&gt;
250417 22:32:49 InnoDB: Compressed tables use zlib 1.2.7&lt;br /&gt;
250417 22:32:49 InnoDB: Using Linux native AIO&lt;br /&gt;
250417 22:32:49 InnoDB: Initializing buffer pool, size = 128.0M&lt;br /&gt;
250417 22:32:49 InnoDB: Completed initialization of buffer pool&lt;br /&gt;
250417 22:32:49 InnoDB: highest supported file format is Barracuda.&lt;br /&gt;
250417 22:32:49  InnoDB: Starting crash recovery from checkpoint LSN=625883462907&lt;br /&gt;
InnoDB: Restoring possible half-written data pages from the doublewrite buffer...&lt;br /&gt;
250417 22:32:49  InnoDB: Starting final batch to recover 11 pages from redo log&lt;br /&gt;
250417 22:32:49  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:50  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:51  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:52  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:53  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:54  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:55  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:56  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:57  InnoDB: Waiting for the background threads to start&lt;br /&gt;
250417 22:32:58  InnoDB: Waiting for the background threads to start&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it seems infinite. I don&#039;t know if it&#039;s going to time-out, but I&#039;m just going to leave it and come-back tomorrow.&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 11, 2025=&lt;br /&gt;
&lt;br /&gt;
# let&#039;s get Catarina that broken staging site for osemain on hetzner3&lt;br /&gt;
# Marcin still hasn&#039;t regained access to his ssh key (so he can update the ose keepass), but he did finally send me the password to our hetzner account&lt;br /&gt;
# so now I can order a second IPv4 address, as needed for obi &amp;amp; osemain to have two distinct sites on hetzner3&lt;br /&gt;
# I logged-into hetzner https://robot.hetzner.com/server&lt;br /&gt;
# I also typed a &amp;quot;name&amp;quot; into the blank &amp;quot;name&amp;quot; fields for our two servers. one is now called &amp;quot;hetzner2&amp;quot; and the new one &amp;quot;hetzner3&amp;quot;&lt;br /&gt;
# I clicked on the server for &amp;quot;hetzner3&amp;quot; and the tab &amp;quot;IPs&amp;quot;.&lt;br /&gt;
## Then I clicked on &amp;quot;Order additional IPs / Nets&amp;quot;&lt;br /&gt;
## I selected &amp;quot;One additional IP with costs (€ 1.70 max. per month / € 0.0027 per hour + € 4.90 once-off setup)&amp;quot;&lt;br /&gt;
## it required me to enter a reason (IPv4 is scarce) to which I wrote:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
we need to run two websites with the same domain name that are already running on our primary IPv4 address, and a client doesn&#039;t have IPv6 working at their office&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## and I clicked &amp;quot;Apply for IP/subnet in obligation&amp;quot;&lt;br /&gt;
## I got a message; looks like it needs human approval&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your request for additional IPs/subnets was successfully sent. We will send you an email as soon as your IP/subnet is ready.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I typed an email to Marcin and Catarina to notify them of this order&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
As authorized on our last call, I ordered an additional IPv4 address for your hetzner account.&lt;br /&gt;
&lt;br /&gt;
IPv4 addresses are scarce, and it appears that they need to approve it manually.&lt;br /&gt;
&lt;br /&gt;
The cost is €1.70 per month + € 4.90 once-off setup.&lt;br /&gt;
&lt;br /&gt;
This will allow us to run more than one website with the same domain off the same server. That will be needed for osemain and obi.&lt;br /&gt;
&lt;br /&gt;
Once you finish rebuilding those websites on hetzner3 to use a new not-broken theme, we can cancel this second IP address.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I finished typing ^ that email, I got an email from hetzner indicating that we have a new IP&lt;br /&gt;
# I refreshed the hetzner wui, and now I see the new IP&lt;br /&gt;
# ...&lt;br /&gt;
# following-up on the bus factor, I added Catarina &amp;amp; Tom&#039;s ssh keys to their authorized_keys files on hetzner3&lt;br /&gt;
## I sent them both emails asking them to confirm access&lt;br /&gt;
# I also emailed Marcin asking if he installed zulucrypt yet to try to recover his old ssh key&lt;br /&gt;
# update: within a few hours, Marcin had successfully decrypted and mounted his old veracrypt volume using zuluCrypt&lt;br /&gt;
# he created this article on the wiki https://wiki.opensourceecology.org/wiki/Zulucrypt&lt;br /&gt;
# I found that he had previously documented scattered articles about backups, luks, veracrypt, pgp, cybersec general, etc in a ton of different articles. So I spent some time adding categories and &amp;quot;see also&amp;quot; sections to those articles, in hopes he will be more easily able to do this in the future&lt;br /&gt;
# I also asked him to please document what he needed for himself 5 years from now into a README file next to the &#039;ose-veracrypt&#039; volume on his usb drive.&lt;br /&gt;
# Marcin confirmed that he was able to restore his ssh keys and ssh into hetzner3. awesome.&lt;br /&gt;
# ...&lt;br /&gt;
# I logged all my hours and sent an invoice to OSE for last month (Mar 2025)&lt;br /&gt;
# gah, I had obliterated half my 2025Q1 log. when I tried to restore it, I got a 413 error lgo&lt;br /&gt;
# I checked php and nginx; it&#039;s 10M. How did I write &amp;gt;10 MB of text in one quarter?&lt;br /&gt;
# there&#039;s too many layers on this server; I checked the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Apr 11 22:18:20.306872 2025] [:error] [pid 13182] [client 127.0.0.1:56606] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;], referer: https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&lt;br /&gt;
HTTP/1.1 413 Request Entity Too Large&lt;br /&gt;
Message: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413)&lt;br /&gt;
Apache-Error: [file &amp;quot;apache2_util.c&amp;quot;] [line 271] [level 3] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;]&lt;br /&gt;
127.0.0.1 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.0&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot;&lt;br /&gt;
146.70.199.124 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.1&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it&#039;s modsecurity?&lt;br /&gt;
# gah, that&#039;s a lot of files to review&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# find .  |grep -i security&lt;br /&gt;
./conf.d/mod_security.wordpress.include&lt;br /&gt;
./conf.d/mod_security.conf&lt;br /&gt;
./conf.modules.d/10-mod_security.conf&lt;br /&gt;
./modsecurity.d&lt;br /&gt;
./modsecurity.d/activated_rules&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_42_tight_security.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_35_bad_robots.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_45_trojans.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_48_local_exceptions.conf.example&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_bad_robots.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_23_request_limits.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_49_inbound_blocking.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_60_correlation.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound_malware.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_scanners.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_40_generic_attacks.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_50_outbound.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_47_common_exceptions.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_30_http_policy.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_20_protocol_violations.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_xss_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_59_outbound_blocking.conf&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf.20181024.orig&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf&lt;br /&gt;
./modsecurity.d/do_not_log_passwords.conf&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it&#039;s SecRequestBodyLimit http://stackoverflow.com/questions/13887812/ddg#14690797&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -irl &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf&lt;br /&gt;
modules/mod_security2.so&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s 13107200&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimit 13107200&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimitAction Reject&lt;br /&gt;
Binary file modules/mod_security2.so matches&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# docs say it&#039;s in bytes https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecRequestBodyLimit&lt;br /&gt;
# so 13107200 / 1024 / 1024 = 12.5 MB.&lt;br /&gt;
# jesus that&#039;s a lot of data; I&#039;m not gonna increase that in 4 places (nginx, apache, mod_security, php); let&#039;s just split it into two articles :(&lt;br /&gt;
# ...&lt;br /&gt;
# so Marcin is stressing urgancy to get Catarina a sandbox so she can rebuild osemain using some new theme that&#039;s not broken on the latest version of wordpress, php, etc on hetzner3&lt;br /&gt;
# I didn&#039;t want to do this site before the other less-priority ones, but it&#039;s just a sandbox&lt;br /&gt;
# I realized I never made a CHG file for osemain&lt;br /&gt;
# looks like I first did a snapshot Jan 31https://wiki.opensourceecology.org/wiki/Maltfield_Log/2025_Q1#Fri_Jan_31.2C_2025&lt;br /&gt;
# ugh, I just said I was &amp;quot;following the same guide as with the other sites&amp;quot;&lt;br /&gt;
## I was hoping to know which one to CHG to copy-from&lt;br /&gt;
## I guess it makes the most sense to copy from obi, which already has both a static and dynamic site setup (untested)&lt;br /&gt;
# ok, I made a first draft of our osemain CHG to migrate to hetnzer3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
# oh, crap, I&#039;m going to remove&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306059</id>
		<title>Maltfield Log/2025 Q2</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q2&amp;diff=306059"/>
		<updated>2025-04-27T21:29:28Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* Sun Apr 11, 2025 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the second quarter of the year 2025. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Sun Apr 11, 2025=&lt;br /&gt;
&lt;br /&gt;
# let&#039;s get Catarina that broken staging site for osemain on hetzner3&lt;br /&gt;
# Marcin still hasn&#039;t regained access to his ssh key (so he can update the ose keepass), but he did finally send me the password to our hetzner account&lt;br /&gt;
# so now I can order a second IPv4 address, as needed for obi &amp;amp; osemain to have two distinct sites on hetzner3&lt;br /&gt;
# I logged-into hetzner https://robot.hetzner.com/server&lt;br /&gt;
# I also typed a &amp;quot;name&amp;quot; into the blank &amp;quot;name&amp;quot; fields for our two servers. one is now called &amp;quot;hetzner2&amp;quot; and the new one &amp;quot;hetzner3&amp;quot;&lt;br /&gt;
# I clicked on the server for &amp;quot;hetzner3&amp;quot; and the tab &amp;quot;IPs&amp;quot;.&lt;br /&gt;
## Then I clicked on &amp;quot;Order additional IPs / Nets&amp;quot;&lt;br /&gt;
## I selected &amp;quot;One additional IP with costs (€ 1.70 max. per month / € 0.0027 per hour + € 4.90 once-off setup)&amp;quot;&lt;br /&gt;
## it required me to enter a reason (IPv4 is scarce) to which I wrote:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
we need to run two websites with the same domain name that are already running on our primary IPv4 address, and a client doesn&#039;t have IPv6 working at their office&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## and I clicked &amp;quot;Apply for IP/subnet in obligation&amp;quot;&lt;br /&gt;
## I got a message; looks like it needs human approval&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your request for additional IPs/subnets was successfully sent. We will send you an email as soon as your IP/subnet is ready.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I typed an email to Marcin and Catarina to notify them of this order&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
As authorized on our last call, I ordered an additional IPv4 address for your hetzner account.&lt;br /&gt;
&lt;br /&gt;
IPv4 addresses are scarce, and it appears that they need to approve it manually.&lt;br /&gt;
&lt;br /&gt;
The cost is €1.70 per month + € 4.90 once-off setup.&lt;br /&gt;
&lt;br /&gt;
This will allow us to run more than one website with the same domain off the same server. That will be needed for osemain and obi.&lt;br /&gt;
&lt;br /&gt;
Once you finish rebuilding those websites on hetzner3 to use a new not-broken theme, we can cancel this second IP address.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# before I finished typing ^ that email, I got an email from hetzner indicating that we have a new IP&lt;br /&gt;
# I refreshed the hetzner wui, and now I see the new IP&lt;br /&gt;
# ...&lt;br /&gt;
# following-up on the bus factor, I added Catarina &amp;amp; Tom&#039;s ssh keys to their authorized_keys files on hetzner3&lt;br /&gt;
## I sent them both emails asking them to confirm access&lt;br /&gt;
# I also emailed Marcin asking if he installed zulucrypt yet to try to recover his old ssh key&lt;br /&gt;
# update: within a few hours, Marcin had successfully decrypted and mounted his old veracrypt volume using zuluCrypt&lt;br /&gt;
# he created this article on the wiki https://wiki.opensourceecology.org/wiki/Zulucrypt&lt;br /&gt;
# I found that he had previously documented scattered articles about backups, luks, veracrypt, pgp, cybersec general, etc in a ton of different articles. So I spent some time adding categories and &amp;quot;see also&amp;quot; sections to those articles, in hopes he will be more easily able to do this in the future&lt;br /&gt;
# I also asked him to please document what he needed for himself 5 years from now into a README file next to the &#039;ose-veracrypt&#039; volume on his usb drive.&lt;br /&gt;
# Marcin confirmed that he was able to restore his ssh keys and ssh into hetzner3. awesome.&lt;br /&gt;
# ...&lt;br /&gt;
# I logged all my hours and sent an invoice to OSE for last month (Mar 2025)&lt;br /&gt;
# gah, I had obliterated half my 2025Q1 log. when I tried to restore it, I got a 413 error lgo&lt;br /&gt;
# I checked php and nginx; it&#039;s 10M. How did I write &amp;gt;10 MB of text in one quarter?&lt;br /&gt;
# there&#039;s too many layers on this server; I checked the logs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Apr 11 22:18:20.306872 2025] [:error] [pid 13182] [client 127.0.0.1:56606] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;], referer: https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&lt;br /&gt;
HTTP/1.1 413 Request Entity Too Large&lt;br /&gt;
Message: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413)&lt;br /&gt;
Apache-Error: [file &amp;quot;apache2_util.c&amp;quot;] [line 271] [level 3] [client 127.0.0.1] ModSecurity: Request body no files data length is larger than the configured limit (1000000).. Deny with code (413) [hostname &amp;quot;wiki.opensourceecology.org&amp;quot;] [uri &amp;quot;/index.php&amp;quot;] [unique_id &amp;quot;Z-mVLLwDarHC@6u2-5xhBgAAAAg&amp;quot;]&lt;br /&gt;
127.0.0.1 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.0&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot;&lt;br /&gt;
146.70.199.124 - - [11/Apr/2025:22:18:20 +0000] &amp;quot;POST /index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=submit HTTP/1.1&amp;quot; 413 338 &amp;quot;https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2025_Q1&amp;amp;action=edit&amp;quot; &amp;quot;Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it&#039;s modsecurity?&lt;br /&gt;
# gah, that&#039;s a lot of files to review&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# find .  |grep -i security&lt;br /&gt;
./conf.d/mod_security.wordpress.include&lt;br /&gt;
./conf.d/mod_security.conf&lt;br /&gt;
./conf.modules.d/10-mod_security.conf&lt;br /&gt;
./modsecurity.d&lt;br /&gt;
./modsecurity.d/activated_rules&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_42_tight_security.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_35_bad_robots.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_45_trojans.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_48_local_exceptions.conf.example&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_bad_robots.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_23_request_limits.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_sql_injection_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_49_inbound_blocking.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_60_correlation.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_50_outbound_malware.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_35_scanners.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_40_generic_attacks.data&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_50_outbound.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_47_common_exceptions.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_30_http_policy.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_20_protocol_violations.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_41_xss_attacks.conf&lt;br /&gt;
./modsecurity.d/activated_rules/modsecurity_crs_59_outbound_blocking.conf&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf.20181024.orig&lt;br /&gt;
./modsecurity.d/modsecurity_crs_10_config.conf&lt;br /&gt;
./modsecurity.d/do_not_log_passwords.conf&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like it&#039;s SecRequestBodyLimit http://stackoverflow.com/questions/13887812/ddg#14690797&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -irl &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf&lt;br /&gt;
modules/mod_security2.so&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s 13107200&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;BodyLimit&#039; *&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimit 13107200&lt;br /&gt;
conf.d/mod_security.conf:    SecRequestBodyLimitAction Reject&lt;br /&gt;
Binary file modules/mod_security2.so matches&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# docs say it&#039;s in bytes https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecRequestBodyLimit&lt;br /&gt;
# so 13107200 / 1024 / 1024 = 12.5 MB.&lt;br /&gt;
# jesus that&#039;s a lot of data; I&#039;m not gonna increase that in 4 places (nginx, apache, mod_security, php); let&#039;s just split it into two articles :(&lt;br /&gt;
# ...&lt;br /&gt;
# so Marcin is stressing urgancy to get Catarina a sandbox so she can rebuild osemain using some new theme that&#039;s not broken on the latest version of wordpress, php, etc on hetzner3&lt;br /&gt;
# I didn&#039;t want to do this site before the other less-priority ones, but it&#039;s just a sandbox&lt;br /&gt;
# I realized I never made a CHG file for osemain&lt;br /&gt;
# looks like I first did a snapshot Jan 31https://wiki.opensourceecology.org/wiki/Maltfield_Log/2025_Q1#Fri_Jan_31.2C_2025&lt;br /&gt;
# ugh, I just said I was &amp;quot;following the same guide as with the other sites&amp;quot;&lt;br /&gt;
## I was hoping to know which one to CHG to copy-from&lt;br /&gt;
## I guess it makes the most sense to copy from obi, which already has both a static and dynamic site setup (untested)&lt;br /&gt;
# ok, I made a first draft of our osemain CHG to migrate to hetnzer3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_osemain_to_hetzner3&lt;br /&gt;
# oh, crap, I&#039;m going to remove&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2024_Q4&amp;diff=306058</id>
		<title>Maltfield Log/2024 Q4</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Maltfield_Log/2024_Q4&amp;diff=306058"/>
		<updated>2025-04-27T21:11:12Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* Wed Oct 02, 2024 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;My work log from the fourth quarter of the year 2024. I intentionally made this verbose to make future admin&#039;s work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
# [[Maltfield_Log]]&lt;br /&gt;
# [[User:Maltfield]]&lt;br /&gt;
# [[Special:Contributions/Maltfield]]&lt;br /&gt;
&lt;br /&gt;
=Tue Dec 31, 2024=&lt;br /&gt;
&lt;br /&gt;
# finally I got an email from the oshine theme support team with a link to download the required plugins&lt;br /&gt;
# it&#039;s just a link to a google drive. the email was sent&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Received: by smtp-relay.sendinblue.com with ESMTP id 9b9d7d38-bd8e-4fc1-9b37-475be98908d8; Tue, 31 December 2024 04:46:17 +0000 (UTC)&lt;br /&gt;
...&lt;br /&gt;
From: &amp;quot;BrandExponents&amp;quot; &amp;lt;support@brandexponents.com&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the email isn&#039;t signed, so it would be trivial for someone malicious to have sent me this link&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------&lt;br /&gt;
 &lt;br /&gt;
Hi Michael,&lt;br /&gt;
&lt;br /&gt;
Please download the plugins from the link below:&lt;br /&gt;
https://drive.google.com/file/d/1xbs80Rz1hcZOPhl2O_Kh_zvz9jUEymId/view?usp=sharing&lt;br /&gt;
&lt;br /&gt;
Warm regards,&lt;br /&gt;
&lt;br /&gt;
--&lt;br /&gt;
Suman M., Tech Support Executive&lt;br /&gt;
&lt;br /&gt;
BrandExponents&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# to refresh: here&#039;s all the plugins required by the theme&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
BE Portfolio Post Type, Meta Box Conditional Logic, Meta Box Show Hide, Meta Box Tabs, Oshine Core, Oshine Modules and Tatsu.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the link above contains a file (plugins.zip) with 8 .zip files inside it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ unzip plugins.zip&lt;br /&gt;
Archive:  plugins.zip&lt;br /&gt;
 extracting: be-portfolio-post.zip   &lt;br /&gt;
 extracting: masterslider.zip        &lt;br /&gt;
 extracting: meta-box.zip            &lt;br /&gt;
 extracting: meta-box-conditional-logic.zip  &lt;br /&gt;
 extracting: meta-box-show-hide.zip  &lt;br /&gt;
 extracting: meta-box-tabs.zip       &lt;br /&gt;
 extracting: revslider.zip           &lt;br /&gt;
 extracting: wpforms-lite.zip        &lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$&lt;br /&gt;
&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ sha256sum *&lt;br /&gt;
1dbf8c7b7ab07f964d50efb5cc737dd9cb90ff366e51ab3b2051310dfa87f461  be-portfolio-post.zip&lt;br /&gt;
7d7b88a0f60a7c98b67fb97cd6e319bf814c6156504e5f52275052b62310dfb4  masterslider.zip&lt;br /&gt;
4af937e0435fa7370b91c256f5b83aa6e41869ae64595ce8b043f8fdf9b34a6f  meta-box-conditional-logic.zip&lt;br /&gt;
357c4c51dc5e253a3f3b7f59726f94ec4529dfce88b9eb45671fe5ecb409e7a8  meta-box-show-hide.zip&lt;br /&gt;
20d0f5f13c23540bba617a232bfd0eee662e2b85022f3a752dad75fc214cd457  meta-box-tabs.zip&lt;br /&gt;
5afafc6e01ea7d1c1fb6e3c97b7cc711c39a137244f6a0946557f0bb2a66295e  meta-box.zip&lt;br /&gt;
345da9e15ba4b618f7b626858a700be697600e39abd9709905b9aad27e5a6491  plugins.zip&lt;br /&gt;
1a4e1230e7aac6b136d5af018acaede037b3bfdc4572dd838fac6fef0a2b25c3  revslider.zip&lt;br /&gt;
8f1a9e34c3d7f9fae8c9af1d8bfec08989f8175b93a54a8d0f2c4712c525a40f  wpforms-lite.zip&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# that doesn&#039;t inlude the 3 last ones, but they previously sent us another file (again on google drive) with those&lt;br /&gt;
# so we now have all the plugins needed (hopefully without malware; there&#039;s no way to authenticate this crap)&lt;br /&gt;
# some of the plugins above, though, are superfluous&lt;br /&gt;
## meta-box is a free plugin that we can download from wordpress.org https://wordpress.org/plugins/meta-box/&lt;br /&gt;
## wpforms-lite is also free and available on wordpress.org https://wordpress.org/plugins/wpforms-lite/&lt;br /&gt;
# alright, I downloaded all the other oshine-made plugins from the links they gave me previously&lt;br /&gt;
# so here&#039;s our shitty TOFU 1/3 (Tor, exit in France)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ ls&lt;br /&gt;
be-portfolio-post.zip           meta-box-tabs.zip         plugins.zip&lt;br /&gt;
masterslider.zip                meta-box.zip              revslider.zip&lt;br /&gt;
meta-box-conditional-logic.zip  oshine-core-1.6.1.zip     tatsu-3.5.3.zip&lt;br /&gt;
meta-box-show-hide.zip          oshine-modules-3.3.8.zip  wpforms-lite.zip&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ sha256sum *&lt;br /&gt;
1dbf8c7b7ab07f964d50efb5cc737dd9cb90ff366e51ab3b2051310dfa87f461  be-portfolio-post.zip&lt;br /&gt;
7d7b88a0f60a7c98b67fb97cd6e319bf814c6156504e5f52275052b62310dfb4  masterslider.zip&lt;br /&gt;
4af937e0435fa7370b91c256f5b83aa6e41869ae64595ce8b043f8fdf9b34a6f  meta-box-conditional-logic.zip&lt;br /&gt;
357c4c51dc5e253a3f3b7f59726f94ec4529dfce88b9eb45671fe5ecb409e7a8  meta-box-show-hide.zip&lt;br /&gt;
20d0f5f13c23540bba617a232bfd0eee662e2b85022f3a752dad75fc214cd457  meta-box-tabs.zip&lt;br /&gt;
5afafc6e01ea7d1c1fb6e3c97b7cc711c39a137244f6a0946557f0bb2a66295e  meta-box.zip&lt;br /&gt;
d3aba9dd7351476d58e3ffa3c29ccf7d3f0c05736fc688a382e95bca1154034c  oshine-core-1.6.1.zip&lt;br /&gt;
e54903207f51377350276ac50e2be5749848cd3e95513ed85fdfbf652f348e0b  oshine-modules-3.3.8.zip&lt;br /&gt;
345da9e15ba4b618f7b626858a700be697600e39abd9709905b9aad27e5a6491  plugins.zip&lt;br /&gt;
1a4e1230e7aac6b136d5af018acaede037b3bfdc4572dd838fac6fef0a2b25c3  revslider.zip&lt;br /&gt;
ad769858e414eb983a7fb94dc06e5b3c26958794d359b2224c8def6546d2361e  tatsu-3.5.3.zip&lt;br /&gt;
8f1a9e34c3d7f9fae8c9af1d8bfec08989f8175b93a54a8d0f2c4712c525a40f  wpforms-lite.zip&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ du -sh *&lt;br /&gt;
56K	be-portfolio-post.zip&lt;br /&gt;
1.6M	masterslider.zip&lt;br /&gt;
8.0K	meta-box-conditional-logic.zip&lt;br /&gt;
8.0K	meta-box-show-hide.zip&lt;br /&gt;
8.0K	meta-box-tabs.zip&lt;br /&gt;
1.1M	meta-box.zip&lt;br /&gt;
18M	oshine-core-1.6.1.zip&lt;br /&gt;
1.7M	oshine-modules-3.3.8.zip&lt;br /&gt;
24M	plugins.zip&lt;br /&gt;
11M	revslider.zip&lt;br /&gt;
16M	tatsu-3.5.3.zip&lt;br /&gt;
11M	wpforms-lite.zip&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I don&#039;t know what we can do to better authenticate these files&lt;br /&gt;
# if any of these files contains a vulnerability, the malicious code would be able to modify any of our other websites (well only the upload dirs, but they could read the db creds and then inject something malicious into the other site&#039;s db), and attack our users on any of those websites&lt;br /&gt;
# I&#039;m thinking that I should try one wordpress install in a DispVM that downloads these plugins in wordpress directly and diff those from what we just got above&lt;br /&gt;
# also, I should see if there&#039;s any sort of general scanner for malicious php code – something that can identify things like exec() calls or base32/64 blocks or anything that tries to file_put_contents or file_get_contents from the public internet. I can&#039;t possibly read all of these plugins code, but I could probably read through an automated report of code blocks identified as high-risk &lt;br /&gt;
## here&#039;s a collection of &amp;quot;static code analysis&amp;quot; tools for php https://github.com/guardrailsio/awesome-php-security?tab=readme-ov-file#static-code-analysis&lt;br /&gt;
# I went ahead and launched a Debian 12 disposable VM, installed wordpress from apt, and followed this guide https://wiki.debian.org/WordPress&lt;br /&gt;
## curiously installing wordpress doesn&#039;t install mariadb-server, nor does it configure the db or create an apache vhost; that had to be done manually&lt;br /&gt;
# anyway, I copied-in the oshine theme that I had downloaded from themeforest, and activated it on http://localhost&lt;br /&gt;
# the theme automatically set itself to &amp;quot;developer mode&amp;quot; because (it says) it realized it&#039;s on &amp;quot;localhost&amp;quot;&lt;br /&gt;
# I don&#039;t know exactly what that means, but I couldn&#039;t find anywhere to enter a license key; so I guess that&#039;s one difference&lt;br /&gt;
# there&#039;s a notice at the top that tells me to install the required plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
This theme requires the following plugins: BE Portfolio Post Type, Meta Box Conditional Logic, Meta Box Show Hide, Meta Box Tabs, Oshine Core, Oshine Modules and Tatsu. This theme recommends the following plugins: BE GDPR, Master Slider, Meta Box Framework, Safe SVG, Slider Revolution and WPForms Lite. Begin installing plugins | Dismiss this notice &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I click &amp;quot;begin installing plugins&amp;quot;, then it brings me to a page listing a ton of plugins&lt;br /&gt;
# but if I click &amp;quot;Install&amp;quot; under any of them, then I&#039;m brought to a page asking for my ftp creds. what?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Connection Informatio&lt;br /&gt;
&lt;br /&gt;
 To perform the requested action, WordPress needs to access your web server. Please enter your FTP credentials to proceed. If you do not remember your credentials, you should contact your web host.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I thought maybe it was a permissions issues, but that looked fine.&lt;br /&gt;
# instead, it looks like I needed to add this to /etc/wordpress/config-localhost.php https://stackoverflow.com/questions/30688431/wordpress-needs-the-ftp-credentials-to-update-plugins#30690783&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
define(&#039;FS_METHOD&#039;, &#039;direct&#039;);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# after that I tried to click the &amp;quot;Install&amp;quot; link again. This time it looked like it was working, but then it output some error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Installing Plugin: BE Portfolio Post Type&lt;br /&gt;
&lt;br /&gt;
Downloading installation package from https://brandexponents.com/oshin-plugins/be-portfolio-post.zip…&lt;br /&gt;
&lt;br /&gt;
Unpacking the package…&lt;br /&gt;
&lt;br /&gt;
The package could not be installed. PCLZIP_ERR_BAD_FORMAT (-10) : Invalid archive structure&lt;br /&gt;
&lt;br /&gt;
TGMPA v2.6.1&lt;br /&gt;
&lt;br /&gt;
Return to Required Plugins Installer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah, the problem was that php-curl wasn&#039;t installed. Not only did I have to install it, but I had to restart apache after&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo apt-get install php-curl&lt;br /&gt;
sudo systemctl restart apache2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ugh, I was able to get the URL to the plugin; this is wayy more trustworthy than their google drive account! https://brandexponents.com/oshin-plugins/be-portfolio-post.zip&lt;br /&gt;
# here&#039;s all of them, one-by-one&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
https://brandexponents.com/oshin-plugins/be-portfolio-post.zip&lt;br /&gt;
https://brandexponents.com/thirdparty-plugins/meta-box-conditional-logic.zip&lt;br /&gt;
https://brandexponents.com/thirdparty-plugins/meta-box-show-hide.zip&lt;br /&gt;
https://brandexponents.com/thirdparty-plugins/meta-box-tabs.zip&lt;br /&gt;
https://brandexponents.com/oshin-plugins/oshine-core.zip&lt;br /&gt;
https://brandexponents.com/be-plugins/oshine-modules.zip&lt;br /&gt;
https://brandexponents.com/thirdparty-plugins/masterslider.zip&lt;br /&gt;
https://downloads.wordpress.org/plugin/meta-box.5.10.5.zip&lt;br /&gt;
https://downloads.wordpress.org/plugin/safe-svg.2.3.1.zip&lt;br /&gt;
https://brandexponents.com/thirdparty-plugins/revslider.zip&lt;br /&gt;
https://downloads.wordpress.org/plugin/wpforms-lite.1.9.2.3.zip&lt;br /&gt;
https://brandexponents.com/oshin-plugins/be-gdpr.zip&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# shit, this one failed with a 404 https://brandexponents.com/be-plugins/oshine-modules.zip&lt;br /&gt;
# this one too https://brandexponents.com/be-plugins/tatsu.zip&lt;br /&gt;
# I googled trying to find ^ those, and I found this https://www.markeaandrews.com/wordpressfile/Oshine Buyers Package 5.0/Plugins/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 	Parent Directory 	 	- 	 &lt;br /&gt;
 	be-page-builder.zip 	2016-08-31 04:34 	1.6M	 &lt;br /&gt;
 	be-portfolio-post.zip 	2016-08-29 22:33 	7.3K	 &lt;br /&gt;
 	meta-box-conditional..&amp;gt;	2016-08-29 22:33 	8.1K	 &lt;br /&gt;
 	meta-box-show-hide.zip 	2016-08-29 22:33 	3.2K	 &lt;br /&gt;
 	meta-box-tabs.zip 	2016-08-29 22:33 	3.9K	 &lt;br /&gt;
 	oshine-modules.zip 	2016-12-22 07:23 	927K	 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I found this site, which claims to be a security review of the oshine theme. It actually marks it down for having included the plugin .zip files in the theme because &amp;quot;plugins are not allowed in themes.&amp;quot; – maybe that&#039;s why they were removed? https://themecheck.info/score/wordpress-theme-oshin-v6_4_1.html&lt;br /&gt;
# unfortunately wordpress unzips the plugins, which makes it hard to use it for our 3TOFU :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@disp2713:/var/lib/wordpress/wp-content/plugins# ls&lt;br /&gt;
akismet		   index.php	 meta-box-conditional-logic  oshine-core&lt;br /&gt;
be-gdpr		   masterslider  meta-box-show-hide	     revslider&lt;br /&gt;
be-portfolio-post  meta-box	 meta-box-tabs		     wpforms-lite&lt;br /&gt;
root@disp2713:/var/lib/wordpress/wp-content/plugins# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I just created a zip from the dir and got its checksum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@disp2713:/var/lib/wordpress/wp-content/plugins# zip --recurse-paths revslider.zip revslider&lt;br /&gt;
...&lt;br /&gt;
  adding: revslider/public/assets/assets/svg/content/ic_clear_24px.svg (deflated 31%)&lt;br /&gt;
  adding: revslider/public/assets/assets/coloredbg.png (deflated 7%)&lt;br /&gt;
  adding: revslider/public/assets/assets/gridtile.png (deflated 9%)&lt;br /&gt;
root@disp2713:/var/lib/wordpress/wp-content/plugins# &lt;br /&gt;
&lt;br /&gt;
root@disp2713:/var/lib/wordpress/wp-content/plugins# du -sh revslider.zip &lt;br /&gt;
8.0M	revslider.zip&lt;br /&gt;
root@disp2713:/var/lib/wordpress/wp-content/plugins# &lt;br /&gt;
&lt;br /&gt;
root@disp2713:/var/lib/wordpress/wp-content/plugins# sha256sum revslider.zip &lt;br /&gt;
3754090d1573cf7b901a6a29166c4ae54035b60be599357bf34a2b3ccaeaf441  revslider.zip&lt;br /&gt;
root@disp2713:/var/lib/wordpress/wp-content/plugins#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# unfortunately it differs from what I had just downloaded from Google Drive :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads/tofu1$ sha256sum revslider.zip &lt;br /&gt;
1a4e1230e7aac6b136d5af018acaede037b3bfdc4572dd838fac6fef0a2b25c3  revslider.zip&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads/tofu1$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# quick check shows the version of revslider that BrandExponents sent me for download over Google Drive is v6.7.25&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads/tofu1$ head revslider/revslider.php &lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
/*&lt;br /&gt;
Plugin Name: Slider Revolution&lt;br /&gt;
Plugin URI: https://www.sliderrevolution.com/?utm_source=admin&amp;amp;utm_medium=button&amp;amp;utm_campaign=srusers&amp;amp;utm_content=info&lt;br /&gt;
Description: Slider Revolution - More than just a WordPress Slider&lt;br /&gt;
Author: ThemePunch&lt;br /&gt;
Text Domain: revslider&lt;br /&gt;
Domain Path: /languages&lt;br /&gt;
Version: 6.7.25&lt;br /&gt;
Author URI: https://themepunch.com/?utm_source=admin&amp;amp;utm_medium=button&amp;amp;utm_campaign=srusers&amp;amp;utm_content=info&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads/tofu1$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# whereas the one that wordpress just downloaded from their website is v6.5.15&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@disp2713:/var/lib/wordpress/wp-content/plugins# head revslider/revslider.php &lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
/*&lt;br /&gt;
Plugin Name: Slider Revolution&lt;br /&gt;
Plugin URI: https://www.sliderrevolution.com/&lt;br /&gt;
Description: Slider Revolution - Premium responsive slider&lt;br /&gt;
Author: ThemePunch&lt;br /&gt;
Text Domain: revslider&lt;br /&gt;
Domain Path: /languages&lt;br /&gt;
Version: 6.5.15&lt;br /&gt;
Author URI: https://themepunch.com/&lt;br /&gt;
root@disp2713:/var/lib/wordpress/wp-content/plugins# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# huh – I would expect that the one on google drive would be more stale than the one they serve to their theme users. wow.&lt;br /&gt;
# oh, I found the license input; I found it when a beg notice informed me that I&#039;m running an outdated version of the theme; it sent me to this page, which prompted me for the theme  purchase code http://localhost/wp-admin/themes.php?page=be_register#be-welcome&lt;br /&gt;
# after entering the license, I was able to update the theme. Then it brought me to a different screen for installing plugins. It began installing, but this time it didn&#039;t print out on the screen where it was downloading from. I fired-up wireshark, but I could only see the domain (it was encrypted)&lt;br /&gt;
# let&#039;s see if I can decrypt it with this https://thelinuxcode.com/decrypt-ssl-tls-wireshark/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export SSLKEYLOGFILE=/home/user/ssl-key.log&lt;br /&gt;
touch $SSLKEYLOGFILE&lt;br /&gt;
firefox https://localhost &amp;amp;&lt;br /&gt;
tail -f $SSLKEYLOGFILE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, I realized that won&#039;t matter because the connections are coming from wordpress; not firefox&lt;br /&gt;
# I was able to get this working by setting up mitmproxy and configuring wordpress to use it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo apt-get install mitmproxy&lt;br /&gt;
sudo mitmproxy&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# in another window, I added the self-signed cert to our whitelist, so wordpress wouldn&#039;t thrown an error on trying to connect&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir /usr/local/share/ca-certificates/extra&lt;br /&gt;
cp ~/.mitmproxy/mitmproxy-ca-cert.cer /usr/local/share/ca-certificates/extra/mitmproxy-ca-cert.crt&lt;br /&gt;
update-ca-certificates&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tested this with curl, and – sure enough – mitmproxy showed the request; it has a really nice ncurses TUI&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# initiate a request; you&#039;ll see it pop-up in the terminal with `mitmproxy` running&lt;br /&gt;
export HTTP_PROXY=http://127.0.0.1:8080&lt;br /&gt;
export HTTPS_PROXY=http://127.0.0.1:8080&lt;br /&gt;
&lt;br /&gt;
curl -L https://ddg.gg/?q=ose&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# then, to get wordpress to go through the proxy, I added these lines to /etc/wordpress/config-localhost.php&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@disp2713:~# tail /etc/wordpress/config-localhost.php -n3&lt;br /&gt;
define(&#039;WP_PROXY_HOST&#039;, &#039;127.0.0.1&#039;);&lt;br /&gt;
define(&#039;WP_PROXY_PORT&#039;, &#039;8080&#039;);&lt;br /&gt;
?&amp;gt;&lt;br /&gt;
root@disp2713:~# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# then I went back to deactivate &amp;amp; delete all the plugins it had installed, and went to the page to install them again&lt;br /&gt;
&lt;br /&gt;
=Mon Dec 30, 2024=&lt;br /&gt;
&lt;br /&gt;
# yesterday Marcin decided to let go of fef and oswh, and I realized that we can trivially just turn them into static sites&lt;br /&gt;
# today I want to actually migrate those static sites from hetzner2 to hetzner3&lt;br /&gt;
# let&#039;s start with making a CHG page for the fef migration https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_fef&lt;br /&gt;
# hmm, I realized that I should set the SITE_DOWN on nginx before taking the backups, but now that backup includes making a static site dump with wget, which requires the site to be online&lt;br /&gt;
## fortunately we can take the site down on nginx and then scrape it directly from apache, but that means we have to modify the port from what we did yesterday&lt;br /&gt;
# I found that if I use the hostname, it tries to connect to the WAN IP address, which blocks incoming ports to port 8000&lt;br /&gt;
&amp;lt;pre&amp;gt;[root@opensourceecology fef]# time nice wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}:8000&amp;quot; &amp;quot;${vhost_name}:8000&amp;quot;&lt;br /&gt;
Both --no-clobber and --convert-links were specified, only --convert-links will be used.&lt;br /&gt;
--2024-12-30 18:16:22--  http://oswh.opensourceecology.org:8000/&lt;br /&gt;
Resolving oswh.opensourceecology.org (oswh.opensourceecology.org)... 2a01:4f8:172:209e::2, 138.201.84.243&lt;br /&gt;
Connecting to oswh.opensourceecology.org (oswh.opensourceecology.org)|2a01:4f8:172:209e::2|:8000... failed: Connection refused.&lt;br /&gt;
Connecting to oswh.opensourceecology.org (oswh.opensourceecology.org)|138.201.84.243|:8000... failed: Connection refused.&lt;br /&gt;
Converted 0 files in 0 seconds.&lt;br /&gt;
&lt;br /&gt;
real    0m0.017s&lt;br /&gt;
user    0m0.001s&lt;br /&gt;
sys     0m0.002s&lt;br /&gt;
[root@opensourceecology fef]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the fix is to use 127.0.0.1 and set the host header manually, but somehow this is breaking the recursive pull?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology fef]# time nice wget --header &amp;quot;Host: ${vhost_name}&amp;quot; --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;127.0.0.1:8000&amp;quot;&lt;br /&gt;
Both --no-clobber and --convert-links were specified, only --convert-links will be used.&lt;br /&gt;
--2024-12-30 18:25:45--  http://127.0.0.1:8000/&lt;br /&gt;
Connecting to 127.0.0.1:8000... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 200 OK&lt;br /&gt;
Length: unspecified [text/html]&lt;br /&gt;
Saving to: ‘127.0.0.1:8000/index.html’&lt;br /&gt;
&lt;br /&gt;
	[ &amp;lt;=&amp;gt;                                                                    ] 21,750      --.-K/s   in 0s      &lt;br /&gt;
&lt;br /&gt;
2024-12-30 18:25:45 (644 MB/s) - ‘127.0.0.1:8000/index.html’ saved [21750]&lt;br /&gt;
&lt;br /&gt;
FINISHED --2024-12-30 18:25:45--&lt;br /&gt;
Total wall clock time: 0.3s&lt;br /&gt;
Downloaded: 1 files, 21K in 0s (644 MB/s)&lt;br /&gt;
Converting 127.0.0.1:8000/index.html... 1-2&lt;br /&gt;
Converted 1 files in 0 seconds.&lt;br /&gt;
&lt;br /&gt;
real    0m0.257s&lt;br /&gt;
user    0m0.003s&lt;br /&gt;
sys     0m0.000s&lt;br /&gt;
[root@opensourceecology fef]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried many changes to the command, but I think this might be a bug because it isn&#039;t even doing a &#039;--convert-links&#039;, even though I clearly specified it&lt;br /&gt;
# I tried to change /etc/hosts to hard-code fef.opensourceecology.org to 127.0.0.1, but that still won&#039;t work because the links it follows are on port 80 (not 8000)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology fef]# time nice wget --header &amp;quot;Host: ${vhost_name}&amp;quot; --recursive --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;${vhost_name}:8000&amp;quot;&lt;br /&gt;
[root@opensourceecology fef]# time nice wget --header &amp;quot;Host: ${vhost_name}&amp;quot; --recursive --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;${vhost_name}:8000&amp;quot;&lt;br /&gt;
--2024-12-30 20:56:37--  http://fef.opensourceecology.org:8000/&lt;br /&gt;
Resolving fef.opensourceecology.org (fef.opensourceecology.org)... 127.0.0.1&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|127.0.0.1|:8000... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 200 OK&lt;br /&gt;
Length: unspecified [text/html]&lt;br /&gt;
Saving to: ‘fef.opensourceecology.org:8000/index.html’&lt;br /&gt;
&lt;br /&gt;
    [ &amp;lt;=&amp;gt;                                                                    ] 21,750      --.-K/s   in 0s&lt;br /&gt;
&lt;br /&gt;
2024-12-30 20:56:38 (646 MB/s) - ‘fef.opensourceecology.org:8000/index.html’ saved [21750]&lt;br /&gt;
&lt;br /&gt;
Loading robots.txt; please ignore errors.&lt;br /&gt;
--2024-12-30 20:56:38--  http://fef.opensourceecology.org/robots.txt&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|127.0.0.1|:80... failed: Connection refused.&lt;br /&gt;
Resolving fef.opensourceecology.org (fef.opensourceecology.org)... 127.0.0.1&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|127.0.0.1|:80... failed: Connection refused.&lt;br /&gt;
...&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|127.0.0.1|:80... failed: Connection refused.&lt;br /&gt;
--2024-12-30 20:53:42--  http://fef.opensourceecology.org/wp-content/plugins/cyclone-slider-2/templates/thumbnails/script.js?ver=2.10.0&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|127.0.0.1|:80... failed: Connection refused.&lt;br /&gt;
--2024-12-30 20:53:42--  http://fef.opensourceecology.org/wp-content/plugins/cyclone-slider-2/js/client.js?ver=2.10.0&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|127.0.0.1|:80... failed: Connection refused.&lt;br /&gt;
--2024-12-30 20:53:42--  http://fef.opensourceecology.org/wp-includes/js/wp-embed.min.js?ver=4.9.1&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|127.0.0.1|:80... failed: Connection refused.&lt;br /&gt;
FINISHED --2024-12-30 20:53:42--&lt;br /&gt;
Total wall clock time: 0.5s&lt;br /&gt;
Downloaded: 3 files, 28K in 0s (711 MB/s)&lt;br /&gt;
Converting fef.opensourceecology.org:8000/index.html... 2-1&lt;br /&gt;
Converted 1 files in 0 seconds.&lt;br /&gt;
&lt;br /&gt;
real    0m0.478s&lt;br /&gt;
user    0m0.011s&lt;br /&gt;
sys     0m0.007s&lt;br /&gt;
[root@opensourceecology fef]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# all right, this isn&#039;t working; I see two alternatives&lt;br /&gt;
## restrict nginx only to the server&#039;s IP&lt;br /&gt;
## figure out some sort of way to put the wordpress site in &amp;quot;read-only&amp;quot; mode&lt;br /&gt;
# I discovered a wordpress plugin specifically for generating static sites from its own content (eg for uploading to a &amp;quot;severless&amp;quot; site @ a CDN) https://wordpress.org/plugins/simply-static/&lt;br /&gt;
# I also found another tool that we can use to generate a static site, which might not have the issues that wget does https://www.httrack.com/page/2/en/index.html&lt;br /&gt;
## it&#039;s in the official debian apt repos&lt;br /&gt;
## I gave it a local run, and it spat-out one page that just infinitely redirects to itself. Fail.&lt;br /&gt;
# the internet says I can use WP_MAINTENANCE_MODE&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
define(‘WP_MAINTENANCE_MODE’, true);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah, nope. That&#039;s misinformation. The fucking article that mentioned it was probably ChatGPT outputting bullshit that doesn&#039;t actually exist https://wordpress.org/support/topic/wasnt-there-a-maintenance-mode-directive-in-wp-config-php/&lt;br /&gt;
# this is real, but not sure if it would help us https://developer.wordpress.org/reference/functions/wp_is_maintenance_mode/&lt;br /&gt;
# This SE answer suggests either to modify the wp db user&#039;s permission to not have write permission to the tables or to add some new filter that basically MITMs db calls and blocks anything that&#039;s not a SELECT https://wordpress.stackexchange.com/questions/243438/configure-wordpress-to-read-from-database-only-never-write&lt;br /&gt;
# ugh, ddg for making wordpress read only doesn&#039;t have any first-page results talking about locking the db&lt;br /&gt;
# here&#039;s the docs on LOCK in mysql https://dev.mysql.com/doc/refman/8.4/en/lock-tables.html&lt;br /&gt;
# I&#039;m not gonna play on hetzner2; let&#039;s try this on hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [fef_db]&amp;gt; show tables;&lt;br /&gt;
+-----------------------+&lt;br /&gt;
| Tables_in_fef_db      |&lt;br /&gt;
+-----------------------+&lt;br /&gt;
| wp_ahm_events         |&lt;br /&gt;
| wp_ahm_pages          |&lt;br /&gt;
| wp_commentmeta        |&lt;br /&gt;
| wp_comments           |&lt;br /&gt;
| wp_links              |&lt;br /&gt;
| wp_options            |&lt;br /&gt;
| wp_postmeta           |&lt;br /&gt;
| wp_posts              |&lt;br /&gt;
| wp_term_relationships |&lt;br /&gt;
| wp_term_taxonomy      |&lt;br /&gt;
| wp_termmeta           |&lt;br /&gt;
| wp_terms              |&lt;br /&gt;
| wp_usermeta           |&lt;br /&gt;
| wp_users              |&lt;br /&gt;
+-----------------------+&lt;br /&gt;
14 rows in set (0,001 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [fef_db]&amp;gt; LOCK *;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;*&#039; at line 1&lt;br /&gt;
MariaDB [fef_db]&amp;gt; LOCK TABLES *;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;*&#039; at line 1&lt;br /&gt;
MariaDB [fef_db]&amp;gt; LOCK TABLES * READ;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;* READ&#039; at line 1&lt;br /&gt;
MariaDB [fef_db]&amp;gt; LOCK TABLES READ;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;READ&#039; at line 1&lt;br /&gt;
MariaDB [fef_db]&amp;gt; LOCK TABLE READ:&lt;br /&gt;
    -&amp;gt; ;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;READ:&#039; at line 1&lt;br /&gt;
MariaDB [fef_db]&amp;gt; LOCK TABLE fef_db.* READ;&lt;br /&gt;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;* READ&#039; at line 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# wow, the actual command is FLUSH TABLES WITH READ LOCK;&lt;br /&gt;
## https://stackoverflow.com/questions/40046124/write-lock-all-tables-in-mysql-for-a-moment&lt;br /&gt;
## https://serverfault.com/questions/92279/how-do-i-freeze-stop-lock-a-mysql-database-then-how-do-i-enable-the-databse-bac&lt;br /&gt;
# that was super unclear&lt;br /&gt;
# ok, I confirmed it&#039;s working for this database&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [fef_db]&amp;gt; UPDATE wp_users SET user_nicename = &amp;quot;grinch&amp;quot; where user_login = &amp;quot;Maltfield&amp;quot;;&lt;br /&gt;
ERROR 1223 (HY000): Can&#039;t execute the query because you have a conflicting read lock&lt;br /&gt;
MariaDB [fef_db]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# shit, that locked all tables in all databases&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [store_db]&amp;gt; use oswh_db;&lt;br /&gt;
Reading table information for completion of table and column names&lt;br /&gt;
You can turn off this feature to get a quicker startup with -A&lt;br /&gt;
&lt;br /&gt;
Database changed&lt;br /&gt;
MariaDB [oswh_db]&amp;gt; UPDATE wp_users SET user_nicename = &amp;quot;grinch&amp;quot; where user_login = &amp;quot;Maltfield&amp;quot;;&lt;br /&gt;
ERROR 1223 (HY000): Can&#039;t execute the query because you have a conflicting read lock&lt;br /&gt;
MariaDB [oswh_db]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this doesn&#039;t seem very safe. And I&#039;m afraid that it will also break the wordpress static site scrape. let&#039;s look into the nginx solution&lt;br /&gt;
# first we&#039;ll unlock the DBs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [oswh_db]&amp;gt; UNLOCK TABLES;&lt;br /&gt;
Query OK, 0 rows affected (0,000 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [oswh_db]&amp;gt; UPDATE wp_users SET user_nicename = &amp;quot;grinch&amp;quot; where user_login = &amp;quot;Maltfield&amp;quot;;&lt;br /&gt;
Query OK, 1 row affected (0,009 sec)&lt;br /&gt;
Rows matched: 1  Changed: 1  Warnings: 0&lt;br /&gt;
&lt;br /&gt;
MariaDB [oswh_db]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I tested adding these lines to the &amp;quot;server{&amp;quot; block of /etc/nginx/sites-enabled/fef.opensourceecology.org, and I confirmed that a local curl on the server works while my attempt to load it in my web browser gets a 403&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   allow 127.0.0.1;&lt;br /&gt;
   deny all;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m going to test this on prod now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology current]# # backup nginx config&lt;br /&gt;
[root@opensourceecology current]# cp /etc/nginx/conf.d/fef.opensourceecology.org.conf nginx_fef.opensourceecology.org.${stamp}.conf&lt;br /&gt;
[root@opensourceecology current]#&lt;br /&gt;
[root@opensourceecology current]# # restrict website to local requests only&lt;br /&gt;
[root@opensourceecology current]# grep &#039;deny&#039; /etc/nginx/conf.d/fef.opensourceecology.org.conf || sed -i &#039;s%^\(\s*\)server_name\(.*\)%\1server_name\2\n\tallow 127.0.0.1;\n\tdeny all;\n%&#039; /etc/nginx/conf.d/fef.opensourceecology.org.conf&lt;br /&gt;
[root@opensourceecology current]#&lt;br /&gt;
[root@opensourceecology current]# ls&lt;br /&gt;
nginx_fef.opensourceecology.org.20241230.conf&lt;br /&gt;
[root@opensourceecology current]# diff nginx_fef.opensourceecology.org.20241230.conf /etc/nginx/conf.d/fef.opensourceecology.org.conf&lt;br /&gt;
41a42,44&lt;br /&gt;
&amp;gt;       allow 127.0.0.1;&lt;br /&gt;
&amp;gt;       deny all;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
[root@opensourceecology current]# # verify config &amp;amp; reload&lt;br /&gt;
[root@opensourceecology current]# nginx -t &amp;amp;&amp;amp; systemctl reload nginx&lt;br /&gt;
...&lt;br /&gt;
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok&lt;br /&gt;
nginx: configuration file /etc/nginx/nginx.conf test is successful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# I confirmed that I now can&#039;t load the prod fef site (on hetzner2) in my web browser (I get a 403)&lt;br /&gt;
# I reverted it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology current]# cp nginx_fef.opensourceecology.org.20241230.conf /etc/nginx/conf.d/fef.opensourceecology.org.conf&lt;br /&gt;
cp: overwrite ‘/etc/nginx/conf.d/fef.opensourceecology.org.conf’? y&lt;br /&gt;
[root@opensourceecology current]# nginx -t &amp;amp;&amp;amp; systemctl reload nginx                        nginx: the configuration file /etc/nginx/nginx.conf syntax is ok&lt;br /&gt;
nginx: configuration file /etc/nginx/nginx.conf test is successful&lt;br /&gt;
[root@opensourceecology current]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s try the whole thing, with a wget static site dump while it&#039;s locked-down from the public internet&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner2 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;fef.opensourceecology.org&#039;&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner3/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
mkdir -p ${backupDir_hetzner2}/{current,old}&lt;br /&gt;
pushd ${backupDir_hetzner2}/current&lt;br /&gt;
&lt;br /&gt;
# backup nginx config&lt;br /&gt;
cp /etc/nginx/conf.d/fef.opensourceecology.org.conf nginx_fef.opensourceecology.org.${stamp}.conf&lt;br /&gt;
&lt;br /&gt;
# restrict website to local requests only&lt;br /&gt;
grep &#039;deny&#039; /etc/nginx/conf.d/fef.opensourceecology.org.conf || sed -i &#039;s%^\(\s*\)server_name\(.*\)%\1server_name\2\n\tallow 127.0.0.1;\n\tdeny all;\n%&#039; /etc/nginx/conf.d/fef.opensourceecology.org.conf&lt;br /&gt;
&lt;br /&gt;
# verify config &amp;amp; reload&lt;br /&gt;
nginx -t &amp;amp;&amp;amp; systemctl reload nginx&lt;br /&gt;
&lt;br /&gt;
mkdir wget&lt;br /&gt;
pushd wget&lt;br /&gt;
time nice wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;${vhost_name}:8000&amp;quot;&lt;br /&gt;
&lt;br /&gt;
time nice wget --header &amp;quot;Host: ${vhost_name}&amp;quot; --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;127.0.0.1:8000&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fuck, it&#039;s still failing by trying to load it on port 80&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wget]# time nice wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;${vhost_name}:8000&amp;quot;&lt;br /&gt;
Both --no-clobber and --convert-links were specified, only --convert-links will be used.&lt;br /&gt;
--2024-12-30 22:32:24--  http://fef.opensourceecology.org:8000/&lt;br /&gt;
Resolving fef.opensourceecology.org (fef.opensourceecology.org)... 127.0.0.1&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|127.0.0.1|:8000... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 301 Moved Permanently&lt;br /&gt;
Location: http://fef.opensourceecology.org/ [following]&lt;br /&gt;
--2024-12-30 22:32:25--  http://fef.opensourceecology.org/&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|127.0.0.1|:80... failed: Connection refused.&lt;br /&gt;
Resolving fef.opensourceecology.org (fef.opensourceecology.org)... 127.0.0.1&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|127.0.0.1|:80... failed: Connection refused.&lt;br /&gt;
Converted 0 files in 0 seconds.&lt;br /&gt;
&lt;br /&gt;
real    0m0.211s&lt;br /&gt;
user    0m0.000s&lt;br /&gt;
sys     0m0.003s&lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I removed the line from /etc/hosts, and now it fails on port 8080&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wget]# time nice wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;${vhost_name}:8000&amp;quot;&lt;br /&gt;
Both --no-clobber and --convert-links were specified, only --convert-links will be used.&lt;br /&gt;
--2024-12-30 22:33:42--  http://fef.opensourceecology.org:8000/&lt;br /&gt;
Resolving fef.opensourceecology.org (fef.opensourceecology.org)... 2a01:4f8:172:209e::2, 138.201.84.243&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|2a01:4f8:172:209e::2|:8000... failed: Connection refused.&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|138.201.84.243|:8000... failed: Connection refused.&lt;br /&gt;
Converted 0 files in 0 seconds.&lt;br /&gt;
&lt;br /&gt;
real    0m0.016s&lt;br /&gt;
user    0m0.002s&lt;br /&gt;
sys     0m0.001s&lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, duh, we&#039;re doing that so we don&#039;t have to call it by ip address or port&lt;br /&gt;
# here&#039;s the original; now it&#039;s failing because of the source IP isn&#039;t 127.0.0.1&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wget]# time nice wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;${vhost_name}&amp;quot;&lt;br /&gt;
Both --no-clobber and --convert-links were specified, only --convert-links will be used.&lt;br /&gt;
--2024-12-30 22:34:53--  http://fef.opensourceecology.org/&lt;br /&gt;
Resolving fef.opensourceecology.org (fef.opensourceecology.org)... 2a01:4f8:172:209e::2, 138.201.84.243&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|2a01:4f8:172:209e::2|:80... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 301 Moved Permanently&lt;br /&gt;
Location: https://fef.opensourceecology.org/ [following]&lt;br /&gt;
--2024-12-30 22:34:53--  https://fef.opensourceecology.org/&lt;br /&gt;
Connecting to fef.opensourceecology.org (fef.opensourceecology.org)|2a01:4f8:172:209e::2|:443... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
2024-12-30 22:34:53 ERROR 403: Forbidden.&lt;br /&gt;
&lt;br /&gt;
Converted 0 files in 0 seconds.&lt;br /&gt;
&lt;br /&gt;
real    0m0.024s&lt;br /&gt;
user    0m0.006s&lt;br /&gt;
sys     0m0.003s&lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I manually edited the config&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   server_name fef.opensourceecology.org;                                                                        &lt;br /&gt;
   allow 127.0.0.1;                                                                                              &lt;br /&gt;
   allow [2a01:4f8:172:209e::2];                                                                                 &lt;br /&gt;
   allow 138.201.84.243;                                                                                         &lt;br /&gt;
   deny all;    &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, it didn&#039;t like that&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wget]# nginx -t &amp;amp;&amp;amp; systemctl reload nginx&lt;br /&gt;
nginx: [warn] the &amp;quot;ssl&amp;quot; directive is deprecated, use the &amp;quot;listen ... ssl&amp;quot; directive instead in /etc/nginx/conf.d/ssl.openbuildinginstitute.org.include:11&lt;br /&gt;
nginx: [warn] the &amp;quot;ssl&amp;quot; directive is deprecated, use the &amp;quot;listen ... ssl&amp;quot; directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11&lt;br /&gt;
nginx: [warn] the &amp;quot;ssl&amp;quot; directive is deprecated, use the &amp;quot;listen ... ssl&amp;quot; directive instead in /etc/nginx/conf.d/ssl.opensourceecology.org.include:11&lt;br /&gt;
nginx: [emerg] invalid parameter &amp;quot;[2a01:4f8:172:209e::2]&amp;quot; in /etc/nginx/conf.d/fef.opensourceecology.org.conf:43&lt;br /&gt;
nginx: configuration file /etc/nginx/nginx.conf test failed&lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it was happy when I removed the square bracktes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   server_name fef.opensourceecology.org;                                                                        &lt;br /&gt;
   allow 127.0.0.1;                                                                                              &lt;br /&gt;
   allow 2a01:4f8:172:209e::2;&lt;br /&gt;
   allow 138.201.84.243;                                                                                         &lt;br /&gt;
   deny all;   &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# cool, this time it worked!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wget]# time nice wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;${vhost_name}&amp;quot;&lt;br /&gt;
...&lt;br /&gt;
Converting fef.opensourceecology.org/wp-content/plugins/wp-simple-galleries/colorbox/themes/theme1/colorbox.css?ver=4.9.1.css... 22-0&lt;br /&gt;
Converted 88 files in 0.03 seconds.&lt;br /&gt;
&lt;br /&gt;
real    0m6.786s&lt;br /&gt;
user    0m0.109s&lt;br /&gt;
sys     0m0.109s&lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology wget]# du -sh .&lt;br /&gt;
37M     .&lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# alright, so this works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner2 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;fef.opensourceecology.org&#039;&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner3/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
mkdir -p ${backupDir_hetzner2}/{current,old}&lt;br /&gt;
pushd ${backupDir_hetzner2}/current&lt;br /&gt;
&lt;br /&gt;
# backup nginx config&lt;br /&gt;
cp /etc/nginx/conf.d/fef.opensourceecology.org.conf nginx_fef.opensourceecology.org.${stamp}.conf&lt;br /&gt;
&lt;br /&gt;
# restrict website to local requests only&lt;br /&gt;
grep &#039;deny&#039; /etc/nginx/conf.d/fef.opensourceecology.org.conf || sed -i &#039;s%^\(\s*\)server_name\(.*\)%\1server_name\2\n\tallow 127.0.0.1;\n\tallow 2a01:4f8:172:209e::2;\n\tallow 138.201.84.243;\n\tdeny all;\n%&#039; /etc/nginx/conf.d/fef.opensourceecology.org.conf&lt;br /&gt;
&lt;br /&gt;
# verify config &amp;amp; reload&lt;br /&gt;
nginx -t &amp;amp;&amp;amp; systemctl reload nginx&lt;br /&gt;
&lt;br /&gt;
mkdir wget&lt;br /&gt;
pushd wget&lt;br /&gt;
time nice wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;${vhost_name}&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I did see some 403 errors in there, and I wonder if it&#039;s my nginx rate limiting is causing this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
--2024-12-30 22:44:15--  https://fef.opensourceecology.org/wp-content/uploads/2015/08/chicks_7.jpg&lt;br /&gt;
Reusing existing connection to [fef.opensourceecology.org]:443.&lt;br /&gt;
HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
2024-12-30 22:44:15 ERROR 403: Forbidden.&lt;br /&gt;
&lt;br /&gt;
--2024-12-30 22:44:15--  https://fef.opensourceecology.org/wp-content/uploads/2015/08/chicks_7-150x150.jpg&lt;br /&gt;
Reusing existing connection to [fef.opensourceecology.org]:443.&lt;br /&gt;
HTTP request sent, awaiting response... 200 OK&lt;br /&gt;
Length: 32329 (32K) [image/jpeg]&lt;br /&gt;
Saving to: ‘fef.opensourceecology.org/wp-content/uploads/2015/08/chicks_7-150x150.jpg’&lt;br /&gt;
&lt;br /&gt;
100%[=======================================================================&amp;gt;] 32,329      --.-K/s   in 0s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# after some fiddling with nginx; I realized (when I tailed the mod_security logs) that this is DOS protection coming from apache&#039;s mod_evasive&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology nginx]# tail -f /var/log/httpd/modsec_audit.log&lt;br /&gt;
...&lt;br /&gt;
--27b11513-F--&lt;br /&gt;
HTTP/1.1 403 Forbidden&lt;br /&gt;
Content-Length: 226&lt;br /&gt;
Content-Type: text/html; charset=iso-8859-1&lt;br /&gt;
&lt;br /&gt;
--27b11513-E--&lt;br /&gt;
&lt;br /&gt;
--27b11513-H--&lt;br /&gt;
Apache-Error: [file &amp;quot;mod_evasive24.c&amp;quot;] [line 248] [level 3] client denied by server configuration: /var/www/html/www.opensourceecology.org/htdocs/wp-json&lt;br /&gt;
Stopwatch: 1735600807963014 4987 (- - -)&lt;br /&gt;
Stopwatch2: 1735600807963014 4987; combined=161, p1=116, p2=0, p3=4, p4=25, p5=15, sr=30, sw=1, l=0, gc=0&lt;br /&gt;
Response-Body-Transformed: Dechunked&lt;br /&gt;
Producer: ModSecurity for Apache/2.9.2 (http://www.modsecurity.org/); OWASP_CRS/2.2.9.&lt;br /&gt;
Server: Apache&lt;br /&gt;
Engine-Mode: &amp;quot;ENABLED&amp;quot;&lt;br /&gt;
&lt;br /&gt;
--27b11513-Z-&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I uncommented the line that whitelisted 127.0.0.1&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology conf.d]# diff mod_evasive.conf.20241230 mod_evasive.conf&lt;br /&gt;
71c71&lt;br /&gt;
&amp;lt;     #DOSWhitelist   127.0.0.1&lt;br /&gt;
---&lt;br /&gt;
&amp;gt;     DOSWhitelist   127.0.0.1&lt;br /&gt;
[root@opensourceecology conf.d]#&lt;br /&gt;
 &lt;br /&gt;
[root@opensourceecology conf.d]# systemctl reload httpd&lt;br /&gt;
[root@opensourceecology conf.d]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I ran it again, and it made a huge difference&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wget]# time nice wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;${vhost_name}&amp;quot;&lt;br /&gt;
...&lt;br /&gt;
.. nothing to do.                                                                                                &lt;br /&gt;
Converting fef.opensourceecology.org/wp-content/plugins/cyclone-slider-2/templates/default/style.css?ver=2.10.0.css... 2-0                                                                                                        &lt;br /&gt;
Converting fef.opensourceecology.org/wp-content/plugins/wp-simple-galleries/colorbox/themes/theme1/colorbox.css?ver=4.9.1.css... 22-0                                                                                             &lt;br /&gt;
Converted 109 files in 0.03 seconds.&lt;br /&gt;
&lt;br /&gt;
real    0m21.372s&lt;br /&gt;
user    0m0.148s&lt;br /&gt;
sys     0m0.194s&lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
[root@opensourceecology wget]# du -sh .&lt;br /&gt;
70M     .&lt;br /&gt;
[root@opensourceecology wget]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I ran it again and confirmed there&#039;s no 403s. Yay!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wget]# time nice wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;${vhost_name}&amp;quot; &amp;amp;&amp;gt; wget.log&lt;br /&gt;
...&lt;br /&gt;
[root@opensourceecology wget]# grep 403 wget.log &lt;br /&gt;
Length: 413054 (403K) [image/jpeg]&lt;br /&gt;
Length: 412969 (403K) [image/jpeg]&lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I formalized all of this into this section (but I still need to do another full run-through to test it) https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_deprecate_fef#Change_Steps&lt;br /&gt;
# alright, I went through the whole process. I had to add a couple missing commands, but now it&#039;s good&lt;br /&gt;
# I confirmed that the fef static site on hetzner3 loads, and several spot checks suggest it&#039;s working fine&lt;br /&gt;
# here&#039;s the files; it&#039;s 1 quarter of a gigabyte. Not nothing, but tolerable as an archive.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/fef.opensourceecology.org # ls -lah&lt;br /&gt;
total 176M&lt;br /&gt;
d---r-x---  3 not-apache www-data 4,0K Dec 31 01:38 .&lt;br /&gt;
d---r-x---  9 not-apache www-data 4,0K Dec 31 01:37 ..&lt;br /&gt;
-r--------  1 not-apache www-data 176M Dec 31 01:38 fef.opensourceecology.org_files.20241231.tar.bz2&lt;br /&gt;
d---r-x--- 10 not-apache www-data 4,0K Dec 31 01:28 htdocs&lt;br /&gt;
-r--------  1 not-apache www-data  61K Dec 31 01:38 mysqldump_fef.opensourceecology.org.20241231.sql.bz2&lt;br /&gt;
-rw-r--r--  1 not-apache www-data  779 Dec 31 01:38 README.txt&lt;br /&gt;
root@hetzner3 /var/www/html/fef.opensourceecology.org # ls htdocs/&lt;br /&gt;
 2014                    &#039;index.html?p=189.html&#039;  &#039;index.html?p=447.html&#039;  &#039;index.html?p=553.html&#039;  &#039;index.html?p=77.html&#039;&lt;br /&gt;
 2015                    &#039;index.html?p=203.html&#039;  &#039;index.html?p=449.html&#039;  &#039;index.html?p=580.html&#039;   robots.txt&lt;br /&gt;
 category                &#039;index.html?p=215.html&#039;  &#039;index.html?p=458.html&#039;  &#039;index.html?p=610.html&#039;   the-farm-2&lt;br /&gt;
 index.html              &#039;index.html?p=218.html&#039;  &#039;index.html?p=46.html&#039;   &#039;index.html?p=612.html&#039;   the-farm-house&lt;br /&gt;
&#039;index.html?p=104.html&#039;  &#039;index.html?p=258.html&#039;  &#039;index.html?p=471.html&#039;  &#039;index.html?p=616.html&#039;   wp-content&lt;br /&gt;
&#039;index.html?p=119.html&#039;  &#039;index.html?p=264.html&#039;  &#039;index.html?p=475.html&#039;  &#039;index.html?p=618.html&#039;   wp-includes&lt;br /&gt;
&#039;index.html?p=122.html&#039;  &#039;index.html?p=338.html&#039;  &#039;index.html?p=486.html&#039;  &#039;index.html?p=632.html&#039;   wp-json&lt;br /&gt;
&#039;index.html?p=129.html&#039;  &#039;index.html?p=402.html&#039;  &#039;index.html?p=510.html&#039;  &#039;index.html?p=642.html&#039;  &#039;xmlrpc.php?rsd&#039;&lt;br /&gt;
&#039;index.html?p=143.html&#039;  &#039;index.html?p=411.html&#039;  &#039;index.html?p=511.html&#039;  &#039;index.html?p=64.html&#039;&lt;br /&gt;
&#039;index.html?p=160.html&#039;  &#039;index.html?p=424.html&#039;  &#039;index.html?p=512.html&#039;  &#039;index.html?p=663.html&#039;&lt;br /&gt;
&#039;index.html?p=169.html&#039;  &#039;index.html?p=438.html&#039;  &#039;index.html?p=542.html&#039;  &#039;index.html?p=674.html&#039;&lt;br /&gt;
&#039;index.html?p=173.html&#039;  &#039;index.html?p=43.html&#039;   &#039;index.html?p=546.html&#039;  &#039;index.html?p=67.html&#039;&lt;br /&gt;
root@hetzner3 /var/www/html/fef.opensourceecology.org # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/www/html/fef.opensourceecology.org # du -sh .&lt;br /&gt;
245M    .&lt;br /&gt;
root@hetzner3 /var/www/html/fef.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin sent an email saying he also wants:&lt;br /&gt;
## not to migrate store.opensourceecolgy.org (it was never setup)&lt;br /&gt;
## not to migrate seedhome.openbuildinginstitute.org (it was never setup)&lt;br /&gt;
## to convert to a static site microfactory.opensourceecology.org&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
We want to keep and migrate www.openbuildinginstitute.org which is still in&lt;br /&gt;
use. The rest can go:&lt;br /&gt;
&lt;br /&gt;
We have never actually put any real content on store.opensourceecology.org&lt;br /&gt;
, nor seed home.openbuildings institute.org&lt;br /&gt;
&amp;lt;http://seedhome.openbuildinginstitute.org/&amp;gt; - so these definitely do not&lt;br /&gt;
need to be migrated nor made static.&lt;br /&gt;
&lt;br /&gt;
As far as microfactory.open source ecology.org&lt;br /&gt;
&amp;lt;http://microfactory.opensourceecology.org/&amp;gt; - we can also turn that into a&lt;br /&gt;
static site. We do not plan to update this site.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# microfactory was another oshine site. I&#039;m not sure if it&#039;ll migrate OK or not; it&#039;s blocked until the oshine devs give us the required plugins. But we have to do this anyway, because oshine is required by OBI. And Catarina wants to switch OSE to it too.&lt;br /&gt;
# anyway, here&#039;s our original list&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1. forum.opensourceecology.org&lt;br /&gt;
2. store.opensourceecology.orgc&lt;br /&gt;
3. microfactory.opensourceecology.org&lt;br /&gt;
4. fef.opensourceecology.org&lt;br /&gt;
5. oswh.opensourceecology.org&lt;br /&gt;
6. seedhome.openbuildinginstitute.org&lt;br /&gt;
7. www.openbuildinginstitute.org&lt;br /&gt;
8. www.opensourceecology.org&lt;br /&gt;
9. phplist.opensourceecology.org&lt;br /&gt;
10. wiki.opensourceecology.org&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and here&#039;s what we have now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1. forum.opensourceecology.org&lt;br /&gt;
2. microfactory.opensourceecology.org # can convert to static site, if necessary&lt;br /&gt;
3. fef.opensourceecology.org # convert to static site&lt;br /&gt;
4. oswh.opensourceecology.org # convert to static site&lt;br /&gt;
5. www.openbuildinginstitute.org&lt;br /&gt;
6. www.opensourceecology.org&lt;br /&gt;
7. phplist.opensourceecology.org&lt;br /&gt;
8. wiki.opensourceecology.org&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# continuing with the wiki, I got all the sed commands together to fix LocalSettings.php such that we could run the upgrade&lt;br /&gt;
# the upgrade took 30 minutes, and it ended with an error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
... log_id=212513&lt;br /&gt;
... log_id=212538&lt;br /&gt;
Completed migration, updated 212428 row(s) with 2283 new actor(s), 97 error(s)&lt;br /&gt;
Beginning migration of log_search&lt;br /&gt;
... target_author_id, ls_value=141 ls_log_id=145907&lt;br /&gt;
Completed migration, inserted 1 row(s) with 0 new actor(s), 0 error(s)&lt;br /&gt;
errors were encountered.&lt;br /&gt;
Modifying rev_text_id field of table revision ...done.&lt;br /&gt;
Modifying table site_stats ...done.&lt;br /&gt;
Populating ar_rev_id.&lt;br /&gt;
Populating ar_rev_id...&lt;br /&gt;
MediaWiki\Revision\RevisionAccessException from line 1296 of /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php: Main slot of revision not found in database. See T212428.&lt;br /&gt;
#0 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1224): MediaWiki\Revision\RevisionStore-&amp;gt;constructSlotRecords()&lt;br /&gt;
#1 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1217): MediaWiki\Revision\RevisionStore-&amp;gt;loadSlotRecords()&lt;br /&gt;
#2 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1335): MediaWiki\Revision\RevisionStore-&amp;gt;loadSlotRecords()&lt;br /&gt;
#3 [internal function]: MediaWiki\Revision\RevisionStore-&amp;gt;MediaWiki\Revision\{closure}()&lt;br /&gt;
#4 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionSlots.php(175): call_user_func()&lt;br /&gt;
#5 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionSlots.php(117): MediaWiki\Revision\RevisionSlots-&amp;gt;getSlots()&lt;br /&gt;
#6 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionRecord.php(192): MediaWiki\Revision\RevisionSlots-&amp;gt;getSlot()&lt;br /&gt;
#7 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\RevisionRecord-&amp;gt;getSlot()&lt;br /&gt;
#8 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1185): MediaWiki\Revision\RevisionRecord-&amp;gt;getContent()&lt;br /&gt;
#9 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/wancache/WANObjectCache.php(1528): MessageCache-&amp;gt;{closure}()&lt;br /&gt;
#10 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/wancache/WANObjectCache.php(1376): WANObjectCache-&amp;gt;fetchOrRegenerate()&lt;br /&gt;
#11 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1167): WANObjectCache-&amp;gt;getWithSetCallback()&lt;br /&gt;
#12 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/BagOStuff.php(149): MessageCache-&amp;gt;{closure}()&lt;br /&gt;
#13 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1163): BagOStuff-&amp;gt;getWithSetCallback()&lt;br /&gt;
#14 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1106): MessageCache-&amp;gt;loadCachedMessagePageEntry()&lt;br /&gt;
#15 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1016): MessageCache-&amp;gt;getMsgFromNamespace()&lt;br /&gt;
#16 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(988): MessageCache-&amp;gt;getMessageForLang()&lt;br /&gt;
#17 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(927): MessageCache-&amp;gt;getMessageFromFallbackChain()&lt;br /&gt;
#18 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(1304): MessageCache-&amp;gt;get()&lt;br /&gt;
#19 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(862): Message-&amp;gt;fetchMessage()&lt;br /&gt;
#20 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(954): Message-&amp;gt;toString()&lt;br /&gt;
#21 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Title.php(661): Message-&amp;gt;text()&lt;br /&gt;
#22 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/populateArchiveRevId.php(213): Title::newMainPage()&lt;br /&gt;
#23 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/populateArchiveRevId.php(118): PopulateArchiveRevId::makeDummyRevisionRow()&lt;br /&gt;
#24 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/populateArchiveRevId.php(63): PopulateArchiveRevId::checkMysqlAutoIncrementBug()&lt;br /&gt;
#25 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/includes/LoggedUpdateMaintenance.php(45): PopulateArchiveRevId-&amp;gt;doDBUpdates()&lt;br /&gt;
#26 /var/www/html/wiki.opensourceecology.org/htdocs/includes/installer/DatabaseUpdater.php(1377): LoggedUpdateMaintenance-&amp;gt;execute()&lt;br /&gt;
#27 /var/www/html/wiki.opensourceecology.org/htdocs/includes/installer/DatabaseUpdater.php(512): DatabaseUpdater-&amp;gt;populateArchiveRevId()&lt;br /&gt;
#28 /var/www/html/wiki.opensourceecology.org/htdocs/includes/installer/DatabaseUpdater.php(475): DatabaseUpdater-&amp;gt;runUpdates()&lt;br /&gt;
#29 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/update.php(181): DatabaseUpdater-&amp;gt;doUpdates()&lt;br /&gt;
#30 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/doMaintenance.php(107): UpdateMediaWiki-&amp;gt;execute()&lt;br /&gt;
#31 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/update.php(253): require_once(&#039;...&#039;)&lt;br /&gt;
#32 {main}&lt;br /&gt;
&lt;br /&gt;
real    28m11,950s&lt;br /&gt;
user    0m0,153s&lt;br /&gt;
sys     0m0,282s&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/wiki.opensourceecology.org_20241228/current # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# is the script idempotent? I tried running it again; this time it exited in 6 seconds&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/wiki.opensourceecology.org_20241228/current # time nice sudo -u www-data php &amp;quot;${docrootDir}/maintenance/update.php&amp;quot;&lt;br /&gt;
...&lt;br /&gt;
.php: Main slot of revision not found in database. See T212428.&lt;br /&gt;
#0 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1224): MediaWiki\Revision\RevisionStore-&amp;gt;constructSlotRecords()&lt;br /&gt;
#1 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1217): MediaWiki\Revision\RevisionStore-&amp;gt;loadSlotRecords()&lt;br /&gt;
#2 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1335): MediaWiki\Revision\RevisionStore-&amp;gt;loadSlotRecords()&lt;br /&gt;
#3 [internal function]: MediaWiki\Revision\RevisionStore-&amp;gt;MediaWiki\Revision\{closure}()&lt;br /&gt;
#4 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionSlots.php(175): call_user_func()&lt;br /&gt;
#5 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionSlots.php(117): MediaWiki\Revision\RevisionSlots-&amp;gt;getSlots()&lt;br /&gt;
#6 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionRecord.php(192): MediaWiki\Revision\RevisionSlots-&amp;gt;getSlot()&lt;br /&gt;
#7 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\RevisionRecord-&amp;gt;getSlot()&lt;br /&gt;
#8 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1185): MediaWiki\Revision\RevisionRecord-&amp;gt;getContent()&lt;br /&gt;
#9 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/wancache/WANObjectCache.php(1528): MessageCache-&amp;gt;{closure}()&lt;br /&gt;
#10 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/wancache/WANObjectCache.php(1376): WANObjectCache-&amp;gt;fetchOrRegenerate()&lt;br /&gt;
#11 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1167): WANObjectCache-&amp;gt;getWithSetCallback()&lt;br /&gt;
#12 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/BagOStuff.php(149): MessageCache-&amp;gt;{closure}()&lt;br /&gt;
#13 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1163): BagOStuff-&amp;gt;getWithSetCallback()&lt;br /&gt;
#14 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1106): MessageCache-&amp;gt;loadCachedMessagePageEntry()&lt;br /&gt;
#15 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1016): MessageCache-&amp;gt;getMsgFromNamespace()&lt;br /&gt;
#16 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(988): MessageCache-&amp;gt;getMessageForLang()&lt;br /&gt;
#17 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(927): MessageCache-&amp;gt;getMessageFromFallbackChain()&lt;br /&gt;
#18 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(1304): MessageCache-&amp;gt;get()&lt;br /&gt;
#19 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(862): Message-&amp;gt;fetchMessage()&lt;br /&gt;
#20 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(954): Message-&amp;gt;toString()&lt;br /&gt;
#21 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Title.php(661): Message-&amp;gt;text()&lt;br /&gt;
#22 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/populateArchiveRevId.php(213): Title::newMainPage()&lt;br /&gt;
#23 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/populateArchiveRevId.php(118): PopulateArchiveRevId::makeDummyRevisionRow()&lt;br /&gt;
#24 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/populateArchiveRevId.php(63): PopulateArchiveRevId::checkMysqlAutoIncrementBug()&lt;br /&gt;
#25 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/includes/LoggedUpdateMaintenance.php(45): PopulateArchiveRevId-&amp;gt;doDBUpdates()&lt;br /&gt;
#26 /var/www/html/wiki.opensourceecology.org/htdocs/includes/installer/DatabaseUpdater.php(1377): LoggedUpdateMaintenance-&amp;gt;execute()&lt;br /&gt;
#27 /var/www/html/wiki.opensourceecology.org/htdocs/includes/installer/DatabaseUpdater.php(512): DatabaseUpdater-&amp;gt;populateArchiveRevId()&lt;br /&gt;
#28 /var/www/html/wiki.opensourceecology.org/htdocs/includes/installer/DatabaseUpdater.php(475): DatabaseUpdater-&amp;gt;runUpdates()&lt;br /&gt;
#29 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/update.php(181): DatabaseUpdater-&amp;gt;doUpdates()&lt;br /&gt;
#30 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/doMaintenance.php(107): UpdateMediaWiki-&amp;gt;execute()&lt;br /&gt;
#31 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/update.php(253): require_once(&#039;...&#039;)&lt;br /&gt;
#32 {main}&lt;br /&gt;
&lt;br /&gt;
real    0m6,596s&lt;br /&gt;
user    0m0,006s&lt;br /&gt;
sys     0m0,012s&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/wiki.opensourceecology.org_20241228/current # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried the other command that I had to run from the last upgrade attempt&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/wiki.opensourceecology.org_20241228/current # time nice sudo -u www-data php &amp;quot;${docrootDir}/maintenance/populateContentTables.php&amp;quot;&lt;br /&gt;
...&lt;br /&gt;
... archive processed up to revision id 302402 of 302632 (3173 rows in 1.3402280807495 seconds)&lt;br /&gt;
... archive processed up to revision id 302632 of 302632 (3178 rows in 1.3433260917664 seconds)&lt;br /&gt;
Done populating archive table. Processed 3178 rows in 1.343334197998 seconds&lt;br /&gt;
Done. Processed 302790 rows in 13.325568199158 seconds&lt;br /&gt;
&lt;br /&gt;
real    0m13,413s&lt;br /&gt;
user    0m0,013s&lt;br /&gt;
sys     0m0,030s&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/wiki.opensourceecology.org_20241228/current # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I was able to load the wiki in the browser, but it has an error again&lt;br /&gt;
&lt;br /&gt;
=Sun Dec 29, 2024=&lt;br /&gt;
&lt;br /&gt;
# so yesterday I finished the upgrade of our old MediaWiki v1.30.0 to v1.35 on hetzner3&lt;br /&gt;
# I quickly realized some errors clicking around on the site, but honestly I&#039;m not sure I should attempt to fix them. probably it makes sense to first try the second upgrade to v1.43 and then try to fix things&lt;br /&gt;
# here&#039;s some commands for my upgrade plan (not tested yet)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;wiki.opensourceecology.org&#039;&lt;br /&gt;
&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=&amp;quot;/var/tmp/CHG_${stamp}_wiki_1.35-to-1.43&amp;quot;&lt;br /&gt;
mkdir -p &amp;quot;${chg_dir}/{pre,post}&amp;quot;&lt;br /&gt;
vhostDir=&amp;quot;/var/www/html/${vhost_name}&amp;quot;&lt;br /&gt;
docrootDir=&amp;quot;${vhostDir}/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Add vhost files&lt;br /&gt;
vhost_backup_path=&amp;quot;${chg_dir}/pre/${vhost_name}.${stamp}&amp;quot;&lt;br /&gt;
mv &amp;quot;${vhostDir}&amp;quot; &amp;quot;${vhost_backup_path}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
mkdir -p ${vhostDir}&lt;br /&gt;
rsync -av --progress /var/tmp/mediawiki/mediawiki-1.43.0/ ${docrootDir}/&lt;br /&gt;
&lt;br /&gt;
rsync -av --progress ${vhost_backup_path}/LocalSettings.php ${vhostDir}/&lt;br /&gt;
rsync -av --progress ${vhost_backup_path}/htdocs/LocalSettings.php ${docrootDir}/&lt;br /&gt;
rsync -av --progress ${vhost_backup_path}/htdocs/images ${docrootDir}/&lt;br /&gt;
&lt;br /&gt;
# SET PERMISSIONS&lt;br /&gt;
&lt;br /&gt;
# first pass, whole site&lt;br /&gt;
chown -R not-apache:www-data &amp;quot;/var/www/html&amp;quot;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
#############&lt;br /&gt;
# WORDPRESS #&lt;br /&gt;
#############&lt;br /&gt;
&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
	chown not-apache:apache-admins &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
	chmod 0040 &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
###########&lt;br /&gt;
# phpList #&lt;br /&gt;
###########&lt;br /&gt;
&lt;br /&gt;
phplist_sites=&amp;quot;$(find /var/www/html -maxdepth 1 -type d -iname *phplist*)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for vhost_dir in $phplist_sites; do&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type d -exec chmod 0050 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type f -exec chmod 0040 {} \;; done&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do [ -d &amp;quot;${dir}/public_html/uploadimages&amp;quot; ] || mkdir &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type f -exec chmod 0660 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type d -exec chmod 0770 {} \;; done&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
#############&lt;br /&gt;
# MediaWiki #&lt;br /&gt;
#############&lt;br /&gt;
&lt;br /&gt;
vhost_dir=&amp;quot;/var/www/html/wiki.opensourceecology.org&amp;quot;&lt;br /&gt;
mw_docroot=&amp;quot;${vhost_dir}/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
chown not-apache:apache-admins &amp;quot;${vhost_dir}/LocalSettings.php&amp;quot;&lt;br /&gt;
chmod 0040 &amp;quot;${vhost_dir}/LocalSettings.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
[ -d &amp;quot;${mw_docroot}/images&amp;quot; ] || mkdir &amp;quot;${mw_docroot}/images&amp;quot;&lt;br /&gt;
chown -R www-data:www-data &amp;quot;${mw_docroot}/images&amp;quot;&lt;br /&gt;
find &amp;quot;${mw_docroot}/images&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
find &amp;quot;${mw_docroot}/images&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
[ -d &amp;quot;${vhost_dir}/cache&amp;quot; ] || mkdir &amp;quot;${vhost_dir}/cache&amp;quot;&lt;br /&gt;
chown -R www-data:www-data &amp;quot;${vhost_dir}/cache&amp;quot;&lt;br /&gt;
find &amp;quot;${vhost_dir}/cache&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
find &amp;quot;${vhost_dir}/cache&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
# RUN UPGRADE&lt;br /&gt;
sudo -u www-data php &amp;quot;${docrootDir}/maintenance/update.php&amp;quot;&lt;br /&gt;
sudo -u www-data php &amp;quot;${docrootDir}/maintenance/populateContentTables.php&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# not that I realized it was really, really stupid to run the php scripts as root – even if they were 3TOFU&#039;d. We&#039;re probably ok because this server isn&#039;t publicly accessible, but I&#039;m going to make sure to run this as &#039;www-data&#039; user in the future&lt;br /&gt;
# ah crap, adding the quotes broke the syntax on the mkdir command, so I ended-up with a dir named &amp;quot;{pre,post}&amp;quot; instead of two dirs. That broke the move, which caused the data to get merged. Now I need to startover and do the upgrade to 1.35 again :(&lt;br /&gt;
# well, the merge shouldn&#039;t be an issue since the &#039;images/&#039; dir only containts two unimportant files (and the LocalSettings.php files are absent too)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # ls -lah mediawiki-1.43.0/images/&lt;br /&gt;
total 16K&lt;br /&gt;
drwxr-xr-x  2 root root  4,0K Dec 29 18:14 .&lt;br /&gt;
drwxr-xr-x 14 root root  4,0K Dec 29 18:14 ..&lt;br /&gt;
-rw-r--r--  1  501 staff  232 Dec  5 15:41 .htaccess&lt;br /&gt;
-rw-r--r--  1  501 staff   84 Dec  5 15:41 README&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# alright, this worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # mkdir -p ${chg_dir}/{pre,post}&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # mv &amp;quot;${vhostDir}&amp;quot; &amp;quot;${vhost_backup_path}&amp;quot;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # echo $vhost_backup_path &lt;br /&gt;
/var/tmp/CHG_20241229_182546_wiki_1.35-to-1.43/pre/wiki.opensourceecology.org.20241229_182546&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # ls -lah $vhost_backup_path &lt;br /&gt;
total 155M&lt;br /&gt;
d---r-x---  4 not-apache www-data      4,0K Dec 29 05:13 .&lt;br /&gt;
drwxr-xr-x  3 root       root          4,0K Dec 29 18:35 ..&lt;br /&gt;
drwxrwx---  2 www-data   www-data      4,0K Dec 29 05:10 cache&lt;br /&gt;
drwxr-xr-x 14 root       root          4,0K Dec 29 18:14 htdocs&lt;br /&gt;
----r-----  1 root       root           18K Dec 29 03:41 LocalSettings.20241228.php&lt;br /&gt;
----r-----  1 not-apache apache-admins  18K Dec 29 05:12 LocalSettings.php&lt;br /&gt;
-rw-r--r--  1 root       root          155M Dec 29 05:03 wiki-error.log&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # ls -lah /var/www/html&lt;br /&gt;
total 36K&lt;br /&gt;
d---r-x--- 8 not-apache www-data 4,0K Dec 29 18:35 .&lt;br /&gt;
drwxr-xr-x 3 root       root     4,0K Sep 25 02:37 ..&lt;br /&gt;
d---r-x--- 3 not-apache www-data 4,0K Dec 26 18:04 fef.opensourceecology.org&lt;br /&gt;
d---r-x--- 5 not-apache www-data 4,0K Jul 11  2018 forum.opensourceecology.org&lt;br /&gt;
----r----- 1 not-apache www-data  138 Mar  3  2018 .htpasswd&lt;br /&gt;
d---r-x--- 3 not-apache www-data 4,0K Dec 26 03:58 microfactory.opensourceecology.org&lt;br /&gt;
d---r-x--- 3 not-apache www-data 4,0K Dec 26 20:08 oswh.opensourceecology.org&lt;br /&gt;
d---r-x--- 4 not-apache www-data 4,0K Dec 26 20:44 seedhome.openbuildinginstitute.org&lt;br /&gt;
d---r-x--- 4 not-apache www-data 4,0K Dec 23 00:25 store.opensourceecology.org&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, first attempt to upgrade failed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # sudo -u www-data php &amp;quot;${docrootDir}/maintenance/update.php&amp;quot;&lt;br /&gt;
Error: Missing one or more required PHP extensions. Please see&lt;br /&gt;
https://www.mediawiki.org/wiki/Manual:Installation_requirements#PHP&lt;br /&gt;
for help with installing them.&lt;br /&gt;
Please install or enable:&lt;br /&gt;
 * intl &amp;lt;https://www.php.net/intl&amp;gt;&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like we need &#039;php-intl&#039; now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # apt-cache search php | grep intl&lt;br /&gt;
php-intl - Internationalisation module for PHP [default]&lt;br /&gt;
php-symfony-polyfill-intl-grapheme - Symfony polyfill for intl&#039;s grapheme_* functions&lt;br /&gt;
php-symfony-polyfill-intl-icu - Symfony polyfill for intl&#039;s ICU-related data and classes&lt;br /&gt;
php-symfony-polyfill-intl-idn - Symfony polyfill for intl&#039;s idn_to_ascii and idn_to_utf8 functions&lt;br /&gt;
php-symfony-polyfill-intl-messageformatter - Symfony polyfill for intl&#039;s MessageFormatter class and related functions&lt;br /&gt;
php-symfony-polyfill-intl-normalizer - Symfony polyfill for intl&#039;s Normalizer class and related functions&lt;br /&gt;
php-twig-intl-extra - A Twig extension for Intl&lt;br /&gt;
php8.2-intl - Internationalisation module for PHP&lt;br /&gt;
php-symfony-intl - limited replacement layer for the PHP extension intl&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# next, it doesn&#039;t like MonoBook? Looks like it&#039;s there&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # sudo -u www-data php &amp;quot;${docrootDir}/maintenance/update.php&amp;quot;&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_USER_AGENT&amp;quot; in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 23&lt;br /&gt;
PHP Deprecated:  DefaultSettings.php is deprecated and will be removed. Use MainConfigSchema::listDefaultValues() or MainConfigSchema::getDefaultValue() instead. [Called from require_once in /var/www/html/wiki.opensourceecology.org/LocalSettings.php at line 49] in /var/www/html/wiki.opensourceecology.org/htdocs/includes/debug/MWDebug.php on line 385&lt;br /&gt;
Error: The MonoBook skin cannot be loaded. Check that all of its files are installed properly.&lt;br /&gt;
&lt;br /&gt;
#0 /var/www/html/wiki.opensourceecology.org/htdocs/includes/GlobalFunctions.php(94): MediaWiki\Registration\ExtensionRegistry-&amp;gt;queue()&lt;br /&gt;
#1 /var/www/html/wiki.opensourceecology.org/LocalSettings.php(186): wfLoadSkin()&lt;br /&gt;
#2 /var/www/html/wiki.opensourceecology.org/htdocs/LocalSettings.php(8): require_once(&#039;...&#039;)&lt;br /&gt;
#3 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Setup.php(220): require_once(&#039;...&#039;)&lt;br /&gt;
#4 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/doMaintenance.php(83): require_once(&#039;...&#039;)&lt;br /&gt;
#5 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/update.php(308): require_once(&#039;...&#039;)&lt;br /&gt;
#6 {main}&lt;br /&gt;
PHP Fatal error:  Error Loading extension. Unable to open file /MonoBook/skin.json: filemtime(): stat failed for /MonoBook/skin.json in /var/www/html/wiki.opensourceecology.org/htdocs/includes/registration/MissingExtensionException.php on line 102&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # ls -lah /var/www/html/wiki.opensourceecology.org/htdocs/skins/&lt;br /&gt;
total 28K&lt;br /&gt;
d---r-x---  6 not-apache www-data 4,0K Dec 29 18:14 .&lt;br /&gt;
d---r-x--- 14 not-apache www-data 4,0K Dec 29 18:38 ..&lt;br /&gt;
d---r-x--- 11 not-apache www-data 4,0K Dec 29 18:14 MinervaNeue&lt;br /&gt;
d---r-x---  7 not-apache www-data 4,0K Dec 29 18:14 MonoBook&lt;br /&gt;
----r-----  1 not-apache www-data 1,3K Dec  5 15:41 README&lt;br /&gt;
d---r-x---  6 not-apache www-data 4,0K Dec 29 18:14 Timeless&lt;br /&gt;
d---r-x---  9 not-apache www-data 4,0K Dec 29 18:14 Vector&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# huh, this seems to work fine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # sudo -u www-data echo &amp;quot;&amp;lt;?php echo filemtime(&#039;/var/www/html/wiki.opensourceecology.org/htdocs/skins/MonoBook/skin.json&#039;); ?&amp;gt;&amp;quot; | php&lt;br /&gt;
1733413878root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there was very little info about this on the net, but the path starting with &#039;/MonoBook/...&#039; and this thread talking about setting &#039;$wgExtensionDirectory&#039; had me guessing it was a path error&lt;br /&gt;
# I fixed it by adding this to the LocalSettings.php config https://www.mediawiki.org/wiki/Manual:$wgStyleDirectory&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$wgStyleDirectory = &amp;quot;$IP/skins&amp;quot;;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the docs say that it used to default to &amp;quot;{$IP}/skins&amp;quot; until 1.37, and then it began to default to &#039;null&#039; -- not sure why&lt;br /&gt;
# anyway now we get a similar error about the extensions&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # sudo -u www-data php &amp;quot;${docrootDir}/maintenance/update.php&amp;quot;&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_USER_AGENT&amp;quot; in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 23&lt;br /&gt;
Error: The Renameuser extension cannot be loaded. Check that all of its files are installed properly.&lt;br /&gt;
&lt;br /&gt;
#0 /var/www/html/wiki.opensourceecology.org/htdocs/includes/GlobalFunctions.php(57): MediaWiki\Registration\ExtensionRegistry-&amp;gt;queue()&lt;br /&gt;
#1 /var/www/html/wiki.opensourceecology.org/LocalSettings.php(313): wfLoadExtension()&lt;br /&gt;
#2 /var/www/html/wiki.opensourceecology.org/htdocs/LocalSettings.php(8): require_once(&#039;...&#039;)&lt;br /&gt;
#3 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Setup.php(220): require_once(&#039;...&#039;)&lt;br /&gt;
#4 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/doMaintenance.php(83): require_once(&#039;...&#039;)&lt;br /&gt;
#5 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/update.php(308): require_once(&#039;...&#039;)&lt;br /&gt;
#6 {main}&lt;br /&gt;
PHP Fatal error:  Error Loading extension. Unable to open file /var/www/html/wiki.opensourceecology.org/htdocs/extensions/Renameuser/extension.json: filemtime(): stat failed for /var/www/html/wiki.opensourceecology.org/htdocs/extensions/Renameuser/extension.json in /var/www/html/wiki.opensourceecology.org/htdocs/includes/registration/MissingExtensionException.php on line 102&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I added this to LocalSettings.php&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$wgExtensionsDirectory = &amp;quot;$IP/extensions&amp;quot;;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but we still have error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # sudo -u www-data php &amp;quot;${docrootDir}/maintenance/update.php&amp;quot;&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_USER_AGENT&amp;quot; in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 23&lt;br /&gt;
Error: The Renameuser extension cannot be loaded. Check that all of its files are installed properly.&lt;br /&gt;
&lt;br /&gt;
#0 /var/www/html/wiki.opensourceecology.org/htdocs/includes/GlobalFunctions.php(57): MediaWiki\Registration\ExtensionRegistry-&amp;gt;queue()&lt;br /&gt;
#1 /var/www/html/wiki.opensourceecology.org/LocalSettings.php(314): wfLoadExtension()&lt;br /&gt;
#2 /var/www/html/wiki.opensourceecology.org/htdocs/LocalSettings.php(8): require_once(&#039;...&#039;)&lt;br /&gt;
#3 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Setup.php(220): require_once(&#039;...&#039;)&lt;br /&gt;
#4 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/doMaintenance.php(83): require_once(&#039;...&#039;)&lt;br /&gt;
#5 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/update.php(308): require_once(&#039;...&#039;)&lt;br /&gt;
#6 {main}&lt;br /&gt;
PHP Fatal error:  Error Loading extension. Unable to open file /var/www/html/wiki.opensourceecology.org/htdocs/extensions/Renameuser/extension.json: filemtime(): stat failed for /var/www/html/wiki.opensourceecology.org/htdocs/extensions/Renameuser/extension.json in /var/www/html/wiki.opensourceecology.org/htdocs/includes/registration/MissingExtensionException.php on line 102&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, there is no &#039;Renamuser&#039; dir&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # ls htdocs/extensions/&lt;br /&gt;
AbuseFilter   ConfirmEdit      InputBox     MultimediaViewer  PdfHandler   SecureLinkFixer        Thanks&lt;br /&gt;
CategoryTree  DiscussionTools  Interwiki    Nuke              Poem         SpamBlacklist          TitleBlacklist&lt;br /&gt;
Cite          Echo             Linter       OATHAuth          README       SyntaxHighlight_GeSHi  VisualEditor&lt;br /&gt;
CiteThisPage  Gadgets          LoginNotify  PageImages        ReplaceText  TemplateData           WikiEditor&lt;br /&gt;
CodeEditor    ImageMap         Math         ParserFunctions   Scribunto    TextExtracts&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I commented-out that line, and now I get now it finally ran&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # sudo -u www-data php &amp;quot;${docrootDir}/maintenance/update.php&amp;quot;&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_USER_AGENT&amp;quot; in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 23&lt;br /&gt;
&lt;br /&gt;
*******************************************************************************&lt;br /&gt;
NOTE: Do not run maintenance scripts directly, use maintenance/run.php instead!&lt;br /&gt;
      Running scripts directly has been deprecated in MediaWiki 1.40.&lt;br /&gt;
      It may not work for some (or any) scripts in the future.&lt;br /&gt;
*******************************************************************************&lt;br /&gt;
&lt;br /&gt;
MediaWiki 1.43.0 Updater&lt;br /&gt;
&lt;br /&gt;
Your composer.lock file is up to date with current dependencies!&lt;br /&gt;
Going to run database updates for osewiki_db-wiki_&lt;br /&gt;
Depending on the size of your database this may take a while!&lt;br /&gt;
Abort with control-c in the next five seconds (skip this countdown with --quick) ...0&lt;br /&gt;
Can not upgrade from versions older than 1.35, please upgrade to that version or later first.&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but, shit, it says it can&#039;t update from versions older than 1.35. Duh, that&#039;s why we updated it to 1.35 yesterday! Ugh&lt;br /&gt;
# I guess I didn&#039;t actually login and confirm the version after the last update&lt;br /&gt;
# also, the message yelling us at the top is because apparently we&#039;re supposed to use &#039;run.php&#039; since v1.40, so it&#039;s a different command for our first upgrade from our second; this is what we should do, but it gets the same result https://www.mediawiki.org/wiki/Manual:Upgrading&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # sudo -u www-data php &amp;quot;${docrootDir}/maintenance/run.php&amp;quot; &amp;quot;${docrootDir}/maintenance/update.php&amp;quot;&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_USER_AGENT&amp;quot; in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 23&lt;br /&gt;
MediaWiki 1.43.0 Updater&lt;br /&gt;
&lt;br /&gt;
Your composer.lock file is up to date with current dependencies!&lt;br /&gt;
Going to run database updates for osewiki_db-wiki_&lt;br /&gt;
Depending on the size of your database this may take a while!&lt;br /&gt;
Abort with control-c in the next five seconds (skip this countdown with --quick) ...0&lt;br /&gt;
Can not upgrade from versions older than 1.35, please upgrade to that version or later first.&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, this the internet says I can verify the version of the wiki by checking the files. that&#039;s dumb because we need to check the db&lt;br /&gt;
# others say to check Special:Version. Yeah, I could if the site was working; that&#039;s also dumb&lt;br /&gt;
# I found on SO with a query to check the DB, but it says we&#039;re running MedaiWiki v1.24.2? Wtf? https://stackoverflow.com/questions/19074643/determining-mediawiki-version-from-the-database-only&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [osewiki_db]&amp;gt; select max(ul_key) from wiki_updatelog where ul_key like &#039;updatelist-%&#039;;&lt;br /&gt;
+------------------------------+&lt;br /&gt;
| max(ul_key)                  |&lt;br /&gt;
+------------------------------+&lt;br /&gt;
| updatelist-1.24.2-1452650145 |&lt;br /&gt;
+------------------------------+&lt;br /&gt;
1 row in set (0,001 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [osewiki_db]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, at this point I guess I have to start back at recreating the site from the hetzner2 transfer, then re-do the first upgrade. But this time I&#039;ll be sure to check the DB and the Special:Version pages after the first upgrade before proceeding with the second&lt;br /&gt;
# I created a CHG ticket for this whole process https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_wiki_to_hetzner3&lt;br /&gt;
# I confirmed that Special:Version on the old wiki does, indeed, say v1.30.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Installed software&lt;br /&gt;
Product 	Version&lt;br /&gt;
MediaWiki 	1.30.0&lt;br /&gt;
PHP 	5.6.40 (apache2handler)&lt;br /&gt;
MariaDB 	5.5.68-MariaDB&lt;br /&gt;
ICU 	50.1.2&lt;br /&gt;
Entry point URLs&lt;br /&gt;
Entry point	URL&lt;br /&gt;
Article path	/wiki/$1&lt;br /&gt;
Script path	/&lt;br /&gt;
index.php	/index.php&lt;br /&gt;
api.php	/api.php&lt;br /&gt;
load.php	/load.php&lt;br /&gt;
Installed skins&lt;br /&gt;
Skin	Version	License	Description	Authors&lt;br /&gt;
Cologne Blue	–	GPL-2.0+	A lightweight skin with minimal formatting	Lee Daniel Crocker and others&lt;br /&gt;
Modern	–	GPL-2.0+	A blue/gray theme with sidebar and top bar. Derived from MonoBook	River Tarnell and others&lt;br /&gt;
MonoBook	–	GPL-2.0+	The classic MediaWiki skin since 2004, named after the black-and-white photo of a book in the page background	Gabriel Wicke and others&lt;br /&gt;
Vector	–	GPL-2.0+	Modern version of MonoBook with fresh look and many usability improvements	Trevor Parscal, Roan Kattouw and others&lt;br /&gt;
Installed extensions&lt;br /&gt;
Special pages&lt;br /&gt;
Extension	Version	License	Description	Authors&lt;br /&gt;
Confirm User Accounts	– (4fe25f7)	GPL-2.0+	Gives bureaucrats the ability to confirm account requests	Aaron Schulz&lt;br /&gt;
Interwiki	3.1 20160307	GPL-2.0+	Adds a special page to view and edit the interwiki table	Stephanie Amanda Stevens, Alexandre Emsenhuber, Robin Pepermans, Siebrand Mazeland, Platonides, Raimond Spekking, Sam Reed, Jack Phoenix, Calimonius the Estrange and others&lt;br /&gt;
Nuke	1.3.0	GPL-2.0+	Gives administrators the ability to mass delete pages	Brion Vibber and Jeroen De Dauw&lt;br /&gt;
Renameuser	–	GPL-2.0+	Adds a special page to rename a user (need renameuser right)	Ævar Arnfjörð Bjarmason and Aaron Schulz&lt;br /&gt;
Replace Text	1.2 (4426752)	GPL-2.0+	Provides a special page to allow administrators to do a global string find-and-replace on all the content pages of a wiki	Yaron Koren, Niklas Laxström and others&lt;br /&gt;
UserMerge	1.10.1 (4546537)	GPL-2.0+	Merges references from one user to another user in the wiki database - will also delete old users following merge. Requires usermerge privileges	Tim Laqua, Thomas Gries and Matthew April&lt;br /&gt;
Parser hooks&lt;br /&gt;
Extension	Version	License	Description	Authors&lt;br /&gt;
CategoryTree	– (850c018)	GPL-2.0+	Dynamically navigate the category structure	Daniel Kinzler&lt;br /&gt;
Cite	–	GPL-2.0+	Adds &amp;lt;ref[ name=id]&amp;gt; and &amp;lt;references/&amp;gt; tags, for citations	Ævar Arnfjörð Bjarmason, Andrew Garrett, Brion Vibber, Ed Sanders, Marius Hoch, Steve Sanbeg, Trevor Parscal and others&lt;br /&gt;
ParserFunctions	1.6.0	GPL-2.0+	Enhance parser with logical functions	Tim Starling, Robert Rohde, Ross McClure and Juraj Simlovic&lt;br /&gt;
Widgets	1.3.0 (fce5acc)	GPL-2.0+	Allows wiki administrators to add free-form widgets to the wiki by editing pages within the Widget namespace. Community-contributed widgets can be found on MediaWikiWidgets.org	Sergey Chernyshev, Yaron Koren and others&lt;br /&gt;
Spam prevention&lt;br /&gt;
Extension	Version	License	Description	Authors&lt;br /&gt;
ConfirmEdit	1.5.0	GPL-2.0+	Provides CAPTCHA techniques to protect against spam and password-guessing	Brion Vibber, Florian Schmidt, Sam Reed and others&lt;br /&gt;
Other&lt;br /&gt;
Extension	Version	License	Description	Authors&lt;br /&gt;
Gadgets	–	GPL-2.0+	Lets users select custom CSS and JavaScript gadgets in their preferences	Daniel Kinzler and Max Semenik&lt;br /&gt;
OATHAuth	0.2.2 (bed2e4b)	GPL-2.0+	Provides authentication support using HMAC based one-time passwords	Ryan Lane&lt;br /&gt;
ReCaptcha	–			&lt;br /&gt;
Installed libraries&lt;br /&gt;
Library	Version	License	Description	Authors&lt;br /&gt;
composer/semver	1.4.2	MIT	Semver library that offers utilities, version constraint parsing and validation.	Nils Adermann, Jordi Boggiano and Rob Bast&lt;br /&gt;
cssjanus/cssjanus	1.2.0	Apache-2.0	Convert CSS stylesheets between left-to-right and right-to-left.	&lt;br /&gt;
firebase/php-jwt	4.0.0	BSD-3-Clause	A simple library to encode and decode JSON Web Tokens (JWT) in PHP. Should conform to the current spec.	Neuman Vong and Anant Narayanan&lt;br /&gt;
james-heinrich/getid3	1.9.14	GPL	PHP script that extracts useful information from popular multimedia file formats	&lt;br /&gt;
justinrainbow/json-schema	5.2.1	MIT	A library to validate a json schema.	Bruno Prieto Reis, Justin Rainbow, Igor Wiedler and Robert Schönthal&lt;br /&gt;
liuggio/statsd-php-client	1.0.18	MIT	Statsd (Object Oriented) client library for PHP	Giulio De Donato&lt;br /&gt;
mediawiki/at-ease	1.1.0	GPL-2.0+	Safe replacement to @ for suppressing warnings.	Tim Starling and MediaWiki developers&lt;br /&gt;
monolog/monolog	1.22.1	MIT	Sends your logs to files, sockets, inboxes, databases and various web services	Jordi Boggiano&lt;br /&gt;
mustangostang/spyc	0.6.2	MIT	A simple YAML loader/dumper class for PHP	mustangostang&lt;br /&gt;
nmred/kafka-php	0.1.5	BSD-3-Clause	Kafka client for php	&lt;br /&gt;
oojs/oojs-ui	0.23.0	MIT	Provides library of common widgets, layouts, and windows.	Timo Tijhof, Bartosz Dziewoński, Ed Sanders, James D. Forrester, Kirsten Menger-Anderson, Rob Moen, Roan Kattouw, Trevor Parscal, Kunal Mehta and Prateek Saxena&lt;br /&gt;
oyejorge/less.php	1.7.0.14	Apache-2.0	PHP port of the Javascript version of LESS http://lesscss.org (Originally maintained by Josh Schmidt)	Matt Agar, Martin Jantošovič and Josh Schmidt&lt;br /&gt;
pear/console_getopt	1.4.1	BSD-2-Clause	More info available on: http://pear.php.net/package/Console_Getopt	Greg Beaver, Andrei Zmievski and Stig Bakken&lt;br /&gt;
pear/mail	1.4.1	BSD-2-Clause	Class that provides multiple interfaces for sending emails.	Chuck Hagenbuch, Richard Heyes and Aleksander Machniak&lt;br /&gt;
pear/mail_mime	1.10.1	BSD-3-clause	Mail_Mime provides classes to create MIME messages	Cipriano Groenendal and Aleksander Machniak&lt;br /&gt;
pear/mail_mime-decode	1.5.5.2	BSD-2-Clause	More info available on: http://pear.php.net/package/Mail_mimeDecode	Cipriano Groenendal and Aleksander Machniak&lt;br /&gt;
pear/net_smtp	1.7.3	PHP-3.01	An implementation of the SMTP protocol	Jon Parise and Chuck Hagenbuch&lt;br /&gt;
pear/net_socket	1.2.1	BSD-2-Clause	More info available on: http://pear.php.net/package/Net_Socket	Chuck Hagenbuch, Aleksander Machniak and Stig Bakken&lt;br /&gt;
pear/pear-core-minimal	1.10.3	BSD-3-Clause	Minimal set of PEAR core files to be used as composer dependency	Christian Weiske&lt;br /&gt;
pear/pear_exception	1.0.0	BSD-2-Clause	The PEAR Exception base class.	Helgi Thormar and Greg Beaver&lt;br /&gt;
pimple/pimple	3.0.2	MIT	Pimple, a simple Dependency Injection Container	Fabien Potencier&lt;br /&gt;
psr/log	1.0.2	MIT	Common interface for logging libraries	PHP-FIG&lt;br /&gt;
ruflin/elastica	5.1.0	MIT	Elasticsearch Client	Nicolas Ruflin&lt;br /&gt;
stil/gd-text	1.0.0	MIT	A class drawing multiline and aligned text on pictures. Uses GD extension.	&lt;br /&gt;
symfony/process	3.2.6	MIT	Symfony Process Component	Fabien Potencier and Symfony Community&lt;br /&gt;
wikimedia/assert	0.2.2	MIT	Provides runtime assertions	Daniel Kinzler&lt;br /&gt;
wikimedia/avro	1.7.7	Apache-2.0	A library for using Apache Avro with PHP.	Michael Glaesemann, Andy Wick, Saleem Shafi, A B, Doug Cutting and Tom White&lt;br /&gt;
wikimedia/base-convert	1.0.1	GPL-2.0+	Convert an arbitrarily-long string from one numeric base to another, optionally zero-padding to a minimum column width.	Brion Vibber and Tyler Romeo&lt;br /&gt;
wikimedia/cdb	1.4.1	GPL-2.0+	Constant Database (CDB) wrapper library for PHP. Provides pure-PHP fallback when dba_* functions are absent.	Daniel Kinzler, Tim Starling, Chad Horohoe and Ori Livneh&lt;br /&gt;
wikimedia/cldr-plural-rule-parser	1.0.0	GPL-2.0+	Evaluates plural rules specified in the CLDR project notation.	Tim Starling and Niklas Laxström&lt;br /&gt;
wikimedia/composer-merge-plugin	1.4.1	MIT	Composer plugin to merge multiple composer.json files	Bryan Davis&lt;br /&gt;
wikimedia/css-sanitizer	1.0.2	Apache-2.0	Classes to parse and sanitize CSS	Brad Jorsch&lt;br /&gt;
wikimedia/html-formatter	1.0.1	GPL-2.0+	Performs transformations of HTML by wrapping around libxml2 and working around its countless bugs.	MediaWiki contributors&lt;br /&gt;
wikimedia/ip-set	1.1.0	GPL-2.0+	Efficiently match IP addresses against a set of CIDR specifications.	Brandon Black&lt;br /&gt;
wikimedia/php-session-serializer	1.0.4	GPL-2.0+	Provides methods like PHP&#039;s session_encode and session_decode that don&#039;t mess with $_SESSION	Brad Jorsch&lt;br /&gt;
wikimedia/purtle	1.0.6	GPL-2.0+	Fast streaming RDF serializer	Daniel Kinzler, Thiemo Mättig and Stanislav Malyshev&lt;br /&gt;
wikimedia/relpath	2.0.0	MIT	Compute a relative filepath between two paths.	Ori Livneh&lt;br /&gt;
wikimedia/remex-html	1.0.1	MIT	Fast HTML 5 parser	Tim Starling&lt;br /&gt;
wikimedia/running-stat	1.1.0	GPL-2.0+	PHP implementations of online statistical algorithms	Ori Livneh&lt;br /&gt;
wikimedia/scoped-callback	1.0.0	GPL-2.0+	Class for asserting that a callback happens when a dummy object leaves scope	Aaron Schulz&lt;br /&gt;
wikimedia/testing-access-wrapper	1.0.0	GPL-2.0+	A simple helper class to access non-public elements of a class when testing.	Adam Roses Wight, Brad Jorsch and Gergő Tisza&lt;br /&gt;
wikimedia/textcat	1.2.0	LGPL-2.1	PHP port of the TextCat language guesser utility, see http://odur.let.rug.nl/~vannoord/TextCat/.	Stanislav Malyshev and Trey Jones&lt;br /&gt;
wikimedia/timestamp	1.0.0	GPL-2.0+	Creation, parsing, and conversion of timestamps	Tyler Romeo&lt;br /&gt;
wikimedia/utfnormal	1.1.0	GPL-2.0+	Contains Unicode normalization routines, including both pure PHP implementations and automatic use of the &#039;intl&#039; PHP extension when present	Brion Vibber&lt;br /&gt;
wikimedia/wait-condition-loop	1.0.1	GPL-2.0+	Wait loop that reaches a condition or times out	Aaron Schulz&lt;br /&gt;
wikimedia/wrappedstring	2.2.0	MIT	Automatically compact sequentially-outputted strings that share a common prefix / suffix pair.	Timo Tijhof&lt;br /&gt;
zordius/lightncandy	0.23	MIT	An extremely fast PHP implementation of handlebars ( http://handlebarsjs.com/ ) and mustache ( http://mustache.github.io/ ).	Zordius Chen&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I then ran this on hetzner2, which shows that this db query is useless&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# echo &amp;quot;select max(ul_key) from wiki_updatelog where ul_key like &#039;updatelist-%&#039;;&amp;quot; | mysql osewiki_db -uroot -p${mysqlPass}&lt;br /&gt;
max(ul_key)&lt;br /&gt;
updatelist-1.24.2-1452650145&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I think that the only thing I need to change from this process is that I need to change the (time)stamp var, so we get the last backup; no need to make a new one from hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
stamp=$(ls /var/tmp/backups_for_migration_from_hetzner2/ | grep -i wiki | grep -oE &amp;quot;[0-9]{8}&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin just emailed me saying that they want to retire fef, but make a backup of it&lt;br /&gt;
# I re-visited our forums deprecation CHG to see the commands to use wget to create a static site https://wiki.opensourceecology.org/wiki/CHG-2018-02-04_deprecate_vanilla_forums&lt;br /&gt;
# I drafted this to create a fef offline site&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner2 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;fef.opensourceecology.org&#039;&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner3/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
mkdir -p ${backupDir_hetzner2}/{current,old}&lt;br /&gt;
pushd ${backupDir_hetzner2}/current&lt;br /&gt;
&lt;br /&gt;
mkdir wget&lt;br /&gt;
pushd wget&lt;br /&gt;
time nice wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;fef.opensourceecology.org&amp;quot; &amp;quot;http://fef.opensourceecology.org&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I executed it on hetzner2: good news; the whole website is just 32 MB!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wget]# du -sh fef.opensourceecology.org/&lt;br /&gt;
32M     fef.opensourceecology.org/&lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also downloaded an xml export of the wordpress site. It appears that doesn&#039;t include the media files, and it was only 698 kb.&lt;br /&gt;
# even if I tell it to export the &amp;quot;media&amp;quot; dir, it doesn&#039;t do it.&lt;br /&gt;
# anyway, we do have an actual backup of the db and wordpress files already; ;it&#039;s only 180 MB&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology backups_for_migration_to_hetzner2]# du -sh fef.opensourceecology.org_20241226/current/*&lt;br /&gt;
179M    fef.opensourceecology.org_20241226/current/fef.opensourceecology.org_files.20241226.tar.gz&lt;br /&gt;
64K     fef.opensourceecology.org_20241226/current/mysqldump_fef.opensourceecology.org.20241226.sql.bz2&lt;br /&gt;
[root@opensourceecology backups_for_migration_to_hetzner2]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# all right, here&#039;s what I&#039;m thinking: we keep fef around as a static-site vhost, just like our forums; it will forever look like it does, without having to worry about code rot -- but we won&#039;t be able to edit it in the future&lt;br /&gt;
# And in the vhost (not docroot), we keep a copy of the export xml, db, and files.&lt;br /&gt;
# this is a trade-off where we&#039;ll have more data but less time. We should be able to migrate this site much easier, and it will never break again due to code rot&lt;br /&gt;
# These backups are lossy, but we should have everything we&#039;d need to re-import it into wordpress in the future, though it would probably never look the same and it would not be trivial (but it will be much easier than just restoring from a static site)&lt;br /&gt;
# at the end of the day, the site will require us to lug around 32+180+1 = 213 MB. I think that&#039;s fine.&lt;br /&gt;
# ...&lt;br /&gt;
# Marcin also suggested that we do the same for oswh.opensourceecology.org&lt;br /&gt;
# I did the same for oswh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner2 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;oswh.opensourceecology.org&#039;&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner3/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
mkdir -p ${backupDir_hetzner2}/{current,old}&lt;br /&gt;
pushd ${backupDir_hetzner2}/current&lt;br /&gt;
&lt;br /&gt;
mkdir wget&lt;br /&gt;
pushd wget&lt;br /&gt;
time nice wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains &amp;quot;${vhost_name}&amp;quot; &amp;quot;${vhost_name}&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the static site is just 3 MB!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wget]# pwd&lt;br /&gt;
/var/tmp/backups_for_migration_to_hetzner3/oswh.opensourceecology.org_20241230/current/wget&lt;br /&gt;
[root@opensourceecology wget]# du -sh *&lt;br /&gt;
2.5M    oswh.opensourceecology.org&lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the files and db backup is 50 MB&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wget]# du -sh /var/tmp/backups_for_migration_to_hetzner2/oswh.opensourceecology.org_20241226/current/*&lt;br /&gt;
1.1M    /var/tmp/backups_for_migration_to_hetzner2/oswh.opensourceecology.org_20241226/current/mysqldump_oswh.opensourceecology.org.20241226.sql.bz2&lt;br /&gt;
48M     /var/tmp/backups_for_migration_to_hetzner2/oswh.opensourceecology.org_20241226/current/oswh.opensourceecology.org_files.20241226.tar.gz&lt;br /&gt;
[root@opensourceecology wget]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# And the xml export is 11 MB&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp6783:~/Downloads$ du -sh opensourcehardwaredocumentationjam.wordpress.2024-12-30.xml &lt;br /&gt;
11M	opensourcehardwaredocumentationjam.wordpress.2024-12-30.xml&lt;br /&gt;
user@disp6783:~/Downloads$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so that means the total oswh would be 3+50+11=64 MB&lt;br /&gt;
# I sent an email to Marcin &amp;amp; Catarina with this info&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ok, I have good news: the fef and oswh sites are small enough that we can just turn them into static sties.&lt;br /&gt;
&lt;br /&gt;
Basically I&#039;ll do what we did with the OSE vanilla forums.&lt;br /&gt;
&lt;br /&gt;
Pros:&lt;br /&gt;
&lt;br /&gt;
 + It&#039;s nearly negligible extra time to migrate in the future&lt;br /&gt;
 + It isn&#039;t generated dynamically, so it wont &amp;quot;rot&amp;quot; in the future&lt;br /&gt;
 + It doesn&#039;t require wordpress or themes or plugins&lt;br /&gt;
&lt;br /&gt;
Cons:&lt;br /&gt;
&lt;br /&gt;
 + They&#039;re generally bigger in size&lt;br /&gt;
 + You can&#039;t edit them (easily)&lt;br /&gt;
 + You can&#039;t go from a static site back to a wordpress site again&lt;br /&gt;
&lt;br /&gt;
However, to mitigate that last con, I&#039;ll also keep three backups:&lt;br /&gt;
&lt;br /&gt;
 1. An xml export from wordpress&lt;br /&gt;
 2. A wordpress DB backup&lt;br /&gt;
 3. A wordpress file backup&lt;br /&gt;
&lt;br /&gt;
I just confirmed that doing this for:&lt;br /&gt;
&lt;br /&gt;
 * fef.opensourceecology.org requires a total of ~250 MB&lt;br /&gt;
&lt;br /&gt;
 * oswh.opensourceecology.org requires a total of ~100 MB&lt;br /&gt;
&lt;br /&gt;
This is totally reasonable, and I think it&#039;s the best happy-medium.&lt;br /&gt;
&lt;br /&gt;
Please let me know if you have any questions, and if you&#039;d like me to convert any of your other unused sites into static sites.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 12/29/24 22:50, Michael Altfield wrote:&lt;br /&gt;
&amp;gt; Yeah, making a backup of the data (post/page text and images) is easy.&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; The difficult &amp;amp; time-consuming part is trying to keep making it look the &lt;br /&gt;
&amp;gt; same after the code that renders it rots over time..&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; I&#039;ll look into:&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt;   [1] exporting the content from wordpress (which you can import into &lt;br /&gt;
&amp;gt; wordpress in the future if you ever want restore the site)&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt;   [2] making an html static site copy (like we did with the forums)&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; Both are lossy in different ways. I&#039;ll let you know how big these are &lt;br /&gt;
&amp;gt; and if they&#039;re practical for the sites we&#039;re retiring.&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; So far we have the following on the not-migrate list:&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt;   [a] fef.opensourceecology.org&lt;br /&gt;
&amp;gt;   [b] oswh.opensourceecology.org&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; Here&#039;s the others:&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; 2. store.opensourceecology.orgc&lt;br /&gt;
&amp;gt; 3. microfactory.opensourceecology.org&lt;br /&gt;
&amp;gt; 6. seedhome.openbuildinginstitute.org&lt;br /&gt;
&amp;gt; 7. www.openbuildinginstitute.org&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; Are there any more you want me to not migrate?&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have &lt;br /&gt;
&amp;gt; changed my email address by visiting my website at https:// &lt;br /&gt;
&amp;gt; email.michaelaltfield.net&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; On 12/29/24 22:13, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt;&amp;gt; You&#039;re right about not using unpopular themes and as a practice go to &lt;br /&gt;
&amp;gt;&amp;gt; only&lt;br /&gt;
&amp;gt;&amp;gt; wordpress.org/plugins/.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Just had a long discussion with Catarina regarding archival footage from&lt;br /&gt;
&amp;gt;&amp;gt; the past, which is what the fef site consists of. She has no plans to&lt;br /&gt;
&amp;gt;&amp;gt; continue the site as it is. It appears that the best way to go forward is&lt;br /&gt;
&amp;gt;&amp;gt; to simply download the pictures so she has a local backup. She can&#039;t find&lt;br /&gt;
&amp;gt;&amp;gt; the originals from 10 years ago. We have the text from the site. &lt;br /&gt;
&amp;gt;&amp;gt; What&#039;s the&lt;br /&gt;
&amp;gt;&amp;gt; easiest way to download all the pictures and transfer to us - can you do&lt;br /&gt;
&amp;gt;&amp;gt; that for us?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Not that I expect that we would want to do this - but we do have a backup&lt;br /&gt;
&amp;gt;&amp;gt; from last year or whenever - so if we ever wanted to restore the site &lt;br /&gt;
&amp;gt;&amp;gt; with&lt;br /&gt;
&amp;gt;&amp;gt; a different theme - we could still do that, correct?&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; In any case, once we are done with the server migration, let&#039;s meet up &lt;br /&gt;
&amp;gt;&amp;gt; so I&lt;br /&gt;
&amp;gt;&amp;gt; can get caught up on how everything went, and get a refresher on how to&lt;br /&gt;
&amp;gt;&amp;gt; access everything if you get hit by a bus. The main point is to migrate&lt;br /&gt;
&amp;gt;&amp;gt; only - and later think about what additional improvements we may need &lt;br /&gt;
&amp;gt;&amp;gt; after&lt;br /&gt;
&amp;gt;&amp;gt; the apprenticeship is running. As long as we have the wiki, main site, &lt;br /&gt;
&amp;gt;&amp;gt; and&lt;br /&gt;
&amp;gt;&amp;gt; phplist, we&#039;re good for basic operations.&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; Marcin&lt;br /&gt;
&amp;gt;&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt; MJ&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Sat Dec 28, 2024=&lt;br /&gt;
&lt;br /&gt;
# here&#039;s TOFU 3/3 on phpList and MediaWiki (ISP, exit in Ecuador)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Ecuador&lt;br /&gt;
2024-12-28&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
2024-12-28&lt;br /&gt;
3c457f4960cfcbdf5e1e2fab6b6e3ade006e65a55cd93459e0ecede3d9c23a16  keys.txt&lt;br /&gt;
56e0a7fe2ff78e50b9c87b3d589f2d8df5bcab0018e454b2a4a239c8586bd8ee  mediawiki-1.43.0.tar.gz&lt;br /&gt;
cee3d020f25e9b3f66eef214a343be466a1a313db31344ad2a5e87ab0183883e  mediawiki-1.43.0.tar.gz.sig&lt;br /&gt;
9e17cb15dd75bbbd5dbb984eda674863c3b10ab72613cf8a39a00c3e11a8492a  phplist-3.6.14.zip&lt;br /&gt;
gpg: WARNING: no command supplied.  Trying to guess what you mean ...&lt;br /&gt;
pub   rsa4096/0x73F146FECF9D333C 2014-11-20 [SC] [expired: 2021-06-05]&lt;br /&gt;
	  Key fingerprint = F64E BF5F 2099 6AB5 14F1  98A8 73F1 46FE CF9D 333C&lt;br /&gt;
uid                             Tim Starling &amp;lt;tstarling@wikimedia.org&amp;gt;&lt;br /&gt;
sub   rsa4096/0x1075249FCCC9CAAF 2014-11-20 [E] [expired: 2021-06-05]&lt;br /&gt;
pub   dsa1024/0xC119E1A64D70938E 2003-11-15 [SCA]&lt;br /&gt;
	  Key fingerprint = 4412 76E9 CCD1 5F44 F6D9  7D18 C119 E1A6 4D70 938E&lt;br /&gt;
uid                             Brion Vibber &amp;lt;brion@pobox.com&amp;gt;&lt;br /&gt;
sub   elg1024/0x6596FAD2965B3548 2003-11-15 [E]&lt;br /&gt;
pub   dsa1024/0x9B69B3109D3BB7B0 2011-10-24 [SC]&lt;br /&gt;
	  Key fingerprint = 1D98 867E 8298 2C8F E0AB  C25F 9B69 B310 9D3B B7B0&lt;br /&gt;
uid                             Sam Reed &amp;lt;reedy@wikimedia.org&amp;gt;&lt;br /&gt;
sub   elg2048/0x3BBB95CE2B08BFD2 2011-10-24 [E]&lt;br /&gt;
pub   rsa2048/0x72BC1C5D23107F8A 2014-04-29 [SC] [expires: 2026-04-29]&lt;br /&gt;
	  Key fingerprint = 41B2 ABE8 17AD D3E5 2BDA  946F 72BC 1C5D 2310 7F8A&lt;br /&gt;
uid                             Chad Horohoe &amp;lt;chad@wikimedia.org&amp;gt;&lt;br /&gt;
uid                             keybase.io/demon &amp;lt;demon@keybase.io&amp;gt;&lt;br /&gt;
sub   rsa2048/0x08CF4E7951361C13 2014-04-29 [E] [expires: 2026-04-29]&lt;br /&gt;
pub   rsa4096/0xF6DAD285018FAC02 2014-02-19 [SC] [expired: 2018-10-04]&lt;br /&gt;
	  Key fingerprint = 6237 D8D3 ECC1 AE91 8729  296F F6DA D285 018F AC02&lt;br /&gt;
uid                             Tyler Cipriani &amp;lt;tcipriani@wikimedia.org&amp;gt;&lt;br /&gt;
uid                             Tyler Cipriani &amp;lt;tyler@tylercipriani.com&amp;gt;&lt;br /&gt;
uid                             [jpeg image of size 5098]&lt;br /&gt;
sub   rsa4096/0xB002E1FDEE737D83 2014-02-19 [E] [expired: 2018-10-04]&lt;br /&gt;
pub   rsa3072/0x26752EBB0D9E6218 2021-11-11 [SC]&lt;br /&gt;
	  Key fingerprint = 72D2 86F6 F8F0 3C78 F2C5  9C73 2675 2EBB 0D9E 6218&lt;br /&gt;
uid                             Amir Sarabadani &amp;lt;asarabadani@wikimedia.org&amp;gt;&lt;br /&gt;
sub   rsa3072/0x4F889038CE86B378 2021-11-11 [E]&lt;br /&gt;
pub   rsa4096/0x361F943B15C08DD4 2015-05-22 [SC] [expired: 2020-05-20]&lt;br /&gt;
	  Key fingerprint = 80D1 13B7 67E3 D519 3672  5679 361F 943B 15C0 8DD4&lt;br /&gt;
uid                             Brian Wolff &amp;lt;bwolff@wikimedia.org&amp;gt;&lt;br /&gt;
uid                             Brian Wolff (Bawolff) &amp;lt;bawolff@gmail.com&amp;gt;&lt;br /&gt;
sub   rsa4096/0xBF1629CD074D3DD8 2015-05-22 [E] [expired: 2020-05-20]&lt;br /&gt;
pub   rsa4096/0x131910E01605D9AA 2016-01-08 [SC] [expired: 2020-07-31]&lt;br /&gt;
	  Key fingerprint = C83A 8E4D 3C8F EB7C 8A3A  1998 1319 10E0 1605 D9AA&lt;br /&gt;
uid                             Mukunda Modell &amp;lt;twentyafterfour@gmail.com&amp;gt;&lt;br /&gt;
uid                             Mukunda Modell (WMF) &amp;lt;mmodell@wikimedia.org&amp;gt;&lt;br /&gt;
uid                             [jpeg image of size 2928]&lt;br /&gt;
sub   rsa4096/0x5411F23A0C4E5EC1 2018-12-25 [A] [expired: 2020-12-24]&lt;br /&gt;
sub   rsa4096/0x02C99BB8AB1C6DD5 2018-12-25 [E] [expired: 2020-12-24]&lt;br /&gt;
sub   rsa4096/0x60AE06D4875BE862 2018-12-26 [S] [expired: 2019-12-26]&lt;br /&gt;
pub   rsa4096/0xA8F734246D73B586 2024-12-09 [SC] [expires: 2026-12-09]&lt;br /&gt;
	  Key fingerprint = E059 C034 E7A4 3058 3C25  2F4A A8F7 3424 6D73 B586&lt;br /&gt;
uid                             atieno njira &amp;lt;pnjira@wikimedia.org&amp;gt;&lt;br /&gt;
sub   rsa4096/0xB65D063E68F2FC8A 2024-12-09 [E] [expires: 2026-12-09]&lt;br /&gt;
user@disp1764:/tmp/tmp.wJv4Y7wAMf$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the hashes all match; let&#039;s import that new MediaWiki key into our hetzner3 keyring&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/nginx # cd /var/tmp/mediawiki/&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # ls&lt;br /&gt;
keys.txt  mediawiki-1.35.0  mediawiki-1.35.0.tar.gz  mediawiki-1.35.0.tar.gz.sig&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # rm keys.txt &lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # f&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # wget https://www.mediawiki.org/keys/keys.txt&lt;br /&gt;
--2024-12-28 22:07:27--  https://www.mediawiki.org/keys/keys.txt&lt;br /&gt;
Resolving www.mediawiki.org (www.mediawiki.org)... 2a02:ec80:300:ed1a::1, 185.15.59.224&lt;br /&gt;
Connecting to www.mediawiki.org (www.mediawiki.org)|2a02:ec80:300:ed1a::1|:443... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 200 OK&lt;br /&gt;
Length: 59336 (58K) [text/plain]&lt;br /&gt;
Saving to: ‘keys.txt’&lt;br /&gt;
&lt;br /&gt;
keys.txt                           100%[================================================================&amp;gt;]  57,95K  --.-KB/s    in 0,01s   &lt;br /&gt;
&lt;br /&gt;
2024-12-28 22:07:28 (5,02 MB/s) - ‘keys.txt’ saved [59336/59336]&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # sha256sum keys.txt &lt;br /&gt;
3c457f4960cfcbdf5e1e2fab6b6e3ade006e65a55cd93459e0ecede3d9c23a16  keys.txt&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the checksum matches our 3TOFU; let&#039;s import it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # gpg --import keys.txt &lt;br /&gt;
gpg: key 73F146FECF9D333C: &amp;quot;Tim Starling &amp;lt;tstarling@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key C119E1A64D70938E: 9 signatures not checked due to missing keys&lt;br /&gt;
gpg: key C119E1A64D70938E: &amp;quot;Brion Vibber &amp;lt;brion@pobox.com&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key 9B69B3109D3BB7B0: &amp;quot;Sam Reed &amp;lt;reedy@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key 72BC1C5D23107F8A: &amp;quot;Chad Horohoe &amp;lt;chad@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key F6DAD285018FAC02: 12 signatures not checked due to missing keys&lt;br /&gt;
gpg: key F6DAD285018FAC02: &amp;quot;Tyler Cipriani &amp;lt;tcipriani@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key 26752EBB0D9E6218: &amp;quot;Amir Sarabadani &amp;lt;asarabadani@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key 361F943B15C08DD4: 7 signatures not checked due to missing keys&lt;br /&gt;
gpg: key 361F943B15C08DD4: &amp;quot;Brian Wolff (Bawolff) &amp;lt;bawolff@gmail.com&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key 131910E01605D9AA: &amp;quot;Mukunda Modell (WMF) &amp;lt;mmodell@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key A8F734246D73B586: public key &amp;quot;atieno njira &amp;lt;pnjira@wikimedia.org&amp;gt;&amp;quot; imported&lt;br /&gt;
gpg: Total number processed: 9&lt;br /&gt;
gpg:               imported: 1&lt;br /&gt;
gpg:              unchanged: 8&lt;br /&gt;
You have mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # cp keys.txt /home/maltfield/&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # chown maltfield /home/maltfield/keys.txt &lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
logout&lt;br /&gt;
maltfield@hetzner3:~$ gpg --import keys.txt &lt;br /&gt;
gpg: key 73F146FECF9D333C: &amp;quot;Tim Starling &amp;lt;tstarling@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key C119E1A64D70938E: 9 signatures not checked due to missing keys&lt;br /&gt;
gpg: key C119E1A64D70938E: &amp;quot;Brion Vibber &amp;lt;brion@pobox.com&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key 9B69B3109D3BB7B0: &amp;quot;Sam Reed &amp;lt;reedy@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key 72BC1C5D23107F8A: &amp;quot;Chad Horohoe &amp;lt;chad@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key F6DAD285018FAC02: 12 signatures not checked due to missing keys&lt;br /&gt;
gpg: key F6DAD285018FAC02: &amp;quot;Tyler Cipriani &amp;lt;tcipriani@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key 26752EBB0D9E6218: &amp;quot;Amir Sarabadani &amp;lt;asarabadani@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key 361F943B15C08DD4: 7 signatures not checked due to missing keys&lt;br /&gt;
gpg: key 361F943B15C08DD4: &amp;quot;Brian Wolff (Bawolff) &amp;lt;bawolff@gmail.com&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key 131910E01605D9AA: &amp;quot;Mukunda Modell (WMF) &amp;lt;mmodell@wikimedia.org&amp;gt;&amp;quot; not changed&lt;br /&gt;
gpg: key A8F734246D73B586: public key &amp;quot;atieno njira &amp;lt;pnjira@wikimedia.org&amp;gt;&amp;quot; imported&lt;br /&gt;
gpg: Total number processed: 9&lt;br /&gt;
gpg:               imported: 1&lt;br /&gt;
gpg:              unchanged: 8&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s download the latest version of Mediawiki too&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # wget https://releases.wikimedia.org/mediawiki/1.43/mediawiki-1.43.0.tar.gz https://releases.wikimedia.org/mediawiki/1.43/mediawiki-1.43.0.tar.gz.sig&lt;br /&gt;
--2024-12-28 22:10:54--  https://releases.wikimedia.org/mediawiki/1.43/mediawiki-1.43.0.tar.gz&lt;br /&gt;
Resolving releases.wikimedia.org (releases.wikimedia.org)... 2a02:ec80:300:ed1a::1, 185.15.59.224&lt;br /&gt;
Connecting to releases.wikimedia.org (releases.wikimedia.org)|2a02:ec80:300:ed1a::1|:443... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 200 OK&lt;br /&gt;
Length: 91947888 (88M) [application/x-gzip]&lt;br /&gt;
Saving to: ‘mediawiki-1.43.0.tar.gz’&lt;br /&gt;
&lt;br /&gt;
mediawiki-1.43.0.tar.gz            100%[================================================================&amp;gt;]  87,69M  28,6MB/s    in 3,5s    &lt;br /&gt;
&lt;br /&gt;
2024-12-28 22:10:58 (25,1 MB/s) - ‘mediawiki-1.43.0.tar.gz’ saved [91947888/91947888]&lt;br /&gt;
&lt;br /&gt;
--2024-12-28 22:10:58--  https://releases.wikimedia.org/mediawiki/1.43/mediawiki-1.43.0.tar.gz.sig&lt;br /&gt;
Reusing existing connection to [releases.wikimedia.org]:443.&lt;br /&gt;
HTTP request sent, awaiting response... 200 OK&lt;br /&gt;
Length: 566 [application/pgp-signature]&lt;br /&gt;
Saving to: ‘mediawiki-1.43.0.tar.gz.sig’&lt;br /&gt;
&lt;br /&gt;
mediawiki-1.43.0.tar.gz.sig        100%[================================================================&amp;gt;]     566  --.-KB/s    in 0s      &lt;br /&gt;
&lt;br /&gt;
2024-12-28 22:10:58 (13,7 MB/s) - ‘mediawiki-1.43.0.tar.gz.sig’ saved [566/566]&lt;br /&gt;
&lt;br /&gt;
FINISHED --2024-12-28 22:10:58--&lt;br /&gt;
Total wall clock time: 4,1s&lt;br /&gt;
Downloaded: 2 files, 88M in 3,5s (25,1 MB/s)&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed that the hashes match our 3TOFU *and* the pgp signature is valid; great, this is trustworthy&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # sha256sum mediawiki-1.43.0.tar.gz*&lt;br /&gt;
56e0a7fe2ff78e50b9c87b3d589f2d8df5bcab0018e454b2a4a239c8586bd8ee  mediawiki-1.43.0.tar.gz&lt;br /&gt;
cee3d020f25e9b3f66eef214a343be466a1a313db31344ad2a5e87ab0183883e  mediawiki-1.43.0.tar.gz.sig&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # gpg --verify mediawiki-1.43.0.tar.gz.sig &lt;br /&gt;
gpg: assuming signed data in &#039;mediawiki-1.43.0.tar.gz&#039;&lt;br /&gt;
gpg: Signature made 2024-12-20T19:36:38 UTC&lt;br /&gt;
gpg:                using RSA key E059C034E7A430583C252F4AA8F734246D73B586&lt;br /&gt;
gpg: Good signature from &amp;quot;atieno njira &amp;lt;pnjira@wikimedia.org&amp;gt;&amp;quot; [unknown]&lt;br /&gt;
gpg: WARNING: This key is not certified with a trusted signature!&lt;br /&gt;
gpg:          There is no indication that the signature belongs to the owner.&lt;br /&gt;
Primary key fingerprint: E059 C034 E7A4 3058 3C25  2F4A A8F7 3424 6D73 B586&lt;br /&gt;
root@hetzner3 /var/tmp/mediawiki # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also downloaded the latest phpList release&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp # wget https://altushost-swe.dl.sourceforge.net/project/phplist/phplist/3.6.14/phplist-3.6.14.zip&lt;br /&gt;
--2024-12-28 22:13:27--  https://altushost-swe.dl.sourceforge.net/project/phplist/phplist/3.6.14/phplist-3.6.14.zip&lt;br /&gt;
Resolving altushost-swe.dl.sourceforge.net (altushost-swe.dl.sourceforge.net)... 79.142.76.130&lt;br /&gt;
Connecting to altushost-swe.dl.sourceforge.net (altushost-swe.dl.sourceforge.net)|79.142.76.130|:443... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 301 Moved Permanently&lt;br /&gt;
Location: https://downloads.sourceforge.net/project/phplist/phplist/3.6.14/phplist-3.6.14.zip [following]&lt;br /&gt;
--2024-12-28 22:13:27--  https://downloads.sourceforge.net/project/phplist/phplist/3.6.14/phplist-3.6.14.zip&lt;br /&gt;
Resolving downloads.sourceforge.net (downloads.sourceforge.net)... 2606:4700::6812:d95, 2606:4700::6812:c95, 104.18.12.149, ...&lt;br /&gt;
Connecting to downloads.sourceforge.net (downloads.sourceforge.net)|2606:4700::6812:d95|:443... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 302 Found&lt;br /&gt;
Location: https://kumisystems.dl.sourceforge.net/project/phplist/phplist/3.6.14/phplist-3.6.14.zip?viasf=1 [following]&lt;br /&gt;
--2024-12-28 22:13:27--  https://kumisystems.dl.sourceforge.net/project/phplist/phplist/3.6.14/phplist-3.6.14.zip?viasf=1&lt;br /&gt;
Resolving kumisystems.dl.sourceforge.net (kumisystems.dl.sourceforge.net)... 2a01:4f8:210:1057::2, 148.251.120.111&lt;br /&gt;
Connecting to kumisystems.dl.sourceforge.net (kumisystems.dl.sourceforge.net)|2a01:4f8:210:1057::2|:443... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 200 OK&lt;br /&gt;
Length: 29850924 (28M) [application/octet-stream]&lt;br /&gt;
Saving to: ‘phplist-3.6.14.zip’&lt;br /&gt;
&lt;br /&gt;
phplist-3.6.14.zip                 100%[================================================================&amp;gt;]  28,47M  76,2MB/s    in 0,4s&lt;br /&gt;
&lt;br /&gt;
2024-12-28 22:13:29 (76,2 MB/s) - ‘phplist-3.6.14.zip’ saved [29850924/29850924]&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I – oh shit. There&#039;s a difference!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp # sha256sum  phplist-3.6.14.zip&lt;br /&gt;
938e9bdb64d8c042a04192e1fca42814d906e715ec9c2726756425a1be7e0791  phplist-3.6.14.zip&lt;br /&gt;
root@hetzner3 /var/tmp # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I deleted it and downloaded it again. I got the same bad hash. This is crazy. I just did the third TOFU less than an hour ago. Did they change something?!?&lt;br /&gt;
# I opened a thread about it in their forums; I&#039;m not touching this release until we get clarification, jesus https://discuss.phplist.org/t/psa-release-hash-changed-publishing-infrastructure-comprimise/9927&lt;br /&gt;
# somehow it looks like I was downloading the old version? 3.6.15 is the latest version; not sure how that happened because it says 3.6.15 was released in April. Well that fucks-up our 3TOFU but still this changing hash is a concern.&lt;br /&gt;
# I&#039;m going to wait a few days before I touch phpList; let&#039;s see what the community has to say&lt;br /&gt;
# ...&lt;br /&gt;
# meanwhile, we got a response from the oshine support team; they gave me links to their google drive to download their plugins, but said they don&#039;t have download links for third party plugins. I asked for the slugs of those plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Can you please tell me the slugs of the other required plugins, so I can download them from wordpress.org&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&lt;br /&gt;
On 12/27/24 22:13, BrandExponents wrote:&lt;br /&gt;
&amp;gt; -----------------------------------------------------------&lt;br /&gt;
&amp;gt;   Hi Michael,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; We can only send you the direct download links of the plugins we own.&lt;br /&gt;
&amp;gt; The third party supporting plugins for Oshine theme can only be&lt;br /&gt;
&amp;gt; installed directly from your site&amp;amp;#039;s dashboard.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Here are the download links:&lt;br /&gt;
&amp;gt; Tatsu&lt;br /&gt;
&amp;gt; plugin: https://drive.google.com/file/d/1aaSZxC8AumyrFEoAd1bZega9rmI9ZMzW/view?usp=sharing&lt;br /&gt;
&amp;gt; Oshine&lt;br /&gt;
&amp;gt; Modules: https://drive.google.com/file/d/1oKJupmosWglpCfYDsRIWGoRV_cxSlrb6/view?usp=sharing&lt;br /&gt;
&amp;gt; Oshine&lt;br /&gt;
&amp;gt; Core: https://drive.google.com/file/d/1uGj9BgbMpNqnfJk6wE6goLq42vJwLDrD/view?usp=sharing&lt;br /&gt;
&amp;gt; [https://drive.google.com/file/d/1uGj9BgbMpNqnfJk6wE6goLq42vJwLDrD/view?usp=sharing]&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do let me know if you have any queries or concerns.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Best regards, &amp;amp;amp; Have a nice weekend!&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; -- &lt;br /&gt;
&amp;gt; Suman M., Tech Support Executive&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;m going to have to do a 3TOFU on these, but first let&#039;s figure out what they gave us and what we&#039;re still missing&lt;br /&gt;
# ugh, you can&#039;t even wget these shitty google drive URLs; this is going to be so annoying to verify with 3TOFU&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@host:~$ wget https://drive.google.com/file/d/1aaSZxC8AumyrFEoAd1bZega9rmI9ZMzW/view?usp=sharing&lt;br /&gt;
--2024-12-28 22:53:03--  https://drive.google.com/file/d/1aaSZxC8AumyrFEoAd1bZega9rmI9ZMzW/view?usp=sharing&lt;br /&gt;
Resolving drive.google.com (drive.google.com)... 142.250.186.174&lt;br /&gt;
Connecting to drive.google.com (drive.google.com)|142.250.186.174|:443... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 200 OK&lt;br /&gt;
Length: unspecified [text/html]&lt;br /&gt;
Saving to: ‘view?usp=sharing’&lt;br /&gt;
&lt;br /&gt;
view?usp=sharing             [    &amp;lt;=&amp;gt;                            ]  91.42K   231KB/s    in 0.4s    &lt;br /&gt;
&lt;br /&gt;
2024-12-28 22:53:20 (231 KB/s) - ‘view?usp=sharing’ saved [93616]&lt;br /&gt;
&lt;br /&gt;
user@host:~$&lt;br /&gt;
&lt;br /&gt;
user@host:~$ head -c 2048 view\?usp\=sharing &lt;br /&gt;
&amp;lt;!DOCTYPE html&amp;gt;&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;&amp;lt;script nonce=&amp;quot;IdpnSdj2tPFuOlj9hU9khQ&amp;quot;&amp;gt; window[&#039;_DRIVE_VIEWER_ctiming&#039;]={}; &amp;lt;/script&amp;gt;&amp;lt;meta name=&amp;quot;google&amp;quot; content=&amp;quot;notranslate&amp;quot;&amp;gt;&amp;lt;meta http-equiv=&amp;quot;X-UA-Compatible&amp;quot; content=&amp;quot;IE=edge;&amp;quot;&amp;gt;&amp;lt;style nonce=&amp;quot;o_Fo6LpKy3W204sfbq1iYA&amp;quot;&amp;gt;@font-face{font-family:&#039;Roboto&#039;;font-style:italic;font-weight:400;src:url(fonts.gstatic.com/s/roboto/v18/KFOkCnqEu92Fr1Mu51xIIzc.ttf)format(&#039;truetype&#039;);}@font-face{font-family:&#039;Roboto&#039;;font-style:normal;font-weight:300;src:url(fonts.gstatic.com/s/roboto/v18/KFOlCnqEu92Fr1MmSU5fBBc9.ttf)format(&#039;truetype&#039;);}@font-face{font-family:&#039;Roboto&#039;;font-style:normal;font-weight:400;src:url(fonts.gstatic.com/s/roboto/v18/KFOmCnqEu92Fr1Mu4mxP.ttf)format(&#039;truetype&#039;);}@font-face{font-family:&#039;Roboto&#039;;font-style:normal;font-weight:500;src:url(fonts.gstatic.com/s/roboto/v18/KFOlCnqEu92Fr1MmEU9fBBc9.ttf)format(&#039;truetype&#039;);}@font-face{font-family:&#039;Roboto&#039;;font-style:normal;font-weight:700;src:url(//fonts.gstatic.com/s/roboto/v18/KFOlCnqEu92Fr1MmWUlfBBc9.ttf)format(&#039;truetype&#039;);}&amp;lt;/style&amp;gt;&amp;lt;meta name=&amp;quot;referrer&amp;quot; content=&amp;quot;origin&amp;quot;&amp;gt;&amp;lt;title&amp;gt;tatsu-3.5.3.zip - Google Drive&amp;lt;/title&amp;gt;&amp;lt;script nonce=&amp;quot;IdpnSdj2tPFuOlj9hU9khQ&amp;quot;&amp;gt;&lt;br /&gt;
		  window[&#039;_DRIVE_VIEWER_IVIS&#039;] = document.visibilityState;&lt;br /&gt;
		&amp;lt;/script&amp;gt;&amp;lt;meta property=&amp;quot;og:title&amp;quot; content=&amp;quot;tatsu-3.5.3.zip&amp;quot;&amp;gt;&amp;lt;meta property=&amp;quot;og:type&amp;quot; content=&amp;quot;article&amp;quot;&amp;gt;&amp;lt;meta property=&amp;quot;og:site_name&amp;quot; content=&amp;quot;Google Docs&amp;quot;&amp;gt;&amp;lt;meta property=&amp;quot;og:url&amp;quot; content=&amp;quot;https://drive.google.com/file/d/1aaSZxC8AumyrFEoAd1bZega9rmI9ZMzW/view?usp=sharing&amp;amp;amp;usp=embed_facebook&amp;quot;&amp;gt;&amp;lt;link rel=&amp;quot;shortcut icon&amp;quot; href=&amp;quot;https://ssl.gstatic.com/images/branding/product/1x/drive_2020q4_32dp.png&amp;quot;&amp;gt;&amp;lt;script nonce=&amp;quot;IdpnSdj2tPFuOlj9hU9khQ&amp;quot;&amp;gt; window[&#039;_DRIVE_VIEWER_ctiming&#039;][&#039;cls&#039;]=performance.now(); &amp;lt;/script&amp;gt;&amp;lt;link rel=&amp;quot;stylesheet&amp;quot; href=&amp;quot;https://fonts.googleapis.com/css?family=Google+Sans_old:300,400,500,700&amp;quot; nonce=&amp;quot;o_Fo6LpKy3W204sfbq1iYA&amp;quot;&amp;gt;&amp;lt;link rel=&amp;quot;stylesheet&amp;quot; href=&amp;quot;https://www.gstatic.com/_/apps-fileview/_/ss/k=apps-fileview.v.ovqSItYnX0g.L.W.O/am=MBg/d=0/rs=AO0039tz5VrQ5sjGTSjn86HXfczh2ZNxIg&amp;quot; data-id=&amp;quot;_cl&amp;quot; noncuser@host:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I loaded the first one in Tor Browser and downloaded it manually; here&#039;s what it gave me&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ ls&lt;br /&gt;
tatsu-3.5.3.zip&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this is weird; there is no style.css file in this plugin&#039;s dir&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ unzip tatsu-3.5.3.zip &lt;br /&gt;
...&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ &lt;br /&gt;
&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ cd tatsu/&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads/tatsu$ ls&lt;br /&gt;
admin         changelog.txt  includes   LICENSE.txt            README.txt&lt;br /&gt;
builder       css            index.php  plugin-update-checker  tatsu.php&lt;br /&gt;
changelog.md  img            languages  public                 uninstall.php&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads/tatsu$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# best I found is the &#039;tatsu.php&#039; file, but there&#039;s no slug given&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads/tatsu$ head -n15 tatsu.php &lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
/**&lt;br /&gt;
 * Plugin Name:       Tatsu &lt;br /&gt;
 * Plugin URI:        http://www.brandexponents.com&lt;br /&gt;
 * Description:       A Powerful and Elegant Live Front End Website Builder for Wordpress.&lt;br /&gt;
 * Version:           3.5.3&lt;br /&gt;
 * Author:            Brand Exponents&lt;br /&gt;
 * Author URI:        http://www.brandexponents.com&lt;br /&gt;
 * License:           GPL-2.0+&lt;br /&gt;
 * License URI:       http://www.gnu.org/licenses/gpl-2.0.txt&lt;br /&gt;
 * Text Domain:       tatsu&lt;br /&gt;
 * Domain Path:       /languages&lt;br /&gt;
 */&lt;br /&gt;
&lt;br /&gt;
// If this file is called directly, abort.&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads/tatsu$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this is frustrating; I don&#039;t like ambiguity. let&#039;s try the second link; it gave us this &#039;oshine-modules-3.3.8.zip&#039; file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ ls&lt;br /&gt;
oshine-modules-3.3.8.zip  tatsu  tatsu-3.5.3.zip&lt;br /&gt;
user@host:~/.tb/tor-browser/Browser/Downloads$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this one also doesn&#039;t have a style.css file. I guess the slug in this case is the folder name and the file name before the .php&lt;br /&gt;
# so now we have &#039;tatsu&#039; and &#039;oshine-modules&#039;&lt;br /&gt;
# the third link get us &#039;oshine-core&#039;&lt;br /&gt;
# reminder: here&#039;s the list of plugins that oshine complained we need to install&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
BE Portfolio Post Type, Meta Box Conditional Logic, Meta Box Show Hide, Meta Box Tabs, Oshine Core, Oshine Modules and Tatsu.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I&#039;d say we got the last 3. Now we need to find these:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
BE Portfolio Post Type, Meta Box Conditional Logic, Meta Box Show Hide, Meta Box Tabs&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I did a search on wordpress.org for the above plugins, but it&#039;s super ambiguous (and dangerous) to try to match these names to other plugins. I could guess, but I have no idea if it&#039;s the right plugin.&lt;br /&gt;
# For example, there&#039;s a plugin just called &amp;quot;Meta Box&amp;quot; – but wtf are these three other plugins?&lt;br /&gt;
# I sent another email asking for the slugs&lt;br /&gt;
# also, I read in their own documentation that the release that we downloaded from themeforest *should* include these paid plugins for free, but it didn&#039;t have the plugins dir.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
I&#039;m still waiting for the slugs for the remaining required plugins:&lt;br /&gt;
&lt;br /&gt;
 1. BE Portfolio Post Type&lt;br /&gt;
 2. Meta Box Conditional Logic&lt;br /&gt;
 3. Meta Box Show Hide&lt;br /&gt;
 4. Meta Box Tabs&lt;br /&gt;
&lt;br /&gt;
These &amp;quot;human readable&amp;quot; names for the plugins are not very useful for finding the actual plugins on wordpress.org/plugins/&lt;br /&gt;
&lt;br /&gt;
Also, your own documentation says that the plugins should have been included in your theme&#039;s .zip that I downloaded from themeforest&lt;br /&gt;
&lt;br /&gt;
 * https://oshine-knowledgebase.brandexponents.com/knowledge-base/install-activate-required-plugins/&lt;br /&gt;
&lt;br /&gt;
&amp;gt; NOTE: When you download the latest theme package from the downloads&lt;br /&gt;
&amp;gt; of your themeforest account, you will find the latest copy of the&lt;br /&gt;
&amp;gt; following plugins included in a folder named “plugins” after&lt;br /&gt;
&amp;gt; extracting the the zip.&lt;br /&gt;
&lt;br /&gt;
Why were the plugins abesnt? Can you please fix your latest release on themeforest to include the required plugins, as advertised?&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# regarding oswh, Catarina responded saying the site was originally setup by Simone Cicero, and asked Marcin (CCd) if he wanted to buy another license.&lt;br /&gt;
# I don&#039;t think we should buy it if it&#039;s not actively updated, and the fact that their sales team didn&#039;t respond to my email that I sent 3 days ago isn&#039;t a great sign.&lt;br /&gt;
# Catarina also said that I can try to find some free wp theme for events&lt;br /&gt;
# I guess let&#039;s wait 2 weeks on oswh to see if Marcin or the Eventor author replies. If they don&#039;t, I&#039;ll just do some research and find/install a few FOSS event wordpress themes to install and replace this theme&lt;br /&gt;
# ...&lt;br /&gt;
# ok, back to the wiki&lt;br /&gt;
# last time I finished setting up the vhost, but the web browser greeted me with an error telling me to install the mbstring php extension&lt;br /&gt;
# I installed `php-mbstring` from apt and added it to ansible&lt;br /&gt;
# now when I referesh the web browser, I get a blank page&lt;br /&gt;
# error logs show this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Sun Dec 29 03:30:00.336742 2024] [proxy_fcgi:error] [pid 447104:tid 447104] [client 127.0.0.1:0] AH01071: Got error &#039;PHP message: PHP Fatal error:  Uncaught Error: Undefined constant &amp;quot;NS_IMAGE&amp;quot; in /var/www/html/wiki.opensourceecology.org/LocalSettings.php:58\nStack trace:\n#0 /var/www/html/wiki.opensourceecology.org/htdocs/LocalSettings.php(8): require_once()\n#1 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Setup.php(143): require_once(&#039;...&#039;)\n#2 /var/www/html/wiki.opensourceecology.org/htdocs/includes/WebStart.php(89): require_once(&#039;...&#039;)\n#3 /var/www/html/wiki.opensourceecology.org/htdocs/index.php(44): require(&#039;...&#039;)\n#4 {main}\n  thrown in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 58; PHP message: PHP Fatal error:  Uncaught Error: Class &amp;quot;WebRequest&amp;quot; not found in /var/www/html/wiki.opensourceecology.org/htdocs/includes/HeaderCallback.php:63\nStack trace:\n#0 [internal function]: MediaWiki\\HeaderCallback::callback()\n#1 {main}\n  thrown in /var/www/html/wiki.opensourceecology.org/htdocs/includes/HeaderCallback.php on line 63&#039;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it&#039;s complaining about this line in particular&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$wgNamespacesWithSubpages[NS_IMAGE] = true;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but wait, I realized the upgrade guide says I&#039;m supposed to run an upgrade command, which I never did; let&#039;s do that.&lt;br /&gt;
# oh fuck, well that just fails with the same error lol&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # php htdocs/maintenance/update.php&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_USER_AGENT&amp;quot; in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 5&lt;br /&gt;
PHP Fatal error:  Uncaught Error: Undefined constant &amp;quot;NS_IMAGE&amp;quot; in /var/www/html/wiki.opensourceecology.org/LocalSettings.php:58&lt;br /&gt;
Stack trace:&lt;br /&gt;
#0 /var/www/html/wiki.opensourceecology.org/htdocs/LocalSettings.php(8): require_once()&lt;br /&gt;
#1 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Setup.php(143): require_once(&#039;...&#039;)&lt;br /&gt;
#2 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/doMaintenance.php(91): require_once(&#039;...&#039;)&lt;br /&gt;
#3 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/update.php(253): require_once(&#039;...&#039;)&lt;br /&gt;
#4 {main}&lt;br /&gt;
  thrown in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 58&lt;br /&gt;
PHP Fatal error:  Uncaught Error: Class &amp;quot;WebRequest&amp;quot; not found in /var/www/html/wiki.opensourceecology.org/htdocs/includes/HeaderCallback.php:63&lt;br /&gt;
Stack trace:&lt;br /&gt;
#0 [internal function]: MediaWiki\HeaderCallback::callback()&lt;br /&gt;
#1 {main}&lt;br /&gt;
  thrown in /var/www/html/wiki.opensourceecology.org/htdocs/includes/HeaderCallback.php on line 63&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the block that we have in our LocalSettings.php file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# Enable subpages in the main namespace&lt;br /&gt;
$wgNamespacesWithSubpages[NS_MAIN] = true;&lt;br /&gt;
$wgNamespacesWithSubpages[NS_TEMPLATE] = true;&lt;br /&gt;
$wgNamespacesWithSubpages[NS_CATEGORY] = true;&lt;br /&gt;
$wgNamespacesWithSubpages[NS_MEDIA] = true;&lt;br /&gt;
$wgNamespacesWithSubpages[NS_IMAGE] = true;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it looks like this is how we setup subpages in Mediawiki. I found documentation on this setting variable here https://www.mediawiki.org/wiki/Manual:$wgNamespacesWithSubpages#Enabling-for-a-namespace&lt;br /&gt;
# but the syntax is a bit different in those docs, it says to use something like this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$wgNamespacesWithSubpages = [&lt;br /&gt;
	NS_TALK =&amp;gt; true,&lt;br /&gt;
	NS_USER =&amp;gt; true,&lt;br /&gt;
	NS_USER_TALK =&amp;gt; true,&lt;br /&gt;
	NS_PROJECT =&amp;gt; true,&lt;br /&gt;
	NS_PROJECT_TALK =&amp;gt; true,&lt;br /&gt;
	NS_FILE_TALK =&amp;gt; true,&lt;br /&gt;
	NS_MEDIAWIKI =&amp;gt; true,&lt;br /&gt;
	NS_MEDIAWIKI_TALK =&amp;gt; true,&lt;br /&gt;
	NS_TEMPLATE =&amp;gt; true,&lt;br /&gt;
	NS_TEMPLATE_TALK =&amp;gt; true,&lt;br /&gt;
	NS_HELP =&amp;gt; true,&lt;br /&gt;
	NS_HELP_TALK =&amp;gt; true,&lt;br /&gt;
	NS_CATEGORY_TALK =&amp;gt; true&lt;br /&gt;
];&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so I changed our block to match this syntax&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # cp LocalSettings.php LocalSettings.20241228.php&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # vim LocalSettings.php&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # diff LocalSettings.20241228.php LocalSettings.php &lt;br /&gt;
54,58c54,60&lt;br /&gt;
&amp;lt; $wgNamespacesWithSubpages[NS_MAIN] = true;&lt;br /&gt;
&amp;lt; $wgNamespacesWithSubpages[NS_TEMPLATE] = true;&lt;br /&gt;
&amp;lt; $wgNamespacesWithSubpages[NS_CATEGORY] = true;&lt;br /&gt;
&amp;lt; $wgNamespacesWithSubpages[NS_MEDIA] = true;&lt;br /&gt;
&amp;lt; $wgNamespacesWithSubpages[NS_IMAGE] = true;&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; $wgNamespacesWithSubpages = [&lt;br /&gt;
&amp;gt;       NS_MAIN =&amp;gt; true,&lt;br /&gt;
&amp;gt;       NS_TEMPLATE =&amp;gt; true,&lt;br /&gt;
&amp;gt;       NS_CATEGORY =&amp;gt; true,&lt;br /&gt;
&amp;gt;       NS_MEDIA =&amp;gt; true,&lt;br /&gt;
&amp;gt;       NS_IMAGE =&amp;gt; true&lt;br /&gt;
&amp;gt; ];&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# unfortunately, I got the same error. I ended-up just deleting the line with &#039;NS_IMAGE&#039;&lt;br /&gt;
# now I got a different error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # php htdocs/maintenance/update.php&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_USER_AGENT&amp;quot; in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 5&lt;br /&gt;
PHP Warning:  require_once(/var/www/html/wiki.opensourceecology.org/htdocs/extensions/CategoryTree/CategoryTree.php): Failed to open stream: No such file or directory in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 62&lt;br /&gt;
PHP Fatal error:  Uncaught Error: Failed opening required &#039;/var/www/html/wiki.opensourceecology.org/htdocs/extensions/CategoryTree/CategoryTree.php&#039; (include_path=&#039;/var/www/html/wiki.opensourceecology.org/htdocs:/var/www/html/wiki.opensourceecology.org/htdocs/includes:/var/www/html/wiki.opensourceecology.org/htdocs/languages:/var/www/html/wiki.opensourceecology.org/htdocs/extensions/OpenID:/var/www/html/wiki.opensourceecology.org/htdocs/vendor/pear/console_getopt:/var/www/html/wiki.opensourceecology.org/htdocs/vendor/pear/mail:/var/www/html/wiki.opensourceecology.org/htdocs/vendor/pear/mail_mime:/var/www/html/wiki.opensourceecology.org/htdocs/vendor/pear/net_smtp:/var/www/html/wiki.opensourceecology.org/htdocs/vendor/pear/net_socket:/var/www/html/wiki.opensourceecology.org/htdocs/vendor/pear/pear-core-minimal/src:/var/www/html/wiki.opensourceecology.org/htdocs/vendor/pear/pear_exception:.:/usr/share/php&#039;) in /var/www/html/wiki.opensourceecology.org/LocalSettings.php:62&lt;br /&gt;
Stack trace:&lt;br /&gt;
#0 /var/www/html/wiki.opensourceecology.org/htdocs/LocalSettings.php(8): require_once()&lt;br /&gt;
#1 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Setup.php(143): require_once(&#039;...&#039;)&lt;br /&gt;
#2 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/doMaintenance.php(91): require_once(&#039;...&#039;)&lt;br /&gt;
#3 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/update.php(253): require_once(&#039;...&#039;)&lt;br /&gt;
#4 {main}&lt;br /&gt;
  thrown in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 62&lt;br /&gt;
PHP Fatal error:  Uncaught Error: Class &amp;quot;WebRequest&amp;quot; not found in /var/www/html/wiki.opensourceecology.org/htdocs/includes/HeaderCallback.php:63&lt;br /&gt;
Stack trace:&lt;br /&gt;
#0 [internal function]: MediaWiki\HeaderCallback::callback()&lt;br /&gt;
#1 {main}&lt;br /&gt;
  thrown in /var/www/html/wiki.opensourceecology.org/htdocs/includes/HeaderCallback.php on line 63&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I had to comment-out the line including the CategroyTree extension&lt;br /&gt;
# next error: I had to comment-out the line including the CologneBlue skin&lt;br /&gt;
# then the Modern skin, then the MonoBook skin, then the Vector skin&lt;br /&gt;
# then I had to comment-out the include for the ConfirmAccount extension, then the ConfirmEdit extension, then the Widgets extension&lt;br /&gt;
# while I&#039;m at it, I sent an email to Marcin asking if there&#039;s any new extensions he&#039;d want me to install, as that&#039;s important to know soon&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
Are there any MediaWiki extensions that you&#039;ve wanted to add in the past years?&lt;br /&gt;
&lt;br /&gt;
I&#039;ve started working on the upgrade process of the OSE wiki. Part of this includes downloading updates to your extensions, and installing new extensions that you may want.&lt;br /&gt;
&lt;br /&gt;
Please let me know if there&#039;s any new MediaWiki extensions that you want on hetzner3.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# next error: I had to comment-out the include of the ParserFunctions extension, and the UserMerge extension&lt;br /&gt;
# oh, finally, something new; it says I&#039;m missing some $wgServer var?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # php htdocs/maintenance/update.php &lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_USER_AGENT&amp;quot; in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 5&lt;br /&gt;
$wgServer must be set in LocalSettings.php. See &amp;lt;a href=&amp;quot;https://www.mediawiki.org/wiki/Manual:$wgServer&amp;quot;&amp;gt;https://www.mediawiki.org/wiki/Manual:$wgServer&amp;lt;/a&amp;gt;.root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like we just had it commented-out for some reason; I&#039;ll uncomment it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # grep wgServer LocalSettings.php &lt;br /&gt;
#$wgServer = &amp;quot;http://opensourceecology.org&amp;quot;;&lt;br /&gt;
#$wgScriptPath       = &amp;quot;$wgServer/w&amp;quot;;&lt;br /&gt;
  global $wgServer;&lt;br /&gt;
	$url = $wgServer . $url;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, it&#039;s doing something now! It flew by too fast for me to catch it all, but here&#039;s some snippets&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # php htdocs/maintenance/update.php&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_USER_AGENT&amp;quot; in /var/www/html/wiki.opensourceecology.org/LocalSettings.php on line 5&lt;br /&gt;
MediaWiki 1.35.0 Updater&lt;br /&gt;
&lt;br /&gt;
Your composer.lock file is up to date with current dependencies!&lt;br /&gt;
Going to run database updates for osewiki_db-wiki_&lt;br /&gt;
Depending on the size of your database this may take a while!&lt;br /&gt;
Abort with control-c in the next five seconds (skip this countdown with --quick) ... 0&lt;br /&gt;
...have ipb_id field in ipblocks table.&lt;br /&gt;
...have ipb_expiry field in ipblocks table.&lt;br /&gt;
...already have interwiki table&lt;br /&gt;
...indexes seem up to 20031107 standards.&lt;br /&gt;
...have rc_type field in recentchanges table.&lt;br /&gt;
...index new_name_timestamp already set on recentchanges table.&lt;br /&gt;
...have user_real_name field in user table.&lt;br /&gt;
...querycache table already exists.&lt;br /&gt;
...objectcache table already exists.&lt;br /&gt;
...categorylinks table already exists.&lt;br /&gt;
...have pagelinks; skipping old links table updates&lt;br /&gt;
...il_from OK&lt;br /&gt;
...have rc_ip field in recentchanges table.&lt;br /&gt;
...index PRIMARY already set on image table.&lt;br /&gt;
...have rc_id field in recentchanges table.&lt;br /&gt;
...have rc_patrolled field in recentchanges table.&lt;br /&gt;
...logging table already exists.&lt;br /&gt;
...have user_token field in user table.&lt;br /&gt;
...have wl_notificationtimestamp field in watchlist table.&lt;br /&gt;
...watchlist talk page rows already present.&lt;br /&gt;
...user table does not contain user_emailauthenticationtimestamp field.&lt;br /&gt;
...page table already exists.&lt;br /&gt;
...have log_params field in logging table.&lt;br /&gt;
...logging table has correct log_title encoding.&lt;br /&gt;
...have ar_rev_id field in archive table.&lt;br /&gt;
...have page_len field in page table.&lt;br /&gt;
...revision table does not contain inverse_timestamp field.&lt;br /&gt;
...have rev_text_id field in revision table.&lt;br /&gt;
...have rev_deleted field in revision table.&lt;br /&gt;
...have img_width field in image table.&lt;br /&gt;
...have img_metadata field in image table.&lt;br /&gt;
...have user_email_token field in user table.&lt;br /&gt;
...have ar_text_id field in archive table.&lt;br /&gt;
...page_namespace is already a full int (int(11)).&lt;br /&gt;
...ar_namespace is already a full int (int(11)).&lt;br /&gt;
...rc_namespace is already a full int (int(11)).&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
... 3223&lt;br /&gt;
Completed migration, updated 3183 row(s) with 725 new comment(s)&lt;br /&gt;
Beginning migration of ipblocks.ipb_reason to ipblocks.ipb_reason_id&lt;br /&gt;
... 118&lt;br /&gt;
... 282&lt;br /&gt;
... 438&lt;br /&gt;
... 507&lt;br /&gt;
Completed migration, updated 353 row(s) with 12 new comment(s)&lt;br /&gt;
Beginning migration of image.img_description to image.img_description_id&lt;br /&gt;
... 021SketchPortfolio_UJ_MartinBolton.JPG&lt;br /&gt;
... 04_precut2x6x9_1.png&lt;br /&gt;
... 080_FlexibleVinylflashing-10x52_5inch-withcut_3.png&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
... rev_id=302843&lt;br /&gt;
Completed migration, updated 299612 row(s) with 261 new actor(s), 0 error(s)&lt;br /&gt;
Beginning migration of archive.ar_user and archive.ar_user_text to archive.ar_actor&lt;br /&gt;
... ar_id=100&lt;br /&gt;
... ar_id=200&lt;br /&gt;
... ar_id=300&lt;br /&gt;
... ar_id=400&lt;br /&gt;
... ar_id=500&lt;br /&gt;
... ar_id=600&lt;br /&gt;
... ar_id=700&lt;br /&gt;
... ar_id=800&lt;br /&gt;
... ar_id=900&lt;br /&gt;
... ar_id=1000&lt;br /&gt;
... ar_id=1100&lt;br /&gt;
... ar_id=1200&lt;br /&gt;
... ar_id=1300&lt;br /&gt;
... ar_id=1400&lt;br /&gt;
... ar_id=1500&lt;br /&gt;
... ar_id=1600&lt;br /&gt;
... ar_id=1700&lt;br /&gt;
... ar_id=1800&lt;br /&gt;
... ar_id=1900&lt;br /&gt;
... ar_id=2001&lt;br /&gt;
... ar_id=2101&lt;br /&gt;
... ar_id=2201&lt;br /&gt;
... ar_id=2309&lt;br /&gt;
... ar_id=2426&lt;br /&gt;
... ar_id=2531&lt;br /&gt;
... ar_id=2634&lt;br /&gt;
... ar_id=2740&lt;br /&gt;
... ar_id=2840&lt;br /&gt;
... ar_id=2940&lt;br /&gt;
... ar_id=3040&lt;br /&gt;
... ar_id=3140&lt;br /&gt;
... ar_id=3223&lt;br /&gt;
Completed migration, updated 3183 row(s) with 3 new actor(s), 0 error(s)&lt;br /&gt;
Beginning migration of ipblocks.ipb_by and ipblocks.ipb_by_text to ipblocks.ipb_by_actor&lt;br /&gt;
... ipb_id=118&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
Modifying rev_text_id field of table revision ...done.   &lt;br /&gt;
Modifying table site_stats ...done.&lt;br /&gt;
Populating ar_rev_id.&lt;br /&gt;
Populating ar_rev_id...&lt;br /&gt;
MediaWiki\Revision\RevisionAccessException from line 1296 of /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore&lt;br /&gt;
.php: Main slot of revision not found in database. See T212428.&lt;br /&gt;
#0 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1224): MediaWiki\Revision\RevisionStore-&amp;gt;constructSlo&lt;br /&gt;
tRecords()&lt;br /&gt;
#1 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1217): MediaWiki\Revision\RevisionStore-&amp;gt;loadSlotRecords()&lt;br /&gt;
#2 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1335): MediaWiki\Revision\RevisionStore-&amp;gt;loadSlotRecords()&lt;br /&gt;
#3 [internal function]: MediaWiki\Revision\RevisionStore-&amp;gt;MediaWiki\Revision\{closure}()&lt;br /&gt;
#4 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionSlots.php(175): call_user_func()&lt;br /&gt;
#5 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionSlots.php(117): MediaWiki\Revision\RevisionSlots-&amp;gt;getSlots()&lt;br /&gt;
#6 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionRecord.php(192): MediaWiki\Revision\RevisionSlots-&amp;gt;getSlot()&lt;br /&gt;
#7 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\RevisionRecord-&amp;gt;getSlot()&lt;br /&gt;
#8 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1185): MediaWiki\Revision\RevisionRecord-&amp;gt;getContent()&lt;br /&gt;
#9 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/wancache/WANObjectCache.php(1528): MessageCache-&amp;gt;{closure}()&lt;br /&gt;
#10 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/wancache/WANObjectCache.php(1376): WANObjectCache-&amp;gt;fetchOrRegenerate()&lt;br /&gt;
#11 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1167): WANObjectCache-&amp;gt;getWithSetCallback()&lt;br /&gt;
#12 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/BagOStuff.php(149): MessageCache-&amp;gt;{closure}()&lt;br /&gt;
#13 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1163): BagOStuff-&amp;gt;getWithSetCallback()&lt;br /&gt;
#14 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1106): MessageCache-&amp;gt;loadCachedMessagePageEntry()&lt;br /&gt;
#15 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1016): MessageCache-&amp;gt;getMsgFromNamespace()&lt;br /&gt;
#16 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(988): MessageCache-&amp;gt;getMessageForLang()&lt;br /&gt;
#17 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(927): MessageCache-&amp;gt;getMessageFromFallbackChain()&lt;br /&gt;
#18 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(1304): MessageCache-&amp;gt;get()&lt;br /&gt;
#19 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(862): Message-&amp;gt;fetchMessage()&lt;br /&gt;
#20 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(954): Message-&amp;gt;toString()&lt;br /&gt;
#21 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Title.php(661): Message-&amp;gt;text()&lt;br /&gt;
#22 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/populateArchiveRevId.php(213): Title::newMainPage()&lt;br /&gt;
#23 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/populateArchiveRevId.php(118): PopulateArchiveRevId::makeDummyRevisionRow()&lt;br /&gt;
#24 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/populateArchiveRevId.php(63): PopulateArchiveRevId::checkMysqlAutoIncrementBug()&lt;br /&gt;
#25 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/includes/LoggedUpdateMaintenance.php(45): PopulateArchiveRevId-&amp;gt;doDBUpdates()&lt;br /&gt;
#26 /var/www/html/wiki.opensourceecology.org/htdocs/includes/installer/DatabaseUpdater.php(1377): LoggedUpdateMaintenance-&amp;gt;execute()&lt;br /&gt;
#27 /var/www/html/wiki.opensourceecology.org/htdocs/includes/installer/DatabaseUpdater.php(512): DatabaseUpdater-&amp;gt;populateArchiveRevId()&lt;br /&gt;
#28 /var/www/html/wiki.opensourceecology.org/htdocs/includes/installer/DatabaseUpdater.php(475): DatabaseUpdater-&amp;gt;runUpdates()&lt;br /&gt;
#29 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/update.php(181): DatabaseUpdater-&amp;gt;doUpdates()&lt;br /&gt;
#30 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/doMaintenance.php(107): UpdateMediaWiki-&amp;gt;execute()&lt;br /&gt;
#31 /var/www/html/wiki.opensourceecology.org/htdocs/maintenance/update.php(253): require_once(&#039;...&#039;)&lt;br /&gt;
#32 {main}&lt;br /&gt;
You have mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# while that ran, I tried to look into the download URLs for the extensions (eventually needed for 3TOFU)&lt;br /&gt;
# it looks like some of these mediawiki extensions are quite – odd. They generate a temp extension zip at the time of download? That&#039;s terrible for supply chain security.&lt;br /&gt;
# I looked at ConfirmEdit; this is the official page https://www.mediawiki.org/wiki/Extension:ConfirmEdit&lt;br /&gt;
# if you click the &amp;quot;download extension&amp;quot; link, it takes you to this page which asks you which version of mediawiki you&#039;re using https://www.mediawiki.org/wiki/Special:ExtensionDistributor/ConfirmEdit&lt;br /&gt;
# I select v1.42 (our latest version), and click the &amp;quot;continue&amp;quot; button&lt;br /&gt;
# that brings me to this page, which says https://www.mediawiki.org/wiki/Special:ExtensionDistributor?extdistname=ConfirmEdit&amp;amp;extdistversion=REL1_42&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A snapshot of version 6673c5d of the ConfirmEdit extension for MediaWiki REL1_42 has been created. Your download should start automatically in 5 seconds.&lt;br /&gt;
&lt;br /&gt;
The URL for this snapshot is:&lt;br /&gt;
&lt;br /&gt;
	https://extdist.wmflabs.org/dist/extensions/ConfirmEdit-REL1_42-6673c5d.tar.gz&lt;br /&gt;
&lt;br /&gt;
You can use this link to download the extension on any computer, but please do not bookmark it, since its contents will not be updated, and it may be deleted at a later date. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# also, this is a bad example, because ConfirmEdit is one of those shipped with MediaWiki core now.&lt;br /&gt;
# from yesterday, here&#039;s the extensions we need&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Renameuser&lt;br /&gt;
UserMerge&lt;br /&gt;
ConfirmAccount&lt;br /&gt;
Widgets&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# wait, shit, no, Renameuser is also part of core now https://www.mediawiki.org/wiki/Manual:Renameuser&lt;br /&gt;
# and so is CategoryTree https://www.mediawiki.org/wiki/Extension:CategoryTree&lt;br /&gt;
# and so is Parser Functions https://www.mediawiki.org/wiki/Extension:ParserFunctions&lt;br /&gt;
# alright, UserMerge is – in fact – an extension https://www.mediawiki.org/wiki/Extension:UserMerge&lt;br /&gt;
# same thing as before; it generates a snapshot from the git repo :( https://www.mediawiki.org/wiki/Special:ExtensionDistributor?extdistname=UserMerge&amp;amp;extdistversion=REL1_42&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
A snapshot of version 41759d0 of the UserMerge extension for MediaWiki REL1_42 has been created. Your download should start automatically in 5 seconds.&lt;br /&gt;
&lt;br /&gt;
The URL for this snapshot is:&lt;br /&gt;
&lt;br /&gt;
	https://extdist.wmflabs.org/dist/extensions/UserMerge-REL1_42-41759d0.tar.gz&lt;br /&gt;
&lt;br /&gt;
You can use this link to download the extension on any computer, but please do not bookmark it, since its contents will not be updated, and it may be deleted at a later date. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# alright, well, here&#039;s the actual repo https://gerrit.wikimedia.org/g/mediawiki/extensions/UserMerge&lt;br /&gt;
## it shows the branches on the left; looks like &#039;REL1_42&#039; is a branch for MediaWiki v1.42 https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/UserMerge/+/refs/heads/REL1_42&lt;br /&gt;
## the latest commit in ^ that branch is 41759d0c61377074d159f7d84130a095822bc7a3, the first 7 characters of which match the filename above &#039;41759d0&#039;&lt;br /&gt;
## here&#039;s the log for the REL1_42 branch; it shows it was last updated 5 weeks ago. looks like updates are infrequent enough that we might be able to use that tarball for a 3TOFU, even though they say it&#039;s ephemeral and shouldn&#039;t be bookmarked. I think it&#039;ll work https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/UserMerge/+log/refs/heads/REL1_42&lt;br /&gt;
# so here&#039;s the URLs to the extensions we need&lt;br /&gt;
## https://www.mediawiki.org/wiki/Extension:UserMerge&lt;br /&gt;
## https://www.mediawiki.org/wiki/Extension:ConfirmAccount&lt;br /&gt;
## https://www.mediawiki.org/wiki/Extension:Widgets&lt;br /&gt;
# and here&#039;s some releases that we can 3TOFU&lt;br /&gt;
## https://extdist.wmflabs.org/dist/extensions/UserMerge-REL1_42-41759d0.tar.gz&lt;br /&gt;
## https://extdist.wmflabs.org/dist/extensions/ConfirmAccount-REL1_42-7405319.tar.gz&lt;br /&gt;
## https://extdist.wmflabs.org/dist/extensions/Widgets-REL1_42-17dbd92.tar.gz&lt;br /&gt;
# I&#039;m not sure if we should actually be putting these in-place for the intermittent upgrade from MediaWiki v1.30.0 to v1.35&lt;br /&gt;
# to be safe, here&#039;s the version of the extensions for v1.35&lt;br /&gt;
## just kidding, there is no option to download the extension for anything older than v1.39. Well, shit, I guess we&#039;ll just have to install the extensions after the second upgrade to v1.42 :shrug:&lt;br /&gt;
# ok, the update finally finished. I should have prepended it with `time`, but I think that took like half an hour&lt;br /&gt;
# after it ran, I tried loading the website; again I just get a blank white page&lt;br /&gt;
# the error logs complain about a call to ini_set; this reminds me of the wordpress bug. guess I have to open one for mediawiki too :/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
==&amp;gt; wiki.opensourceecology.org/error.log &amp;lt;==&lt;br /&gt;
[Sun Dec 29 04:39:22.406293 2024] [proxy_fcgi:error] [pid 447105:tid 447105] [client 127.0.0.1:0] AH01071: Got error &#039;PHP message: PHP Fatal error:  Uncaught Error: Call to undefined function ini_set() in /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/APCUBagOStuff.php:61\nStack trace:\n#0 /var/www/html/wiki.opensourceecology.org/htdocs/includes/objectcache/ObjectCache.php(308): APCUBagOStuff-&amp;gt;__construct()\n#1 /var/www/html/wiki.opensourceecology.org/htdocs/includes/registration/ExtensionRegistry.php(187): ObjectCache::makeLocalServerCache()\n#2 /var/www/html/wiki.opensourceecology.org/htdocs/includes/registration/ExtensionRegistry.php(224): ExtensionRegistry-&amp;gt;getCache()\n#3 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Setup.php(161): ExtensionRegistry-&amp;gt;loadFromQueue()\n#4 /var/www/html/wiki.opensourceecology.org/htdocs/includes/WebStart.php(89): require_once(&#039;...&#039;)\n#5 /var/www/html/wiki.opensourceecology.org/htdocs/index.php(44): require(&#039;...&#039;)\n#6 {main}\n  thrown in /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/APCUBagOStuff.php on line 61; PHP message: PHP Fatal error:  Uncaught Error: Class &amp;quot;WebRequest&amp;quot; not found in /var/www/html/wiki.opensourceecology.org/htdocs/includes/HeaderCallback.php:63\nStack trace:\n#0 [internal function]: MediaWiki\\HeaderCallback::callback()\n#1 {main}\n  thrown in /var/www/html/wiki.opensourceecology.org/htdocs/includes/HeaderCallback.php on line 63&#039;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# in the meantime, I&#039;ll ad the same workaround to LocalSettings.php as I added to wordpress&lt;br /&gt;
# curiously, this one is unique to MediaWiki&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
==&amp;gt; wiki.opensourceecology.org/error.log &amp;lt;==&lt;br /&gt;
[Sun Dec 29 04:41:57.197364 2024] [proxy_fcgi:error] [pid 447103:tid 447103] [client 127.0.0.1:0] AH01071: Got error &#039;PHP message: PHP Fatal error:  Uncaught Error: Call to undefined function putenv() in /var/www/html/wiki.opensourceecology.org/htdocs/includes/Setup.php:167\nStack trace:\n#0 /var/www/html/wiki.opensourceecology.org/htdocs/includes/WebStart.php(89): require_once()\n#1 /var/www/html/wiki.opensourceecology.org/htdocs/index.php(44): require(&#039;...&#039;)\n#2 {main}\n  thrown in /var/www/html/wiki.opensourceecology.org/htdocs/includes/Setup.php on line 167; PHP message: PHP Fatal error:  Uncaught Error: Class &amp;quot;WebRequest&amp;quot; not found in /var/www/html/wiki.opensourceecology.org/htdocs/includes/HeaderCallback.php:63\nStack trace:\n#0 [internal function]: MediaWiki\\HeaderCallback::callback()\n#1 {main}\n  thrown in /var/www/html/wiki.opensourceecology.org/htdocs/includes/HeaderCallback.php on line 63&#039;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# also php_uname; damn, I&#039;m disappointed in Mediawiki&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
==&amp;gt; wiki.opensourceecology.org/error.log &amp;lt;==&lt;br /&gt;
[Sun Dec 29 04:43:59.298957 2024] [proxy_fcgi:error] [pid 447104:tid 447104] [client 127.0.0.1:0] AH01071: Got error &#039;PHP message: PHP Fatal error:  Uncaught Error: Call to undefined function php_uname() in /var/www/html/wiki.opensourceecology.org/htdocs/includes/GlobalFunctions.php:1290\nStack trace:\n#0 /var/www/html/wiki.opensourceecology.org/htdocs/includes/db/MWLBFactory.php(104): wfHostname()\n#1 /var/www/html/wiki.opensourceecology.org/htdocs/includes/ServiceWiring.php(306): MWLBFactory::applyDefaultConfig()\n#2 /var/www/html/wiki.opensourceecology.org/htdocs/vendor/wikimedia/services/src/ServiceContainer.php(445): Wikimedia\\Services\\ServiceContainer-&amp;gt;{closure}()\n#3 /var/www/html/wiki.opensourceecology.org/htdocs/vendor/wikimedia/services/src/ServiceContainer.php(416): Wikimedia\\Services\\ServiceContainer-&amp;gt;createService()\n#4 /var/www/html/wiki.opensourceecology.org/htdocs/includes/MediaWikiServices.php(679): Wikimedia\\Services\\ServiceContainer-&amp;gt;getService()\n#5 /var/www/html/wiki.opensourceecology.org/htdocs/includes/exception/MWExceptionHandler.php(135): MediaWiki\\MediaWikiServices-&amp;gt;getDBLoadBalancerFactory()\n#...&#039;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now the LocalSettings.php starts with this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # head -n20 LocalSettings.php &lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
# fix mediawiki bugs&lt;br /&gt;
# * https://core.trac.wordpress.org/ticket/48693&lt;br /&gt;
if( ! function_exists(&#039;ini_set&#039;) ){&lt;br /&gt;
		function ini_set(){&lt;br /&gt;
				return;&lt;br /&gt;
		}&lt;br /&gt;
}&lt;br /&gt;
if( ! function_exists(&#039;putenv&#039;) ){&lt;br /&gt;
		function putenv(){&lt;br /&gt;
				return;&lt;br /&gt;
		}&lt;br /&gt;
}&lt;br /&gt;
if( ! function_exists(&#039;php_uname&#039;) ){&lt;br /&gt;
		function php_uname(){&lt;br /&gt;
				return;&lt;br /&gt;
		}&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# the .htaccess file that ships with mediawiki to prevent XSS attacks on IE6&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now when I load the site, I get this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;!DOCTYPE html&amp;gt;&lt;br /&gt;
&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;&amp;lt;title&amp;gt;Internal error - Open Source Ecology&amp;lt;/title&amp;gt;&amp;lt;style&amp;gt;body { font-family: sans-serif; margin: 0; padding: 0.5em 2em; }&amp;lt;/style&amp;gt;&amp;lt;/head&amp;gt;&amp;lt;body&amp;gt;&lt;br /&gt;
&amp;lt;div class=&amp;quot;errorbox mw-content-ltr&amp;quot;&amp;gt;[Z3DUd0uDMO6w_6GcLRKPZgAAAAU] 2024-12-29 04:47:51: Fatal exception of type &amp;amp;quot;MediaWiki\Revision\RevisionAccessException&amp;amp;quot;&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;!-- Set $wgShowExceptionDetails = true; at the bottom&lt;br /&gt;
of LocalSettings.php to show detailed debugging&lt;br /&gt;
information. --&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried tailing wikierror.log, but I didn&#039;t get anything useful, so I enabled the exception output as the error above said&lt;br /&gt;
# next page load gave me this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[Z3DXAXouL7ks0Tm9gkHDXgAAAAQ] /?nocache=12 MediaWiki\Revision\RevisionAccessException from line 1296 of /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php: Main slot of revision not found in database. See T212428.&lt;br /&gt;
&lt;br /&gt;
Backtrace:&lt;br /&gt;
&lt;br /&gt;
#0 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1224): MediaWiki\Revision\RevisionStore-&amp;gt;constructSlotRecords()&lt;br /&gt;
#1 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1217): MediaWiki\Revision\RevisionStore-&amp;gt;loadSlotRecords()&lt;br /&gt;
#2 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionStore.php(1335): MediaWiki\Revision\RevisionStore-&amp;gt;loadSlotRecords()&lt;br /&gt;
#3 [internal function]: MediaWiki\Revision\RevisionStore-&amp;gt;MediaWiki\Revision\{closure}()&lt;br /&gt;
#4 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionSlots.php(175): call_user_func()&lt;br /&gt;
#5 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionSlots.php(117): MediaWiki\Revision\RevisionSlots-&amp;gt;getSlots()&lt;br /&gt;
#6 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionRecord.php(192): MediaWiki\Revision\RevisionSlots-&amp;gt;getSlot()&lt;br /&gt;
#7 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\RevisionRecord-&amp;gt;getSlot()&lt;br /&gt;
#8 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1185): MediaWiki\Revision\RevisionRecord-&amp;gt;getContent()&lt;br /&gt;
#9 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/wancache/WANObjectCache.php(1528): MessageCache-&amp;gt;{closure}()&lt;br /&gt;
#10 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/wancache/WANObjectCache.php(1376): WANObjectCache-&amp;gt;fetchOrRegenerate()&lt;br /&gt;
#11 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1167): WANObjectCache-&amp;gt;getWithSetCallback()&lt;br /&gt;
#12 /var/www/html/wiki.opensourceecology.org/htdocs/includes/libs/objectcache/BagOStuff.php(149): MessageCache-&amp;gt;{closure}()&lt;br /&gt;
#13 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1163): BagOStuff-&amp;gt;getWithSetCallback()&lt;br /&gt;
#14 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1106): MessageCache-&amp;gt;loadCachedMessagePageEntry()&lt;br /&gt;
#15 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(1016): MessageCache-&amp;gt;getMsgFromNamespace()&lt;br /&gt;
#16 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(988): MessageCache-&amp;gt;getMessageForLang()&lt;br /&gt;
#17 /var/www/html/wiki.opensourceecology.org/htdocs/includes/cache/MessageCache.php(927): MessageCache-&amp;gt;getMessageFromFallbackChain()&lt;br /&gt;
#18 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(1304): MessageCache-&amp;gt;get()&lt;br /&gt;
#19 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(862): Message-&amp;gt;fetchMessage()&lt;br /&gt;
#20 /var/www/html/wiki.opensourceecology.org/htdocs/includes/language/Message.php(954): Message-&amp;gt;toString()&lt;br /&gt;
#21 /var/www/html/wiki.opensourceecology.org/htdocs/includes/Title.php(661): Message-&amp;gt;text()&lt;br /&gt;
#22 /var/www/html/wiki.opensourceecology.org/htdocs/includes/MediaWiki.php(131): Title::newMainPage()&lt;br /&gt;
#23 /var/www/html/wiki.opensourceecology.org/htdocs/includes/MediaWiki.php(151): MediaWiki-&amp;gt;parseTitle()&lt;br /&gt;
#24 /var/www/html/wiki.opensourceecology.org/htdocs/includes/MediaWiki.php(902): MediaWiki-&amp;gt;getTitle()&lt;br /&gt;
#25 /var/www/html/wiki.opensourceecology.org/htdocs/includes/MediaWiki.php(543): MediaWiki-&amp;gt;main()&lt;br /&gt;
#26 /var/www/html/wiki.opensourceecology.org/htdocs/index.php(53): MediaWiki-&amp;gt;run()&lt;br /&gt;
#27 /var/www/html/wiki.opensourceecology.org/htdocs/index.php(46): wfIndexMain()&lt;br /&gt;
#28 {main}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the &amp;quot;See T212428.&amp;quot; refers to this https://phabricator.wikimedia.org/T212428&lt;br /&gt;
# that says to run &#039;maintenance/populateContentTables.php&#039;, so I gave it a try&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # php htdocs/maintenance/populateContentTables.php &lt;br /&gt;
...&lt;br /&gt;
... archive processed up to revision id 302632 of 302632 (3178 rows in 2.0540862083435 seconds)&lt;br /&gt;
Done populating archive table. Processed 3178 rows in 2.0541031360626 seconds&lt;br /&gt;
Done. Processed 302790 rows in 16.495725154877 seconds&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# after that, I refreshed the page, and the wiki loads! WooHoo!&lt;br /&gt;
# there&#039;s a huge banner at the top complaining about missing skins, and the wiki looks a bit fucked-up (probably because of that), but it&#039;s loading!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Whoops! The default skin for your wiki, defined in $wgDefaultSkin as Vector, is not available.&lt;br /&gt;
&lt;br /&gt;
Your installation seems to include the following skins. See Manual: Skin configuration for information how to enable them and choose the default.&lt;br /&gt;
&lt;br /&gt;
	monobook / MonoBook (disabled)&lt;br /&gt;
	timeless / Timeless (disabled)&lt;br /&gt;
	vector / Vector (disabled)&lt;br /&gt;
&lt;br /&gt;
If you have just installed MediaWiki&lt;br /&gt;
	You probably installed from git, or directly from the source code using some other method. This is expected. Try installing some skins from mediawiki.org&#039;s skin directory, by:&lt;br /&gt;
&lt;br /&gt;
		Downloading the tarball installer, which comes with several skins and extensions. You can copy and paste the skins/ directory from it.&lt;br /&gt;
		Downloading individual skin tarballs from mediawiki.org.&lt;br /&gt;
		Using Git to download skins.&lt;br /&gt;
&lt;br /&gt;
	Doing this should not interfere with your git repository if you&#039;re a MediaWiki developer.&lt;br /&gt;
&lt;br /&gt;
If you have just upgraded MediaWiki&lt;br /&gt;
	MediaWiki 1.24 and newer no longer automatically enables installed skins (see Manual: Skin autodiscovery). You can paste the following lines into LocalSettings.php to enable all installed skins:&lt;br /&gt;
&lt;br /&gt;
wfLoadSkin( &#039;MonoBook&#039; );&lt;br /&gt;
wfLoadSkin( &#039;Timeless&#039; );&lt;br /&gt;
wfLoadSkin( &#039;Vector&#039; );&lt;br /&gt;
&lt;br /&gt;
If you have just modified LocalSettings.php&lt;br /&gt;
	Double-check the skin names for typos.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# next I need to figure out the download URLs for the skins and put together a 3TOFU script for both the skins and the extensions&lt;br /&gt;
# and also attempt the upgrade from 1.35 to 1.42.&lt;br /&gt;
# oh, it looks like some of the skins are included by default; I think the issue is that we were doing a include() when we need to use the wfLoadSkin() function&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # ls -lah htdocs/skins&lt;br /&gt;
total 24K&lt;br /&gt;
d---r-x---  5 not-apache www-data 4,0K Dec 26 23:53 .&lt;br /&gt;
d---r-x--- 14 not-apache www-data 4,0K Dec 28 04:23 ..&lt;br /&gt;
d---r-x---  6 not-apache www-data 4,0K Dec 26 23:53 MonoBook&lt;br /&gt;
----r-----  1 not-apache www-data 1,3K Sep 24  2020 README&lt;br /&gt;
d---r-x---  6 not-apache www-data 4,0K Dec 26 23:53 Timeless&lt;br /&gt;
d---r-x--- 10 not-apache www-data 4,0K Dec 26 23:53 Vector&lt;br /&gt;
root@hetzner3 /var/www/html/wiki.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, I fixed the skin issue by adding this to the LocalSettings.php file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# SKINS #&lt;br /&gt;
#########&lt;br /&gt;
&lt;br /&gt;
## Default skin: you can change the default skin. Use the internal symbolic&lt;br /&gt;
## names, ie &#039;vector&#039;, &#039;monobook&#039;:&lt;br /&gt;
# $wgDefaultSkin = &#039;monobook&#039;;&lt;br /&gt;
$wgDefaultSkin = &#039;Vector&#039;;&lt;br /&gt;
&lt;br /&gt;
wfLoadSkin( &#039;MonoBook&#039; );&lt;br /&gt;
wfLoadSkin( &#039;Timeless&#039; );&lt;br /&gt;
wfLoadSkin( &#039;Vector&#039; );&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now the wiki frontpage looks sane, except that the GVCS section says this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
{{#ifexist:Template:GVCS List/Main Page|/Main Page|}}|}}}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s some other things broken; let&#039;s do the second upgrade and then we can try to fix them one-by-one&lt;br /&gt;
&lt;br /&gt;
=Fri Dec 27, 2024=&lt;br /&gt;
# Here&#039;s TOFU 2/3 (VPN, exit in Germany)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Germany&lt;br /&gt;
2024-12-27&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
2024-12-27&lt;br /&gt;
3c457f4960cfcbdf5e1e2fab6b6e3ade006e65a55cd93459e0ecede3d9c23a16  keys.txt&lt;br /&gt;
56e0a7fe2ff78e50b9c87b3d589f2d8df5bcab0018e454b2a4a239c8586bd8ee  mediawiki-1.43.0.tar.gz&lt;br /&gt;
cee3d020f25e9b3f66eef214a343be466a1a313db31344ad2a5e87ab0183883e  mediawiki-1.43.0.tar.gz.sig&lt;br /&gt;
9e17cb15dd75bbbd5dbb984eda674863c3b10ab72613cf8a39a00c3e11a8492a  phplist-3.6.14.zip&lt;br /&gt;
user@disp4681:/tmp/tmp.5fvnR1Pc6h$ &lt;br /&gt;
&lt;br /&gt;
user@disp4681:/tmp/tmp.5fvnR1Pc6h$ gpg --with-fingerprint  --with-subkey-fingerprint --keyid-format 0xlong keys.txt &lt;br /&gt;
gpg: WARNING: no command supplied.  Trying to guess what you mean ...&lt;br /&gt;
pub   rsa4096/0x73F146FECF9D333C 2014-11-20 [SC] [expired: 2021-06-05]&lt;br /&gt;
	  Key fingerprint = F64E BF5F 2099 6AB5 14F1  98A8 73F1 46FE CF9D 333C&lt;br /&gt;
uid                             Tim Starling &amp;lt;tstarling@wikimedia.org&amp;gt;&lt;br /&gt;
sub   rsa4096/0x1075249FCCC9CAAF 2014-11-20 [E] [expired: 2021-06-05]&lt;br /&gt;
pub   dsa1024/0xC119E1A64D70938E 2003-11-15 [SCA]&lt;br /&gt;
	  Key fingerprint = 4412 76E9 CCD1 5F44 F6D9  7D18 C119 E1A6 4D70 938E&lt;br /&gt;
uid                             Brion Vibber &amp;lt;brion@pobox.com&amp;gt;&lt;br /&gt;
sub   elg1024/0x6596FAD2965B3548 2003-11-15 [E]&lt;br /&gt;
pub   dsa1024/0x9B69B3109D3BB7B0 2011-10-24 [SC]&lt;br /&gt;
	  Key fingerprint = 1D98 867E 8298 2C8F E0AB  C25F 9B69 B310 9D3B B7B0&lt;br /&gt;
uid                             Sam Reed &amp;lt;reedy@wikimedia.org&amp;gt;&lt;br /&gt;
sub   elg2048/0x3BBB95CE2B08BFD2 2011-10-24 [E]&lt;br /&gt;
pub   rsa2048/0x72BC1C5D23107F8A 2014-04-29 [SC] [expires: 2026-04-29]&lt;br /&gt;
	  Key fingerprint = 41B2 ABE8 17AD D3E5 2BDA  946F 72BC 1C5D 2310 7F8A&lt;br /&gt;
uid                             Chad Horohoe &amp;lt;chad@wikimedia.org&amp;gt;&lt;br /&gt;
uid                             keybase.io/demon &amp;lt;demon@keybase.io&amp;gt;&lt;br /&gt;
sub   rsa2048/0x08CF4E7951361C13 2014-04-29 [E] [expires: 2026-04-29]&lt;br /&gt;
pub   rsa4096/0xF6DAD285018FAC02 2014-02-19 [SC] [expired: 2018-10-04]&lt;br /&gt;
	  Key fingerprint = 6237 D8D3 ECC1 AE91 8729  296F F6DA D285 018F AC02&lt;br /&gt;
uid                             Tyler Cipriani &amp;lt;tcipriani@wikimedia.org&amp;gt;&lt;br /&gt;
uid                             Tyler Cipriani &amp;lt;tyler@tylercipriani.com&amp;gt;&lt;br /&gt;
uid                             [jpeg image of size 5098]&lt;br /&gt;
sub   rsa4096/0xB002E1FDEE737D83 2014-02-19 [E] [expired: 2018-10-04]&lt;br /&gt;
pub   rsa3072/0x26752EBB0D9E6218 2021-11-11 [SC]&lt;br /&gt;
	  Key fingerprint = 72D2 86F6 F8F0 3C78 F2C5  9C73 2675 2EBB 0D9E 6218&lt;br /&gt;
uid                             Amir Sarabadani &amp;lt;asarabadani@wikimedia.org&amp;gt;&lt;br /&gt;
sub   rsa3072/0x4F889038CE86B378 2021-11-11 [E]&lt;br /&gt;
pub   rsa4096/0x361F943B15C08DD4 2015-05-22 [SC] [expired: 2020-05-20]&lt;br /&gt;
	  Key fingerprint = 80D1 13B7 67E3 D519 3672  5679 361F 943B 15C0 8DD4&lt;br /&gt;
uid                             Brian Wolff &amp;lt;bwolff@wikimedia.org&amp;gt;&lt;br /&gt;
uid                             Brian Wolff (Bawolff) &amp;lt;bawolff@gmail.com&amp;gt;&lt;br /&gt;
sub   rsa4096/0xBF1629CD074D3DD8 2015-05-22 [E] [expired: 2020-05-20]&lt;br /&gt;
pub   rsa4096/0x131910E01605D9AA 2016-01-08 [SC] [expired: 2020-07-31]&lt;br /&gt;
	  Key fingerprint = C83A 8E4D 3C8F EB7C 8A3A  1998 1319 10E0 1605 D9AA&lt;br /&gt;
uid                             Mukunda Modell &amp;lt;twentyafterfour@gmail.com&amp;gt;&lt;br /&gt;
uid                             Mukunda Modell (WMF) &amp;lt;mmodell@wikimedia.org&amp;gt;&lt;br /&gt;
uid                             [jpeg image of size 2928]&lt;br /&gt;
sub   rsa4096/0x5411F23A0C4E5EC1 2018-12-25 [A] [expired: 2020-12-24]&lt;br /&gt;
sub   rsa4096/0x02C99BB8AB1C6DD5 2018-12-25 [E] [expired: 2020-12-24]&lt;br /&gt;
sub   rsa4096/0x60AE06D4875BE862 2018-12-26 [S] [expired: 2019-12-26]&lt;br /&gt;
pub   rsa4096/0xA8F734246D73B586 2024-12-09 [SC] [expires: 2026-12-09]&lt;br /&gt;
	  Key fingerprint = E059 C034 E7A4 3058 3C25  2F4A A8F7 3424 6D73 B586&lt;br /&gt;
uid                             atieno njira &amp;lt;pnjira@wikimedia.org&amp;gt;&lt;br /&gt;
sub   rsa4096/0xB65D063E68F2FC8A 2024-12-09 [E] [expires: 2026-12-09]&lt;br /&gt;
user@disp4681:/tmp/tmp.5fvnR1Pc6h$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# I began the process to copy the wiki from hetzner2 to hetzner3&lt;br /&gt;
# man, this is a big site; it took over 10 minutes just to create a gzip&#039;d tarball of the site&#039;s files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology current]# time nice tar -czvf ${backupDir_hetzner2}/current/${backupFileName_files_hetzner2} ${vhostDir}&lt;br /&gt;
...&lt;br /&gt;
/var/www/html/wiki.opensourceecology.org/cache/l10n_cache-it.cdb&lt;br /&gt;
/var/www/html/wiki.opensourceecology.org/cache/l10n_cache-uk.cdb&lt;br /&gt;
&lt;br /&gt;
real    11m37.448s&lt;br /&gt;
user    10m48.352s&lt;br /&gt;
sys     0m40.263s&lt;br /&gt;
[root@opensourceecology current]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology current]# du -sh ${backupDir_hetzner2}/current/${backupFileName_files_hetzner2}&lt;br /&gt;
15G     /var/tmp/backups_for_migration_to_hetzner2/wiki.opensourceecology.org_20241228/current/wiki.opensourceecology.org_files.20241228.tar.gz&lt;br /&gt;
[root@opensourceecology current]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# curiously the mysqldump fails, something about LOCK TABLES&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology current]# time nice mysqldump -u&amp;quot;${dbUser}&amp;quot; -p&amp;quot;${dbPass}&amp;quot; ${dbName} | bzip2 -c &amp;gt; ${backupDir_hetzner2}/current/${backupFileName_db_hetzner2}&lt;br /&gt;
mysqldump: Got error: 1044: &amp;quot;Access denied for user &#039;osewiki_user&#039;@&#039;localhost&#039; to database &#039;osewiki_db&#039;&amp;quot; when using LOCK TABLES&lt;br /&gt;
&lt;br /&gt;
real    0m0.007s&lt;br /&gt;
user    0m0.001s&lt;br /&gt;
sys     0m0.005s&lt;br /&gt;
[root@opensourceecology current]#&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# stack exchange says I can bypass this by using the `--single-transaction` flag, but that appears to mean that the dump could be corrupted. The lock is needed to ensure a consistent state https://stackoverflow.com/questions/70376243/mysqldump-got-error-1044-access-denied-for-user-usernamelocalhost-to-dat&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
	   ·   --single-transaction&lt;br /&gt;
&lt;br /&gt;
		   This option sends a START TRANSACTION SQL statement to the server before dumping data. It is useful&lt;br /&gt;
		   only with transactional tables such as InnoDB, because then it dumps the consistent state of the&lt;br /&gt;
		   database at the time when BEGIN was issued without blocking any applications.&lt;br /&gt;
&lt;br /&gt;
		   When using this option, you should keep in mind that only InnoDB tables are dumped in a consistent&lt;br /&gt;
		   state. For example, any MyISAM or MEMORY tables dumped while using this option may still change&lt;br /&gt;
		   state.&lt;br /&gt;
&lt;br /&gt;
		   While a --single-transaction dump is in process, to ensure a valid dump file (correct table&lt;br /&gt;
		   contents and binary log coordinates), no other connection should use the following statements:&lt;br /&gt;
		   ALTER TABLE, CREATE TABLE, DROP TABLE, RENAME TABLE, TRUNCATE TABLE. A consistent read is not&lt;br /&gt;
		   isolated from those statements, so use of them on a table to be dumped can cause the SELECT that is&lt;br /&gt;
		   performed by mysqldump to retrieve the table contents to obtain incorrect contents or fail.&lt;br /&gt;
&lt;br /&gt;
		   The --single-transaction option and the --lock-tables option are mutually exclusive because LOCK&lt;br /&gt;
		   TABLES causes any pending transactions to be committed implicitly.&lt;br /&gt;
&lt;br /&gt;
		   This option is not supported for MySQL Cluster tables; the results cannot be guaranteed to be&lt;br /&gt;
		   consistent due to the fact that the NDBCLUSTER storage engine supports only the READ_COMMITTED&lt;br /&gt;
		   transaction isolation level. You should always use NDB backup and restore instead.&lt;br /&gt;
&lt;br /&gt;
		   To dump large tables, you should combine the --single-transaction option with --quick.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# another site said I could GRANT the LOCK permission to all the db tables to this user to fix this issue https://michaelrigart.be/mysqldump-1044-access-denied-using-lock-tables/&lt;br /&gt;
# I tried this (executed in the mysql shell as root)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GRANT SELECT,LOCK TABLES ON osewiki_db.* TO &#039;osewiki_user&#039;@&#039;localhost&#039;;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well that worked; I wonder if the only reason it didn&#039;t throw the same error for the wordpress sites is because the wordpress sites are stupid small (and inactive) by comparison&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology current]# time nice mysqldump -u&amp;quot;${dbUser}&amp;quot; -p&amp;quot;${dbPass}&amp;quot; ${dbName} | bzip2 -c &amp;gt; ${backupDir_hetzner2}/current/${backupFileName_db_hetzner2}&lt;br /&gt;
&lt;br /&gt;
real    6m53.476s&lt;br /&gt;
user    7m4.351s&lt;br /&gt;
sys     0m2.092s&lt;br /&gt;
[root@opensourceecology current]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the upgrade guide on mediawiki actually doesn&#039;t say we should copy the files from the latest release to override the old release. Instead it says we should start from the files in the new release, and then just copy a few files/dirs from the old install into it&lt;br /&gt;
## https://www.mediawiki.org/wiki/Manual:Upgrading&lt;br /&gt;
## LocalSettings.php&lt;br /&gt;
## htdocs/LocalSettings.php (since we have a custom one because we want ours outside the docroot)&lt;br /&gt;
## htdocs/images/&lt;br /&gt;
## htdocs/extensions/&lt;br /&gt;
## htdocs/skins/&lt;br /&gt;
# I&#039;m going to skip the extensions for now; after the second upgrade, I&#039;ll re-visit installing the updated versions of these extensions.&lt;br /&gt;
# for now let&#039;s just see if I can even do the first upgrade; here&#039;s the commands to place the vhost files in-place&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# STEP 2: Add vhost files&lt;br /&gt;
mv &amp;quot;${vhostDir}&amp;quot; &amp;quot;${backupDir_hetzner3}/old/${vhost_name}.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&amp;quot;&lt;br /&gt;
tar -xzvf ${backupFileName_files_hetzner2}&lt;br /&gt;
&lt;br /&gt;
mkdir -p ${vhostDir}&lt;br /&gt;
rsync -av --progress /var/tmp/mediawiki/mediawiki-1.35.0/ ${docrootDir}/&lt;br /&gt;
&lt;br /&gt;
rsync -av --progress var/www/html/wiki.opensourceecology.org/LocalSettings.php ${vhostDir}/&lt;br /&gt;
rsync -av --progress var/www/html/wiki.opensourceecology.org/htdocs/LocalSettings.php ${docrootDir}/&lt;br /&gt;
rsync -av --progress var/www/html/wiki.opensourceecology.org/htdocs/images ${docrootDir}/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# while I&#039;m waiting, I looked into the extensions; here&#039;s what we have installed on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wiki.opensourceecology.org]# ls -lah htdocs/extensions/&lt;br /&gt;
total 104K&lt;br /&gt;
d---r-x--- 25 not-apache apache 4.0K May 24  2018 .&lt;br /&gt;
d---r-x--- 17 not-apache apache 4.0K Jul 28  2019 ..&lt;br /&gt;
d---r-x---  5 not-apache apache 4.0K May 24  2018 CategoryTree&lt;br /&gt;
d---r-x---  6 not-apache apache 4.0K Dec  8  2017 Cite&lt;br /&gt;
d---r-x---  4 not-apache apache 4.0K Dec  8  2017 CiteThisPage&lt;br /&gt;
d---r-x---  7 not-apache apache 4.0K May 24  2018 ConfirmAccount&lt;br /&gt;
d---r-x--- 13 not-apache apache 4.0K Dec  8  2017 ConfirmEdit&lt;br /&gt;
d---r-x---  6 not-apache apache 4.0K Dec  8  2017 Gadgets&lt;br /&gt;
d---r-x---  3 not-apache apache 4.0K Dec  8  2017 ImageMap&lt;br /&gt;
d---r-x---  5 not-apache apache 4.0K Dec  8  2017 InputBox&lt;br /&gt;
d---r-x---  3 not-apache apache 4.0K Dec  8  2017 Interwiki&lt;br /&gt;
d---r-x---  7 not-apache apache 4.0K Dec  8  2017 LocalisationUpdate&lt;br /&gt;
d---r-x---  4 not-apache apache 4.0K Dec  8  2017 Nuke&lt;br /&gt;
d---r-x--- 12 not-apache apache 4.0K May 24  2018 OATHAuth&lt;br /&gt;
d---r-x---  4 not-apache apache 4.0K Dec  8  2017 ParserFunctions&lt;br /&gt;
d---r-x---  4 not-apache apache 4.0K Dec  8  2017 PdfHandler&lt;br /&gt;
d---r-x---  3 not-apache apache 4.0K Dec  8  2017 Poem&lt;br /&gt;
----r-----  1 not-apache apache 1.1K Dec  8  2017 README&lt;br /&gt;
d---r-x---  5 not-apache apache 4.0K Dec  8  2017 Renameuser&lt;br /&gt;
d---r-x---  4 not-apache apache 4.0K May 24  2018 ReplaceText&lt;br /&gt;
d---r-x---  6 not-apache apache 4.0K Dec  8  2017 SpamBlacklist&lt;br /&gt;
d---r-x---  7 not-apache apache 4.0K Dec  8  2017 SyntaxHighlight_GeSHi&lt;br /&gt;
d---r-x---  6 not-apache apache 4.0K Dec  8  2017 TitleBlacklist&lt;br /&gt;
d---r-x---  5 not-apache apache 4.0K May 24  2018 UserMerge&lt;br /&gt;
d---r-x---  7 not-apache apache 4.0K May 24  2018 Widgets&lt;br /&gt;
d---r-x---  5 not-apache apache 4.0K Dec  8  2017 WikiEditor&lt;br /&gt;
[root@opensourceecology wiki.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it looks like we might not be using all of these&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology wiki.opensourceecology.org]# grep wfLoadExtension LocalSettings.php&lt;br /&gt;
wfLoadExtensions([ &#039;ConfirmEdit&#039;, &#039;ConfirmEdit/ReCaptcha&#039; ]);&lt;br /&gt;
wfLoadExtension( &#039;Cite&#039; );&lt;br /&gt;
wfLoadExtension( &#039;Interwiki&#039; );&lt;br /&gt;
wfLoadExtension( &#039;Gadgets&#039; );&lt;br /&gt;
wfLoadExtension( &#039;ReplaceText&#039; );&lt;br /&gt;
wfLoadExtension( &#039;Renameuser&#039; );&lt;br /&gt;
wfLoadExtension( &#039;UserMerge&#039; );&lt;br /&gt;
wfLoadExtension( &#039;Nuke&#039; );&lt;br /&gt;
wfLoadExtension( &#039;OATHAuth&#039; );&lt;br /&gt;
[root@opensourceecology wiki.opensourceecology.org]# grep wfLoadExtension LocalSetti^Chp&lt;br /&gt;
[root@opensourceecology wiki.opensourceecology.org]# grep require LocalSettings.php&lt;br /&gt;
require_once( &amp;quot;$IP/includes/DefaultSettings.php&amp;quot; );&lt;br /&gt;
require_once(&amp;quot;{$IP}/extensions/CategoryTree/CategoryTree.php&amp;quot;);&lt;br /&gt;
require_once &amp;quot;$IP/skins/CologneBlue/CologneBlue.php&amp;quot;;&lt;br /&gt;
require_once &amp;quot;$IP/skins/Modern/Modern.php&amp;quot;;&lt;br /&gt;
require_once &amp;quot;$IP/skins/MonoBook/MonoBook.php&amp;quot;;&lt;br /&gt;
require_once &amp;quot;$IP/skins/Vector/Vector.php&amp;quot;;&lt;br /&gt;
#require_once( &amp;quot;$IP/extensions/CCAgreement/CCAgreement.php&amp;quot; );&lt;br /&gt;
# This extension and directory requires an admin to confirm a user before their account is created&lt;br /&gt;
require_once &amp;quot;$IP/extensions/ConfirmAccount/ConfirmAccount.php&amp;quot;;&lt;br /&gt;
require_once(&amp;quot;$IP/extensions/Widgets/Widgets.php&amp;quot;);&lt;br /&gt;
require_once( &amp;quot;$IP/extensions/ParserFunctions/ParserFunctions.php&amp;quot;  );&lt;br /&gt;
[root@opensourceecology wiki.opensourceecology.org]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# also, apparently mediawiki cores ships with a bunch of these (that we don&#039;t have to worry about installing)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # ls -lah /var/tmp/mediawiki/mediawiki-1.35.0/extensions/&lt;br /&gt;
total 124K&lt;br /&gt;
drwxr-xr-x 30 root      root      4,0K Dec 26 23:53 .&lt;br /&gt;
drwxr-xr-x 14 root      root      4,0K Dec 26 23:53 ..&lt;br /&gt;
drwxr-xr-x  5 root      root      4,0K Dec 26 23:53 CategoryTree&lt;br /&gt;
drwxr-xr-x  6 root      root      4,0K Dec 26 23:53 Cite&lt;br /&gt;
drwxr-xr-x  5 root      root      4,0K Dec 26 23:53 CiteThisPage&lt;br /&gt;
drwxr-xr-x  6 root      root      4,0K Dec 26 23:53 CodeEditor&lt;br /&gt;
drwxr-xr-x 13 root      root      4,0K Dec 26 23:53 ConfirmEdit&lt;br /&gt;
drwxr-xr-x  5 root      root      4,0K Dec 26 23:53 Gadgets&lt;br /&gt;
drwxr-xr-x  6 root      root      4,0K Dec 26 23:53 ImageMap&lt;br /&gt;
drwxr-xr-x  6 root      root      4,0K Dec 26 23:53 InputBox&lt;br /&gt;
drwxr-xr-x  4 root      root      4,0K Dec 26 23:53 Interwiki&lt;br /&gt;
drwxr-xr-x  5 root      root      4,0K Dec 26 23:53 LocalisationUpdate&lt;br /&gt;
drwxr-xr-x  6 root      root      4,0K Dec 26 23:53 MultimediaViewer&lt;br /&gt;
drwxr-xr-x  5 root      root      4,0K Dec 26 23:53 Nuke&lt;br /&gt;
drwxr-xr-x  8 root      root      4,0K Dec 26 23:53 OATHAuth&lt;br /&gt;
drwxr-xr-x  6 root      root      4,0K Dec 26 23:53 PageImages&lt;br /&gt;
drwxr-xr-x  5 root      root      4,0K Dec 26 23:53 ParserFunctions&lt;br /&gt;
drwxr-xr-x  4 root      root      4,0K Dec 26 23:53 PdfHandler&lt;br /&gt;
drwxr-xr-x  5 root      root      4,0K Dec 26 23:53 Poem&lt;br /&gt;
-rw-rw-r--  1 maltfield maltfield 1,1K Sep 11  2020 README&lt;br /&gt;
drwxr-xr-x  5 root      root      4,0K Dec 26 23:53 Renameuser&lt;br /&gt;
drwxr-xr-x  6 root      root      4,0K Dec 26 23:53 ReplaceText&lt;br /&gt;
drwxr-xr-x  6 root      root      4,0K Dec 26 23:53 Scribunto&lt;br /&gt;
drwxr-xr-x  6 root      root      4,0K Dec 26 23:53 SecureLinkFixer&lt;br /&gt;
drwxr-xr-x  7 root      root      4,0K Dec 26 23:53 SpamBlacklist&lt;br /&gt;
drwxr-xr-x  8 root      root      4,0K Dec 26 23:53 SyntaxHighlight_GeSHi&lt;br /&gt;
drwxr-xr-x  7 root      root      4,0K Dec 26 23:53 TemplateData&lt;br /&gt;
drwxr-xr-x  5 root      root      4,0K Dec 26 23:53 TextExtracts&lt;br /&gt;
drwxr-xr-x  6 root      root      4,0K Dec 26 23:53 TitleBlacklist&lt;br /&gt;
drwxr-xr-x 11 root      root      4,0K Dec 26 23:53 VisualEditor&lt;br /&gt;
drwxr-xr-x  6 root      root      4,0K Dec 26 23:53 WikiEditor&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# as far as I can tell, here&#039;s the extensions we&#039;re using on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Cite&lt;br /&gt;
Interwiki&lt;br /&gt;
Gadgets&lt;br /&gt;
ReplaceText&lt;br /&gt;
Renameuser&lt;br /&gt;
UserMerge&lt;br /&gt;
Nuke&lt;br /&gt;
OATHAuth&lt;br /&gt;
CategoryTree&lt;br /&gt;
ConfirmAccount&lt;br /&gt;
Widgets&lt;br /&gt;
ParserFunctions&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and if we remove the extensions that already come with core, here&#039;s what we need to find the latest versions for&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Renameuser&lt;br /&gt;
UserMerge&lt;br /&gt;
ConfirmAccount&lt;br /&gt;
Widgets&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# It looks like Renameuser became merged into core in 1.40, so we can scratch that one off https://www.mediawiki.org/wiki/Manual:Renameuser&lt;br /&gt;
# UserMerge is still an extension, and it&#039;s actively developed https://www.mediawiki.org/wiki/Extension:UserMerge&lt;br /&gt;
# ConfirmAccount is still an extension, and it&#039;s actively developed https://www.mediawiki.org/wiki/Extension:ConfirmAccount&lt;br /&gt;
# Widgets is still an extension, and it&#039;s actively developed https://www.mediawiki.org/wiki/Extension:Widgets&lt;br /&gt;
# I had some typos in my previous plan&#039;s permissions setting at the end; here&#039;s the updated one&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner2 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;wiki.opensourceecology.org&#039;&lt;br /&gt;
dbName=&#039;osewiki_db&#039;&lt;br /&gt;
 dbUser=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
 dbPass=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupDir_hetzner3=&amp;quot;/var/tmp/backups_for_migration_from_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupFileName_db_hetzner2=&amp;quot;mysqldump_${vhost_name}.${stamp}.sql.bz2&amp;quot;&lt;br /&gt;
backupFileName_files_hetzner2=&amp;quot;${vhost_name}_files.${stamp}.tar.gz&amp;quot;&lt;br /&gt;
vhostDir=&amp;quot;/var/www/html/${vhost_name}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 2: BACKUP DB&lt;br /&gt;
mkdir -p ${backupDir_hetzner2}/{current,old}&lt;br /&gt;
pushd ${backupDir_hetzner2}/current/&lt;br /&gt;
mv ${backupDir_hetzner2}/current/* ${backupDir_hetzner2}/old/&lt;br /&gt;
&lt;br /&gt;
time nice mysqldump -u&amp;quot;${dbUser}&amp;quot; -p&amp;quot;${dbPass}&amp;quot; ${dbName} | bzip2 -c &amp;gt; ${backupDir_hetzner2}/current/${backupFileName_db_hetzner2}&lt;br /&gt;
&lt;br /&gt;
# STEP 3: BACKUP FILES&lt;br /&gt;
time nice tar -czvf ${backupDir_hetzner2}/current/${backupFileName_files_hetzner2} ${vhostDir}&lt;br /&gt;
&lt;br /&gt;
# STEP 4: COPY TO HETZNER3&lt;br /&gt;
ssh -p 32415 maltfield@hetzner3 sudo mkdir -p ${backupDir_hetzner3}/{current,old}&lt;br /&gt;
ssh -p 32415 maltfield@hetzner3 sudo mv ${backupDir_hetzner3}/current/* ${backupDir_hetzner3}/old/&lt;br /&gt;
rsync -av --progress --rsync-path=&amp;quot;sudo rsync&amp;quot; -e &amp;quot;ssh -p 32415&amp;quot; ${backupDir_hetzner2}/current/* maltfield@hetzner3:${backupDir_hetzner3}/current/&lt;br /&gt;
&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner3 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;wiki.opensourceecology.org&#039;&lt;br /&gt;
dbName=&#039;osewiki_db&#039;&lt;br /&gt;
 dbUser=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
 dbPass=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner3/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupDir_hetzner3=&amp;quot;/var/tmp/backups_for_migration_from_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupFileName_db_hetzner2=&amp;quot;mysqldump_${vhost_name}.${stamp}.sql.bz2&amp;quot;&lt;br /&gt;
backupFileName_files_hetzner2=&amp;quot;${vhost_name}_files.${stamp}.tar.gz&amp;quot;&lt;br /&gt;
vhostDir=&amp;quot;/var/www/html/${vhost_name}&amp;quot;&lt;br /&gt;
docrootDir=&amp;quot;${vhostDir}/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 1: ADD DB&lt;br /&gt;
&lt;br /&gt;
# create backup before we start changing the sql file&lt;br /&gt;
pushd ${backupDir_hetzner3}/current&lt;br /&gt;
cp ${backupFileName_db_hetzner2} ${backupFileName_db_hetzner2}.orig&lt;br /&gt;
&lt;br /&gt;
# extract .sql.bz2 -&amp;gt; .sql&lt;br /&gt;
bzip2 -dc ${backupFileName_db_hetzner2} &amp;gt; db.sql&lt;br /&gt;
&lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;DROP DATABASE IF EXISTS ${dbName};&amp;quot; &lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;CREATE DATABASE ${dbName}; USE ${dbName};&amp;quot;&lt;br /&gt;
 time nice mysql ${dbName} -uroot -p${mysqlPass} &amp;lt; &amp;quot;db.sql&amp;quot;&lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;GRANT ALL ON ${dbName}.* TO &#039;${dbUser}&#039;@&#039;localhost&#039; IDENTIFIED BY &#039;${dbPass}&#039;; FLUSH PRIVILEGES;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 2: Add vhost files&lt;br /&gt;
mv &amp;quot;${vhostDir}&amp;quot; &amp;quot;${backupDir_hetzner3}/old/${vhost_name}.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&amp;quot;&lt;br /&gt;
tar -xzvf ${backupFileName_files_hetzner2}&lt;br /&gt;
&lt;br /&gt;
mkdir -p ${vhostDir}&lt;br /&gt;
rsync -av --progress /var/tmp/mediawiki/mediawiki-1.35.0/ ${docrootDir}/&lt;br /&gt;
&lt;br /&gt;
rsync -av --progress var/www/html/wiki.opensourceecology.org/LocalSettings.php ${vhostDir}/&lt;br /&gt;
rsync -av --progress var/www/html/wiki.opensourceecology.org/htdocs/LocalSettings.php ${docrootDir}/&lt;br /&gt;
rsync -av --progress var/www/html/wiki.opensourceecology.org/htdocs/images ${docrootDir}/&lt;br /&gt;
&lt;br /&gt;
# UPDATE OLD EXTENSIONS&lt;br /&gt;
&lt;br /&gt;
# TODO&lt;br /&gt;
&lt;br /&gt;
# INSTALLL NEW EXTENSIONS&lt;br /&gt;
&lt;br /&gt;
# TODO&lt;br /&gt;
&lt;br /&gt;
# SET PERMISSIONS&lt;br /&gt;
&lt;br /&gt;
# first pass, whole site&lt;br /&gt;
chown -R not-apache:www-data &amp;quot;/var/www/html&amp;quot;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
#############&lt;br /&gt;
# WORDPRESS #&lt;br /&gt;
#############&lt;br /&gt;
&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
	chown not-apache:apache-admins &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
	chmod 0040 &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
###########&lt;br /&gt;
# phpList #&lt;br /&gt;
###########&lt;br /&gt;
&lt;br /&gt;
phplist_sites=&amp;quot;$(find /var/www/html -maxdepth 1 -type d -iname *phplist*)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for vhost_dir in $phplist_sites; do&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type d -exec chmod 0050 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type f -exec chmod 0040 {} \;; done&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do [ -d &amp;quot;${dir}/public_html/uploadimages&amp;quot; ] || mkdir &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type f -exec chmod 0660 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type d -exec chmod 0770 {} \;; done&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
#############&lt;br /&gt;
# MediaWiki #&lt;br /&gt;
#############&lt;br /&gt;
&lt;br /&gt;
vhost_dir=&amp;quot;/var/www/html/wiki.opensourceecology.org&amp;quot;&lt;br /&gt;
mw_docroot=&amp;quot;${vhost_dir}/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
chown not-apache:apache-admins &amp;quot;${vhost_dir}/LocalSettings.php&amp;quot;&lt;br /&gt;
chmod 0040 &amp;quot;${vhost_dir}/LocalSettings.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
[ -d &amp;quot;${mw_docroot}/images&amp;quot; ] || mkdir &amp;quot;${mw_docroot}/images&amp;quot;&lt;br /&gt;
chown -R www-data:www-data &amp;quot;${mw_docroot}/images&amp;quot;&lt;br /&gt;
find &amp;quot;${mw_docroot}/images&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
find &amp;quot;${mw_docroot}/images&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
[ -d &amp;quot;${vhost_dir}/cache&amp;quot; ] || mkdir &amp;quot;${vhost_dir}/cache&amp;quot;&lt;br /&gt;
chown -R www-data:www-data &amp;quot;${vhost_dir}/cache&amp;quot;&lt;br /&gt;
find &amp;quot;${vhost_dir}/cache&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
find &amp;quot;${vhost_dir}/cache&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I pushed the wiki vhost out with ansible, but apache failed to restart&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/wiki.opensourceecology.org_20241228/current # systemctl restart apache2 varnish nginx&lt;br /&gt;
Job for apache2.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status apache2.service&amp;quot; and &amp;quot;journalctl -xeu apache2.service&amp;quot; for details.&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/wiki.opensourceecology.org_20241228/current # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# looks like I&#039;m missing a file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/wiki.opensourceecology.org_20241228/current # journalctl --no-pager -u apache2&lt;br /&gt;
...&lt;br /&gt;
Dec 28 04:58:06 hetzner3 systemd[1]: Starting apache2.service - The Apache HTTP Server...&lt;br /&gt;
Dec 28 04:58:06 hetzner3 apachectl[43030]: apache2: Syntax error on line 232 of /etc/apache2/apache2.conf: Syntax error on line 22 of /etc/apache2/sites-enabled/wiki.opensourceecology.org.conf: Could not open configuration file /etc/apache2/conf-available/wiki.virtualhost.include: No such file or directory&lt;br /&gt;
Dec 28 04:58:06 hetzner3 apachectl[43019]: Action &#039;start&#039; failed.&lt;br /&gt;
Dec 28 04:58:06 hetzner3 apachectl[43019]: The Apache error log may have more information.&lt;br /&gt;
Dec 28 04:58:06 hetzner3 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Dec 28 04:58:06 hetzner3 systemd[1]: apache2.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Dec 28 04:58:06 hetzner3 systemd[1]: Failed to start apache2.service - The Apache HTTP Server.&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/wiki.opensourceecology.org_20241228/current # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I had to spend some time fixing some issues in the wiki config in ansible https://github.com/OpenSourceEcology/ansible/commit/d070595d4284874d49b735292fc7c9f1dd83a085&lt;br /&gt;
# when I finally got apache and nginx configs resolved, loading the new wiki site on hetzner3 gives this error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MediaWiki 1.35 internal error&lt;br /&gt;
&lt;br /&gt;
Installing some PHP extensions is required.&lt;br /&gt;
Required components&lt;br /&gt;
&lt;br /&gt;
You are missing a required extension to PHP that MediaWiki requires to run. Please install:&lt;br /&gt;
&lt;br /&gt;
	mbstring (more information)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I should be able to install that from apt, and then I can proceed with upgrading the wiki&lt;br /&gt;
&lt;br /&gt;
=Thr Dec 26, 2024=&lt;br /&gt;
# I got a genetic human-generated message back from oshine support, but they didn&#039;t answer my question. I replied asking again for the slugs to the required plugins&lt;br /&gt;
# I could either sift through their sourcecode to find-out where it downloads its dependency plugins, or wait. In the interest of time, I&#039;m going to wait a bit longer&lt;br /&gt;
# ...&lt;br /&gt;
# let&#039;s proceed with the next wordpress site that&#039;s *not* using the oshine theme&lt;br /&gt;
# here&#039;s the active themes on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
[root@opensourceecology ~]# for wordpress_site in $wordpress_sites; do wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;; echo $wp_docroot; sudo -u wp -i wp --path=&amp;quot;${wp_docroot}&amp;quot; theme list 2&amp;gt;/dev/null;  done | grep -Ei &#039;^/var/| active &#039;&lt;br /&gt;
/var/www/html/www.openbuildinginstitute.org/htdocs&lt;br /&gt;
| oshin           | active   | available | 4.3.1   |&lt;br /&gt;
/var/www/html/oswh.opensourceecology.org/htdocs&lt;br /&gt;
| Eventor         | active   | none   | 1.7     |&lt;br /&gt;
/var/www/html/d3d.opensourceecology.org/htdocs&lt;br /&gt;
/var/www/html/microfactory.opensourceecology.org/htdocs&lt;br /&gt;
| oshin           | active   | none   | 6.5     |&lt;br /&gt;
/var/www/html/seedhome.openbuildinginstitute.org/htdocs&lt;br /&gt;
| twentyseventeen | active   | none   | 1.4     |&lt;br /&gt;
/var/www/html/staging.openbuildinginstitute.org/htdocs&lt;br /&gt;
| oshin           | active   | none      | 4.3.1   |&lt;br /&gt;
/var/www/html/store.opensourceecology.org/htdocs&lt;br /&gt;
| oshin           | active   | none   | 6.6.4.4 |&lt;br /&gt;
/var/www/html/fef.opensourceecology.org/htdocs&lt;br /&gt;
| simplephotoRes      | active   | none      | 2.0     |&lt;br /&gt;
/var/www/html/www.opensourceecology.org/htdocs&lt;br /&gt;
| enigmatic              | active   | none      | 3.5     |&lt;br /&gt;
/var/www/html/staging.opensourceecology.org/htdocs&lt;br /&gt;
| enigmatic              | active   | none      | 3.5     |&lt;br /&gt;
/var/www/html/3dp.opensourceecology.org/htdocs&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## d3d and 3dp didn&#039;t have output because they&#039;re  not a real sites (it was never setup and we&#039;re not migrating it; we used microfactory.opensourceecology.org as the vhost instead of d3d/3dp) &lt;br /&gt;
# the other sites we have that are *not* currently using oshine are:&lt;br /&gt;
## oswh.opensourceecology.org&lt;br /&gt;
## seedhome.openbuildinginstitute.org&lt;br /&gt;
## there&#039;s a couple staging sites, but I don&#039;t think I&#039;ll migrate these; better to use the staging server&lt;br /&gt;
## fef.opensourceecology.org&lt;br /&gt;
## www.opensourceecology.org, but this one we&#039;re migrating to oshine, so it doesn&#039;t count&lt;br /&gt;
# again, here&#039;s the order of our migrations&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1. forum.opensourceecology.org&lt;br /&gt;
2. store.opensourceecology.org&lt;br /&gt;
3. microfactory.opensourceecology.org&lt;br /&gt;
4. fef.opensourceecology.org&lt;br /&gt;
5. oswh.opensourceecology.org&lt;br /&gt;
6. seedhome.openbuildinginstitute.org&lt;br /&gt;
7. www.openbuildinginstitute.org&lt;br /&gt;
8. www.opensourceecology.org&lt;br /&gt;
9. phplist.opensourceecology.org&lt;br /&gt;
10. wiki.opensourceecology.org&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# therefore, let&#039;s first do fef, then oswh, then seedhome.&lt;br /&gt;
# here&#039;s the script to migrate fef from hetzner2 to hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner2 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;fef.opensourceecology.org&#039;&lt;br /&gt;
dbName=&#039;fef_db&#039;&lt;br /&gt;
 dbUser=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
 dbPass=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupDir_hetzner3=&amp;quot;/var/tmp/backups_for_migration_from_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupFileName_db_hetzner2=&amp;quot;mysqldump_${vhost_name}.${stamp}.sql.bz2&amp;quot;&lt;br /&gt;
backupFileName_files_hetzner2=&amp;quot;${vhost_name}_files.${stamp}.tar.gz&amp;quot;&lt;br /&gt;
vhostDir=&amp;quot;/var/www/html/${vhost_name}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 2: BACKUP DB&lt;br /&gt;
mkdir -p ${backupDir_hetzner2}/{current,old}&lt;br /&gt;
pushd ${backupDir_hetzner2}/current/&lt;br /&gt;
mv ${backupDir_hetzner2}/current/* ${backupDir_hetzner2}/old/&lt;br /&gt;
&lt;br /&gt;
time nice mysqldump -u&amp;quot;${dbUser}&amp;quot; -p&amp;quot;${dbPass}&amp;quot; ${dbName} | bzip2 -c &amp;gt; ${backupDir_hetzner2}/current/${backupFileName_db_hetzner2}&lt;br /&gt;
&lt;br /&gt;
# STEP 3: BACKUP FILES&lt;br /&gt;
time nice tar -czvf ${backupDir_hetzner2}/current/${backupFileName_files_hetzner2} ${vhostDir}&lt;br /&gt;
&lt;br /&gt;
# STEP 4: COPY TO HETZNER3&lt;br /&gt;
ssh -p 32415 maltfield@hetzner3 sudo mkdir -p ${backupDir_hetzner3}/{current,old}&lt;br /&gt;
ssh -p 32415 maltfield@hetzner3 sudo mv ${backupDir_hetzner3}/current/* ${backupDir_hetzner3}/old/&lt;br /&gt;
rsync -av --progress --rsync-path=&amp;quot;sudo rsync&amp;quot; -e &amp;quot;ssh -p 32415&amp;quot; ${backupDir_hetzner2}/current/* maltfield@hetzner3:${backupDir_hetzner3}/current/&lt;br /&gt;
&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner3 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;fef.opensourceecology.org&#039;&lt;br /&gt;
dbName=&#039;fef_db&#039;&lt;br /&gt;
 dbUser=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
 dbPass=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner3/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupDir_hetzner3=&amp;quot;/var/tmp/backups_for_migration_from_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupFileName_db_hetzner2=&amp;quot;mysqldump_${vhost_name}.${stamp}.sql.bz2&amp;quot;&lt;br /&gt;
backupFileName_files_hetzner2=&amp;quot;${vhost_name}_files.${stamp}.tar.gz&amp;quot;&lt;br /&gt;
vhostDir=&amp;quot;/var/www/html/${vhost_name}&amp;quot;&lt;br /&gt;
docrootDir=&amp;quot;${vhostDir}/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 1: ADD DB&lt;br /&gt;
&lt;br /&gt;
# create backup before we start changing the sql file&lt;br /&gt;
pushd ${backupDir_hetzner3}/current&lt;br /&gt;
cp ${backupFileName_db_hetzner2} ${backupFileName_db_hetzner2}.orig&lt;br /&gt;
&lt;br /&gt;
# extract .sql.bz2 -&amp;gt; .sql&lt;br /&gt;
bzip2 -dc ${backupFileName_db_hetzner2} &amp;gt; db.sql&lt;br /&gt;
&lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;DROP DATABASE IF EXISTS ${dbName};&amp;quot; &lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;CREATE DATABASE ${dbName}; USE ${dbName};&amp;quot;&lt;br /&gt;
 time nice mysql ${dbName} -uroot -p${mysqlPass} &amp;lt; &amp;quot;db.sql&amp;quot;&lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;GRANT ALL ON ${dbName}.* TO &#039;${dbUser}&#039;@&#039;localhost&#039; IDENTIFIED BY &#039;${dbPass}&#039;; FLUSH PRIVILEGES;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 2: Add vhost files&lt;br /&gt;
mv &amp;quot;${vhostDir}&amp;quot; &amp;quot;${backupDir_hetzner3}/old/${vhost_name}.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&amp;quot;&lt;br /&gt;
tar -xzvf ${backupFileName_files_hetzner2}&lt;br /&gt;
mv var/www/html/${vhost_name} ${vhostDir}&lt;br /&gt;
&lt;br /&gt;
# remove &#039;.svn&#039; dirs (we no longer use svn, for security)&lt;br /&gt;
find ${docrootDir} -iname &#039;.svn&#039; -exec rm -rf &#039;{}&#039; \;&lt;br /&gt;
&lt;br /&gt;
# add wordpress bug fix&lt;br /&gt;
# is the bug fix already present?&lt;br /&gt;
if  ! $(grep &#039;https://core.trac.wordpress.org/ticket/48693&#039; ${vhostDir}/wp-config.php) ; then&lt;br /&gt;
	# the bug fix is absent; add it&lt;br /&gt;
&lt;br /&gt;
	backup_filename=&amp;quot;wp-config.`date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;`.php&amp;quot;&lt;br /&gt;
	mv ${vhostDir}/wp-config.php ${vhostDir}/${backup_filename}&lt;br /&gt;
&lt;br /&gt;
	cat &amp;gt; ${vhostDir}/wp-config.php &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
&lt;br /&gt;
# fix wordpress bugs&lt;br /&gt;
# * https://core.trac.wordpress.org/ticket/48693&lt;br /&gt;
# * https://core.trac.wordpress.org/ticket/62693&lt;br /&gt;
if( ! function_exists(&#039;ini_set&#039;) ){&lt;br /&gt;
		function ini_set(){&lt;br /&gt;
				return;&lt;br /&gt;
		}&lt;br /&gt;
}&lt;br /&gt;
if( ! function_exists(&#039;chmod&#039;) ){&lt;br /&gt;
		function chmod(){&lt;br /&gt;
				return;&lt;br /&gt;
		}&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
	tail -n +2 ${vhostDir}/${backup_filename} &amp;gt;&amp;gt; ${vhostDir}/wp-config.php&lt;br /&gt;
&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
ls&lt;br /&gt;
vim ${vhostDir}/wp-config.php&lt;br /&gt;
&lt;br /&gt;
# UPDATE CORE&lt;br /&gt;
&lt;br /&gt;
rsync -av --progress /var/tmp/wordpress/core/wordpress/ ${docrootDir}&lt;br /&gt;
&lt;br /&gt;
# UPDATE OLD PLUGINS&lt;br /&gt;
&lt;br /&gt;
for plugin_path in $(find &amp;quot;${docrootDir}/wp-content/plugins&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
		plugin=$(basename &amp;quot;${plugin_path}&amp;quot;)&lt;br /&gt;
		source_path=&amp;quot;/var/tmp/wordpress/plugins/${plugin}&amp;quot;&lt;br /&gt;
        &lt;br /&gt;
		echo &amp;quot;${plugin}&amp;quot;&lt;br /&gt;
		rm -rf ${plugin_path};&lt;br /&gt;
		if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
				rsync -a ${source_path}/ &amp;quot;${plugin_path}/&amp;quot;&lt;br /&gt;
		fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# INSTALLL NEW PLUGINS&lt;br /&gt;
&lt;br /&gt;
new_plugins=&amp;quot;activitypub aurora-heatmap melapress-login-security wps-hide-login raw-html related-posts-by-taxonomy smart-slider-3 spam-destroyer coinpayments-payment-gateway-for-woocommerce woocommerce-gateway-stripe wpfront-notification-bar wordpress-seo wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed bulk-media-register enable-media-replace regenerate-thumbnails wp-qrcode wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed wp-2fa advanced-nocaptcha-recaptcha hcaptcha-for-forms-and-more leaflet-map extensions-leaflet-map wpforms-lite&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for plugin in ${new_plugins}; do&lt;br /&gt;
		plugin_path=&amp;quot;${docrootDir}/wp-content/plugins/${plugin}&amp;quot;&lt;br /&gt;
		source_path=&amp;quot;/var/tmp/wordpress/plugins/${plugin}&amp;quot;&lt;br /&gt;
        &lt;br /&gt;
		if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
				echo &amp;quot;${plugin}&amp;quot;&lt;br /&gt;
				rm -rf ${plugin_path};&lt;br /&gt;
				rsync -a ${source_path}/ &amp;quot;${plugin_path}/&amp;quot;&lt;br /&gt;
		fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# UPDATE/INSTALL THEMES&lt;br /&gt;
&lt;br /&gt;
for theme_path in $(find &amp;quot;${docrootDir}/wp-content/themes&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
	theme=$(basename &amp;quot;${theme_path}&amp;quot;)&lt;br /&gt;
	source_path=&amp;quot;/var/tmp/wordpress/themes/${theme}&amp;quot;&lt;br /&gt;
	&lt;br /&gt;
	echo &amp;quot;${theme}&amp;quot;&lt;br /&gt;
	rm -rf ${theme_path};&lt;br /&gt;
	if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
		rsync -a ${source_path}/ &amp;quot;${theme_path}/&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# SET PERMISSIONS&lt;br /&gt;
&lt;br /&gt;
# first pass, whole site&lt;br /&gt;
chown -R not-apache:www-data &amp;quot;/var/www/html&amp;quot;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
#############&lt;br /&gt;
# WORDPRESS #&lt;br /&gt;
#############&lt;br /&gt;
&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
	chown not-apache:apache-admins &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
	chmod 0040 &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
###########&lt;br /&gt;
# phpList #&lt;br /&gt;
###########&lt;br /&gt;
&lt;br /&gt;
phplist_sites=&amp;quot;$(find /var/www/html -maxdepth 1 -type d -iname *phplist*)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for vhost_dir in $phplist_sites; do&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type d -exec chmod 0050 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type f -exec chmod 0040 {} \;; done&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do [ -d &amp;quot;${dir}/public_html/uploadimages&amp;quot; ] || mkdir &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type f -exec chmod 0660 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type d -exec chmod 0770 {} \;; done&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# ACTIVATE NEW PLUGINS&lt;br /&gt;
&lt;br /&gt;
activate_plugins=&amp;quot;activitypub aurora-heatmap melapress-login-security&amp;quot;&lt;br /&gt;
for plugin in ${activate_plugins}; do&lt;br /&gt;
	sudo -u wp -i wp --path=&amp;quot;${docrootDir}&amp;quot; plugin activate ${plugin}&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# as with yesterday, I then uncommented the fef host in the ansible provision.yml file and re-ran ansible-playbook&lt;br /&gt;
# then on the server, I restarted the web services&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/fef.opensourceecology.org_20241226/current # systemctl restart apache2 varnish nginx&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/fef.opensourceecology.org_20241226/current # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# then I updated my local comp&#039;s /etc/hosts&lt;br /&gt;
# I loaded the site, updated the db with the WUI&lt;br /&gt;
# I&#039;m able to load the admin dashboard, but when I try to load the frontpage of the site on hetzner3, I just get an error box that says&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
The theme directory &amp;quot;simplephotoRes&amp;quot; does not exist.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# in the admin dashboard, if I click &amp;quot;Appearance&amp;quot; -&amp;gt; &amp;quot;Themes&amp;quot;, I get an error at the top&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
The active theme is broken. Reverting to the default theme.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the themes dir on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology current]# ls -lah /var/www/html/fef.opensourceecology.org/htdocs/wp-content/themes/&lt;br /&gt;
total 72K&lt;br /&gt;
d---r-x--- 17 not-apache apache 4.0K Jan  3  2018 .&lt;br /&gt;
d---r-x---  6 not-apache apache 4.0K Oct  3  2018 ..&lt;br /&gt;
d---r-x---  4 not-apache apache 4.0K Aug 16  2015 ArtWorksResponsive&lt;br /&gt;
d---r-x---  8 not-apache apache 4.0K Aug 16  2015 gk-portfolio&lt;br /&gt;
d---r-x--- 11 not-apache apache 4.0K Aug 16  2015 gridsby&lt;br /&gt;
d---r-x---  4 not-apache apache 4.0K Aug 16  2015 gridthemeresponsive&lt;br /&gt;
----r-----  1 not-apache apache   28 Jun  5  2014 index.php&lt;br /&gt;
d---r-x---  9 not-apache apache 4.0K Aug 16  2015 portfolio-press&lt;br /&gt;
d---r-x---  4 not-apache apache 4.0K Aug 16  2015 simplephotoRes&lt;br /&gt;
d---r-x---  7 not-apache apache 4.0K Aug 16  2015 sketch&lt;br /&gt;
d---r-x---  7 not-apache apache 4.0K Jan  3  2018 twentyeleven&lt;br /&gt;
d---r-x---  7 not-apache apache 4.0K May  7  2015 twentyfifteen&lt;br /&gt;
d---r-x---  9 not-apache apache 4.0K May  7  2015 twentyfourteen&lt;br /&gt;
d---r-x---  5 not-apache apache 4.0K Jan  3  2018 twentyseventeen&lt;br /&gt;
d---r-x---  7 not-apache apache 4.0K Jan  3  2018 twentysixteen&lt;br /&gt;
d---r-x---  4 not-apache apache 4.0K Jan  3  2018 twentyten&lt;br /&gt;
d---r-x---  8 not-apache apache 4.0K May  7  2015 twentythirteen&lt;br /&gt;
d---r-x---  6 not-apache apache 4.0K Jan  3  2018 twentytwelve&lt;br /&gt;
[root@opensourceecology current]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and on hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/fef.opensourceecology.org_20241226/current # ls -lah /var/www/html/fef.opensourceecology.org/htdocs/wp-content/themes/&lt;br /&gt;
total 56K&lt;br /&gt;
d---r-x--- 13 not-apache www-data 4,0K Dec 26 18:05 .&lt;br /&gt;
d---r-x---  7 not-apache www-data 4,0K Dec 26 18:10 ..&lt;br /&gt;
d---r-x---  8 not-apache www-data 4,0K May 27  2016 gk-portfolio&lt;br /&gt;
----r-----  1 not-apache www-data   28 Jun  5  2014 index.php&lt;br /&gt;
d---r-x---  9 not-apache www-data 4,0K Dec 26  2018 portfolio-press&lt;br /&gt;
d---r-x---  7 not-apache www-data 4,0K Mar  5  2018 sketch&lt;br /&gt;
d---r-x---  7 not-apache www-data 4,0K Jul 16 13:09 twentyeleven&lt;br /&gt;
d---r-x---  7 not-apache www-data 4,0K Jul 16 13:28 twentyfifteen&lt;br /&gt;
d---r-x---  9 not-apache www-data 4,0K Jul 16 13:23 twentyfourteen&lt;br /&gt;
d---r-x---  5 not-apache www-data 4,0K Jul 16 13:29 twentyseventeen&lt;br /&gt;
d---r-x---  8 not-apache www-data 4,0K Jul 16 13:29 twentysixteen&lt;br /&gt;
d---r-x---  4 not-apache www-data 4,0K Jul 15 17:17 twentyten&lt;br /&gt;
d---r-x---  8 not-apache www-data 4,0K Jul 16 13:20 twentythirteen&lt;br /&gt;
d---r-x---  8 not-apache www-data 4,0K Jul 16 13:17 twentytwelve&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/fef.opensourceecology.org_20241226/current # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so, yeah, that &#039;simplephotoRes&#039; dir is missing&lt;br /&gt;
# worse, I get no results searching for this theme on wordpress.org https://wordpress.org/themes/search/simplephotoRes/&lt;br /&gt;
# looks like my notes from Sep show that I was unable to download this theme https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024_Q3#Thr_Sep_26.2C_2024&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
portfolio-press&lt;br /&gt;
		https://downloads.wordpress.org/theme/portfolio-press.2.8.0.zip&lt;br /&gt;
simplephotoRes&lt;br /&gt;
		null&lt;br /&gt;
		 Invalid slug provided&lt;br /&gt;
		 null&lt;br /&gt;
sketch&lt;br /&gt;
		https://downloads.wordpress.org/theme/sketch.1.2.4.zip&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# 0 results on ddg https://duckduckgo.com/?q=%22simplephotoRes%22+%22wordpress%22&amp;amp;t=ftsa&amp;amp;ia=web&lt;br /&gt;
# if I login to the old fef site&#039;s wp admin dashboard, and go to &amp;quot;Appearance&amp;quot; -&amp;gt; &amp;quot;Themes&amp;quot;, it lists the plugin&#039;s details:&lt;br /&gt;
## Human-readable plugin name = &amp;quot;Simple Photo Responsive&amp;quot;&lt;br /&gt;
## Version = 2.0&lt;br /&gt;
## Author = Marios Lublinski&lt;br /&gt;
## Author website = www.dessign.net&lt;br /&gt;
# I went to that website, but it looks like a scam. Lots of AI crap. Maybe the author sold it to some AI scam landing page?&lt;br /&gt;
# I checked the style.css file directly on hetzner2, which includes a URL for the theme&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology simplephotoRes]$ head style.css &lt;br /&gt;
/*&lt;br /&gt;
&lt;br /&gt;
Theme Name: Simple Photo Responsive&lt;br /&gt;
&lt;br /&gt;
Theme URI: http://www.dessign.net/simplephototheme/&lt;br /&gt;
&lt;br /&gt;
Description: Simple Photo Theme for WordPress is stylish, customizable, simple, and readable. Perfect for any illustrator, designer and blogger. &lt;br /&gt;
&lt;br /&gt;
Version: 2.0&lt;br /&gt;
&lt;br /&gt;
[maltfield@opensourceecology simplephotoRes]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# unfortunately, that URL simply redirects to some totally unrelated page about a podcast about AI (scam?)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp1974:~$ curl -i http://www.dessign.net/simplephototheme/&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Thu, 26 Dec 2024 18:41:17 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://www.dessign.net/simplephototheme/&lt;br /&gt;
X-ac: 2.hhn _atomic_ams BYPASS&lt;br /&gt;
Alt-Svc: h3=&amp;quot;:443&amp;quot;; ma=86400&lt;br /&gt;
&lt;br /&gt;
&amp;lt;html&amp;gt;&lt;br /&gt;
&amp;lt;head&amp;gt;&amp;lt;title&amp;gt;301 Moved Permanently&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;&lt;br /&gt;
&amp;lt;body&amp;gt;&lt;br /&gt;
&amp;lt;center&amp;gt;&amp;lt;h1&amp;gt;301 Moved Permanently&amp;lt;/h1&amp;gt;&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;hr&amp;gt;&amp;lt;center&amp;gt;nginx&amp;lt;/center&amp;gt;&lt;br /&gt;
&amp;lt;/body&amp;gt;&lt;br /&gt;
&amp;lt;/html&amp;gt;&lt;br /&gt;
user@disp1974:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, I deleted the theme in the above steps because we didn&#039;t have an up-to-date theme to replace it.&lt;br /&gt;
# the risk is high that a theme that hasn&#039;t been updated could just break due to incompatibility with wordpress core, other plugins, or php (we saw this with oshine on our other sites iirc)&lt;br /&gt;
# of course, there&#039;s an even higher risk that there&#039;s an unpatched security vulnerability with running an old theme that&#039;s not updated&lt;br /&gt;
# I want to know how old this thing is. I searched the whole repo for dates. Looks like it includes a version of jquery that was published in Mar 2012, so this theme is probably over 12 years without updates D:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology simplephotoRes]# grep -irE &#039;[0-9]{4}&#039;&lt;br /&gt;
Binary file images/logo.png matches&lt;br /&gt;
Binary file images/slide-prev.png matches&lt;br /&gt;
Binary file images/slide-next.png matches&lt;br /&gt;
Binary file images/bg-img.jpg matches&lt;br /&gt;
Binary file images/menu-divider.jpg matches&lt;br /&gt;
Binary file images/google-plus-icon.png matches&lt;br /&gt;
Binary file images/dribbble-icon.png matches&lt;br /&gt;
Binary file images/search-icon.jpg matches&lt;br /&gt;
Binary file images/twitter-icon.png matches&lt;br /&gt;
Binary file images/facebook-icon.png matches&lt;br /&gt;
Binary file images/pinterest-icon.png matches&lt;br /&gt;
footer.php:        © 2012 Simple Photo WordPress. Design by &amp;lt;a href=&amp;quot;http://www.dessign.net&amp;quot;&amp;gt;Dessign&amp;lt;/a&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
header.php:&amp;lt;html xmlns=&amp;quot;http://www.w3.org/1999/xhtml&amp;quot; xmlns:v=&amp;quot;urn:schemas-microsoft-com:vml&amp;quot;&amp;gt;&lt;br /&gt;
header.php:      start_custom_slider(&#039;5000&#039;);&lt;br /&gt;
settings.php:  &amp;lt;td&amp;gt;&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;custom_background_color&amp;quot; class=&amp;quot;ss_text&amp;quot; value=&amp;quot;&amp;lt;?php echo stripslashes(stripslashes(get_option($shortname.&#039;_custom_background_color&#039;,&#039;&#039;))); ?&amp;gt;&amp;quot; /&amp;gt; &amp;lt;small&amp;gt;e.g.: #27292a&amp;lt;/small&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
settings.php:  &amp;lt;td&amp;gt;&amp;lt;input type=&amp;quot;text&amp;quot; name=&amp;quot;slideshow_timeout&amp;quot; class=&amp;quot;ss_text&amp;quot; value=&amp;quot;&amp;lt;?php echo stripslashes(stripslashes(get_option($shortname.&#039;_slideshow_timeout&#039;,&#039;&#039;))); ?&amp;gt;&amp;quot; /&amp;gt; &amp;lt;small&amp;gt;e.g.: 7000&amp;lt;/small&amp;gt;&amp;lt;/td&amp;gt;&lt;br /&gt;
style.css:#commentform input[type=submit] { background-color: #161616; color: #fff; border: 1px solid #6E6E6E; padding: 3px 5px; }&lt;br /&gt;
js/_notes/dwsync.xml:&amp;lt;file name=&amp;quot;jquery-latest.js&amp;quot; server=&amp;quot;dessign.net//httpdocs/&amp;quot; local=&amp;quot;129896034000000000&amp;quot; remote=&amp;quot;129958450800000000&amp;quot; /&amp;gt;&lt;br /&gt;
js/_notes/dwsync.xml:&amp;lt;file name=&amp;quot;scripts.js&amp;quot; server=&amp;quot;dessign.net//httpdocs/&amp;quot; local=&amp;quot;129900749400000000&amp;quot; remote=&amp;quot;129958450800000000&amp;quot; /&amp;gt;&lt;br /&gt;
js/jquery-latest.js: * Copyright 2011, John Resig&lt;br /&gt;
js/jquery-latest.js: * Copyright 2011, The Dojo Foundation&lt;br /&gt;
js/jquery-latest.js: * Date: Wed Mar 21 12:46:34 2012 -0700&lt;br /&gt;
js/jquery-latest.js:    // Prioritize #id over &amp;lt;tag&amp;gt; to avoid XSS via location.hash (#9521)&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# anyway, let&#039;s see if it&#039;s at all possible to use this theme on the latest version of wp and php. I&#039;ll copy it in manually from the files from hetzer2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/fef.opensourceecology.org_20241226/current # tar -xzvf ${backupFileName_files_hetzner2}&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/fef.opensourceecology.org_20241226/current # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/fef.opensourceecology.org_20241226/current # rsync -av --progress var/www/html/fef.opensourceecology.org/htdocs/wp-content/themes/simplephotoRes /var/www/html/fef.opensourceecology.org/htdocs/wp-content/themes/&lt;br /&gt;
...&lt;br /&gt;
sent 442.547 bytes  received 557 bytes  886.208,00 bytes/sec&lt;br /&gt;
total size is 440.589  speedup is 0,99&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/fef.opensourceecology.org_20241226/current # &lt;br /&gt;
# SET PERMISSIONS&lt;br /&gt;
&lt;br /&gt;
# first pass, whole site&lt;br /&gt;
chown -R not-apache:www-data &amp;quot;/var/www/html&amp;quot;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
#############&lt;br /&gt;
# WORDPRESS #&lt;br /&gt;
#############&lt;br /&gt;
&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
	chown not-apache:apache-admins &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
	chmod 0040 &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
###########&lt;br /&gt;
# phpList #&lt;br /&gt;
###########&lt;br /&gt;
&lt;br /&gt;
phplist_sites=&amp;quot;$(find /var/www/html -maxdepth 1 -type d -iname *phplist*)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for vhost_dir in $phplist_sites; do&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type d -exec chmod 0050 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type f -exec chmod 0040 {} \;; done&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do [ -d &amp;quot;${dir}/public_html/uploadimages&amp;quot; ] || mkdir &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type f -exec chmod 0660 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type d -exec chmod 0770 {} \;; done&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# wordpress already deactivated the missing theme, so I had to go-in and reactivate it. As soon as I clicked &amp;quot;Activate&amp;quot;, I got a critical error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
There has been a critical error on this website. Please check your site admin email inbox for instructions.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I literally can&#039;t even load anything on the site anymore; the whole wp admin dashboard area is broken&lt;br /&gt;
# if I tail the logs, then I see why&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
==&amp;gt; fef.opensourceecology.org/error.log &amp;lt;==&lt;br /&gt;
[Thu Dec 26 19:04:48.907704 2024] [proxy_fcgi:error] [pid 1904236:tid 1904282] [client 185.213.155.175:0] AH01071: Got error &#039;PHP message: PHP Fatal error:  Uncaught ArgumentCountError: Too few arguments to function WP_Widget::construct(), 0 passed in /var/www/html/fef.opensourceecology.org/htdocs/wp-includes/class-wp-widget-factory.php on line 62 and at least 2 expected in /var/www/html/fef.opensourceecology.org/htdocs/wp-includes/class-wp-widget.php:163\nStack trace:\n#0 /var/www/html/fef.opensourceecology.org/htdocs/wp-includes/class-wp-widget-factory.php(62): WP_Widget-&amp;gt;construct()\n#1 /var/www/html/fef.opensourceecology.org/htdocs/wp-includes/widgets.php(123): WP_Widget_Factory-&amp;gt;register()\n#2 /var/www/html/fef.opensourceecology.org/htdocs/wp-content/themes/simplephotoRes/functions.php(227): register_widget()\n#3 /var/www/html/fef.opensourceecology.org/htdocs/wp-settings.php(668): include(&#039;...&#039;)\n#4 /var/www/html/fef.opensourceecology.org/wp-config.php(112): require_once(&#039;...&#039;)\n#5 /var/www/html/fef.opensourceecology.org/htdocs/wp-load.php(55): require_once(&#039;...&#039;)\n#6 /var/www/html/fef.opensourceecology.org/htdocs/wp-ad...&#039;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, I&#039;d say this theme is totally not going to work with the latest version of wordpress.&lt;br /&gt;
# I sent an email to Marcin &amp;amp; Catarina asking what theme they want to use for fef on hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hi Marcin,&lt;br /&gt;
Hi Catarina,&lt;br /&gt;
&lt;br /&gt;
I&#039;m sorry to inform you that you need to find a new theme for your fef website:&lt;br /&gt;
&lt;br /&gt;
 * https://fef.opensourceecology.org/&lt;br /&gt;
&lt;br /&gt;
The fef website uses the theme &#039;simplephotoRes&#039; = &amp;quot;Simple Photo Responsive&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Unless you can find otherwise, I can&#039;t find any information about this theme. The theme doesn&#039;t exist on wordpress.org (moreover, it uses an invalid slug name)&lt;br /&gt;
&lt;br /&gt;
 * https://wordpress.org/themes/simplephotoRes/&lt;br /&gt;
&lt;br /&gt;
The theme&#039;s styles.css file says it was developed by &amp;quot;Marios Lublinski&amp;quot;. The URL of the theme is = http://www.dessign.net/simplephototheme/.&lt;br /&gt;
&lt;br /&gt;
That website (dessign.net) just looks like a landing page full of scam AI products, and the page specific to the theme just redirects to some unrelated podcast about AI.&lt;br /&gt;
&lt;br /&gt;
Because I can&#039;t find any info about this theme anywhere on the Internet, I don&#039;t know exactly when it was last updated, but I did find a version of jquery in it with a version that was released in March 2012, so I think it&#039;s safe to say that this theme hasn&#039;t been updated in over 12 years.&lt;br /&gt;
&lt;br /&gt;
It&#039;s generally a bad idea to use themes and plugins that are not popular and not updated, because it could lead to the theme breaking with future versions of wordpress, your other plugins, or future versions of PHP. Not to mention that it could have security vulnerabilities. In fact, I did come across a known security vulnerability in one of the plugins that oshine requires you use (tatsu). I think the only reason you weren&#039;t hacked is because I prevent wordpress from being able to edit its own files, and your web server&#039;s newly uploaded files cannot execute PHP. This breaks a lot of wordpress admin actions, but it also is probably the reason your site is still online, despite not having been updated in many years.&lt;br /&gt;
&lt;br /&gt;
Anyway, I did try to migrate this old theme to your new server, and it does cause a critical error; it appears to be incompatible with the latest version of wordpress. It&#039;s not going to work on hetzner3.&lt;br /&gt;
&lt;br /&gt;
I highly recommend only using themes that can be downloaded directly from wordpress.org/plugins/, with hundreds of thousands of active installations, and that are actively developed.&lt;br /&gt;
&lt;br /&gt;
Please let me know what theme you&#039;d like me to install &amp;amp; activate for fef on hetzner3.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, well, I&#039;d say fef is stuck now pending their response&lt;br /&gt;
# ...&lt;br /&gt;
# our next wp site on the list (which doesn&#039;t run oshine; which we&#039;re also stuck on) is oswh.opensourceecology.org&lt;br /&gt;
# here&#039;s our script for for our initial creation of oswh from hetzern2 to hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner2 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;oswh.opensourceecology.org&#039;&lt;br /&gt;
dbName=&#039;oswh_db&#039;&lt;br /&gt;
 dbUser=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
 dbPass=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupDir_hetzner3=&amp;quot;/var/tmp/backups_for_migration_from_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupFileName_db_hetzner2=&amp;quot;mysqldump_${vhost_name}.${stamp}.sql.bz2&amp;quot;&lt;br /&gt;
backupFileName_files_hetzner2=&amp;quot;${vhost_name}_files.${stamp}.tar.gz&amp;quot;&lt;br /&gt;
vhostDir=&amp;quot;/var/www/html/${vhost_name}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 2: BACKUP DB&lt;br /&gt;
mkdir -p ${backupDir_hetzner2}/{current,old}&lt;br /&gt;
pushd ${backupDir_hetzner2}/current/&lt;br /&gt;
mv ${backupDir_hetzner2}/current/* ${backupDir_hetzner2}/old/&lt;br /&gt;
&lt;br /&gt;
time nice mysqldump -u&amp;quot;${dbUser}&amp;quot; -p&amp;quot;${dbPass}&amp;quot; ${dbName} | bzip2 -c &amp;gt; ${backupDir_hetzner2}/current/${backupFileName_db_hetzner2}&lt;br /&gt;
&lt;br /&gt;
# STEP 3: BACKUP FILES&lt;br /&gt;
time nice tar -czvf ${backupDir_hetzner2}/current/${backupFileName_files_hetzner2} ${vhostDir}&lt;br /&gt;
&lt;br /&gt;
# STEP 4: COPY TO HETZNER3&lt;br /&gt;
ssh -p 32415 maltfield@hetzner3 sudo mkdir -p ${backupDir_hetzner3}/{current,old}&lt;br /&gt;
ssh -p 32415 maltfield@hetzner3 sudo mv ${backupDir_hetzner3}/current/* ${backupDir_hetzner3}/old/&lt;br /&gt;
rsync -av --progress --rsync-path=&amp;quot;sudo rsync&amp;quot; -e &amp;quot;ssh -p 32415&amp;quot; ${backupDir_hetzner2}/current/* maltfield@hetzner3:${backupDir_hetzner3}/current/&lt;br /&gt;
&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner3 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;oswh.opensourceecology.org&#039;&lt;br /&gt;
dbName=&#039;oswh_db&#039;&lt;br /&gt;
 dbUser=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
 dbPass=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner3/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupDir_hetzner3=&amp;quot;/var/tmp/backups_for_migration_from_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupFileName_db_hetzner2=&amp;quot;mysqldump_${vhost_name}.${stamp}.sql.bz2&amp;quot;&lt;br /&gt;
backupFileName_files_hetzner2=&amp;quot;${vhost_name}_files.${stamp}.tar.gz&amp;quot;&lt;br /&gt;
vhostDir=&amp;quot;/var/www/html/${vhost_name}&amp;quot;&lt;br /&gt;
docrootDir=&amp;quot;${vhostDir}/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 1: ADD DB&lt;br /&gt;
&lt;br /&gt;
# create backup before we start changing the sql file&lt;br /&gt;
pushd ${backupDir_hetzner3}/current&lt;br /&gt;
cp ${backupFileName_db_hetzner2} ${backupFileName_db_hetzner2}.orig&lt;br /&gt;
&lt;br /&gt;
# extract .sql.bz2 -&amp;gt; .sql&lt;br /&gt;
bzip2 -dc ${backupFileName_db_hetzner2} &amp;gt; db.sql&lt;br /&gt;
&lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;DROP DATABASE IF EXISTS ${dbName};&amp;quot; &lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;CREATE DATABASE ${dbName}; USE ${dbName};&amp;quot;&lt;br /&gt;
 time nice mysql ${dbName} -uroot -p${mysqlPass} &amp;lt; &amp;quot;db.sql&amp;quot;&lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;GRANT ALL ON ${dbName}.* TO &#039;${dbUser}&#039;@&#039;localhost&#039; IDENTIFIED BY &#039;${dbPass}&#039;; FLUSH PRIVILEGES;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 2: Add vhost files&lt;br /&gt;
mv &amp;quot;${vhostDir}&amp;quot; &amp;quot;${backupDir_hetzner3}/old/${vhost_name}.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&amp;quot;&lt;br /&gt;
tar -xzvf ${backupFileName_files_hetzner2}&lt;br /&gt;
mv var/www/html/${vhost_name} ${vhostDir}&lt;br /&gt;
&lt;br /&gt;
# remove &#039;.svn&#039; dirs (we no longer use svn, for security)&lt;br /&gt;
find ${docrootDir} -iname &#039;.svn&#039; -exec rm -rf &#039;{}&#039; \;&lt;br /&gt;
&lt;br /&gt;
# add wordpress bug fix&lt;br /&gt;
# is the bug fix already present?&lt;br /&gt;
if  ! $(grep &#039;https://core.trac.wordpress.org/ticket/48693&#039; ${vhostDir}/wp-config.php) ; then&lt;br /&gt;
	# the bug fix is absent; add it&lt;br /&gt;
&lt;br /&gt;
	backup_filename=&amp;quot;wp-config.`date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;`.php&amp;quot;&lt;br /&gt;
	mv ${vhostDir}/wp-config.php ${vhostDir}/${backup_filename}&lt;br /&gt;
&lt;br /&gt;
	cat &amp;gt; ${vhostDir}/wp-config.php &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
&lt;br /&gt;
# fix wordpress bugs&lt;br /&gt;
# * https://core.trac.wordpress.org/ticket/48693&lt;br /&gt;
# * https://core.trac.wordpress.org/ticket/62693&lt;br /&gt;
if( ! function_exists(&#039;ini_set&#039;) ){&lt;br /&gt;
		function ini_set(){&lt;br /&gt;
				return;&lt;br /&gt;
		}&lt;br /&gt;
}&lt;br /&gt;
if( ! function_exists(&#039;chmod&#039;) ){&lt;br /&gt;
		function chmod(){&lt;br /&gt;
				return;&lt;br /&gt;
		}&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
	tail -n +2 ${vhostDir}/${backup_filename} &amp;gt;&amp;gt; ${vhostDir}/wp-config.php&lt;br /&gt;
&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
ls&lt;br /&gt;
vim ${vhostDir}/wp-config.php&lt;br /&gt;
&lt;br /&gt;
# UPDATE CORE&lt;br /&gt;
&lt;br /&gt;
rsync -av --progress /var/tmp/wordpress/core/wordpress/ ${docrootDir}&lt;br /&gt;
&lt;br /&gt;
# UPDATE OLD PLUGINS&lt;br /&gt;
&lt;br /&gt;
for plugin_path in $(find &amp;quot;${docrootDir}/wp-content/plugins&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
		plugin=$(basename &amp;quot;${plugin_path}&amp;quot;)&lt;br /&gt;
		source_path=&amp;quot;/var/tmp/wordpress/plugins/${plugin}&amp;quot;&lt;br /&gt;
        &lt;br /&gt;
		echo &amp;quot;${plugin}&amp;quot;&lt;br /&gt;
		rm -rf ${plugin_path};&lt;br /&gt;
		if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
				rsync -a ${source_path}/ &amp;quot;${plugin_path}/&amp;quot;&lt;br /&gt;
		fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# INSTALLL NEW PLUGINS&lt;br /&gt;
&lt;br /&gt;
new_plugins=&amp;quot;activitypub aurora-heatmap melapress-login-security wps-hide-login raw-html related-posts-by-taxonomy smart-slider-3 spam-destroyer coinpayments-payment-gateway-for-woocommerce woocommerce-gateway-stripe wpfront-notification-bar wordpress-seo wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed bulk-media-register enable-media-replace regenerate-thumbnails wp-qrcode wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed wp-2fa advanced-nocaptcha-recaptcha hcaptcha-for-forms-and-more leaflet-map extensions-leaflet-map wpforms-lite&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for plugin in ${new_plugins}; do&lt;br /&gt;
		plugin_path=&amp;quot;${docrootDir}/wp-content/plugins/${plugin}&amp;quot;&lt;br /&gt;
		source_path=&amp;quot;/var/tmp/wordpress/plugins/${plugin}&amp;quot;&lt;br /&gt;
        &lt;br /&gt;
		if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
				echo &amp;quot;${plugin}&amp;quot;&lt;br /&gt;
				rm -rf ${plugin_path};&lt;br /&gt;
				rsync -a ${source_path}/ &amp;quot;${plugin_path}/&amp;quot;&lt;br /&gt;
		fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# UPDATE/INSTALL THEMES&lt;br /&gt;
&lt;br /&gt;
for theme_path in $(find &amp;quot;${docrootDir}/wp-content/themes&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
	theme=$(basename &amp;quot;${theme_path}&amp;quot;)&lt;br /&gt;
	source_path=&amp;quot;/var/tmp/wordpress/themes/${theme}&amp;quot;&lt;br /&gt;
	&lt;br /&gt;
	echo &amp;quot;${theme}&amp;quot;&lt;br /&gt;
	rm -rf ${theme_path};&lt;br /&gt;
	if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
		rsync -a ${source_path}/ &amp;quot;${theme_path}/&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# SET PERMISSIONS&lt;br /&gt;
&lt;br /&gt;
# first pass, whole site&lt;br /&gt;
chown -R not-apache:www-data &amp;quot;/var/www/html&amp;quot;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
#############&lt;br /&gt;
# WORDPRESS #&lt;br /&gt;
#############&lt;br /&gt;
&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
	chown not-apache:apache-admins &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
	chmod 0040 &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
###########&lt;br /&gt;
# phpList #&lt;br /&gt;
###########&lt;br /&gt;
&lt;br /&gt;
phplist_sites=&amp;quot;$(find /var/www/html -maxdepth 1 -type d -iname *phplist*)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for vhost_dir in $phplist_sites; do&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type d -exec chmod 0050 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type f -exec chmod 0040 {} \;; done&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do [ -d &amp;quot;${dir}/public_html/uploadimages&amp;quot; ] || mkdir &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type f -exec chmod 0660 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type d -exec chmod 0770 {} \;; done&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# ACTIVATE NEW PLUGINS&lt;br /&gt;
&lt;br /&gt;
activate_plugins=&amp;quot;activitypub aurora-heatmap melapress-login-security&amp;quot;&lt;br /&gt;
for plugin in ${activate_plugins}; do&lt;br /&gt;
	sudo -u wp -i wp --path=&amp;quot;${docrootDir}&amp;quot; plugin activate ${plugin}&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# as before, I ran ^ that, fixed permissions, pushed with ansible, restarted web services, updated my /etc/hosts, logged-in, and did a db upgrade in the wui&lt;br /&gt;
# after that, we have the same issue as fef; theme doesn&#039;t exist (this time it&#039;s a different theme)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
The theme directory &amp;quot;Eventor&amp;quot; does not exist.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# thee doesn&#039;t exist on wordpress.org https://wordpress.org/themes/eventor/&lt;br /&gt;
# here&#039;s the theme info (taken from hetzner2)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology themes]# head Eventor/style.css &lt;br /&gt;
/*&lt;br /&gt;
Theme Name: Eventor&lt;br /&gt;
Support URI: http://www.themeskingdom.com/support/&lt;br /&gt;
Description: &lt;br /&gt;
Author: Themeskingdom&lt;br /&gt;
Author URI: http://www.themeskingdom.com/support/&lt;br /&gt;
Version: 1.7&lt;br /&gt;
License: GNU General Public License v2.0&lt;br /&gt;
License URI: http://www.gnu.org/licenses/gpl-2.0.html&lt;br /&gt;
Theme URI: http://www.themeskingdom.com/&lt;br /&gt;
[root@opensourceecology themes]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# man, that website is shit. I couldn&#039;t find &amp;quot;eventor&amp;quot; in the author&#039;s site (probably I could if I clicked a thousand times; there&#039;s no way to ctrl+f for it; you have to mouseover an image, and it&#039;s javascript so you have to click and click and click to load more to view all of the themes available)&lt;br /&gt;
# I ended-up giving up and manually typing this URL-out with trial-and-error, and I found the theme&#039;s page https://themeskingdom.com/purchase-options/eventor/&lt;br /&gt;
# alright, so it&#039;s a paid theme with no public download link&lt;br /&gt;
# I sent an email to Marcin &amp;amp; Catarina asking if they had license info&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hi Marcin,&lt;br /&gt;
Hi Catarina,&lt;br /&gt;
&lt;br /&gt;
Do you have any license info for the &amp;quot;Eventor&amp;quot; theme?&lt;br /&gt;
&lt;br /&gt;
It looks like oswh uses a paid theme named &amp;quot;eventor&amp;quot;&lt;br /&gt;
&lt;br /&gt;
 * https://oswh.opensourceecology.org/&lt;br /&gt;
&lt;br /&gt;
I was unable to download the latest version of the Eventor theme from wordpress.org&lt;br /&gt;
&lt;br /&gt;
 * https://wordpress.org/themes/eventor/&lt;br /&gt;
&lt;br /&gt;
I did manage to find the private website for the theme here:&lt;br /&gt;
&lt;br /&gt;
 * https://themeskingdom.com/purchase-options/eventor/&lt;br /&gt;
&lt;br /&gt;
Somehow it says that it&#039;s $35 for a lifetime plan and $59 for a one-year plan. That doesn&#039;t make sense.&lt;br /&gt;
&lt;br /&gt;
Anyway, can you please forward me any info I&#039;d need to download the latest version of this theme?&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also sent an email to their support, CCing Marcin &amp;amp; Catarina&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hello,&lt;br /&gt;
&lt;br /&gt;
Where can I download the latest version of your Eventor theme?&lt;br /&gt;
&lt;br /&gt;
We purchased a copy of your Eventor theme years ago, and we need to download the latest version.&lt;br /&gt;
&lt;br /&gt;
Please let us know where we can download the latest version of the Eventor theme.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# allright, so now we&#039;re stuck on oswh too&lt;br /&gt;
# ...&lt;br /&gt;
# our next website (that doesn&#039;t use oshine) is seedhome.openbuildinginstitute.org&lt;br /&gt;
# here&#039;s our script to copy it from hetzner2 to hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner2 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;seedhome.openbuildinginstitute.org&#039;&lt;br /&gt;
dbName=&#039;seedhome_db&#039;&lt;br /&gt;
 dbUser=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
 dbPass=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupDir_hetzner3=&amp;quot;/var/tmp/backups_for_migration_from_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupFileName_db_hetzner2=&amp;quot;mysqldump_${vhost_name}.${stamp}.sql.bz2&amp;quot;&lt;br /&gt;
backupFileName_files_hetzner2=&amp;quot;${vhost_name}_files.${stamp}.tar.gz&amp;quot;&lt;br /&gt;
vhostDir=&amp;quot;/var/www/html/${vhost_name}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 2: BACKUP DB&lt;br /&gt;
mkdir -p ${backupDir_hetzner2}/{current,old}&lt;br /&gt;
pushd ${backupDir_hetzner2}/current/&lt;br /&gt;
mv ${backupDir_hetzner2}/current/* ${backupDir_hetzner2}/old/&lt;br /&gt;
&lt;br /&gt;
time nice mysqldump -u&amp;quot;${dbUser}&amp;quot; -p&amp;quot;${dbPass}&amp;quot; ${dbName} | bzip2 -c &amp;gt; ${backupDir_hetzner2}/current/${backupFileName_db_hetzner2}&lt;br /&gt;
&lt;br /&gt;
# STEP 3: BACKUP FILES&lt;br /&gt;
time nice tar -czvf ${backupDir_hetzner2}/current/${backupFileName_files_hetzner2} ${vhostDir}&lt;br /&gt;
&lt;br /&gt;
# STEP 4: COPY TO HETZNER3&lt;br /&gt;
ssh -p 32415 maltfield@hetzner3 sudo mkdir -p ${backupDir_hetzner3}/{current,old}&lt;br /&gt;
ssh -p 32415 maltfield@hetzner3 sudo mv ${backupDir_hetzner3}/current/* ${backupDir_hetzner3}/old/&lt;br /&gt;
rsync -av --progress --rsync-path=&amp;quot;sudo rsync&amp;quot; -e &amp;quot;ssh -p 32415&amp;quot; ${backupDir_hetzner2}/current/* maltfield@hetzner3:${backupDir_hetzner3}/current/&lt;br /&gt;
&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner3 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;seedhome.openbuildinginstitute.org&#039;&lt;br /&gt;
dbName=&#039;seedhome_db&#039;&lt;br /&gt;
 dbUser=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
 dbPass=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner3/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupDir_hetzner3=&amp;quot;/var/tmp/backups_for_migration_from_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupFileName_db_hetzner2=&amp;quot;mysqldump_${vhost_name}.${stamp}.sql.bz2&amp;quot;&lt;br /&gt;
backupFileName_files_hetzner2=&amp;quot;${vhost_name}_files.${stamp}.tar.gz&amp;quot;&lt;br /&gt;
vhostDir=&amp;quot;/var/www/html/${vhost_name}&amp;quot;&lt;br /&gt;
docrootDir=&amp;quot;${vhostDir}/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 1: ADD DB&lt;br /&gt;
&lt;br /&gt;
# create backup before we start changing the sql file&lt;br /&gt;
pushd ${backupDir_hetzner3}/current&lt;br /&gt;
cp ${backupFileName_db_hetzner2} ${backupFileName_db_hetzner2}.orig&lt;br /&gt;
&lt;br /&gt;
# extract .sql.bz2 -&amp;gt; .sql&lt;br /&gt;
bzip2 -dc ${backupFileName_db_hetzner2} &amp;gt; db.sql&lt;br /&gt;
&lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;DROP DATABASE IF EXISTS ${dbName};&amp;quot; &lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;CREATE DATABASE ${dbName}; USE ${dbName};&amp;quot;&lt;br /&gt;
 time nice mysql ${dbName} -uroot -p${mysqlPass} &amp;lt; &amp;quot;db.sql&amp;quot;&lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;GRANT ALL ON ${dbName}.* TO &#039;${dbUser}&#039;@&#039;localhost&#039; IDENTIFIED BY &#039;${dbPass}&#039;; FLUSH PRIVILEGES;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 2: Add vhost files&lt;br /&gt;
mv &amp;quot;${vhostDir}&amp;quot; &amp;quot;${backupDir_hetzner3}/old/${vhost_name}.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&amp;quot;&lt;br /&gt;
tar -xzvf ${backupFileName_files_hetzner2}&lt;br /&gt;
mv var/www/html/${vhost_name} ${vhostDir}&lt;br /&gt;
&lt;br /&gt;
# remove &#039;.svn&#039; dirs (we no longer use svn, for security)&lt;br /&gt;
find ${docrootDir} -iname &#039;.svn&#039; -exec rm -rf &#039;{}&#039; \;&lt;br /&gt;
&lt;br /&gt;
# add wordpress bug fix&lt;br /&gt;
# is the bug fix already present?&lt;br /&gt;
if  ! $(grep &#039;https://core.trac.wordpress.org/ticket/48693&#039; ${vhostDir}/wp-config.php) ; then&lt;br /&gt;
	# the bug fix is absent; add it&lt;br /&gt;
&lt;br /&gt;
	backup_filename=&amp;quot;wp-config.`date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;`.php&amp;quot;&lt;br /&gt;
	mv ${vhostDir}/wp-config.php ${vhostDir}/${backup_filename}&lt;br /&gt;
&lt;br /&gt;
	cat &amp;gt; ${vhostDir}/wp-config.php &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
&lt;br /&gt;
# fix wordpress bugs&lt;br /&gt;
# * https://core.trac.wordpress.org/ticket/48693&lt;br /&gt;
# * https://core.trac.wordpress.org/ticket/62693&lt;br /&gt;
if( ! function_exists(&#039;ini_set&#039;) ){&lt;br /&gt;
		function ini_set(){&lt;br /&gt;
				return;&lt;br /&gt;
		}&lt;br /&gt;
}&lt;br /&gt;
if( ! function_exists(&#039;chmod&#039;) ){&lt;br /&gt;
		function chmod(){&lt;br /&gt;
				return;&lt;br /&gt;
		}&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
	tail -n +2 ${vhostDir}/${backup_filename} &amp;gt;&amp;gt; ${vhostDir}/wp-config.php&lt;br /&gt;
&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
ls&lt;br /&gt;
vim ${vhostDir}/wp-config.php&lt;br /&gt;
&lt;br /&gt;
# UPDATE CORE&lt;br /&gt;
&lt;br /&gt;
rsync -av --progress /var/tmp/wordpress/core/wordpress/ ${docrootDir}&lt;br /&gt;
&lt;br /&gt;
# UPDATE OLD PLUGINS&lt;br /&gt;
&lt;br /&gt;
for plugin_path in $(find &amp;quot;${docrootDir}/wp-content/plugins&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
		plugin=$(basename &amp;quot;${plugin_path}&amp;quot;)&lt;br /&gt;
		source_path=&amp;quot;/var/tmp/wordpress/plugins/${plugin}&amp;quot;&lt;br /&gt;
        &lt;br /&gt;
		echo &amp;quot;${plugin}&amp;quot;&lt;br /&gt;
		rm -rf ${plugin_path};&lt;br /&gt;
		if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
				rsync -a ${source_path}/ &amp;quot;${plugin_path}/&amp;quot;&lt;br /&gt;
		fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# INSTALLL NEW PLUGINS&lt;br /&gt;
&lt;br /&gt;
new_plugins=&amp;quot;activitypub aurora-heatmap melapress-login-security wps-hide-login raw-html related-posts-by-taxonomy smart-slider-3 spam-destroyer coinpayments-payment-gateway-for-woocommerce woocommerce-gateway-stripe wpfront-notification-bar wordpress-seo wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed bulk-media-register enable-media-replace regenerate-thumbnails wp-qrcode wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed wp-2fa advanced-nocaptcha-recaptcha hcaptcha-for-forms-and-more leaflet-map extensions-leaflet-map wpforms-lite&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for plugin in ${new_plugins}; do&lt;br /&gt;
		plugin_path=&amp;quot;${docrootDir}/wp-content/plugins/${plugin}&amp;quot;&lt;br /&gt;
		source_path=&amp;quot;/var/tmp/wordpress/plugins/${plugin}&amp;quot;&lt;br /&gt;
        &lt;br /&gt;
		if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
				echo &amp;quot;${plugin}&amp;quot;&lt;br /&gt;
				rm -rf ${plugin_path};&lt;br /&gt;
				rsync -a ${source_path}/ &amp;quot;${plugin_path}/&amp;quot;&lt;br /&gt;
		fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# UPDATE/INSTALL THEMES&lt;br /&gt;
&lt;br /&gt;
for theme_path in $(find &amp;quot;${docrootDir}/wp-content/themes&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
	theme=$(basename &amp;quot;${theme_path}&amp;quot;)&lt;br /&gt;
	source_path=&amp;quot;/var/tmp/wordpress/themes/${theme}&amp;quot;&lt;br /&gt;
	&lt;br /&gt;
	echo &amp;quot;${theme}&amp;quot;&lt;br /&gt;
	rm -rf ${theme_path};&lt;br /&gt;
	if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
		rsync -a ${source_path}/ &amp;quot;${theme_path}/&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# SET PERMISSIONS&lt;br /&gt;
&lt;br /&gt;
# first pass, whole site&lt;br /&gt;
chown -R not-apache:www-data &amp;quot;/var/www/html&amp;quot;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
#############&lt;br /&gt;
# WORDPRESS #&lt;br /&gt;
#############&lt;br /&gt;
&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
	chown not-apache:apache-admins &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
	chmod 0040 &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
###########&lt;br /&gt;
# phpList #&lt;br /&gt;
###########&lt;br /&gt;
&lt;br /&gt;
phplist_sites=&amp;quot;$(find /var/www/html -maxdepth 1 -type d -iname *phplist*)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for vhost_dir in $phplist_sites; do&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type d -exec chmod 0050 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type f -exec chmod 0040 {} \;; done&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do [ -d &amp;quot;${dir}/public_html/uploadimages&amp;quot; ] || mkdir &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type f -exec chmod 0660 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type d -exec chmod 0770 {} \;; done&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# ACTIVATE NEW PLUGINS&lt;br /&gt;
&lt;br /&gt;
activate_plugins=&amp;quot;activitypub aurora-heatmap melapress-login-security&amp;quot;&lt;br /&gt;
for plugin in ${activate_plugins}; do&lt;br /&gt;
	sudo -u wp -i wp --path=&amp;quot;${docrootDir}&amp;quot; plugin activate ${plugin}&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# as before, I ran ^ that, fixed permissions, pushed with ansible, restarted web services, updated my /etc/hosts, and tried to load it in the browser https://seedhome.openbuildinginstitute.org/&lt;br /&gt;
# this time, instead of getting presented with the wp login page, I got presented the OSE forums site; interesting&lt;br /&gt;
# nginx logs appear to show the correct vhost&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/log/nginx # tail -f access.log error.log */*log&lt;br /&gt;
...&lt;br /&gt;
==&amp;gt; seedhome.openbuildinginstitute.org/access.log &amp;lt;==&lt;br /&gt;
185.213.155.175 - - [26/Dec/2024:20:57:34 +0000] &amp;quot;GET / HTTP/1.1&amp;quot; 200 151284 &amp;quot;-&amp;quot; &amp;quot;curl/7.88.1&amp;quot; &amp;quot;-&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# apache does say the wrong vhost&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/log/apache2 # tail -f error.log access.log */*log&lt;br /&gt;
...&lt;br /&gt;
==&amp;gt; forum.opensourceecology.org/access.log &amp;lt;==&lt;br /&gt;
185.213.155.175 - - [26/Dec/2024:20:58:51 +0000] &amp;quot;GET / HTTP/1.1&amp;quot; 200 16464 &amp;quot;-&amp;quot; &amp;quot;curl/7.88.1&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, there&#039;s no vhost at all for this site in apache; hmm&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/apache2/sites-enabled # ls -lah&lt;br /&gt;
total 24K&lt;br /&gt;
drwxr-xr-x 2 root root 4,0K Dec 26 20:10 .&lt;br /&gt;
drwxr-xr-x 8 root root 4,0K Dec 26 20:46 ..&lt;br /&gt;
lrwxrwxrwx 1 root root   61 Sep 25 01:47 00-forum.opensourceecology.org.conf -&amp;gt; /etc/apache2/sites-available/forum.opensourceecology.org.conf&lt;br /&gt;
lrwxrwxrwx 1 root root   59 Dec 26 18:20 fef.opensourceecology.org.conf -&amp;gt; /etc/apache2/sites-available/fef.opensourceecology.org.conf&lt;br /&gt;
lrwxrwxrwx 1 root root   68 Dec 26 04:29 microfactory.opensourceecology.org.conf -&amp;gt; /etc/apache2/sites-available/microfactory.opensourceecology.org.conf&lt;br /&gt;
lrwxrwxrwx 1 root root   60 Dec 26 20:10 oswh.opensourceecology.org.conf -&amp;gt; /etc/apache2/sites-available/oswh.opensourceecology.org.conf&lt;br /&gt;
lrwxrwxrwx 1 root root   61 Sep 27 04:47 store.opensourceecology.org.conf -&amp;gt; /etc/apache2/sites-available/store.opensourceecology.org.conf&lt;br /&gt;
root@hetzner3 /etc/apache2/sites-enabled #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/apache2/sites-enabled # ls -lah ../sites-available/&lt;br /&gt;
total 52K&lt;br /&gt;
drwxr-xr-x 2 root root 4,0K Dec 26 20:10 .&lt;br /&gt;
drwxr-xr-x 8 root root 4,0K Dec 26 20:46 ..&lt;br /&gt;
-rw-r--r-- 1 root root 1,3K Jul  7 13:26 000-default.conf&lt;br /&gt;
-rw-r--r-- 1 root root  868 Oct  4 05:52 000-maintenance.conf&lt;br /&gt;
-rw-r--r-- 1 root root  870 Oct  4 04:57 000-maintenance.conf.3593326.2024-10-04@05:53:55~&lt;br /&gt;
-rw-r--r-- 1 root root 6,1K Jul 18 05:26 default-ssl.conf&lt;br /&gt;
-rw-r--r-- 1 root root  952 Dec 26 18:19 fef.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root  964 Sep 25 01:46 forum.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root 1006 Dec 26 04:28 microfactory.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root  958 Dec 26 20:09 oswh.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root  964 Oct  5 04:19 store.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root 1,2K Oct  5 04:18 store.opensourceecology.org.conf.3944920.2024-10-05@04:20:17~&lt;br /&gt;
root@hetzner3 /etc/apache2/sites-enabled # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh shit, the non-errors that always show up at the end of the ansible runs (restart fails) made me miss the actual error&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ ansible-playbook provision.yml &lt;br /&gt;
...&lt;br /&gt;
TASK [maltfield.apache : sites-available/ item .conf] **********************&lt;br /&gt;
ok: [hetzner3] =&amp;gt; (item=forum.opensourceecology.org)&lt;br /&gt;
ok: [hetzner3] =&amp;gt; (item=store.opensourceecology.org)&lt;br /&gt;
ok: [hetzner3] =&amp;gt; (item=microfactory.opensourceecology.org)&lt;br /&gt;
ok: [hetzner3] =&amp;gt; (item=fef.opensourceecology.org)&lt;br /&gt;
ok: [hetzner3] =&amp;gt; (item=oswh.opensourceecology.org)&lt;br /&gt;
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option&lt;br /&gt;
failed: [hetzner3] (item=seedhome.openbuildinginstitute.org) =&amp;gt; {&amp;quot;ansible_loop_var&amp;quot;: &amp;quot;item&amp;quot;, &amp;quot;changed&amp;quot;: false, &amp;quot;item&amp;quot;: &amp;quot;seedhome.openbuildinginstitute.org&amp;quot;, &amp;quot;msg&amp;quot;: &amp;quot;Could not find or access &#039;seedhome.openbuildinginstitute.org.conf.j2&#039;\nSearched in:\n\t/home/user/sandbox_local/ansible/hetzner3/roles/maltfield.apache/templates/seedhome.openbuildinginstitute.org.conf.j2\n\t/home/user/sandbox_local/ansible/hetzner3/roles/maltfield.apache/seedhome.openbuildinginstitute.org.conf.j2\n\t/home/user/sandbox_local/ansible/hetzner3/roles/maltfield.apache/tasks/templates/seedhome.openbuildinginstitute.org.conf.j2\n\t/home/user/sandbox_local/ansible/hetzner3/roles/maltfield.apache/tasks/seedhome.openbuildinginstitute.org.conf.j2\n\t/home/user/sandbox_local/ansible/hetzner3/templates/seedhome.openbuildinginstitute.org.conf.j2\n\t/home/user/sandbox_local/ansible/hetzner3/seedhome.openbuildinginstitute.org.conf.j2 on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option&amp;quot;}&lt;br /&gt;
&lt;br /&gt;
PLAY RECAP *********************************************************************&lt;br /&gt;
hetzner3                   : ok=60   changed=9    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   &lt;br /&gt;
&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah, I gave it the wrong tld&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3/roles/maltfield.apache/templates$ ls | grep -i seedhome&lt;br /&gt;
seedhome.opensourceecology.org.conf.j2&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3/roles/maltfield.apache/templates$&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I renamed the template file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3/roles/maltfield.apache/templates$ git mv seedhome.opensourceecology.org.conf.j2 seedhome.openbuildinginstitute.org.conf.j2 &lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3/roles/maltfield.apache/templates$&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the next ansible run put the file in-place&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/apache2/sites-enabled # ls -lah ../sites-available/&lt;br /&gt;
total 56K&lt;br /&gt;
drwxr-xr-x 2 root root 4,0K Dec 26 21:05 .&lt;br /&gt;
drwxr-xr-x 8 root root 4,0K Dec 26 21:05 ..&lt;br /&gt;
-rw-r--r-- 1 root root 1,3K Jul  7 13:26 000-default.conf&lt;br /&gt;
-rw-r--r-- 1 root root  868 Oct  4 05:52 000-maintenance.conf&lt;br /&gt;
-rw-r--r-- 1 root root  870 Oct  4 04:57 000-maintenance.conf.3593326.2024-10-04@05:53:55~&lt;br /&gt;
-rw-r--r-- 1 root root 6,1K Jul 18 05:26 default-ssl.conf&lt;br /&gt;
-rw-r--r-- 1 root root  952 Dec 26 18:19 fef.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root  964 Sep 25 01:46 forum.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root 1006 Dec 26 04:28 microfactory.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root  958 Dec 26 20:09 oswh.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root  982 Dec 26 21:04 seedhome.openbuildinginstitute.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root  964 Oct  5 04:19 store.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root 1,2K Oct  5 04:18 store.opensourceecology.org.conf.3944920.2024-10-05@04:20:17~&lt;br /&gt;
root@hetzner3 /etc/apache2/sites-enabled # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but the restart failed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/seedhome.openbuildinginstitute.org_20241226/current # systemctl restart apache2 varnish nginx&lt;br /&gt;
Job for apache2.service failed because the control process exited with error code.&lt;br /&gt;
See &amp;quot;systemctl status apache2.service&amp;quot; and &amp;quot;journalctl -xeu apache2.service&amp;quot; for details.&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/seedhome.openbuildinginstitute.org_20241226/current # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/seedhome.openbuildinginstitute.org_20241226/current # systemctl status apache2&lt;br /&gt;
× apache2.service - The Apache HTTP Server&lt;br /&gt;
	 Loaded: loaded (/lib/systemd/system/apache2.service; enabled; preset: enabled)&lt;br /&gt;
	 Active: failed (Result: exit-code) since Thu 2024-12-26 21:06:00 UTC; 1min 3s ago&lt;br /&gt;
   Duration: 11min 43.849s&lt;br /&gt;
	   Docs: https://httpd.apache.org/docs/2.4/&lt;br /&gt;
	Process: 2767355 ExecStart=/usr/sbin/apachectl start (code=exited, status=1/FAILURE)&lt;br /&gt;
		CPU: 61ms&lt;br /&gt;
&lt;br /&gt;
Dec 26 21:06:00 hetzner3 systemd[1]: Starting apache2.service - The Apache HTTP Server...&lt;br /&gt;
Dec 26 21:06:00 hetzner3 apachectl[2767369]: AH00112: Warning: DocumentRoot [/var/www/html/seedhome.opensourceecology.org/htdocs] does not &amp;gt;&lt;br /&gt;
Dec 26 21:06:00 hetzner3 apachectl[2767369]: (2)No such file or directory: AH02291: Cannot access directory &#039;/var/log/apache2/seedhome.open&amp;gt;&lt;br /&gt;
Dec 26 21:06:00 hetzner3 apachectl[2767369]: AH00014: Configuration check failed&lt;br /&gt;
Dec 26 21:06:00 hetzner3 apachectl[2767355]: Action &#039;start&#039; failed.&lt;br /&gt;
Dec 26 21:06:00 hetzner3 apachectl[2767355]: The Apache error log may have more information.&lt;br /&gt;
Dec 26 21:06:00 hetzner3 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE&lt;br /&gt;
Dec 26 21:06:00 hetzner3 systemd[1]: apache2.service: Failed with result &#039;exit-code&#039;.&lt;br /&gt;
Dec 26 21:06:00 hetzner3 systemd[1]: Failed to start apache2.service - The Apache HTTP Server.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, the typo was in the file as well; I fixed it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ git diff&lt;br /&gt;
diff --git a/hetzner3/roles/maltfield.apache/templates/seedhome.openbuildinginstitute.org.conf.j2 b/hetzner3/roles/maltfield.apache/templates/seedhome.openbuildinginstitute.org.conf.j2&lt;br /&gt;
index 509cd37..8905bf9 100644&lt;br /&gt;
--- a/hetzner3/roles/maltfield.apache/templates/seedhome.openbuildinginstitute.org.conf.j2&lt;br /&gt;
+++ b/hetzner3/roles/maltfield.apache/templates/seedhome.openbuildinginstitute.org.conf.j2&lt;br /&gt;
@@ -1,7 +1,7 @@&lt;br /&gt;
 #  ansible_managed &lt;br /&gt;
 &lt;br /&gt;
 ################################################################################&lt;br /&gt;
-# File:    seedhome.opensourceecology.org.conf&lt;br /&gt;
+# File:    seedhome.openbuildinginstitute.org.conf&lt;br /&gt;
 # Version: 0.3&lt;br /&gt;
 # Purpose: localhost-only-listening, http-only, name-based-vhost for serving&lt;br /&gt;
 #          traffic to varnish&lt;br /&gt;
@@ -11,15 +11,15 @@&lt;br /&gt;
 ################################################################################&lt;br /&gt;
 &lt;br /&gt;
 &amp;lt;VirtualHost 127.0.0.1:8000&amp;gt;&lt;br /&gt;
-       ServerName seedhome.opensourceecology.org&lt;br /&gt;
-       DocumentRoot &amp;quot;/var/www/html/seedhome.opensourceecology.org/htdocs&amp;quot;&lt;br /&gt;
+       ServerName seedhome.openbuildinginstitute.org&lt;br /&gt;
+       DocumentRoot &amp;quot;/var/www/html/seedhome.openbuildinginstitute.org/htdocs&amp;quot;&lt;br /&gt;
 &lt;br /&gt;
-       CustomLog &amp;quot;/var/log/apache2/seedhome.opensourceecology.org/access.log&amp;quot; combined&lt;br /&gt;
-       ErrorLog &amp;quot;/var/log/apache2/seedhome.opensourceecology.org/error.log&amp;quot;&lt;br /&gt;
+       CustomLog &amp;quot;/var/log/apache2/seedhome.openbuildinginstitute.org/access.log&amp;quot; combined&lt;br /&gt;
+       ErrorLog &amp;quot;/var/log/apache2/seedhome.openbuildinginstitute.org/error.log&amp;quot;&lt;br /&gt;
 &lt;br /&gt;
		Include &#039;conf-available/wordpress.virtualhost.include&#039;&lt;br /&gt;
 &lt;br /&gt;
-       &amp;lt;Directory &amp;quot;/var/www/html/seedhome.opensourceecology.org/htdocs&amp;quot;&amp;gt;&lt;br /&gt;
+       &amp;lt;Directory &amp;quot;/var/www/html/seedhome.openbuildinginstitute.org/htdocs&amp;quot;&amp;gt;&lt;br /&gt;
				Include &#039;conf-available/wordpress.directory.include&#039;&lt;br /&gt;
				Options +FollowSymLinks&lt;br /&gt;
		&amp;lt;/Directory&amp;gt;&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I re-ran ansible, and this time the manual restarts worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/seedhome.openbuildinginstitute.org_20241226/current # systemctl restart apache2 varnish nginx&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/seedhome.openbuildinginstitute.org_20241226/current # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I refreshed the page, and – jesus – this is just another wordpress site that we never setup.&lt;br /&gt;
# I logged-in to the wp admin dashboard and did the db update&lt;br /&gt;
# now, unlike the other sites before, I loaded the site and it works! I mean, it&#039;s just a fresh install of a new site with Twenty Seventeen theme active, but at least nothing is broken.&lt;br /&gt;
# I finished setting-up the site with the plugins , per the CHG for store https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_store_to_hetzner3#Change_Steps&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
	&amp;quot;Media&amp;quot; -&amp;gt; &amp;quot;Library&amp;quot;&lt;br /&gt;
		Upload https://wiki.opensourceecology.org/wiki/File:OSE-logo-blueprint-bg-v3-1blarge.jpg&lt;br /&gt;
		Upload https://wiki.opensourceecology.org/wiki/File:1day.jpg&lt;br /&gt;
	&amp;quot;Settings&amp;quot; -&amp;gt; &amp;quot;ActivityPub&amp;quot; -&amp;gt; &amp;quot;Settings&amp;quot; -&amp;gt; &amp;quot;Enable profiles by type&amp;quot; = &amp;quot;Blog profile only&amp;quot;&lt;br /&gt;
	&amp;quot;Settings&amp;quot; -&amp;gt; &amp;quot;ActivityPub&amp;quot; -&amp;gt; &amp;quot;Settings&amp;quot; -&amp;gt; &amp;quot;Blog-Profile&amp;quot; -&amp;gt; &amp;quot;Change profile ID&amp;quot; = &amp;quot;ose&amp;quot;&lt;br /&gt;
	&amp;quot;Settings&amp;quot; -&amp;gt; &amp;quot;ActivityPub&amp;quot; -&amp;gt; &amp;quot;Settings&amp;quot; -&amp;gt; &amp;quot;Blog-Profile&amp;quot; -&amp;gt; &amp;quot;Change Header Image&amp;quot; = Select &amp;quot;1day.jpg&amp;quot;, cropped such the bottom is exactly the bottom of Catarina&#039;s white coat&lt;br /&gt;
	&amp;quot;Settings&amp;quot; -&amp;gt; &amp;quot;General&amp;quot; -&amp;gt; &amp;quot;Choose a Site Icon&amp;quot; -&amp;gt; Select &amp;quot;OSE-logo-blueprint-bg-v3-1blarge.jpg&amp;quot;, cropped such that there is only a small buffer on the left &amp;amp; right of the text &lt;br /&gt;
&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login Security Policies&amp;quot; -&amp;gt; tick the box that said &amp;quot;enable login security policies&amp;quot;&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login Security Policies&amp;quot; -&amp;gt; tick the box that said &amp;quot;Activate password policies&amp;quot;&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login Security Policies&amp;quot; -&amp;gt; change &amp;quot;Passwords must be X characters minimum&amp;quot; to &amp;quot;20&amp;quot;&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login Security Policies&amp;quot; -&amp;gt; uncheck &amp;quot;Password must contain at least one uppercase and one lowercase character. &amp;quot;&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login Security Policies&amp;quot; -&amp;gt; uncheck &amp;quot;Password must contain at least one numeric character (0-9).&amp;quot;&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login Security Policies&amp;quot; -&amp;gt; uncheck &amp;quot;Password must contain at least one special character, i.e., a character that is not a letter or a umber, such as ( , ? € ! @ # * etc&amp;quot;&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login Security Policies&amp;quot; -&amp;gt; check &amp;quot;Reset password on first login &amp;quot;&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login Security Policies&amp;quot; -&amp;gt; check &amp;quot;Do not send password reset links &amp;quot;&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login Security Policies&amp;quot; -&amp;gt; check &amp;quot;Activate failed login policies &amp;quot;&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login Security Policies&amp;quot; -&amp;gt; change &amp;quot;When a user is locked&amp;quot; from &amp;quot;it can be only unlocked by the administrator&amp;quot; to &amp;quot;unlock it after 60 minutes&amp;quot;&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login Security Policies&amp;quot; -&amp;gt; uncheck &amp;quot;Require blocked users to reset password on unblock. &amp;quot;&lt;br /&gt;
	Click the &amp;quot;Save Changes&amp;quot; button&lt;br /&gt;
	&amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login page hardening&amp;quot; -&amp;gt; in the input form next to &amp;quot;Login page URL&amp;quot;, I enter &amp;quot;ose-hidden-login&amp;quot;&lt;br /&gt;
	Click the &amp;quot;Save Changes&amp;quot; button&lt;br /&gt;
&lt;br /&gt;
	&amp;quot;Settings&amp;quot; -&amp;gt; &amp;quot;Google Authenticator&amp;quot; -&amp;gt; Check every box under &amp;quot;Roles requiring Google Authenticator Enabled&amp;quot;&lt;br /&gt;
	Click &amp;quot;Save Changes&amp;quot; Button&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well I changed a few things, like &#039;s/ose/obi&#039; for activitypub&lt;br /&gt;
# anyway, I&#039;d say this site is done. Yay! I think that&#039;s our second finished site, after the static site forums X_x&lt;br /&gt;
# ...&lt;br /&gt;
# ok, well, that does it for all our wordpress sites; all the other ones are stuck in some way&lt;br /&gt;
# next-up is phpList&lt;br /&gt;
# looks like I never actually did a 3TOFU of phpList?&lt;br /&gt;
# I did a 3TOFU of mediawiki and wordpress when I started this project back in August, but phpList is absent https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024_Q3#Sun_Aug_04.2C_2024&lt;br /&gt;
# I was hoping that maybe 3TOFU wouldn&#039;t be necessary (because their releases were signed), but it looks like I opened an issue about that in Sep of 2023. The lead dev said it&#039;s a good idea, but they&#039;re still not signing their releases https://github.com/phpList/phplist3/issues/987&lt;br /&gt;
# looks like the latest version of phpList is v3.4.7. Currently we&#039;re running v3.3.3.&lt;br /&gt;
# https://altushost-swe.dl.sourceforge.net/project/phplist/phplist/3.6.14/phplist-3.6.14.zip&lt;br /&gt;
# Here&#039;s a 3TOFU script for the latest version of phpList&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
REMOTE_FILES=&amp;quot;https://altushost-swe.dl.sourceforge.net/project/phplist/phplist/3.6.14/phplist-3.6.14.zip&amp;quot;&lt;br /&gt;
&lt;br /&gt;
CURL=&amp;quot;/usr/bin/curl --retry 5 --retry-all-errors&amp;quot;&lt;br /&gt;
PYTHON=&amp;quot;/usr/bin/python3&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# in tails, we must torify&lt;br /&gt;
if [[ &amp;quot;`whoami`&amp;quot; == &amp;quot;amnesia&amp;quot; ]] ; then&lt;br /&gt;
	CURL=&amp;quot;/usr/bin/torify ${CURL}&amp;quot;&lt;br /&gt;
	PYTHON=&amp;quot;/usr/bin/torify ${PYTHON}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
tmpDir=`mktemp -d`&lt;br /&gt;
pushd &amp;quot;${tmpDir}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# first get some info about our internet connection&lt;br /&gt;
${CURL} -s https://ifconfig.co/country | head -n1&lt;br /&gt;
${CURL} -s https://check.torproject.org | grep Congratulations | head -n1&lt;br /&gt;
&lt;br /&gt;
# and today&#039;s date&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# get the file&lt;br /&gt;
for file in ${REMOTE_FILES}; do&lt;br /&gt;
	${CURL} --progress-bar -O ${file}&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# checksum&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
sha256sum *&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s TOFU 1/3 (Tor, exit in Germany)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Congratulations. This browser is configured to use Tor.&lt;br /&gt;
2024-12-26&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
2024-12-26&lt;br /&gt;
9e17cb15dd75bbbd5dbb984eda674863c3b10ab72613cf8a39a00c3e11a8492a  phplist-3.6.14.zip&lt;br /&gt;
user@host:/tmp/user/1000/tmp.ehfWbAUg29$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# well, until the 3TOFU on phpList is done, I think all that remains is mediawiki. fortunately, that&#039;s one of the most secure packages available; it&#039;s signed!&lt;br /&gt;
# when I last checked on this in August, the latest LTS of MediaWiki was 1.39 https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024_Q3#Sun_Aug_04.2C_2024&lt;br /&gt;
# good news: a newer LTS has been released since then = 1.43, which just came-out 5 days ago. Good timing! It&#039;s going to be supported until Dec 2027 https://www.mediawiki.org/wiki/Version_lifecycle&lt;br /&gt;
# here&#039;s the blog post from *2 days ago* about the release https://www.pro.wiki/news/what-new-features-in-mediawiki-1-43-release&lt;br /&gt;
## shit, the blog post says you can&#039;t upgrade from v1.34 or earlier. To avoid data loss, we have to update to v1.35 first.&lt;br /&gt;
## It looks like we&#039;re running Mediawiki v1.30.0 on hetzner2&lt;br /&gt;
## So we know we need to upgrade to v1.35 first, but the next question is: can we upgrade directly from v1.30.0 to v1.35?&lt;br /&gt;
## here&#039;s the page that lists all the versions of MediaWiki https://en.wikipedia.org/wiki/MediaWiki_version_history&lt;br /&gt;
## unfortunatley, the link to the release notes for v1.35 is broken https://phabricator.wikimedia.org/source/mediawiki/browse/REL1_35/RELEASE-NOTES-1.35&lt;br /&gt;
## I guess we&#039;ll have to download the release and read the UPGRADE file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget https://releases.wikimedia.org/mediawiki/1.35/mediawiki-1.35.0.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, cool, the notes say that &amp;quot;upgrade from 1.3 or ealier...should generally go smoothly&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp1974:~$ grep -A9 1.3 mediawiki-1.35.0/UPGRADE &lt;br /&gt;
== Upgrading from 1.3 or earlier ==&lt;br /&gt;
&lt;br /&gt;
This should generally go smoothly.&lt;br /&gt;
&lt;br /&gt;
If you keep your LocalSettings.php, you may need to change the style paths to&lt;br /&gt;
match the newly rearranged skin modules. Change these lines:&lt;br /&gt;
  $wgStylePath        = &amp;quot;$wgScriptPath/stylesheets&amp;quot;;&lt;br /&gt;
  $wgStyleDirectory   = &amp;quot;$IP/stylesheets&amp;quot;;&lt;br /&gt;
  $wgLogo             = &amp;quot;$wgStylePath/images/wiki.png&amp;quot;;&lt;br /&gt;
&lt;br /&gt;
--&lt;br /&gt;
Note that the 1.3 beta releases included a potential vulnerability if PHP is&lt;br /&gt;
configured with register_globals on and the includes directory is served to the&lt;br /&gt;
Web. For general safety, turn register_globals *off* if you don&#039;t _really_ need&lt;br /&gt;
it for another package.&lt;br /&gt;
&lt;br /&gt;
If your hosting provider turns it on and you can&#039;t turn it off yourself, send&lt;br /&gt;
them a kind note explaining that it can expose their servers and their customers&lt;br /&gt;
to attacks.&lt;br /&gt;
&lt;br /&gt;
== Upgrading from 1.2 or earlier ==&lt;br /&gt;
user@disp1974:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the security note sounds a bit alarming, and more-so because I didn&#039;t see that we&#039;re disabling register_globals on hetzner2 or hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology current]# grep -ir globals /etc/php.ini&lt;br /&gt;
; globals: GET, POST, COOKIE, ENV and SERVER. There is a performance penalty&lt;br /&gt;
; variables_order directive. It does not mean it will leave the super globals&lt;br /&gt;
; http://php.net/auto-globals-jit&lt;br /&gt;
auto_globals_jit = On&lt;br /&gt;
[root@opensourceecology current]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# not to worry, though. it seems that php disabled register globals entirely in PHP 5.4 https://www.a2hosting.com/kb/developer-corner/php/using-php.ini-directives/php-register-globals-directive&lt;br /&gt;
# and we&#039;re using php 5.6 on hetzner2 already&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology current]# rpm -qa | grep -i php | grep -i mysql&lt;br /&gt;
php56w-mysql-5.6.40-1.w7.x86_64&lt;br /&gt;
[root@opensourceecology current]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# alright, so first we need to figure out how to migrate mediawiki from hetzner2 to hetzner3. then upgrade it to v1.35. Then upgrade it to v1.43.&lt;br /&gt;
# first, let&#039;s download and verify the releases for 1.35 and 1.43&lt;br /&gt;
# I know that I&#039;ve previously 3TOFU&#039;d the MediaWiki release signing key, but it looks like I never imported it into my keying on hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ gpg --list-keys&lt;br /&gt;
/home/maltfield/.gnupg/pubring.kbx&lt;br /&gt;
----------------------------------&lt;br /&gt;
pub   rsa2048 2018-05-31 [SC]&lt;br /&gt;
	  63AF7AA15067C05616FDDD88A3A2E8F226F0BC06&lt;br /&gt;
uid           [ unknown] WP-CLI Releases &amp;lt;releases@wp-cli.org&amp;gt;&lt;br /&gt;
sub   rsa2048 2018-05-31 [E]&lt;br /&gt;
&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the 3tofus I did in the past for the keys.txt file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
2024-08-04&lt;br /&gt;
2e943991a469cb28f4906148b2c3517ab6d5a9285e5342e2312c9f70e643955c  keys.txt&lt;br /&gt;
&lt;br /&gt;
2024-08-07&lt;br /&gt;
2e943991a469cb28f4906148b2c3517ab6d5a9285e5342e2312c9f70e643955c  keys.txt&lt;br /&gt;
&lt;br /&gt;
2024-09-10&lt;br /&gt;
2e943991a469cb28f4906148b2c3517ab6d5a9285e5342e2312c9f70e643955c  keys.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and here&#039;s what I get now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # cd /var&lt;br /&gt;
&lt;br /&gt;
=Wed Dec 25, 2024=&lt;br /&gt;
&lt;br /&gt;
# I&#039;m still waiting to hear back on the plugin question for store.opensourceecology.org from Marcin &amp;amp; Catarnna.&lt;br /&gt;
# ...&lt;br /&gt;
# in the meantime, I&#039;m going to push forward and start working on microfactory.opensourceecology.org&lt;br /&gt;
# I haven&#039;t even begun to migrate this from hetzner2 to hetzner3 yet, so first I just copied from my CHG ticket for store, changing just a few of the vars https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_store_to_hetzner3#Change_Steps&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner2 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;microfactory.opensourceecology.org&#039;&lt;br /&gt;
dbName=&#039;microfactory_db&#039;&lt;br /&gt;
 dbUser=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
 dbPass=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupDir_hetzner3=&amp;quot;/var/tmp/backups_for_migration_from_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupFileName_db_hetzner2=&amp;quot;mysqldump_${vhost_name}.${stamp}.sql.bz2&amp;quot;&lt;br /&gt;
backupFileName_files_hetzner2=&amp;quot;${vhost_name}_files.${stamp}.tar.gz&amp;quot;&lt;br /&gt;
vhostDir=&amp;quot;/var/www/html/${vhost_name}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 2: BACKUP DB&lt;br /&gt;
mkdir -p ${backupDir_hetzner2}/{current,old}&lt;br /&gt;
pushd ${backupDir_hetzner2}/current/&lt;br /&gt;
mv ${backupDir_hetzner2}/current/* ${backupDir_hetzner2}/old/&lt;br /&gt;
&lt;br /&gt;
time nice mysqldump -u&amp;quot;${dbUser}&amp;quot; -p&amp;quot;${dbPass}&amp;quot; ${dbName} | bzip2 -c &amp;gt; ${backupDir_hetzner2}/current/${backupFileName_db_hetzner2}&lt;br /&gt;
&lt;br /&gt;
# STEP 3: BACKUP FILES&lt;br /&gt;
time nice tar -czvf ${backupDir_hetzner2}/current/${backupFileName_files_hetzner2} ${vhostDir}&lt;br /&gt;
&lt;br /&gt;
# STEP 4: COPY TO HETZNER3&lt;br /&gt;
ssh -p 32415 maltfield@hetzner3 sudo mkdir -p ${backupDir_hetzner3}/{current,old}&lt;br /&gt;
ssh -p 32415 maltfield@hetzner3 sudo mv ${backupDir_hetzner3}/current/* ${backupDir_hetzner3}/old/&lt;br /&gt;
rsync -av --progress --rsync-path=&amp;quot;sudo rsync&amp;quot; -e &amp;quot;ssh -p 32415&amp;quot; ${backupDir_hetzner2}/current/* maltfield@hetzner3:${backupDir_hetzner3}/current/&lt;br /&gt;
&lt;br /&gt;
####################&lt;br /&gt;
# run on hetzner3 #&lt;br /&gt;
####################&lt;br /&gt;
&lt;br /&gt;
sudo su -&lt;br /&gt;
&lt;br /&gt;
# DECLARE VARIABLES&lt;br /&gt;
vhost_name=&#039;microfactory.opensourceecology.org&#039;&lt;br /&gt;
dbName=&#039;microfactory_db&#039;&lt;br /&gt;
 dbUser=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
 dbPass=&amp;quot;CHANGEME&amp;quot;&lt;br /&gt;
&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
stamp=`date +%Y%m%d`&lt;br /&gt;
backupDir_hetzner2=&amp;quot;/var/tmp/backups_for_migration_to_hetzner3/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupDir_hetzner3=&amp;quot;/var/tmp/backups_for_migration_from_hetzner2/${vhost_name}_${stamp}&amp;quot;&lt;br /&gt;
backupFileName_db_hetzner2=&amp;quot;mysqldump_${vhost_name}.${stamp}.sql.bz2&amp;quot;&lt;br /&gt;
backupFileName_files_hetzner2=&amp;quot;${vhost_name}_files.${stamp}.tar.gz&amp;quot;&lt;br /&gt;
vhostDir=&amp;quot;/var/www/html/${vhost_name}&amp;quot;&lt;br /&gt;
docrootDir=&amp;quot;${vhostDir}/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 1: ADD DB&lt;br /&gt;
&lt;br /&gt;
# create backup before we start changing the sql file&lt;br /&gt;
pushd ${backupDir_hetzner3}/current&lt;br /&gt;
cp ${backupFileName_db_hetzner2} ${backupFileName_db_hetzner2}.orig&lt;br /&gt;
&lt;br /&gt;
# extract .sql.bz2 -&amp;gt; .sql&lt;br /&gt;
bzip2 -dc ${backupFileName_db_hetzner2} &amp;gt; db.sql&lt;br /&gt;
&lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;DROP DATABASE IF EXISTS ${dbName};&amp;quot; &lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;CREATE DATABASE ${dbName}; USE ${dbName};&amp;quot;&lt;br /&gt;
 time nice mysql ${dbName} -uroot -p${mysqlPass} &amp;lt; &amp;quot;db.sql&amp;quot;&lt;br /&gt;
 time nice mysql -uroot -p${mysqlPass} -sNe &amp;quot;GRANT ALL ON ${dbName}.* TO &#039;${dbUser}&#039;@&#039;localhost&#039; IDENTIFIED BY &#039;${dbPass}&#039;; FLUSH PRIVILEGES;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# STEP 2: Add vhost files&lt;br /&gt;
mv &amp;quot;${vhostDir}&amp;quot; &amp;quot;${backupDir_hetzner3}/old/${vhost_name}.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&amp;quot;&lt;br /&gt;
tar -xzvf ${backupFileName_files_hetzner2}&lt;br /&gt;
mv var/www/html/${vhost_name} ${vhostDir}&lt;br /&gt;
&lt;br /&gt;
# remove &#039;.svn&#039; dirs (we no longer use svn, for security)&lt;br /&gt;
find ${docrootDir} -iname &#039;.svn&#039; -exec rm -rf &#039;{}&#039; \;&lt;br /&gt;
&lt;br /&gt;
# add wordpress bug fix&lt;br /&gt;
# is the bug fix already present?&lt;br /&gt;
if [[ ! $(grep &#039;https://core.trac.wordpress.org/ticket/48693&#039; ${vhostDir}/wp-config.php) ]]; then&lt;br /&gt;
	# the bug fix is absent; add it&lt;br /&gt;
&lt;br /&gt;
	backup_filename=&amp;quot;wp-config.`date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;`.php&amp;quot;&lt;br /&gt;
	mv ${vhostDir}/wp-config.php ${vhostDir}/${backup_filename}&lt;br /&gt;
&lt;br /&gt;
	cat &amp;gt; ${vhostDir}/wp-config.php &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
&lt;br /&gt;
# fix wordpress bugs&lt;br /&gt;
# * https://core.trac.wordpress.org/ticket/48693&lt;br /&gt;
# * https://core.trac.wordpress.org/ticket/62693&lt;br /&gt;
if( ! function_exists(&#039;ini_set&#039;) ){&lt;br /&gt;
        function ini_set(){&lt;br /&gt;
                return;&lt;br /&gt;
        }&lt;br /&gt;
}&lt;br /&gt;
if( ! function_exists(&#039;chmod&#039;) ){&lt;br /&gt;
        function chmod(){&lt;br /&gt;
                return;&lt;br /&gt;
        }&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
	tail -n +2 ${vhostDir}/${backup_filename} &amp;gt;&amp;gt; ${vhostDir}/wp-config.php&lt;br /&gt;
&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
ls&lt;br /&gt;
vim ${vhostDir}/wp-config.php&lt;br /&gt;
&lt;br /&gt;
# UPDATE CORE&lt;br /&gt;
&lt;br /&gt;
rsync -av --progress /var/tmp/wordpress/core/wordpress/ ${docrootDir}&lt;br /&gt;
&lt;br /&gt;
# UPDATE OLD PLUGINS&lt;br /&gt;
&lt;br /&gt;
for plugin_path in $(find &amp;quot;${docrootDir}/wp-content/plugins&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
        plugin=$(basename &amp;quot;${plugin_path}&amp;quot;)&lt;br /&gt;
        source_path=&amp;quot;/var/tmp/wordpress/plugins/${plugin}&amp;quot;&lt;br /&gt;
        &lt;br /&gt;
        echo &amp;quot;${plugin}&amp;quot;&lt;br /&gt;
        rm -rf ${plugin_path};&lt;br /&gt;
        if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
                rsync -a ${source_path}/ &amp;quot;${plugin_path}/&amp;quot;&lt;br /&gt;
        fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# INSTALLL NEW PLUGINS&lt;br /&gt;
&lt;br /&gt;
new_plugins=&amp;quot;activitypub aurora-heatmap melapress-login-security wps-hide-login raw-html related-posts-by-taxonomy smart-slider-3 spam-destroyer coinpayments-payment-gateway-for-woocommerce woocommerce-gateway-stripe wpfront-notification-bar wordpress-seo wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed bulk-media-register enable-media-replace regenerate-thumbnails wp-qrcode wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed wp-2fa advanced-nocaptcha-recaptcha hcaptcha-for-forms-and-more leaflet-map extensions-leaflet-map wpforms-lite&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for plugin in ${new_plugins}; do&lt;br /&gt;
        plugin_path=&amp;quot;${docrootDir}/wp-content/plugins/${plugin}&amp;quot;&lt;br /&gt;
        source_path=&amp;quot;/var/tmp/wordpress/plugins/${plugin}&amp;quot;&lt;br /&gt;
        &lt;br /&gt;
        if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
                echo &amp;quot;${plugin}&amp;quot;&lt;br /&gt;
                rm -rf ${plugin_path};&lt;br /&gt;
                rsync -a ${source_path}/ &amp;quot;${plugin_path}/&amp;quot;&lt;br /&gt;
        fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# UPDATE/INSTALL THEMES&lt;br /&gt;
&lt;br /&gt;
for theme_path in $(find &amp;quot;${docrootDir}/wp-content/themes&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
	theme=$(basename &amp;quot;${theme_path}&amp;quot;)&lt;br /&gt;
	source_path=&amp;quot;/var/tmp/wordpress/themes/${theme}&amp;quot;&lt;br /&gt;
	&lt;br /&gt;
	echo &amp;quot;${theme}&amp;quot;&lt;br /&gt;
	rm -rf ${theme_path};&lt;br /&gt;
	if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
		rsync -a ${source_path}/ &amp;quot;${theme_path}/&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# SET PERMISSIONS&lt;br /&gt;
&lt;br /&gt;
# first pass, whole site&lt;br /&gt;
chown -R not-apache:www-data &amp;quot;/var/www/html&amp;quot;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
#############&lt;br /&gt;
# WORDPRESS #&lt;br /&gt;
#############&lt;br /&gt;
&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
	chown not-apache:apache-admins &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
	chmod 0040 &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
###########&lt;br /&gt;
# phpList #&lt;br /&gt;
###########&lt;br /&gt;
&lt;br /&gt;
phplist_sites=&amp;quot;$(find /var/www/html -maxdepth 1 -type d -iname *phplist*)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for vhost_dir in $phplist_sites; do&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type d -exec chmod 0050 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type f -exec chmod 0040 {} \;; done&lt;br /&gt;
 &lt;br /&gt;
	for dir in ${vhost_dir}; do [ -d &amp;quot;${dir}/public_html/uploadimages&amp;quot; ] || mkdir &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type f -exec chmod 0660 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type d -exec chmod 0770 {} \;; done&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# ACTIVATE NEW PLUGINS&lt;br /&gt;
&lt;br /&gt;
activate_plugins=&amp;quot;activitypub aurora-heatmap melapress-login-security&amp;quot;&lt;br /&gt;
for plugin in ${activate_plugins}; do&lt;br /&gt;
	sudo -u wp -i wp --path=&amp;quot;${docrootDir}&amp;quot; plugin activate ${plugin}&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the /etc/hosts and tried to load the login for the new site on hetnzer3, but I got a cert error https://microfactory.opensourceecology.org/wp-login.php&lt;br /&gt;
# I checked and the cert seems valid. Actually I don&#039;t know how that is, but I&#039;ll take it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/microfactory.opensourceecology.org_20241226/current # certbot certificates&lt;br /&gt;
Saving debug log to /var/log/letsencrypt/letsencrypt.log&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -&lt;br /&gt;
Found the following certs:&lt;br /&gt;
  Certificate Name: openbuildinginstitute.org&lt;br /&gt;
    Serial Number: 37201823a671e8c6da8373cddd4efde6c6a&lt;br /&gt;
    Key Type: RSA&lt;br /&gt;
    Domains: www.openbuildinginstitute.org awstats.openbuildinginstitute.org openbuildinginstitute.org seedhome.openbuildinginstitute.org&lt;br /&gt;
    Expiry Date: 2025-03-11 08:00:42+00:00 (VALID: 75 days)&lt;br /&gt;
    Certificate Path: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem&lt;br /&gt;
    Private Key Path: /etc/letsencrypt/live/openbuildinginstitute.org/privkey.pem&lt;br /&gt;
  Certificate Name: opensourceecology.org&lt;br /&gt;
    Serial Number: 3ec0988ac3af1baa0909ce9a9f4a6409c21&lt;br /&gt;
    Key Type: RSA&lt;br /&gt;
    Domains: fef.opensourceecology.org awstats.opensourceecology.org forum.opensourceecology.org microfactory.opensourceecology.org munin.opensourceecology.org opensourceecology.org oswh.opensourceecology.org phplist.opensourceecology.org staging.opensourceecology.org store.opensourceecology.org wiki.opensourceecology.org www.opensourceecology.org&lt;br /&gt;
    Expiry Date: 2025-03-11 08:00:47+00:00 (VALID: 75 days)&lt;br /&gt;
    Certificate Path: /etc/letsencrypt/live/opensourceecology.org/fullchain.pem&lt;br /&gt;
    Private Key Path: /etc/letsencrypt/live/opensourceecology.org/privkey.pem&lt;br /&gt;
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/microfactory.opensourceecology.org_20241226/current #&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried to reload nginx, and then I refreshed the page. that made the cert error go away, but now I have a 404 from nginx; I guess I never setup the nginx/varnish/apache vhost configs with ansible?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/microfactory.opensourceecology.org_20241226/current # nginx -t&lt;br /&gt;
2024/12/26 04:08:14 [warn] 1428644#1428644: the &amp;quot;ssl&amp;quot; directive is deprecated, use the &amp;quot;listen ... ssl&amp;quot; directive instead in /etc/nginx/conf.d/https.opensourceecology.org.include:12&lt;br /&gt;
2024/12/26 04:08:14 [warn] 1428644#1428644: the &amp;quot;ssl&amp;quot; directive is deprecated, use the &amp;quot;listen ... ssl&amp;quot; directive instead in /etc/nginx/conf.d/https.opensourceecology.org.include:12&lt;br /&gt;
2024/12/26 04:08:14 [warn] 1428644#1428644: the &amp;quot;ssl&amp;quot; directive is deprecated, use the &amp;quot;listen ... ssl&amp;quot; directive instead in /etc/nginx/conf.d/https.opensourceecology.org.include:12&lt;br /&gt;
2024/12/26 04:08:14 [warn] 1428644#1428644: the &amp;quot;ssl&amp;quot; directive is deprecated, use the &amp;quot;listen ... ssl&amp;quot; directive instead in /etc/nginx/conf.d/https.opensourceecology.org.include:12&lt;br /&gt;
2024/12/26 04:08:14 [warn] 1428644#1428644: the &amp;quot;ssl&amp;quot; directive is deprecated, use the &amp;quot;listen ... ssl&amp;quot; directive instead in /etc/nginx/conf.d/https.opensourceecology.org.include:12&lt;br /&gt;
2024/12/26 04:08:14 [warn] 1428644#1428644: the &amp;quot;ssl&amp;quot; directive is deprecated, use the &amp;quot;listen ... ssl&amp;quot; directive instead in /etc/nginx/conf.d/https.opensourceecology.org.include:12&lt;br /&gt;
2024/12/26 04:08:14 [warn] 1428644#1428644: the &amp;quot;ssl&amp;quot; directive is deprecated, use the &amp;quot;listen ... ssl&amp;quot; directive instead in /etc/nginx/conf.d/https.opensourceecology.org.include:12&lt;br /&gt;
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok&lt;br /&gt;
nginx: configuration file /etc/nginx/nginx.conf test is successful&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/microfactory.opensourceecology.org_20241226/current # systemctl reload nginx&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/microfactory.opensourceecology.org_20241226/current #   &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, it&#039;s pretty bare in the nginx configs dir&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/microfactory.opensourceecology.org_20241226/current # ls -lah /etc/nginx/sites-enabled/&lt;br /&gt;
total 20K&lt;br /&gt;
drwxr-xr-x 2 root root 4,0K Sep 27 04:47 .&lt;br /&gt;
drwxr-xr-x 8 root root 4,0K Oct  5 03:45 ..&lt;br /&gt;
-rw-r--r-- 1 root root  659 Sep 24 04:16 00-default.conf&lt;br /&gt;
-rw-r--r-- 1 root root 1,8K Sep 24 04:16 awstats.opensourceecology.org.conf&lt;br /&gt;
lrwxrwxrwx 1 root root   59 Sep 24 04:17 forum.opensourceecology.org.conf -&amp;gt; /etc/nginx/sites-available/forum.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root 1,5K Sep 24 04:16 munin.opensourceecology.org.conf&lt;br /&gt;
lrwxrwxrwx 1 root root   59 Sep 27 04:47 store.opensourceecology.org.conf -&amp;gt; /etc/nginx/sites-available/store.opensourceecology.org.conf&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/microfactory.opensourceecology.org_20241226/current # ls -lah /etc/nginx/sites-available/&lt;br /&gt;
total 24K&lt;br /&gt;
drwxr-xr-x 2 root root 4,0K Oct  5 03:45 .&lt;br /&gt;
drwxr-xr-x 8 root root 4,0K Oct  5 03:45 ..&lt;br /&gt;
-rw-r--r-- 1 root root 2,4K Mar 14  2023 default&lt;br /&gt;
-rw-r--r-- 1 root root 1,6K Sep 24 04:16 forum.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root 1,6K Oct  5 03:44 store.opensourceecology.org.conf&lt;br /&gt;
-rw-r--r-- 1 root root 1,6K Sep 27 04:46 store.opensourceecology.org.conf.3932816.2024-10-05@03:45:29~&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/microfactory.opensourceecology.org_20241226/current # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I commented-out the &#039;microfactory.opensourceecology.org&#039; domain from the ansible provision.yml file, and re-ran `ansible-playbook provision.yml`. It failed with an ssh error :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ ansible-playbook provision.yml &lt;br /&gt;
[WARNING]: While constructing a mapping from&lt;br /&gt;
/home/user/sandbox_local/ansible/hetzner3/roles/maltfield.nginx/tasks/main.yml,&lt;br /&gt;
line 8, column 3, found a duplicate dict key (command). Using last defined&lt;br /&gt;
value only.&lt;br /&gt;
[WARNING]: While constructing a mapping from /home/user/sandbox_local/ansible/h&lt;br /&gt;
etzner3/roles/maltfield.varnish/tasks/main.yml, line 107, column 3, found a&lt;br /&gt;
duplicate dict key (command). Using last defined value only.&lt;br /&gt;
&lt;br /&gt;
PLAY [hetzner3] ****************************************************************&lt;br /&gt;
&lt;br /&gt;
TASK [Gathering Facts] *********************************************************&lt;br /&gt;
[WARNING]: Unhandled error in Python interpreter discovery for host hetzner3:&lt;br /&gt;
SSH authentication is incorrect&lt;br /&gt;
fatal: [hetzner3]: UNREACHABLE! =&amp;gt; {&amp;quot;changed&amp;quot;: false, &amp;quot;msg&amp;quot;: &amp;quot;SSH authentication is incorrect&amp;quot;, &amp;quot;unreachable&amp;quot;: true}&lt;br /&gt;
&lt;br /&gt;
PLAY RECAP *********************************************************************&lt;br /&gt;
hetzner3                   : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   &lt;br /&gt;
&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# curiously, it appears that everying&#039;s working great with just ssh&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ ssh hetzner3&lt;br /&gt;
Linux hetzner3 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64&lt;br /&gt;
&lt;br /&gt;
The programs included with the Debian GNU/Linux system are free software;&lt;br /&gt;
the exact distribution terms for each program are described in the&lt;br /&gt;
individual files in /usr/share/doc/*/copyright.&lt;br /&gt;
&lt;br /&gt;
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent&lt;br /&gt;
permitted by applicable law.&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, I don&#039;t think it&#039;s using my `~/.ssh/config` file; this fails&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ cat hosts &lt;br /&gt;
hetzner3 ansible_port=32415 ansible_host=144.76.164.201 ansible_user=maltfield&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ ssh -p32415 maltfield@144.76.164.201&lt;br /&gt;
maltfield@144.76.164.201: Permission denied (publickey).&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so the issue was that I had changed to use a distinct ssh auth sock file for OSE, to prevent forwarding of keys between different orgs on the wrong org&#039;s servers&lt;br /&gt;
# I fixed this by re-creating the ssh-agent socket file.&lt;br /&gt;
## Note that I had to wrap it in `eval` otherwise the env variables just get printed, and not actually added to the env&lt;br /&gt;
## Also not that I then had to actually do an `ssh hetzner3` to prompt for my password and add the (decyrpted) key to the keyring; ansible still wouldn&#039;t work the first time until I did this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ echo $SSH_AUTH_SOCK &lt;br /&gt;
/tmp/ssh-XXXXXXw7kxUB/agent.807&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$&lt;br /&gt;
&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ rm /home/user/cloud/.ssh/identities/ose/ose-agent; eval `ssh-agent -a /home/user/cloud/.ssh/identities/ose/ose-agent`&lt;br /&gt;
Agent pid 41866&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$&lt;br /&gt;
&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ echo $SSH_AUTH_SOCK &lt;br /&gt;
/home/user/cloud/.ssh/identities/ose/ose-agent&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$&lt;br /&gt;
&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ ansible-playbook provision.yml &lt;br /&gt;
[WARNING]: While constructing a mapping from&lt;br /&gt;
/home/user/sandbox_local/ansible/hetzner3/roles/maltfield.nginx/tasks/main.yml,&lt;br /&gt;
line 8, column 3, found a duplicate dict key (command). Using last defined&lt;br /&gt;
value only.&lt;br /&gt;
[WARNING]: While constructing a mapping from /home/user/sandbox_local/ansible/h&lt;br /&gt;
etzner3/roles/maltfield.varnish/tasks/main.yml, line 107, column 3, found a&lt;br /&gt;
duplicate dict key (command). Using last defined value only.&lt;br /&gt;
&lt;br /&gt;
PLAY [hetzner3] ****************************************************************&lt;br /&gt;
&lt;br /&gt;
TASK [Gathering Facts] *********************************************************&lt;br /&gt;
[WARNING]: Unhandled error in Python interpreter discovery for host hetzner3:&lt;br /&gt;
SSH authentication is incorrect&lt;br /&gt;
fatal: [hetzner3]: UNREACHABLE! =&amp;gt; {&amp;quot;changed&amp;quot;: false, &amp;quot;msg&amp;quot;: &amp;quot;SSH authentication is incorrect&amp;quot;, &amp;quot;unreachable&amp;quot;: true}&lt;br /&gt;
&lt;br /&gt;
PLAY RECAP *********************************************************************&lt;br /&gt;
hetzner3                   : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   &lt;br /&gt;
&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$&lt;br /&gt;
&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ ssh hetzner3&lt;br /&gt;
Enter passphrase for key &#039;/home/user/.ssh/identities/ose/id_rsa.ose&#039;: &lt;br /&gt;
Enter passphrase for key &#039;/home/user/.ssh/identities/ose/id_rsa.ose&#039;: &lt;br /&gt;
Linux hetzner3 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64&lt;br /&gt;
&lt;br /&gt;
The programs included with the Debian GNU/Linux system are free software;&lt;br /&gt;
the exact distribution terms for each program are described in the&lt;br /&gt;
individual files in /usr/share/doc/*/copyright.&lt;br /&gt;
&lt;br /&gt;
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent&lt;br /&gt;
permitted by applicable law.&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
logout&lt;br /&gt;
Connection to 144.76.164.201 closed.&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$&lt;br /&gt;
&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ ssh hetzner3&lt;br /&gt;
Linux hetzner3 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64&lt;br /&gt;
&lt;br /&gt;
The programs included with the Debian GNU/Linux system are free software;&lt;br /&gt;
the exact distribution terms for each program are described in the&lt;br /&gt;
individual files in /usr/share/doc/*/copyright.&lt;br /&gt;
&lt;br /&gt;
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent&lt;br /&gt;
permitted by applicable law.&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
logout&lt;br /&gt;
Connection to 144.76.164.201 closed.&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$&lt;br /&gt;
&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ ansible-playbook provision.yml &lt;br /&gt;
[WARNING]: While constructing a mapping from&lt;br /&gt;
/home/user/sandbox_local/ansible/hetzner3/roles/maltfield.nginx/tasks/main.yml,&lt;br /&gt;
line 8, column 3, found a duplicate dict key (command). Using last defined&lt;br /&gt;
value only.&lt;br /&gt;
[WARNING]: While constructing a mapping from /home/user/sandbox_local/ansible/h&lt;br /&gt;
etzner3/roles/maltfield.varnish/tasks/main.yml, line 107, column 3, found a&lt;br /&gt;
duplicate dict key (command). Using last defined value only.&lt;br /&gt;
&lt;br /&gt;
PLAY [hetzner3] ****************************************************************&lt;br /&gt;
&lt;br /&gt;
TASK [Gathering Facts] *********************************************************&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ansible finished but with an error on the restart&lt;br /&gt;
# I tried reload the web browser, but I got the same 404 error from apache&lt;br /&gt;
# I manually restarted the stack&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/microfactory.opensourceecology.org_20241226/current # systemctl restart apache2 varnish nginx&lt;br /&gt;
root@hetzner3 /var/tmp/backups_for_migration_from_hetzner2/microfactory.opensourceecology.org_20241226/current # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# then I refreshed the page, and I see a wordpress login page :D https://microfactory.opensourceecology.org/wp-login.php	&lt;br /&gt;
# I was able to login using the same creds from hetzner2, and then it dumped me to the upgrade page. I clicked the &amp;quot;Update Worpdress Database&amp;quot; button&lt;br /&gt;
# I then loaded the frontpage, and you can see it&#039;s very broken; there are shortcodes not being resolved&lt;br /&gt;
# this is actually a theme that we *do* want to have oshine setup-on, so we should figure out what plugins we need (and how to get them)&lt;br /&gt;
# the top of the page yells at us about the plugins we&#039;re missing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
This theme requires the following plugins: BE Portfolio Post Type, Meta Box Conditional Logic, Meta Box Show Hide, Meta Box Tabs, Oshine Core, Oshine Modules and Tatsu.&lt;br /&gt;
&lt;br /&gt;
This theme recommends the following plugins: BE GDPR, Master Slider, Meta Box Framework, Safe SVG and Slider Revolution.&lt;br /&gt;
&lt;br /&gt;
The following recommended plugin is currently inactive: WPForms Lite.&lt;br /&gt;
&lt;br /&gt;
Begin installing plugins | Activate installed plugin | Dismiss this notice &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so that&#039;s 7 plugins that it demands we add that we&#039;re missing&lt;br /&gt;
# the shortcodes that are broken on the frontpage are for &amp;quot;tatsu&amp;quot;, so I kinda want to look into that first&lt;br /&gt;
# it sucks that they don&#039;t give the slug for these plugins to avoid ambiguity, but I just googled &amp;quot;tatsu&amp;quot; and -- uhh -- the second link is an article from bleeping computer&lt;br /&gt;
## https://www.bleepingcomputer.com/news/security/hackers-target-tatsu-wordpress-plugin-in-millions-of-attacks/&lt;br /&gt;
# as far as I can tell, our hetzner2 server would be vulnerable -- if we didn&#039;t explicitly harden our permissions to not allow the webserver to edit its own files, outside of the uploads dir -- which (the uploads dir) doesn&#039;t allow php to be executed inside of it&lt;br /&gt;
## this is exactly the same reason that we can&#039;t simply press the &amp;quot;install&amp;quot; button to install these plugins. I know it&#039;s annoying, but clearly it&#039;s saved our ass..&lt;br /&gt;
# I tried searching wordpress.org/plugins for even the first plugin in their list (BE Portfolio Post Type), but the results were super unclear as to which plugin I need https://wordpress.org/plugins/search/BE+Portfolio+Post+Type/&lt;br /&gt;
# I sent an email to oshine (&amp;quot;BrandExponents&amp;quot; &amp;lt;support@brandexponents.com&amp;gt;) asking for them to translate these ambiguous human names to unambiguous plugin slugs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hi,&lt;br /&gt;
&lt;br /&gt;
Can you please tell me:&lt;br /&gt;
&lt;br /&gt;
1. What is the plugin slug&lt;br /&gt;
2. What is the download URL&lt;br /&gt;
&lt;br /&gt;
for all of the required plugins for the oshine theme (BE Portfolio Post Type, Meta Box Conditional Logic, Meta Box Show Hide, Meta Box Tabs, Oshine Core, Oshine Modules and Tatsu)&lt;br /&gt;
&lt;br /&gt;
I&#039;ve tried searching wordpress.org/plugins for ^ these human-readable plugin names, but it&#039;s ambiguous. I&#039;d prefer the actual slugs, so I can make sure I know which plugin to download.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
PS Your KB is inaccessible; it gives an error http://www.brandexponents.com/oshine-knowledgebase/&lt;br /&gt;
&lt;br /&gt;
  Forbidden&lt;br /&gt;
  &lt;br /&gt;
  You don&#039;t have permission to access this resource.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# otherwise, just clicking-through the links in the header nav bar, I realized that I get a 404 if I click Workshops -&amp;gt; Kansas Community College - Feb 11, 2019&lt;br /&gt;
# I do *not* get a 404 when doing the same on the hetzenr2 site&lt;br /&gt;
# all of the other pages appeared to load, albeit totally broken (likely to be fixed once we install those required plugins)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Sun Dec 22, 2024=&lt;br /&gt;
&lt;br /&gt;
# I tried-out the &#039;include-mastodon-feed&#039; plugin; it works fine (though it requires javascript)&lt;br /&gt;
# I think we should install it on all sites, but leave it deactivated for now&lt;br /&gt;
# ...&lt;br /&gt;
# I enabled the &#039;melapress-login-security&#039; plugin&lt;br /&gt;
# after enabling it, it redirected me to the page at wp dashboard -&amp;gt; Login Security&lt;br /&gt;
## I ticked the box that said &amp;quot;enable login security policies&amp;quot;&lt;br /&gt;
## I ticked the box that said &amp;quot;Activate password policies&amp;quot;&lt;br /&gt;
## I changed the &amp;quot;Passwords must be X characters minimum&amp;quot; to &amp;quot;20&amp;quot;&lt;br /&gt;
## I unchecked &amp;quot;Password must contain at least one uppercase and one lowercase character. &amp;quot;&lt;br /&gt;
## I unchecked &amp;quot;Password must contain at least one numeric character (0-9).&amp;quot;&lt;br /&gt;
## I unchecked &amp;quot;Password must contain at least one special character, i.e., a character that is not a letter or a number, such as ( , ? € ! @ # * etc&amp;quot;&lt;br /&gt;
## I checked &amp;quot;Reset password on first login &amp;quot;&lt;br /&gt;
## I checked &amp;quot;Do not send password reset links &amp;quot;&lt;br /&gt;
## I checked &amp;quot;Activate failed login policies &amp;quot;&lt;br /&gt;
## I changed &amp;quot;When a user is locked&amp;quot; from &amp;quot;it can be only unlocked by the administrator&amp;quot; to &amp;quot;unlock it after 60 minutes&amp;quot;&lt;br /&gt;
## I unchecked &amp;quot;Require blocked users to reset password on unblock. &amp;quot;&lt;br /&gt;
## I clicked the &amp;quot;Save Changes&amp;quot; button&lt;br /&gt;
# I clicked on wp dashboard -&amp;gt; &amp;quot;Login Security&amp;quot; -&amp;gt; &amp;quot;Login page hardening&amp;quot;&lt;br /&gt;
## In the input form next to &amp;quot;Login page URL&amp;quot;, I entered &amp;quot;ose-hidden-login&amp;quot;&lt;br /&gt;
## I clicked the &amp;quot;Save Changes&amp;quot; button&lt;br /&gt;
# I think that covers all the plugins that we want to enable on all sites. I sent an email to Marcin &amp;amp; Catarina asking for them to approve my plan&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
Hey Catarina,&lt;br /&gt;
&lt;br /&gt;
Please let me know if you approve the following changes to all of your wordpress sites after they&#039;re migrated to hetzner3:&lt;br /&gt;
&lt;br /&gt;
[1] Install and activate &#039;melapress-login-security&#039;&lt;br /&gt;
&lt;br /&gt;
This plugin replaces &#039;rename-wp-login&#039; and &#039;force-strong-passwords&#039;, both of which are no longer available.&lt;br /&gt;
&lt;br /&gt;
[2] Install and activate &#039;activitypub&#039;&lt;br /&gt;
&lt;br /&gt;
This turns your wordpress site into a decentralized social media site, allowing accounts on the fediverse (eg mastodon, lemmy, threads, flipboard, friendica, peertube, pixelfed, etc) to follow you with ActivityPub.&lt;br /&gt;
&lt;br /&gt;
You probably won&#039;t notice a difference, but it&#039;s a good feature to enable going forward.&lt;br /&gt;
&lt;br /&gt;
[3] Install and activate &#039;aurora-heatmap&#039;&lt;br /&gt;
&lt;br /&gt;
This is a freemium plugin that lets you see where users have clicked as they browse your site. It&#039;s all locally-stored, privacy-friendly analytics that can be helpful in understanding how your users interact with your website.&lt;br /&gt;
&lt;br /&gt;
Moreover, I would also like to install (but not activate) the following plugins (so that you an enable them later, if needed) on all of your wordpress sites:&lt;br /&gt;
&lt;br /&gt;
	wps-hide-login&lt;br /&gt;
	raw-html&lt;br /&gt;
	related-posts-by-taxonomy&lt;br /&gt;
	smart-slider-3&lt;br /&gt;
	spam-destroyer&lt;br /&gt;
	coinpayments-payment-gateway-for-woocommerce&lt;br /&gt;
	woocommerce-gateway-stripe&lt;br /&gt;
	wpfront-notification-bar&lt;br /&gt;
	wordpress-seo&lt;br /&gt;
	wp-pgp-encrypted-emails&lt;br /&gt;
	woo-multi-currency&lt;br /&gt;
	woocommerce-multilingual&lt;br /&gt;
	include-mastodon-feed&lt;br /&gt;
	bulk-media-register&lt;br /&gt;
	enable-media-replace&lt;br /&gt;
	regenerate-thumbnails&lt;br /&gt;
	wp-qrcode&lt;br /&gt;
	wp-pgp-encrypted-emails&lt;br /&gt;
	woo-multi-currency&lt;br /&gt;
	woocommerce-multilingual&lt;br /&gt;
	include-mastodon-feed&lt;br /&gt;
	wp-2fa&lt;br /&gt;
	advanced-nocaptcha-recaptcha&lt;br /&gt;
	hcaptcha-for-forms-and-more&lt;br /&gt;
	leaflet-map&lt;br /&gt;
	extensions-leaflet-map&lt;br /&gt;
	wpforms-lite&lt;br /&gt;
&lt;br /&gt;
If there&#039;s any additional plugins that you&#039;d like me to make available (even to leave disabled), now is the time to let me know so I can add it to the list and the migration script/steps.&lt;br /&gt;
&lt;br /&gt;
Please let me know if you have any questions or concerns about this plan or if it looks good, and I can proceed.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
https://www.michaelaltfield.net&lt;br /&gt;
PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&lt;br /&gt;
Note: If you cannot reach me via email, please check to see if I have changed my email address by visiting my website at https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I found commands to activate the plugins, and I added them to the CHG ticket https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_store_to_hetzner3#Change_Steps&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ docrootDir=&#039;/var/www/html/store.opensourceecology.org/htdocs&#039;&lt;br /&gt;
sudo -u wp -i wp --path=&amp;quot;${docrootDir}&amp;quot; plugin activate activitypub&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_HOST&amp;quot; in /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/vcaching/vcaching.php on line 196&lt;br /&gt;
Warning: Undefined array key &amp;quot;HTTP_HOST&amp;quot; in /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/vcaching/vcaching.php on line 196&lt;br /&gt;
Warning: Plugin &#039;activitypub&#039; is already active.&lt;br /&gt;
Success: Plugin already activated.&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&lt;br /&gt;
maltfield@hetzner3:~$ sudo -u wp -i wp --path=&amp;quot;${docrootDir}&amp;quot; plugin deactivate activitypub&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_HOST&amp;quot; in /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/vcaching/vcaching.php on line 196&lt;br /&gt;
Warning: Undefined array key &amp;quot;HTTP_HOST&amp;quot; in /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/vcaching/vcaching.php on line 196&lt;br /&gt;
Plugin &#039;activitypub&#039; deactivated.&lt;br /&gt;
Success: Deactivated 1 of 1 plugins.&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&lt;br /&gt;
maltfield@hetzner3:~$ sudo -u wp -i wp --path=&amp;quot;${docrootDir}&amp;quot; plugin activate activitypub&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_HOST&amp;quot; in /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/vcaching/vcaching.php on line 196&lt;br /&gt;
Warning: Undefined array key &amp;quot;HTTP_HOST&amp;quot; in /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/vcaching/vcaching.php on line 196&lt;br /&gt;
Plugin &#039;activitypub&#039; activated.&lt;br /&gt;
Success: Activated 1 of 1 plugins.&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I went through the entire script process for the migration of the store site https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_store_to_hetzner3#Change_Steps&lt;br /&gt;
## I found and fixed several bugs&lt;br /&gt;
## I obviously didn&#039;t do the actual DNS changes, that&#039;s still TODO to add to the article&lt;br /&gt;
## I also didn&#039;t verify the backups, but that&#039;s pretty straight-forward&lt;br /&gt;
# ok, I think the only thing blocking migration of store is the adding the DNS steps and getting Marcin/Catarina to approve the process&lt;br /&gt;
&lt;br /&gt;
=Fri Dec 20, 2024=&lt;br /&gt;
&lt;br /&gt;
# updated logs&lt;br /&gt;
# wordpress plugins research&lt;br /&gt;
&lt;br /&gt;
=Sat Dec 14, 2024=&lt;br /&gt;
&lt;br /&gt;
# I checked-up on the amazon glacier job, and I managed to download its output after it&#039;s finished and before it deleted itself!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp4042:~$ aws glacier list-jobs --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020&lt;br /&gt;
{&lt;br /&gt;
&amp;quot;JobList&amp;quot;: [&lt;br /&gt;
	{&lt;br /&gt;
		&amp;quot;JobId&amp;quot;: &amp;quot;Y66F8y-ft3r8ILhMUHth3DbDWwoMZCm0uPXC9R9_dCj74D_0cUwoX5btOTpLh9Vf4eNJS6KPP5JyujUiZ1WG6ciFGgQL&amp;quot;,&lt;br /&gt;
		&amp;quot;Action&amp;quot;: &amp;quot;InventoryRetrieval&amp;quot;,&lt;br /&gt;
		&amp;quot;VaultARN&amp;quot;: &amp;quot;arn:aws:glacier:us-west-2:REDACTED:vaults/deleteMeIn2020&amp;quot;,&lt;br /&gt;
		&amp;quot;CreationDate&amp;quot;: &amp;quot;2024-12-14T01:59:59.138Z&amp;quot;,&lt;br /&gt;
		&amp;quot;Completed&amp;quot;: true,&lt;br /&gt;
		&amp;quot;StatusCode&amp;quot;: &amp;quot;Succeeded&amp;quot;,&lt;br /&gt;
		&amp;quot;StatusMessage&amp;quot;: &amp;quot;Succeeded&amp;quot;,&lt;br /&gt;
		&amp;quot;InventorySizeInBytes&amp;quot;: 24418,&lt;br /&gt;
		&amp;quot;CompletionDate&amp;quot;: &amp;quot;2024-12-14T05:43:14.278Z&amp;quot;,&lt;br /&gt;
		&amp;quot;InventoryRetrievalParameters&amp;quot;: {&lt;br /&gt;
			&amp;quot;Format&amp;quot;: &amp;quot;JSON&amp;quot;&lt;br /&gt;
		}&lt;br /&gt;
	}&lt;br /&gt;
]&lt;br /&gt;
}&lt;br /&gt;
user@disp4042:~$ aws glacier get-job-output --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020 --job-id &amp;quot;Y66F8y-ft3r8ILhMUHth3DbDWwoMZCm0uPXC9R9_dCj74D_0cUwoX5btOTpLh9Vf4eNJS6KPP5JyujUiZ1WG6ciFGgQL&amp;quot; ./output.json&lt;br /&gt;
{&lt;br /&gt;
&amp;quot;status&amp;quot;: 200,&lt;br /&gt;
&amp;quot;acceptRanges&amp;quot;: &amp;quot;bytes&amp;quot;,&lt;br /&gt;
&amp;quot;contentType&amp;quot;: &amp;quot;application/json&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
user@disp4042:~$ &lt;br /&gt;
&lt;br /&gt;
user@disp4042:~$ du -sh output.json &lt;br /&gt;
24K	output.json&lt;br /&gt;
user@disp4042:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# alright, looks like the vault is made up of 65 Archives, which we can now delete&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp4042:~$ archive_ids=$(jq .ArchiveList[].ArchiveId &amp;lt; output.json)&lt;br /&gt;
user@disp4042:~$ &lt;br /&gt;
&lt;br /&gt;
user@disp4042:~$ echo $archive_ids | tr &amp;quot; &amp;quot; &amp;quot;\n&amp;quot; | wc -l&lt;br /&gt;
65&lt;br /&gt;
user@disp4042:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I iterated throgh each of these archives and told AWS to delete them&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp4042:~$ for archive_id in ${archive_ids}; do&lt;br /&gt;
echo &amp;quot;Deleting Archive: ${archive_id}&amp;quot;&lt;br /&gt;
aws glacier delete-archive --archive-id=${archive_id} --vault-name &#039;deleteMeIn2020&#039; --account-id REDACTED --region us-west-2&lt;br /&gt;
done&lt;br /&gt;
Deleting Archive: &amp;quot;qZJWJ57sBb9Nsz0lPGKruocLivg8SVJ40UiZznG308wSPAS0vXyoYIOJekP81YwlTmci-eWETvsy4Si2e5xYJR0oVUNLadwPVkbkPmEWI1t75fbJM_6ohrNjNkwlyWPLW-lgeOaynA&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;lEJNkWsTF-zZ1fj_2XDVrgbTFGhthkMo0FsLyCb7EM18JrQ-SimUAhAi7HtkrTZMT-wuYSDupFGDVzh87cZlzxRXrex_9NHtTkQyp93A2gICb9zOLDViUr8gHJO6AcyN-R9j2yiIDw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;fOeCrDHiQUrbvZoyT-jkSQH_euCAhtRcy8wetvONgUWyJBYzxM7AMmbc4YJzRuroL57hVmIUDQRHS-deAo3WG0esgBU52W2qes-47L1-VkczCpYkeGQjlNFGXaKE7ZeZ6jgZ3hBnpw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;zr-OjFat_oTJ4k_bMRdczuqDL_GNBpbgTVcHYSg6N-vTWvCe9FNgxJXrFeT26eL2LiXMEpijzaretHvFdyFYQarfZZzcFr0GEEB2O4rVEjtslkGuhbHfWMIGFbQZXQgmjE9aKl4EpA&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;Rb3XtAMEDXlx4KSEZ_-OdA121VJ4jHPEPHIGr33GUJ7wbixaxIzSa5gXV-2i_7-AH-_KUCuLMQbmMPxRN7an7xmMr3PHlzdZMXQj1YTFlJC0g2BT2_F1HJf8h6IocDcR-7EJQeFTqw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;P9wIGNBbLaAoz7xGht6Y4k7j33nGgPmg0RQ4sesN2tImQLjFN1dtkooVGrBnQqbPt8YhgvwUXv8eO_N72KRjS3RrZQYvkGxAQ9uPcJ-zaDOG8kII7l4p7UzGfaroO63ZreHItIW4GA&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;o-naX0m4kQde-2i-8JZbEESi7r8OlFjIoDjgbQSXT_zt9L_e7qOH3HQ1R7ViQC3i7M0lVLbODsGZm9w9HfI3tHYKb2R1T_WWBwMxFuC_OhYiPX8uepTvvBg2Mg6KysP9H3zNzwGSZw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;mxeiPukWr03RpfDr49IRdJUaJNjIWQM4gdz8S8k3-_1VetpneyWZbwEVKCB1uMTYpPy0L6HZgZP7vJ6b7gz1oeszMnlzZR0-W6Rgt4O0BZ_mwgtGHRKOH0SIpMJHRnePaq9SBR9gew&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;TOZBeL9sYVRtzy7gsAC1d930vcOhEBaABsh1ejb6vvad_NVSLu_1v0UvWqwkkf7x_8CCu6_WxolooSClZMhQOA21J_0_HP9GxvPkUvdSOeqmHjuANbIS82IRBOjFT4zFUoZnPhcVUg&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;LdlFgzhEnxVsuGMZU4d2c_rfMTGM_3iCvLUZZSpGmmLArCQLs8HxjWLwfDDeKPKEarvSgXOVA-Evy4Ep5WAzESoofG5jdCidL5OispSfHElpPu-60xbmNvQt9neLGZrwa3C_iESGiw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;6GHR8GlRG4EIlkA7O_Ta6BAXN3BQ7HmP0V7TgOp6bOa4cxuIlbHkmCd3I2lUSNwfG1penWOibFvvDhzgcihdmUMtCLepT3rl6HtFR5Lv-ro5mIegCcWQJOUDT0FRfsb7e7IkAze02Q&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;lryfyQFE4NbtWg5Q6uTq8Qqyc-y9il9WYe7lHs8H2lzFSBADOJQmCIgp6FxrkiaCcwnSMIReJPWyWcR4UOnurxwONhw8fojEHQTTeOpkf6fgfWBAPP9P6GOZZ0v8d8Jz_-QFVaV6Bw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;O19wuK1PL_Wwf59-fjQuVP2Con0LXLf5Mk9xQA3HDPw4y1ZdwjYdFzmhZdaMUtGX666YKKjJu823l2C6seOTLg1ZbXZVTqQjZTeZGkQdCSRQdxyo3pEPWE2Iqpgb61FCiIETdCANUQ&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;6ShVCMDoqdhc4wg84L1bXaq3O2InX-qB9Q9NMRH-xJQ0_TSlIN5b3fysow9-_RuNYc2lK958NrwFiIEa7Q0bVaT9LaZQH8WtoTqnX3DN2xJhb4_KUdu6iUaDdJUoPfsSXtC7xvPb-w&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;0M5MSxjrlWJiT0XrncbVBITR__anuTLeOhcq9XvqsX0Q1koa0K0bH-wrZOQO7YsqqPv5Te3AUXPOCzIO6F0g5DQ2tOZq8E_YHX0XmMGjnOfeHIV9m_5GiCQAi3PrUuWM3C4cApTs7A&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;fwR6U5jX2T9N4mc14YQNMoA52vXICj-vvgIvYyDO5Qcv-pNeuXarT4gpzIy-XjuuF4KXkp9BXD13AA3hsau9PfW0ypy874m7arznCaMZO8ajm3NIicawZMiHGEikWw82EGY0z4VDIQ&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;EZG83EoQ65jxe4ye0-0qszEqRjLE3lAb2Vi7vZ2eYvj1bVJnTc5kvfWgTxl4_w2G1PPk4pn6g2dIsYXosWk3OqWNaWNcYEOHEkNREHycnTpcl0rBkWJoimt9fCKLJCF7FiGavWUMSw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;5xqn4AAJhxnOHLJMfkvGX3Qksj5BTiEyHURILglfH0TPh_GfvbZNHqzdYIW-8sMtJ8OQ1GnnFqAOpty5mMwOSEjaokWkrQhEZK9-q7FBKDXXglAlqQKEJpd2UcTQI47zBEmGRasm-A&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;3XL4MENpH6i2Dp6micFWmMR2-qim3D1LQGiyQHME_5_A5jAbepw7WDEJOS2m2gIudSXfCuyclHTqzZYEpr6RwTGIEmYGw1jQ-EDPWYzjGTGDJcwWZEiklTmhLgvezqfyeSnQsdQZtA&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;g8RFNrkxynpQ8Yt9y4KyJra09dhxd3fIJxDlaUeDYBe615j7XON8gAdHMAQVerPQ4VF10obcuHnp64-kJFMmkG722hrlp3QBKy262CD4CcSUTSk3m070Mz6q3xySkcPzqRyxDwjtYg&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;ktHLXVqR5UxOoXEO5uRNMrIq4Jf2XrA6VmLQ0qgirJUeCler9Zcej90Qyg9bHvhQJPreilT4jwuW08oy7rZD_jnjd_2rcdZ11Y5Zl3V25lSKdRPM-b21o21kaBEr_ihhlIxOmPqJXg&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;iUmKTuLdEX3By9oHoqPtJd4KpEQ_2xh5PKV4LPuwBDcXyZmtt4zfq96djdQar1HwYmIh64bXEGqP7kGc0hk0ZtWZc12TtFUL0zohEbKBYr2VFZCQHjmc461TMLskKsOiyd6HbuKUWg&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;6gmWP3OdBIdlRuPIbNpJj8AiaR-2Y4FaPTneD6ZwZY2352Wfp6_1YNha4qvO1lapuITAhjdh-GzKY5ybgJag8O4eh8jjtBKuOg3nrjbABpeS7e6Djc-7PEiMKskaeldv5M52gHFUiA&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;4Ti7ZVFaexAncEDgak5Evp97aQk45VLA6cix3OCEB1cuGM6akGq2pINO8bzUjhEV8nvpqLLqoa_MSxPWTFl4uQ8sPUCDqG0vayB8PhYHcyNES09BQR9cE2HlR7qfxMDl5Ue946jcCw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;GSWslpTGXPiYW5-gJZ4aLrFQGfDiTifPcqsSbh8CZc6T4K8_udBkSrNV0GNZQB9eLoRrUC5cXYT06FSvZ8kltgM61VUVYOXvO0ox4jYH68_sjHnkUmimk8itpa34hBC_c0zS0ZFRLQ&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;3nDMsn_-0igfg6ncwMqx3-UxQLi-ug6LEoBxqyZKsMhd83PPoJk1cqn6QFib2GeyIgJzfCZoTlwrpe9O0_GnrM7u_mUEOsiKTCXP0NadvULehNcUx-2lWQpyRrCiDg5fcBb-f7tY0g&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;CnSvT3qmkPPY7exbsftSC-Ci71aqjLjiL1eUa3hYo3OfVkF4s2SQ8n39rH5KaQwo3GTHeJZOVoBTW9vMEf2ufYKc9e_eVAfVcmG-bLgncRQrrV-DlE2hYglzdAalx3H5OXBY8jlD9Q&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;WWIYVa-hqJzMS8q1UNNZIKfLx1V3w3lzqpCLWwflYBM7yRocX2CEyFA-aY2EKJt0hRLTshgLXE3L3Sni8bYabDLBrV2Gehgq9reRTRhn8cxoKks4f1NmZwCCTSs6L4bQuJnjjNvOKw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;XQYjqYnyYbKQWIzc1BSWQpn2K8mIoPQQH-bnoje7dB3BGCbzTjbEATGYSV1qJMbeUhiT_b7lwDiZzW1ZEbHVCgMDrWxCswG3eTZxiFdSwym7rELpFh5eC7XQlxuHjHocLY2zbUhYvg&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;kn9SKSliFV1eHh_ax1Z9rEWXR5ETF3bhdoy6IuyItI3w63rBgxaNVNk5AFJLpcR2muktNFmsSEp8QucM-B4tMdFD6PtE4K8xPJe_Cvhv3G4e2TPKn3d9HMD5Bx3XjTimGHK6rHnz0A&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;4-Rebjng1gztwjx1x5L0Z1uErelURR5vmCUGD3sEW6rBQRUHRjyEQWL22JAm6YPpCoBwIxzVDPyC2NvSofxx2InjmixAUoQsyy3zAgGoW0nSlqNQPfeF1hkRdOCyIDutfMTQ1keEQw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;OVSNJHSIy5f1WRnisLdZ9ElWY4AjdgZwFqk3vDISCtypn5AHVo7wDGOAL76SpF0XzAd-yLgD3fIzf7mvgR4maA_HCANBhIP7Sdvhi7MLMjLnXLoKoHuKayBok_VLNRFfT5XORaTemA&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;N1TB1zWhwJq20nTRNcIzVIRL9ms1KnszY0C4XAKhfTgtuWaV1SFWlqaA0xb6NjbX6N3XDisuP0bke-I0G_8RbsFQ_PcRTwRZzNEbr4LOU4WFhLM86s-FjDwjdJHmgyttfMh_1K9RLQ&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;wJyG1vWz9IcB8-mnLm9bY3do9KIsxNY9nQ8ClQaOALesN-k3R5GU11p7Q3sVeStelg9IzWvburDcVFdHmJIYHC9RuRbuSZbk_rQvxxrkhtDcviu4i9_hN4SnPHvV3i0hITuiEFGpkA&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;hPtzfNk9SSUpI-_KihUEQOb89sbrK3tr0-3au-pe7al_e8qetM7uQEbNTH4_oWPqD2yajF79XPXxi4wkqAcQjoAN4IhnkPVb846wODKTpFXkRs9V8lz6nW0t_GdR2c9uYXf-xM_MpQ&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;osvrVQsHYSGCO30f0kO9aneACAA8h80KBmqfBMqDG3RioepW6ndLlNBcSvhfQ2nrcWBwLabIn4A7Rkr7sjbddViPo92viBh4lyZdyDwVcm6Pp1hQv-p2j0vldxYLWpyLDflQ8QRn4A&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;OtlG0WN4qd8kIg3xRQvRHoAzICwHRg6S3I8df5r_VRKaUNzJCsnwbO8Z9RiJPAAqqqVqg9I_GKhnt7txvEdUjx5s9hLywWm_OcRm5Lj_rJV_dupUwVlTG8HsdnCIwFseGa1JD5bviw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;2PAyQClvhEMhO-TxdAvV9Qdqa_Lvh4webx9hHIXbVnQQHJxMlhWPikmVpr1zTQRgy23r-WcOouH6gLKQ7WBRSH5yM8q5f8gb0Z2anOAwdR4A9DtxqDIVtI78-7Bs3Bf2b0fYbPQCWw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;Gn7a5jzeimXwa3su0i02OAK2XFmK9faX2WZx77Zq_tOx6j7ihpFEnkuF97Dpo66NgF7M24orh50kMSphvzLex_NbP9tDNoOI8mYG0-7GzOmNSmw9NaZpMLGn9NAVKbxs0byJ3YkquA&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;UqxNCpEu1twmhb9qLPxpQXMBv6yLyR37rZ1T_1tQjdl8x0RwukdIoOEGcmpHwdtrJgTA2OrWZ3ZYTncxkXojwWAOROW-wJ4SJANFfxwvGfueFNUSn17qTggcqeE43I5P1xmlxb25wg&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;NR3Z9zdD2rW0NG1y3QW735TzykIivP_cnFDMCNX6RcIPh0mRb_6QiC5qy1GrBTIoroorfzaGDIKQ0BY18jbcR3XfEzfcmrZ1FiT1YvQw-c1ag6vT46-noPvmddZ_zyy2O1ItIygI6Q&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;3wjWOHj9f48-L180QRkg7onI5CbZcmaaqinUYJZRheCox-hc021rQ3Tl1Houf0s5W-qzk6HVRz3wkilQI_TAi2PXWaFUMibz00DAQfGj9ZQKeSOlxE_3qsIRcmYsYo-TMaU2UsSqNA&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;OfCmIMVetV8SxOBYUGFWldcHWFaFuGeLrYYm3A4YrvUU93zBrCLkOoBssToY1QIt_ZGwIueTgyoLTADetpfgswaoou_CwD8xfqss1hQAbQ7CaKW6sQHD-kcw4ii-D1h22lap95AZ4g&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;PLs1lsB4c1dV3YaBG1y2SN3OEWmtImJVlz6CA6IknA6y3R8yfQV3FXcLXWC_YpczM6t05xigcynA7m1A6GkuHIyTDOr6-DCOLlEvxDHmFrA4hrzJkl2pLquNWJ9yc-JC83ZV4SkM-Q&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;QwTHHmRo-NpqTTe2uy87GgB2MVydnz--3-3Z5u_0gdh5FPxEl2YSyjmJy3CKNDmJaNtrmwLeRF4_GubyZFc-CzlWl6OqZmINkCVSz34wY-k336C8HUOoKm5tPV3riSYaPb7WjjXwNQ&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;EmeH9kAWeVAyMa68pIknrJ135ZyXKB8WcjVKGQ58cVQE4Q98SMsX1OerOA4-_Q6epBJ_hgUT7ztFQ5d6PNiPRJ3H8uUIqXG3pkve5MaeA_cqAqvu4apBhU2HgALb1iS3NKy5IRdeUg&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;NX074yaGa7FGL_QH5W9-mZ9TVmi0T2428B1lW8aEck6Ydjk3H3W6pgQTisOqE9B7azs1jykJ_IL-fdbkLzhAmrpWNGJBq5hVjfMNSP-Msm976Mf7mnXe6Z6QDkO5PVXaFsNZ1EzNyw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;uHk-GTBb6LVulxOkgs_ZYdF-cvKubUpvdP7hoS9Cqduw8YPInJaHB4LbBHpIxOL1idfYoMm-h4YI_Jq8qN3EnOBHiAjqUEwJAstagfMEvk2E38IlNLu_5J_09E0JM7MZXc4RSEZfNA&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;n8UslfWy3wmFYZNYJF3PfuxVoLNORVes-IunJoyzKJDYMNqmkwybrG9KVGoL4sbRspq0Tqmccn87hLGZ_A7kjBB6fvnWuAOjALhNinbDe-RkESPVPWN6464vfCIf3BI3NhK0_nzCNw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;lzlnWYAQWMFp32BM163QS_8kb9wJ_kqaal2XmVb_rXLRDDXhSogYZCanA7oWyi3IdlWECd8R3KT3s50gJo8_kckLtq2uUUjG3Yl1wJuvXQfVh1AwzPOtLlyldqXmDoiVFzw-NrkpIw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;WimEI6ABJtXx6j4jVK4lrVKWbPmI1iPZLErplyB7ChN6MSOH3yMOeANT7L3O6BBI4G17WjSIKE6EN6YgMP9OdgxF4XjHyyGUuWNwqy-nnIETKyp7YrFuuBkSiSloBhZkC6DRqpdrww&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;k-Q9oBnWeC3P7zOEN6IMEVFjl3EwPkqi5kbEvEqby4zKEpb_aDj4f88Us1X7QBvG3Pi8GUriEnNlXXlNH5s4-4cBfQryVjY_MOAnSakhgCLXs-srczsWIZvtkkMsh4XFiBpVzYao3w&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;Y7_nQQC2uSB7XXEfd_zaKpp_gqGPZ_TQTXDPLmSP8k77n9NImLnTL7apkE6AlJopAkgmPiLOaTgIXc4_mSkUFp5teSOxdPxk19Cvs2fL9S1Yv5U7wihZfrsrwNffyZl289J59G-UBg&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;zW6rdGwDojNoD-CjYUf8p_tbX2UMPHedXwUAM4GxNRkO0GoE1Ax5rpGr38LTnzZ_rCX-4F3kdJiAm1ahm-CfAzefUxenayuoS6cg384s5UHbZGsD2QpogBj9EJDDWlzrj8hr8DPC1Q&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;W9argz7v3GxZUkItGwILf1QRot8BNbE4kOJVvUrwOGs72KGog0QCGc8PV-3cWUvhfxkCFLuoZE7qJCQmT2Cc_LvaV46hWFFvgs5TFBdIySr2jeil-d8cYR5oN9zAvkYCGuvDlXmgxw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;T01giTZdzpQVhhijB47T97HEtIYDHTG7sVy5mpfUbaxBaGq5fU5C1aKleXpwTKOz7_aTiWAlkeM5rM3Lg_SS3qMI1JBeZR7l8M4W5a4JmFw3MVneRYZIC9JIuTO46F91SIa8JH1vgw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;GPXAjVjuNoyBjKU_Zx5wAxcjtLhsHwHXxKPuKDugGK3-jxNezXUG27MnJ5yDLay6yVJhZ_h3gCwlkd2y2gokIre6CK2wf2Ms3fk_m0BVkGI_Qx1PDRb7RL6P5l7yYeL1HWMhUfLFuw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;o2o3n7hTDoRKwTwD2IN9bHRT7Ox1K1A3ZaauheIlvQyhycJgy4mSqRusieHvYihijY9hqWaIXXDLQMAn6xa55idBgAWkuLe_Px1xNeaae7uy3nNUOb5GgPoDr8YJ9lollj3cKd_iNw&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;EM59cEYkyc2dTk6AZjPOWDG4ftxqTxXIM5RAMCgB8xP_wMaawcz8TY8ojij-zF9qve7Ae0grQqxe1R74HLA6Yh3R7UHMueMPThlUhpW_r2atTntGZTOkJVjevCoyCkG3P23wDckMXA&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;vgOBLcSW8orHNlOyfPHT071fxpWtCu27wjyHoNx1Lq8V727HYmLX7JZyRXEBpszEYSKIdSU-X1DT1kzDlUeb5amFbcBU3E0s4qSja8fXz769bM89SwSNQ4gWYYgiqUqar6EbJZS2-A&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;rmnC1D8xwBN4PI90pLKD-Och9gluScd7c_3tVF9dLOlEPB8Lp_f2Y0m6YwGnmQkpkc43hrPwoaYTzQWOJRMBbhN0vdLc4RT1DRhfCE68HrQM7YzsEYY7ANf4h_lfFAE7mx1JGv16Cg&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;DBhGz8QPdNcS4Zp-jLpSOfE4sOk7tiLCQsA_rCJs9nQ1722YiaXeOLSThCvFn9RaUqfRj0UmomPE9_A2bdXbxuNi2Re3Q1uxav8HHR7RkJKdNMzYeTWbbpaDSNGyJZgYVKujqrT8TQ&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;8lqDSeF3uiA9PcGb3W4hC1LwUBwT1oPLGColzsnKBq-0RpKZ4aVBMqcpKXlu5oYDGSrM4KjXnEk6ksgRkAtLuimSOWiMKUf-Nq34aDAVG5e1gmRIs8fgE2ghDaa6ToGi3os5-Q48zg&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;Q9FdpUiXA32lWVEa1Xdr0vigTynWsUX5nLkzvg6QCP7LsrWpOykIHrzSZIRdSubWKlJkZ5JR6eZgln7DnPV_Wso5JcFjRMP2L0TgpAMPoOSPsrP9uet3pLQPWr_ZP7aWISR7XLjhdg&amp;quot;&lt;br /&gt;
Deleting Archive: &amp;quot;e2bM8CyOWaQWtrZXk3OahrVeNpateHCkkyBd4UkBPuaJPz-HNdlnrVMA9M4nZdhqTdNMVfpsLK0HEIWxBT8zVJaHKMRdDWKSs9rb86gxyDjyIX6m4oSlik3I6EC1_ZMFhmpYKlMrPQ&amp;quot;&lt;br /&gt;
user@disp4042:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I then logged-into the AWS Console WUI, and it still shows that the &amp;quot;last inventory date&amp;quot; was &amp;quot;August 1, 2018&amp;quot; and still shows a size of 285.3 GB (as of last inventory)&lt;br /&gt;
# Perhaps we just need to wait. Probably we&#039;ll get a partial charge for December. Hopefully there&#039;s $0 charge in January.&lt;br /&gt;
# I sent Marcin an email&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
I just initiated deleting the 65 archives that make-up this AWS Glacier vault.&lt;br /&gt;
&lt;br /&gt;
Let&#039;s never use Glacier again. It was jaw-droppingly difficult to delete this. You can&#039;t just delete a vault, you have to first query the AWS API to schedule a job that generates an inventory of the vault. The job takes hours or days to run, and then you need to query the API again to download its result, which is deleted 24 hours after it completes.&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Amazon_Glacier#Delete_from_Glacier&lt;br /&gt;
&lt;br /&gt;
After a lot of research and three tries, I finally managed to download the inventory report, which listed 65 archives. Today I iterated through the report and submitted the delete query for all 65 of these archives.&lt;br /&gt;
&lt;br /&gt;
I just logged-into the AWS Console WUI, and it still shows that the &amp;quot;last inventory date&amp;quot; was &amp;quot;August 1, 2018&amp;quot; and still shows a size of 285.3 GB (as of last inventory).&lt;br /&gt;
&lt;br /&gt;
Please let me know if you&#039;re still being charged by Amazon for the month of January 2025.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
Senior Technology Advisor&lt;br /&gt;
PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7  70D2 AA3E DF71 60E2 D97B&lt;br /&gt;
&lt;br /&gt;
Open Source Ecology&lt;br /&gt;
www.opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
On 10/4/24 15:16, Marcin Jakubowski wrote:&lt;br /&gt;
&amp;gt; 1. Yes, delete the vault.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; 2. Thanks, good insights - i&#039;ll look into those more closely to see what&lt;br /&gt;
&amp;gt; would fit best.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; MJ&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# I checked munin, and now I see the &amp;quot;process info&amp;quot; charts have legends with all of the processes we&#039;ve listed. Data for some of the processes is empty, but this is pretty good. The most important is apache &amp;amp; mysql, from my experience&lt;br /&gt;
## for some reason the per-process I/O usage chart is 0s for all processes, but we&#039;ve got loads of other charts on these processes, including: thread count, process count, memory usage, cpu usage, and context switches&lt;br /&gt;
# we still only have the 5 mysql charts; I think we&#039;ve reached a point of diminishing return on that effort. this is good enough.&lt;br /&gt;
# I think that crosses off all the TODO items for munin; we have beautiful charts now ☺&lt;br /&gt;
# ...&lt;br /&gt;
# returning to wordpress...&lt;br /&gt;
# Here&#039;s TOFU 3/3 (ISP, exit in Ecuador)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Ecuador&lt;br /&gt;
2024-12-14&lt;br /&gt;
INFO: Determining Latest Version of Wordpress Core&lt;br /&gt;
INFO: Determining Latest Version of Wordpress Plugins &lt;br /&gt;
. . . . . . . . . jq: error (at &amp;lt;stdin&amp;gt;:0): Cannot index array with string &amp;quot;1.0.17&amp;quot;&lt;br /&gt;
. . . . . . . . . . . . . . . . . . . . . &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
WARNING: Failed to download plugin woo-multi-currency&lt;br /&gt;
null&lt;br /&gt;
null&lt;br /&gt;
&lt;br /&gt;
WARNING: Failed to download plugin woo-multi-currency&lt;br /&gt;
null&lt;br /&gt;
null&lt;br /&gt;
&lt;br /&gt;
https://downloads.wordpress.org/release/wordpress-6.7.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wps-hide-login.1.9.17.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/melapress-login-security.2.0.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/activitypub.4.4.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/aurora-heatmap.1.7.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/raw-html.1.6.4.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/related-posts-by-taxonomy.2.7.6.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/smart-slider-3.3.5.1.25.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/spam-destroyer.2.1.4.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/woocommerce-gateway-stripe.9.0.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wpfront-notification-bar.3.4.2.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wordpress-seo.24.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-pgp-encrypted-emails.0.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/woocommerce-multilingual.5.3.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/include-mastodon-feed.1.9.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/bulk-media-register.1.40.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/enable-media-replace.4.1.5.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/regenerate-thumbnails.3.1.6.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-qrcode.1.1.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-pgp-encrypted-emails.0.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/woocommerce-multilingual.5.3.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/include-mastodon-feed.1.9.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-2fa.2.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/advanced-nocaptcha-recaptcha.7.5.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/hcaptcha-for-forms-and-more.4.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/leaflet-map.3.4.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/extensions-leaflet-map.4.4.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wpforms-lite.1.9.2.3.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
2024-12-14&lt;br /&gt;
8b1f9a708838b8710b4198da1116689197e0a6134e0a1a5e786500576383034f  activitypub.4.4.0.zip&lt;br /&gt;
101f645a8f4becdf0394c27195679fe6d134063fde6bd851dc1d57217db5e0e9  advanced-nocaptcha-recaptcha.7.5.0.zip&lt;br /&gt;
873928dd3e940064f5dcac8b74335a9760823147388f472bb755ce5a804eaf53  aurora-heatmap.1.7.0.zip&lt;br /&gt;
5dc1fff3c3e664774ea51d52477e28c060e0b6733a47c6fb5db800eba3a4ea0f  bulk-media-register.1.40.zip&lt;br /&gt;
ad98e83a3bce28612025010d5bca77dd2d29f1df539f2667865d6d959f67e3e0  enable-media-replace.4.1.5.zip&lt;br /&gt;
1a53bdcd1ddb160d5807dc17a0f9e474402e22c899b3a9af486c9d5f0d2c4b36  extensions-leaflet-map.4.4.zip&lt;br /&gt;
27f1ab1e3f5274335d48d0cadaabdef98284880b0324771890d36a1f562fb44a  hcaptcha-for-forms-and-more.4.8.0.zip&lt;br /&gt;
bb0e885969df637767d64d02504d8defb1184db24cd0ade0111ef55ef63c81b9  include-mastodon-feed.1.9.9.zip&lt;br /&gt;
13d906d4677dc3da617752fbe9e7540f0bf84128c0fae43598a10b876dac4217  leaflet-map.3.4.1.zip&lt;br /&gt;
fd1593eefe2fa546926ce0765e7d9944e24c1aca0f9cf2606d3136f4b60cb1b5  melapress-login-security.2.0.1.zip&lt;br /&gt;
33f186b028bd2428af17199d06f9c9c2b6d5e5f5a7d8ddc47c23cfc7cff35ff4  plugin.json&lt;br /&gt;
f2cfaf226788dddd8744e723fe1ef53ef0984f956c4fa2678f932f0d8b72116c  raw-html.1.6.4.zip&lt;br /&gt;
757f29991412ef63a099c4fe77a921d23b51097ddb207dff669fbf24ace6a7d6  regenerate-thumbnails.3.1.6.zip&lt;br /&gt;
4f0e6f6505b8eb39b53dd971e8dba8fe98c65a56a7bb24443f4a513c7940f193  related-posts-by-taxonomy.2.7.6.zip&lt;br /&gt;
ebd87841f73bb7946216ae4827a413dcc97fc5094cee2f8ddb6dea7eff356358  smart-slider-3.3.5.1.25.zip&lt;br /&gt;
41bcae0e3cd94b73d7b5761527e68acb9111cb28080dd68f2f83a82cfd87f210  spam-destroyer.2.1.4.zip&lt;br /&gt;
aa52f9a4c8bbe856fe045e5c76ffedae3573374ee43435de78e1561d8e0169a9  woocommerce-gateway-stripe.9.0.0.zip&lt;br /&gt;
fbe62fc4ec4b91915024c126d9b86b3798c283f60d95435f3e6e1226ddd722aa  woocommerce-multilingual.5.3.9.zip&lt;br /&gt;
75f4e9cb71e583ca3f8b19691b5754adb9c981580762137f82443e1eec468f9c  wordpress-6.7.1.zip&lt;br /&gt;
f9ce7a98840dd4bf490d955320a68ac553c767ba7f0eeae6e4f067be5a927ef3  wordpress-seo.24.0.zip&lt;br /&gt;
feda19ad71ea22abe4dbcff422f6e0e6c8315f26a7d246099967a5eea17b4d38  wp-2fa.2.8.0.zip&lt;br /&gt;
130ba1a4f2396a8e183b8ce732c9bc8a3cf6698890f6f216550188e78e082fda  wpforms-lite.1.9.2.3.zip&lt;br /&gt;
6e1d71809f4421463fc19c5c119c5e49788cd3676b730f7980e3dcd209520a1c  wpfront-notification-bar.3.4.2.zip&lt;br /&gt;
e3cb9db45795a8caed13e00414ce7f43d2bb517a35b88cda98ad91b6871b46e2  wp-pgp-encrypted-emails.0.8.0.zip&lt;br /&gt;
e50735bcda4e85df1e522fda113ae24fd973f000e75154472544d4bcf51491f1  wp-qrcode.1.1.1.zip&lt;br /&gt;
bedfe5b456f5a5b3b6d4b29dd6577f6b8492f4594a192678555691e8403a56d7  wps-hide-login.1.9.17.1.zip&lt;br /&gt;
user@disp1555:/tmp/tmp.Yx1OD1gy5z$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, a diff of these checksums (and the ones from the past 2 days) shows they all match, except plugin.json – which is just metadata (that we won&#039;t be using and therefore doesn&#039;t matter..as long as we got the same result)&lt;br /&gt;
# I rsync&#039;d these 25x 3TOFU&#039;d, wordpress plugin .zip files up to hetzner3 and then copied them into our dir with the other plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # ls -lah | grep maltfield&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  229K Dec 14 16:51 activitypub.4.4.0.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  244K Dec 14 16:52 advanced-nocaptcha-recaptcha.7.5.0.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  133K Dec 14 16:51 aurora-heatmap.1.7.0.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield   50K Dec 14 16:51 bulk-media-register.1.40.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  348K Dec 14 16:52 enable-media-replace.4.1.5.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  2,4M Dec 14 16:52 extensions-leaflet-map.4.4.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  1,4M Dec 14 16:52 hcaptcha-for-forms-and-more.4.8.0.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield   13K Dec 14 16:52 include-mastodon-feed.1.9.9.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield   72K Dec 14 16:52 leaflet-map.3.4.1.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  415K Dec 14 16:51 melapress-login-security.2.0.1.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield   16K Dec 14 16:51 raw-html.1.6.4.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield   78K Dec 14 16:52 regenerate-thumbnails.3.1.6.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield   92K Dec 14 16:51 related-posts-by-taxonomy.2.7.6.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  3,7M Dec 14 16:51 smart-slider-3.3.5.1.25.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  631K Dec 14 16:51 spam-destroyer.2.1.4.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  1,4M Dec 14 16:51 woocommerce-gateway-stripe.9.0.0.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  5,9M Dec 14 16:52 woocommerce-multilingual.5.3.9.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield   28M Dec 14 16:51 wordpress-6.7.1.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  4,0M Dec 14 16:51 wordpress-seo.24.0.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield 1008K Dec 14 16:52 wp-2fa.2.8.0.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield   11M Dec 14 16:52 wpforms-lite.1.9.2.3.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  805K Dec 14 16:51 wpfront-notification-bar.3.4.2.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  627K Dec 14 16:52 wp-pgp-encrypted-emails.0.8.0.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield  246K Dec 14 16:52 wp-qrcode.1.1.1.zip&lt;br /&gt;
-rw-r--r--  1 maltfield maltfield   50K Dec 14 16:51 wps-hide-login.1.9.17.1.zip&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I extracted these new ones&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # new_plugin_files=$(ls -lah | grep maltfield | awk {&#039;print $9}&#039;)&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # echo $new_plugin_files &lt;br /&gt;
activitypub.4.4.0.zip advanced-nocaptcha-recaptcha.7.5.0.zip aurora-heatmap.1.7.0.zip bulk-media-register.1.40.zip enable-media-replace.4.1.5.zip extensions-leaflet-map.4.4.zip hcaptcha-for-forms-and-more.4.8.0.zip include-mastodon-feed.1.9.9.zip leaflet-map.3.4.1.zip melapress-login-security.2.0.1.zip raw-html.1.6.4.zip regenerate-thumbnails.3.1.6.zip related-posts-by-taxonomy.2.7.6.zip smart-slider-3.3.5.1.25.zip spam-destroyer.2.1.4.zip woocommerce-gateway-stripe.9.0.0.zip woocommerce-multilingual.5.3.9.zip wordpress-6.7.1.zip wordpress-seo.24.0.zip wp-2fa.2.8.0.zip wpforms-lite.1.9.2.3.zip wpfront-notification-bar.3.4.2.zip wp-pgp-encrypted-emails.0.8.0.zip wp-qrcode.1.1.1.zip wps-hide-login.1.9.17.1.zip&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # for file in $new_plugin_files; do unzip $file; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I copied these new plugins into the store site, so I could enable them and demo them&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # for file in $new_plugin_files; do dir=$(echo $file | cut -d. -f1); rsync -av --progress $dir /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/; done&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # ls -lah /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/                           total 152K&lt;br /&gt;
d---r-x--- 36 not-apache www-data 4,0K Dec 14 17:36 .&lt;br /&gt;
d---r-x---  7 not-apache www-data 4,0K Jul 23 15:15 ..&lt;br /&gt;
drwxr-xr-x  7 root       root     4,0K Dec  9 14:46 activitypub&lt;br /&gt;
drwxr-xr-x  7 root       root     4,0K Nov 19 14:31 advanced-nocaptcha-recaptcha&lt;br /&gt;
d---r-x---  4 not-apache www-data 4,0K Jul 10 22:16 akismet&lt;br /&gt;
drwxr-xr-x  5 root       root     4,0K Mar 25  2024 aurora-heatmap&lt;br /&gt;
drwxr-xr-x  5 root       root     4,0K Nov  5 23:00 bulk-media-register&lt;br /&gt;
d---r-x---  3 not-apache www-data 4,0K Sep 27 21:51 classic-editor&lt;br /&gt;
d---r-x---  8 not-apache www-data 4,0K Nov 21  2022 coingate-for-woocommerce&lt;br /&gt;
d---r-x---  7 not-apache www-data 4,0K Jul 25 08:28 contact-form-7&lt;br /&gt;
drwxr-xr-x 10 root       root     4,0K Nov 19 12:47 enable-media-replace&lt;br /&gt;
drwxr-xr-x 11 root       root     4,0K Nov 21 20:40 extensions-leaflet-map&lt;br /&gt;
d---r-x---  3 not-apache www-data 4,0K Jul  4  2022 google-authenticator&lt;br /&gt;
d---r-x---  4 not-apache www-data 4,0K Apr 23  2021 google-authenticator-encourage-user-activation&lt;br /&gt;
drwxr-xr-x  7 root       root     4,0K Dec  1 10:01 hcaptcha-for-forms-and-more&lt;br /&gt;
----r-----  1 not-apache www-data 2,3K Apr  9  2019 hello.php&lt;br /&gt;
drwxr-xr-x  2 root       root     4,0K Nov 24 16:52 include-mastodon-feed&lt;br /&gt;
----r-----  1 not-apache www-data   28 Apr  9  2019 index.php&lt;br /&gt;
drwxr-xr-x  6 root       root     4,0K Jul 22 02:54 leaflet-map&lt;br /&gt;
drwxr-xr-x  8 root       root     4,0K Dec 10 15:15 melapress-login-security&lt;br /&gt;
d---r-x---  8 not-apache www-data 4,0K Sep 27 07:22 meta-box&lt;br /&gt;
drwxr-xr-x  3 root       root     4,0K Nov 11 15:00 raw-html&lt;br /&gt;
drwxr-xr-x  6 root       root     4,0K Aug 14  2023 regenerate-thumbnails&lt;br /&gt;
drwxr-xr-x  5 root       root     4,0K Nov 18 11:47 related-posts-by-taxonomy&lt;br /&gt;
drwxr-xr-x  4 root       root     4,0K Nov 21 07:31 smart-slider-3&lt;br /&gt;
drwxr-xr-x  4 root       root     4,0K Apr 18  2024 spam-destroyer&lt;br /&gt;
d---r-x---  8 not-apache www-data 4,0K Mar 17  2024 ssl-insecure-content-fixer&lt;br /&gt;
d---r-x---  4 not-apache www-data 4,0K Oct 21  2019 vcaching&lt;br /&gt;
d---r-x--- 13 not-apache www-data 4,0K Sep 25 13:56 woocommerce&lt;br /&gt;
drwxr-xr-x  7 root       root     4,0K Dec 12 19:24 woocommerce-gateway-stripe&lt;br /&gt;
drwxr-xr-x 11 root       root     4,0K Dec  9 12:14 woocommerce-multilingual&lt;br /&gt;
drwxr-xr-x 13 root       root     4,0K Sep 24 07:41 wordpress-seo&lt;br /&gt;
drwxr-xr-x  6 root       root     4,0K Nov 19 14:34 wp-2fa&lt;br /&gt;
drwxr-xr-x  9 root       root     4,0K Dec  3 13:12 wpforms-lite&lt;br /&gt;
drwxr-xr-x  8 root       root     4,0K Nov 12 04:39 wpfront-notification-bar&lt;br /&gt;
drwxr-xr-x 11 root       root     4,0K May 25  2021 wp-pgp-encrypted-emails&lt;br /&gt;
drwxr-xr-x  3 root       root     4,0K Jul 27  2023 wp-qrcode&lt;br /&gt;
drwxr-xr-x  6 root       root     4,0K Oct  9 09:23 wps-hide-login&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I fixed the permissions https://wiki.opensourceecology.org/wiki/Hetzner3#Restore_State_.28snapshot_.26_test.29&lt;br /&gt;
# I activated the &#039;activitypub&#039; plugin. One obvious change is that we&#039;ll probably want to just have a single site-wide ActivtyPub user that combines all posts into one feed, as opposed to a different ActivityPub user/feed for each user/author on wordpress. I guess we&#039;d want the later if we actually had a lot of users posting content every month, but we only have a few posts per year, it makes sense to combine it.&lt;br /&gt;
# after this change, it says that the endpoint to follow is &#039;store.opensourceecology.org@store.opensourceecology.org&#039; or &#039;https://store.opensourceecology.org/@store.opensourceecology.org&#039;&lt;br /&gt;
## well that&#039;s annoying, but I can change it&lt;br /&gt;
## I went to the &amp;quot;Blog-Profile&amp;quot; subsettings and changed it from &#039;store.opensourceecology.org&#039; to &#039;OpenSourceEverything&#039; https://store.opensourceecology.org/wp-admin/options-general.php?page=activitypub&amp;amp;tab=blog-profile&lt;br /&gt;
## so now the blog-wide ActivityPub endpoint is &#039;opensourceeverything@store.opensourceecology.org&#039; or &#039;https://store.opensourceecology.org/@opensourceeverything&#039;&lt;br /&gt;
## unfortunately, if you just load &#039;https://store.opensourceecology.org/@opensourceeverything&#039; in your web browser, it just redirects to the frontpage (&#039;/&#039;)&lt;br /&gt;
### I imagine that if you search this URL in a mastodon instance, you&#039;d see a feed of posts – but this site isn&#039;t public yet (we&#039;re hardcoding the IP with /etc/hosts), so I can&#039;t test that until after the migration&lt;br /&gt;
# ActivityPub lets us setup an avatar and banner photo; I&#039;ll just copy what we&#039;re doing on our other socials&lt;br /&gt;
## Looks like facebook and X use the phto from brick production run from 2012 with Catarina in a white coat doing some angle griding with sparks flying https://wiki.opensourceecology.org/wiki/File:1day.jpg&lt;br /&gt;
## and the avatar is the logo from August 2011 https://wiki.opensourceecology.org/wiki/File:Open-source-ecology.png&lt;br /&gt;
## curiously the OSE_Logo article on the wiki suggests that a new icon from 2014 should be used, even though it&#039;s not what we&#039;re using on our socials in 2024 https://wiki.opensourceecology.org/wiki/OSE_Logo#Official_and_Current_Logos&lt;br /&gt;
### https://wiki.opensourceecology.org/wiki/File:OSE_logo_2014-blue.png&lt;br /&gt;
## I&#039;m going to go with the first one from Aug 2011 (what we&#039;re using everywhere else)&lt;br /&gt;
## shit, no, that file is actually very small. It&#039;s only 546 × 345. Maybe that&#039;s why we replaced it.&lt;br /&gt;
## oh, here&#039;s a bigger version of the same. there&#039;s 4 versions. some are square. some are rectangular. some have a drop shadow. some don&#039;t have a drop shadow. the square ones are 3,780 × 3,780 and the rectangular ones are 3,780 × 2,598&lt;br /&gt;
### https://wiki.opensourceecology.org/wiki/File:OSE-logo-blueprint-bg-v3-1blarge.jpg&lt;br /&gt;
### https://wiki.opensourceecology.org/wiki/File:OSE-logo-blueprint-bg-v2-1b-large.jpg&lt;br /&gt;
### https://wiki.opensourceecology.org/wiki/File:OSE-logo-blueprint-bg-v3-1large.jpg&lt;br /&gt;
### https://wiki.opensourceecology.org/wiki/File:OSE-logo-blueprint-bg-v2-1large.jpg&lt;br /&gt;
## the official mastodon docs say that the avatar is 400x400 px. So we want square. And that&#039;s taller than our main logo of 345 px https://docs.joinmastodon.org/user/profile/&lt;br /&gt;
### history shows this used to be 120x120 and it was increased to 400x400 in 2017 https://github.com/mastodon/mastodon/issues/3807&lt;br /&gt;
## so I&#039;m thinking we should use the square one (without a drop shadow for easier readability as a thumbnail and consistency with facebook branding). That&#039;s this https://wiki.opensourceecology.org/wiki/File:OSE-logo-blueprint-bg-v3-1blarge.jpg&lt;br /&gt;
## the 1day.jpg banner is 5,184x2,912 px, which should be fine for the banner photo. I&#039;m not going to bother to resize it; any autocropping should do fine, I think&lt;br /&gt;
# the media library for &#039;store&#039; is actually empty, so I uploaded both of these as the first images for the site&lt;br /&gt;
## fuck, I got errors&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 OSE-logo-blueprint-bg-v3-1blarge.jpg The server cannot process the image. This can happen if the server is busy or does not have enough resources to complete the task. Uploading a smaller image may help. Suggested maximum size is 2560 pixels.&lt;br /&gt;
&lt;br /&gt;
1day.jpg The server cannot process the image. This can happen if the server is busy or does not have enough resources to complete the task. Uploading a smaller image may help. Suggested maximum size is 2560 pixels. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# gross, looks like this is another wordpress bug&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
==&amp;gt; store.opensourceecology.org/error.log &amp;lt;==&lt;br /&gt;
[Sat Dec 14 19:52:42.215184 2024] [proxy_fcgi:error] [pid 249572:tid 249594] [client 146.70.184.187:0] AH01071: Got error &#039;PHP message: PHP Notice:  PHP Request Startup: file created in the system&#039;s temporary directory in Unknown on line 0; PHP message: PHP Deprecated:  strpos(): Passing null to parameter #1 ($haystack) of type string is deprecated in /var/www/html/store.opensourceecology.org/htdocs/wp-includes/functions.php on line 7300; PHP message: PHP Deprecated:  str_replace(): Passing null to parameter #3 ($subject) of type array|string is deprecated in /var/www/html/store.opensourceecology.org/htdocs/wp-includes/functions.php on line 2189; PHP message: PHP Fatal error:  Uncaught Error: Call to undefined function chmod() in /var/www/html/store.opensourceecology.org/htdocs/wp-admin/includes/file.php:1043\nStack trace:\n#0 /var/www/html/store.opensourceecology.org/htdocs/wp-admin/includes/file.php(1105): _wp_handle_upload()\n#1 /var/www/html/store.opensourceecology.org/htdocs/wp-admin/includes/media.php(306): wp_handle_upload()\n#2 /var/www/html/store.opensourceecology.org/htdocs/wp-admin/includes/ajax-actions.php(2632): media_handle_upload()\n#3 /var/www/html/store.opensourceecology.org/htdocs/wp-admin/async-upload.php(33): wp_ajax_upload_attachment()\n#4 {main}\n  thrown in /var/www/html/store.opensourceecology.org/htdocs/wp-admin/includes/file.php on line 1043&#039;, referer: https://store.opensourceecology.org/wp-admin/upload.php  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated wp-config.php with a hack to fix it like I did with theother bug (on set_ini)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org # diff wp-config.php.20241214 wp-config.php&lt;br /&gt;
8a9,13&lt;br /&gt;
&amp;gt; if( ! function_exists(&#039;chmod&#039;) ){&lt;br /&gt;
&amp;gt;       function chmod(){&lt;br /&gt;
&amp;gt;               return;&lt;br /&gt;
&amp;gt;       }&lt;br /&gt;
&amp;gt; }&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I tried again, and this time the upload was successful. So, yeah, it&#039;s definitely a dumb wordpress bug&lt;br /&gt;
# I reported the bug here https://core.trac.wordpress.org/ticket/62693&lt;br /&gt;
# And submitted a PR https://github.com/WordPress/wordpress-develop/pull/7352&lt;br /&gt;
# anyway, now that the images are available, I set them in the activitypub settings&lt;br /&gt;
## the banner photo had an interactive crop of the image; I just made the bottom of it the bottom of Catarina&#039;s white coat, such that we can just see the crane&#039;s hook in the top. It&#039;s not precise, but it&#039;s fine for this&lt;br /&gt;
## oh, apparently we can&#039;t even set the avatar for the blog profile; the plugin just re-uses the &amp;quot;Site Icon&amp;quot; from the general settings https://store.opensourceecology.org/wp-admin/options-general.php&lt;br /&gt;
### this site didn&#039;t have anything set; so I used the generic OSE logo per above &lt;br /&gt;
### I did reduce crop away the width a bit, so it&#039;s easier to see on small renderings&lt;br /&gt;
# I enabled the &amp;quot;Aurora Heatmap&amp;quot; plugin, and it looks like the default settings should be fine&lt;br /&gt;
&lt;br /&gt;
=Fri Dec 13, 2024=&lt;br /&gt;
&lt;br /&gt;
# here&#039;s TOFU 2/3 (VPN, exit in Sweden) for the new wordpress plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Sweden&lt;br /&gt;
2024-12-13&lt;br /&gt;
INFO: Determining Latest Version of Wordpress Core&lt;br /&gt;
INFO: Determining Latest Version of Wordpress Plugins &lt;br /&gt;
	. . . . . . . . . jq: error (at &amp;lt;stdin&amp;gt;:0): Cannot index array with string &amp;quot;1.0.17&amp;quot;&lt;br /&gt;
. . . . . . . . . . . . . . . . . . . . . &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
WARNING: Failed to download plugin woo-multi-currency&lt;br /&gt;
	null&lt;br /&gt;
	null&lt;br /&gt;
&lt;br /&gt;
WARNING: Failed to download plugin woo-multi-currency&lt;br /&gt;
	null&lt;br /&gt;
	null&lt;br /&gt;
&lt;br /&gt;
https://downloads.wordpress.org/release/wordpress-6.7.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wps-hide-login.1.9.17.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/melapress-login-security.2.0.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/activitypub.4.4.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/aurora-heatmap.1.7.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/raw-html.1.6.4.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/related-posts-by-taxonomy.2.7.6.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/smart-slider-3.3.5.1.25.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/spam-destroyer.2.1.4.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/woocommerce-gateway-stripe.9.0.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wpfront-notification-bar.3.4.2.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wordpress-seo.24.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-pgp-encrypted-emails.0.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/woocommerce-multilingual.5.3.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/include-mastodon-feed.1.9.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/bulk-media-register.1.40.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/enable-media-replace.4.1.5.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/regenerate-thumbnails.3.1.6.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-qrcode.1.1.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-pgp-encrypted-emails.0.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/woocommerce-multilingual.5.3.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/include-mastodon-feed.1.9.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-2fa.2.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/advanced-nocaptcha-recaptcha.7.5.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/hcaptcha-for-forms-and-more.4.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/leaflet-map.3.4.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/extensions-leaflet-map.4.4.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wpforms-lite.1.9.2.3.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
2024-12-13&lt;br /&gt;
8b1f9a708838b8710b4198da1116689197e0a6134e0a1a5e786500576383034f  activitypub.4.4.0.zip&lt;br /&gt;
101f645a8f4becdf0394c27195679fe6d134063fde6bd851dc1d57217db5e0e9  advanced-nocaptcha-recaptcha.7.5.0.zip&lt;br /&gt;
873928dd3e940064f5dcac8b74335a9760823147388f472bb755ce5a804eaf53  aurora-heatmap.1.7.0.zip&lt;br /&gt;
5dc1fff3c3e664774ea51d52477e28c060e0b6733a47c6fb5db800eba3a4ea0f  bulk-media-register.1.40.zip&lt;br /&gt;
ad98e83a3bce28612025010d5bca77dd2d29f1df539f2667865d6d959f67e3e0  enable-media-replace.4.1.5.zip&lt;br /&gt;
1a53bdcd1ddb160d5807dc17a0f9e474402e22c899b3a9af486c9d5f0d2c4b36  extensions-leaflet-map.4.4.zip&lt;br /&gt;
27f1ab1e3f5274335d48d0cadaabdef98284880b0324771890d36a1f562fb44a  hcaptcha-for-forms-and-more.4.8.0.zip&lt;br /&gt;
bb0e885969df637767d64d02504d8defb1184db24cd0ade0111ef55ef63c81b9  include-mastodon-feed.1.9.9.zip&lt;br /&gt;
13d906d4677dc3da617752fbe9e7540f0bf84128c0fae43598a10b876dac4217  leaflet-map.3.4.1.zip&lt;br /&gt;
fd1593eefe2fa546926ce0765e7d9944e24c1aca0f9cf2606d3136f4b60cb1b5  melapress-login-security.2.0.1.zip&lt;br /&gt;
db016ec3c115ec20c1f0fba87b48b5eddee3a11f30d573b8a266a01077ee7ee1  plugin.json&lt;br /&gt;
f2cfaf226788dddd8744e723fe1ef53ef0984f956c4fa2678f932f0d8b72116c  raw-html.1.6.4.zip&lt;br /&gt;
757f29991412ef63a099c4fe77a921d23b51097ddb207dff669fbf24ace6a7d6  regenerate-thumbnails.3.1.6.zip&lt;br /&gt;
4f0e6f6505b8eb39b53dd971e8dba8fe98c65a56a7bb24443f4a513c7940f193  related-posts-by-taxonomy.2.7.6.zip&lt;br /&gt;
ebd87841f73bb7946216ae4827a413dcc97fc5094cee2f8ddb6dea7eff356358  smart-slider-3.3.5.1.25.zip&lt;br /&gt;
41bcae0e3cd94b73d7b5761527e68acb9111cb28080dd68f2f83a82cfd87f210  spam-destroyer.2.1.4.zip&lt;br /&gt;
aa52f9a4c8bbe856fe045e5c76ffedae3573374ee43435de78e1561d8e0169a9  woocommerce-gateway-stripe.9.0.0.zip&lt;br /&gt;
fbe62fc4ec4b91915024c126d9b86b3798c283f60d95435f3e6e1226ddd722aa  woocommerce-multilingual.5.3.9.zip&lt;br /&gt;
75f4e9cb71e583ca3f8b19691b5754adb9c981580762137f82443e1eec468f9c  wordpress-6.7.1.zip&lt;br /&gt;
f9ce7a98840dd4bf490d955320a68ac553c767ba7f0eeae6e4f067be5a927ef3  wordpress-seo.24.0.zip&lt;br /&gt;
feda19ad71ea22abe4dbcff422f6e0e6c8315f26a7d246099967a5eea17b4d38  wp-2fa.2.8.0.zip&lt;br /&gt;
130ba1a4f2396a8e183b8ce732c9bc8a3cf6698890f6f216550188e78e082fda  wpforms-lite.1.9.2.3.zip&lt;br /&gt;
6e1d71809f4421463fc19c5c119c5e49788cd3676b730f7980e3dcd209520a1c  wpfront-notification-bar.3.4.2.zip&lt;br /&gt;
e3cb9db45795a8caed13e00414ce7f43d2bb517a35b88cda98ad91b6871b46e2  wp-pgp-encrypted-emails.0.8.0.zip&lt;br /&gt;
e50735bcda4e85df1e522fda113ae24fd973f000e75154472544d4bcf51491f1  wp-qrcode.1.1.1.zip&lt;br /&gt;
bedfe5b456f5a5b3b6d4b29dd6577f6b8492f4594a192678555691e8403a56d7  wps-hide-login.1.9.17.1.zip&lt;br /&gt;
user@disp7639:/tmp/tmp.FcejZlvblB$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# I checked munin to see if my changes yesterday have filled the empty charts with data&lt;br /&gt;
# 3 out of 5 of the mysql charts now have data. the two that are empty still are &amp;quot;MySQL InnoDB free tablespace&amp;quot; and &amp;quot;MySQL slow queries&amp;quot;. It&#039;s quite possible that this idle server has no data because – it&#039;s idle&lt;br /&gt;
# I also wanted to see about adding all the other mysql charts possible, so I made a backup of the munin dir, deleted all the pluings/ symlinks for mysql, and recreated them&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # tar -czf /var/tmp/munin.20241213.tar.gz /etc/munin/*&lt;br /&gt;
tar: Removing leading `/&#039; from member names&lt;br /&gt;
tar: Removing leading `/&#039; from hard link targets&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # du -sh /var/tmp/munin.20241213.tar.gz &lt;br /&gt;
28K     /var/tmp/munin.20241213.tar.gz&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# shit, I just accidentally deleted all the mysql plugins&lt;br /&gt;
# I was *trying* to delete the symlinks from /etc/munin/plugins. And to be safe, I even created a backup of this dir first. But then I was in the wrong screen session and I didn&#039;t realize my pwd&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # tar -czf /var/tmp/munin.20241213.tar.gz /etc/munin/*&lt;br /&gt;
tar: Removing leading `/&#039; from member names&lt;br /&gt;
tar: Removing leading `/&#039; from hard link targets&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # du -sh /var/tmp/munin.20241213.tar.gz&lt;br /&gt;
28K     /var/tmp/munin.20241213.tar.gz&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # rm -f mysql_*&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah | grep -i mysql&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fortunately I was able to restore them using apt. Note the package isn&#039;t &#039;munin&#039; but &#039;munin-plugins-core&#039;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah | grep -i mysql&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # sudo apt-get -o Dpkg::Options::=&amp;quot;--force-confmiss&amp;quot; install --reinstall munin-plugins-core&lt;br /&gt;
Reading package lists... Done&lt;br /&gt;
Building dependency tree... Done&lt;br /&gt;
Reading state information... Done&lt;br /&gt;
0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 71 not upgraded.&lt;br /&gt;
Need to get 0 B/242 kB of archives.&lt;br /&gt;
After this operation, 0 B of additional disk space will be used.&lt;br /&gt;
(Reading database ... 66512 files and directories currently installed.)&lt;br /&gt;
Preparing to unpack .../munin-plugins-core_2.0.73-1_all.deb ...&lt;br /&gt;
Unpacking munin-plugins-core (2.0.73-1) over (2.0.73-1) ...&lt;br /&gt;
Setting up munin-plugins-core (2.0.73-1) ...&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # ls -lah | grep -i mysql                                      -rwxr-xr-x 1 root root  43K Mar 21  2023 mysql_&lt;br /&gt;
-rwxr-xr-x 1 root root 1,8K Mar 21  2023 mysql_bytes&lt;br /&gt;
-rwxr-xr-x 1 root root 5,6K Mar 21  2023 mysql_innodb&lt;br /&gt;
-rwxr-xr-x 1 root root 2,6K Mar 21  2023 mysql_queries&lt;br /&gt;
-rwxr-xr-x 1 root root 1,5K Mar 21  2023 mysql_slowqueries&lt;br /&gt;
-rwxr-xr-x 1 root root 1,6K Mar 21  2023 mysql_threads&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, now I *actually* deleted the symlinks and recreated them to be the complete set of possible charts&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # tar -czf /var/tmp/munin.20241213.tar.gz /etc/munin/*&lt;br /&gt;
tar: Removing leading `/&#039; from member names&lt;br /&gt;
tar: Removing leading `/&#039; from hard link targets&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # du -sh /var/tmp/munin.20241213.tar.gz&lt;br /&gt;
28K     /var/tmp/munin.20241213.tar.gz&lt;br /&gt;
root@hetzner3 /usr/share/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ls -lah | grep mysql&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Sep 25 01:47 mysql_ -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   36 Sep 25 01:47 mysql_bytes -&amp;gt; /usr/share/munin/plugins/mysql_bytes&lt;br /&gt;
lrwxrwxrwx 1 root root   37 Sep 25 01:47 mysql_innodb -&amp;gt; /usr/share/munin/plugins/mysql_innodb&lt;br /&gt;
lrwxrwxrwx 1 root root   42 Sep 25 01:47 mysql_isam_space_ -&amp;gt; /usr/share/munin/plugins/mysql_isam_space_&lt;br /&gt;
lrwxrwxrwx 1 root root   38 Sep 25 01:47 mysql_queries -&amp;gt; /usr/share/munin/plugins/mysql_queries&lt;br /&gt;
lrwxrwxrwx 1 root root   42 Sep 25 01:47 mysql_slowqueries -&amp;gt; /usr/share/munin/plugins/mysql_slowqueries&lt;br /&gt;
lrwxrwxrwx 1 root root   38 Sep 25 01:47 mysql_threads -&amp;gt; /usr/share/munin/plugins/mysql_threads&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # rm -f mysql*&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # for i in `/usr/share/munin/plugins/mysql_ suggest`; do ln -sf /usr/share/munin/plugins/mysql_ $i; done&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ls -lah | grep mysql&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 binlog_groupcommit -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 bin_relay_log -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 commands -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 connections -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 files_tables -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_bpool -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_bpool_act -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_insert_buf -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_io -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_io_pend -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_log -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_rows -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_semaphores -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_tnx -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 myisam_indexes -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 network_traffic -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 qcache -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 qcache_mem -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 replication -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 select_types -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 slow -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 sorts -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 table_locks -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 tmp_tables -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I applied and regenerated munin&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # service munin-node restart&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # sudo -u munin /usr/bin/munin-cron&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well now I only see 1 chart on the munin wui = &amp;quot;MySQL InnoDB free tablespace&amp;quot; :( And it&#039;s still empty too&lt;br /&gt;
# let&#039;s wait and see if the others come back?&lt;br /&gt;
# actually, it looks like there&#039;s a better way to do this https://www.thesysadmin.rocks/2020/06/24/installing-munin-on-ubuntu-20-04-with-mysql-plugin/&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
munin-node-configure  --shell | grep mysql&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the above command should output a bunch of `ln` commands with exactly what we need to copy &amp;amp; paste &amp;amp; create the mysql symlinks&lt;br /&gt;
# note this might just be all the &amp;quot;subcharts&amp;quot; for &#039;mysql_&#039;, but it&#039;ll miss all the other &#039;mysql_*&#039; files (eg &#039;mysql_queries&#039; links to &#039;mysql_queries&#039;, not &#039;mysql_&#039; like the rest)&lt;br /&gt;
# to get those last ones, we&#039;d want something like&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ln -s /usr/share/munin/plugins/mysql_* /etc/munin/plugins&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# after some hours, the whole mysql section in munin disappeared; let&#039;s try that again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ls -lah | grep mysql&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 binlog_groupcommit -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 bin_relay_log -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 commands -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 connections -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 files_tables -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_bpool -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_bpool_act -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_insert_buf -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_io -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_io_pend -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_log -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_rows -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_semaphores -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 innodb_tnx -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 myisam_indexes -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 network_traffic -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 qcache -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 qcache_mem -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 replication -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 select_types -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 slow -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 sorts -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 table_locks -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 13 20:09 tmp_tables -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ls -lah | grep -i mysql | awk &#039;{print $9}&#039;&lt;br /&gt;
binlog_groupcommit&lt;br /&gt;
bin_relay_log&lt;br /&gt;
commands&lt;br /&gt;
connections&lt;br /&gt;
files_tables&lt;br /&gt;
innodb_bpool&lt;br /&gt;
innodb_bpool_act&lt;br /&gt;
innodb_insert_buf&lt;br /&gt;
innodb_io&lt;br /&gt;
innodb_io_pend&lt;br /&gt;
innodb_log&lt;br /&gt;
innodb_rows&lt;br /&gt;
innodb_semaphores&lt;br /&gt;
innodb_tnx&lt;br /&gt;
myisam_indexes&lt;br /&gt;
network_traffic&lt;br /&gt;
qcache&lt;br /&gt;
qcache_mem&lt;br /&gt;
replication&lt;br /&gt;
select_types&lt;br /&gt;
slow&lt;br /&gt;
sorts&lt;br /&gt;
table_locks&lt;br /&gt;
tmp_tables&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # for file in $(ls -lah | grep -i mysql | awk &#039;{print $9}&#039;); do rm -f $file; done&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ls -lah | grep -i mysql&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ln -s /usr/share/munin/plugins/mysql_* /etc/munin/plugins&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ls -lah | grep -i mysql&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:18 mysql_ -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   36 Dec 14 01:18 mysql_bytes -&amp;gt; /usr/share/munin/plugins/mysql_bytes&lt;br /&gt;
lrwxrwxrwx 1 root root   37 Dec 14 01:18 mysql_innodb -&amp;gt; /usr/share/munin/plugins/mysql_innodb&lt;br /&gt;
lrwxrwxrwx 1 root root   38 Dec 14 01:18 mysql_queries -&amp;gt; /usr/share/munin/plugins/mysql_queries&lt;br /&gt;
lrwxrwxrwx 1 root root   42 Dec 14 01:18 mysql_slowqueries -&amp;gt; /usr/share/munin/plugins/mysql_slowqueries&lt;br /&gt;
lrwxrwxrwx 1 root root   38 Dec 14 01:18 mysql_threads -&amp;gt; /usr/share/munin/plugins/mysql_threads&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # munin-node-configure  --shell 2&amp;gt;&amp;amp;1 | grep mysql&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # munin-node-configure  --suggest 2&amp;gt;&amp;amp;1 | grep mysql&lt;br /&gt;
mysql_                     | yes  | no [DBI connect(&#039;mysql;mysql_read_default_file=/etc/mysql/debian.cnf;mysql_connect_timeout=5&#039;,&#039;munin&#039;,...) failed: Access denied for user &#039;munin&#039;@&#039;localhost&#039; (using password: NO)]&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# unfortunately `munin-node-configure` failed there at the end (it had no ouptut), so I&#039;m just going to redo what I did before, but this time actually add the missing &#039;mysql_&#039; prefix to the symlink name&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # for i in `/usr/share/munin/plugins/mysql_ suggest`; do ln -sf /usr/share/munin/plugins/mysql_ mysql_$i; done&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ls -lah | grep -i mysql&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:18 mysql_ -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_binlog_groupcommit -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_bin_relay_log -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   36 Dec 14 01:18 mysql_bytes -&amp;gt; /usr/share/munin/plugins/mysql_bytes&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_commands -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_connections -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_files_tables -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   37 Dec 14 01:18 mysql_innodb -&amp;gt; /usr/share/munin/plugins/mysql_innodb&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_innodb_bpool -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_innodb_bpool_act -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_innodb_insert_buf -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_innodb_io -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_innodb_io_pend -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_innodb_log -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_innodb_rows -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_innodb_semaphores -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_innodb_tnx -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_myisam_indexes -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_network_traffic -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_qcache -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_qcache_mem -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   38 Dec 14 01:18 mysql_queries -&amp;gt; /usr/share/munin/plugins/mysql_queries&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_replication -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_select_types -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_slow -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   42 Dec 14 01:18 mysql_slowqueries -&amp;gt; /usr/share/munin/plugins/mysql_slowqueries&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_sorts -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_table_locks -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   38 Dec 14 01:18 mysql_threads -&amp;gt; /usr/share/munin/plugins/mysql_threads&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Dec 14 01:27 mysql_tmp_tables -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # service munin-node restart&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # sudo -u munin /usr/bin/munin-cron&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well now the UI has a &amp;quot;mysql&amp;quot; section again, but there&#039;s only 5 charts.&lt;br /&gt;
# As before, 3 out of 5 of the charts have data (but now with a big chunk of the last several hours of data missing)&lt;br /&gt;
# for some reason all our new charts are missing too&lt;br /&gt;
# I guess I&#039;ll wait until tomorrow. If they show up, then I&#039;ll want to document how I created these extra charts on &#039;hetzner3&#039; – or just add that to ansible. If they don&#039;t show-up, then forget about it, these 5 charts are probably good enough.&lt;br /&gt;
# I confirmed that the two varnish uptime charts are now visible and have data; that&#039;s fixed&lt;br /&gt;
# I also checked the &amp;quot;process info&amp;quot; charts, which now has data for the &amp;quot;apache2&amp;quot; charts, but the &amp;quot;mysqld&amp;quot; ones are empty. I&#039;m pretty sure this is because the process name is &amp;quot;mariadb&amp;quot; (not &amp;quot;mysqld&amp;quot;)&lt;br /&gt;
# I filled this out to more processes, so it now reads&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[proc]&lt;br /&gt;
env.procname apache2|mariadbd|nginx|varnishd|varnishlog|varnishncsa|wazuh-db|wazuh-remoted|wazuh-syscheckd|wazuh-analysisd|wazuh-authd|python3|journalctl|systemd|kworker|unattended-upgrade&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;ll have to check in on this tomorrow to see if the charts have updated&lt;br /&gt;
# ...&lt;br /&gt;
# I wanted to check-in on the glacier inventory job that I kicked-off yesterday, but – as with the one I kicked off a few months ago – it shows no record of it!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp4042:~$ aws glacier get-job-output --account-id OBFUSCATED --region us-west-2 --vault-name deleteMeIn2020 --job-id &amp;quot;ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&amp;quot; ./output.json&lt;br /&gt;
&lt;br /&gt;
An error occurred (ResourceNotFoundException) when calling the GetJobOutput operation: The job ID was not found: ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&lt;br /&gt;
user@disp4042:~$ aws glacier list-jobs --account-id OBFUSCATED --region us-west-2 --vault-name deleteMeIn2020&lt;br /&gt;
{&lt;br /&gt;
&amp;quot;JobList&amp;quot;: []&lt;br /&gt;
}&lt;br /&gt;
user@disp4042:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh wait, that was the job ID from a few months ago. But I get the same result from the one from yesterday&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
aws glacier get-job-output --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020 --job-id &amp;quot;tnLbYFxINicZDRYy06ri1dlxiVX8wVKLMKrQmiyatMuhfs26ggw8o_nMzc2VGpWjF8Z9IDqnXclrdq9B3pFc2X5n99qN&amp;quot; ./output.json&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# jesus christ, SO says that the jobs may expire after 24 hours https://stackoverflow.com/questions/45112105/aws-glacier-job-id-was-not-found&lt;br /&gt;
# so the jobs can take many days to run, and they expire within 24 hours. They really do make this as hard as fucking possible to delete vaults! It really should be criminal how hard they make it. This is madness!&lt;br /&gt;
# oh, one other comment in ^ that thread worth revisting: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
@OrestGulman Are you sure that you have data stored in the Amazon Glacier service? Please note that this is different to storing data in Amazon S3 with a &#039;Glacier&#039; class&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we actually are trying to delete data in Amazon S3 in the glacier class. I think? Anyway I found the vault in the &amp;quot;Glacier S3&amp;quot; Service of the Console WUI. Could that be adding a complication?&lt;br /&gt;
# anyway, for now, I just kicked-off another job&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp4042:~$ aws glacier initiate-job --job-parameters &#039;{&amp;quot;Type&amp;quot;: &amp;quot;inventory-retrieval&amp;quot;}&#039; --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020&lt;br /&gt;
{&lt;br /&gt;
&amp;quot;location&amp;quot;: &amp;quot;/REDACTED/vaults/deleteMeIn2020/jobs/Y66F8y-ft3r8ILhMUHth3DbDWwoMZCm0uPXC9R9_dCj74D_0cUwoX5btOTpLh9Vf4eNJS6KPP5JyujUiZ1WG6ciFGgQL&amp;quot;,&lt;br /&gt;
&amp;quot;jobId&amp;quot;: &amp;quot;Y66F8y-ft3r8ILhMUHth3DbDWwoMZCm0uPXC9R9_dCj74D_0cUwoX5btOTpLh9Vf4eNJS6KPP5JyujUiZ1WG6ciFGgQL&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
user@disp4042:~$ &lt;br /&gt;
&lt;br /&gt;
user@disp4042:~$ aws glacier list-jobs --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020&lt;br /&gt;
{&lt;br /&gt;
&amp;quot;JobList&amp;quot;: [&lt;br /&gt;
	{&lt;br /&gt;
		&amp;quot;JobId&amp;quot;: &amp;quot;Y66F8y-ft3r8ILhMUHth3DbDWwoMZCm0uPXC9R9_dCj74D_0cUwoX5btOTpLh9Vf4eNJS6KPP5JyujUiZ1WG6ciFGgQL&amp;quot;,&lt;br /&gt;
		&amp;quot;Action&amp;quot;: &amp;quot;InventoryRetrieval&amp;quot;,&lt;br /&gt;
		&amp;quot;VaultARN&amp;quot;: &amp;quot;arn:aws:glacier:us-west-2:REDACTED:vaults/deleteMeIn2020&amp;quot;,&lt;br /&gt;
		&amp;quot;CreationDate&amp;quot;: &amp;quot;2024-12-14T01:59:59.138Z&amp;quot;,&lt;br /&gt;
		&amp;quot;Completed&amp;quot;: false,&lt;br /&gt;
		&amp;quot;StatusCode&amp;quot;: &amp;quot;InProgress&amp;quot;,&lt;br /&gt;
		&amp;quot;InventoryRetrievalParameters&amp;quot;: {&lt;br /&gt;
			&amp;quot;Format&amp;quot;: &amp;quot;JSON&amp;quot;&lt;br /&gt;
		}&lt;br /&gt;
	}&lt;br /&gt;
]&lt;br /&gt;
}&lt;br /&gt;
user@disp4042:~$ &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Thr Dec 12, 2024=&lt;br /&gt;
&lt;br /&gt;
# I wanted to follow-up with the glacier deletion task&lt;br /&gt;
# A couple months ago (on Oct 4), I documented that Marcin gave me auth to delete the 285.3 GB &#039;deleteMeIn2020&#039; bucket from our amazon glacier account, which has been costing us $1.03/mo  https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024_Q4#Fri_Oct_04.2C_2024&lt;br /&gt;
# It&#039;s, like, unbelievably complicated to delete this bucket&lt;br /&gt;
# First, before I can delete it, Amazon forces you to generate a recent inventory&lt;br /&gt;
# An inventory can only be created via the API, and it takes time for the job to finish&lt;br /&gt;
# I last checked on the inventory (that I initiated on Oct 4) on Oct 6, and it wasn&#039;t yet ready https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024_Q4#Sun_Oct_06.2C_2024&lt;br /&gt;
# ok, here&#039;s our commands to get the inventory job&#039;s result&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo apt-get -y install awscli&lt;br /&gt;
&lt;br /&gt;
  aws configure set aws_access_key_id &#039;REDACTED&#039;&lt;br /&gt;
  aws configure set aws_secret_access_key &#039;REDACTED&#039;&lt;br /&gt;
&lt;br /&gt;
aws glacier get-job-output --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020 --job-id &amp;quot;ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&amp;quot; ./output.json&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# unfortunately, it says the keys that I used (which I extracted from hetzner2:/root/backups/glacierTest.py) don&#039;t have permission to see the job&#039;s output –- even though those are the keys that I used to initiate the job!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp5233:~$ sudo apt-get -y install awscli&lt;br /&gt;
Reading package lists... Done&lt;br /&gt;
Building dependency tree... Done&lt;br /&gt;
Reading state information... Done&lt;br /&gt;
awscli is already the newest version (2.9.19-1).&lt;br /&gt;
The following packages were automatically installed and are no longer required:&lt;br /&gt;
  chromium-common chromium-sandbox libc++1-16 libc++abi1-16&lt;br /&gt;
  libcommons-compress-java librnp0 libunwind-16 libwpe-1.0-1&lt;br /&gt;
  libwpebackend-fdo-1.0-1 linux-image-6.1.0-10-amd64&lt;br /&gt;
  linux-image-6.1.0-11-amd64 linux-image-6.1.0-13-amd64&lt;br /&gt;
  linux-image-6.1.0-17-amd64 linux-image-6.1.0-18-amd64&lt;br /&gt;
  linux-image-6.1.0-20-amd64 linux-image-6.1.0-21-amd64&lt;br /&gt;
  linux-image-6.1.0-22-amd64 linux-image-6.1.0-23-amd64&lt;br /&gt;
  linux-image-6.1.0-25-amd64 linux-image-6.1.0-26-amd64&lt;br /&gt;
Use &#039;sudo apt autoremove&#039; to remove them.&lt;br /&gt;
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.&lt;br /&gt;
user@disp5233:~$  &lt;br /&gt;
&lt;br /&gt;
user@disp5233:~$   aws configure set aws_access_key_id &#039;REDACTED&#039;&lt;br /&gt;
user@disp5233:~$   aws configure set aws_secret_access_key &#039;REDACTED&#039;&lt;br /&gt;
user@disp5233:~$ &lt;br /&gt;
&lt;br /&gt;
user@disp5233:~$ aws glacier get-job-output --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020 --job-id &amp;quot;ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&amp;quot; ./output.json&lt;br /&gt;
&lt;br /&gt;
An error occurred (AccessDeniedException) when calling the GetJobOutput operation: User: arn:aws:iam::REDACTED:user/backup-cron is not authorized to perform: glacier:GetJobOutput on resource: arn:aws:glacier:us-west-2:REDACTED:vaults/deleteMeIn2020&lt;br /&gt;
user@disp5233:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I also found some aws cred file in the root user&#039;s home, but it had the exact same creds as the glacierTest.py file :(&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /root/.aws/credentials &lt;br /&gt;
[default]&lt;br /&gt;
aws_access_key_id = REDACTED&lt;br /&gt;
aws_secret_access_key = REDACTED&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I logged into the aws console as my personal &#039;maltfield&#039; user (using creds in my personal keepass)&lt;br /&gt;
# I clicked on my username in the top-right -&amp;gt; &amp;quot;Security Credentials&amp;quot;&lt;br /&gt;
# the WUI showed that I have one set of Access Keys that were created &amp;quot;2449 days ago&amp;quot;&lt;br /&gt;
# I checked the notes of my keepass and found a set of keys with the same key id&lt;br /&gt;
# I reconfigured my aws cli on my local dispVM to use these creds, but I got the same error (though this time for user = &#039;maltfield&#039;)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp5233:~$   aws configure set aws_access_key_id &#039;REDACTED&#039;&lt;br /&gt;
user@disp5233:~$   aws configure set aws_secret_access_key &#039;REDACTED&#039;&lt;br /&gt;
user@disp5233:~$ &lt;br /&gt;
&lt;br /&gt;
user@disp5233:~$ aws glacier get-job-output --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020 --job-id &amp;quot;ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&amp;quot; ./output.json&lt;br /&gt;
&lt;br /&gt;
An error occurred (AccessDeniedException) when calling the GetJobOutput operation: User: arn:aws:iam::REDACTED:user/maltfield is not authorized to perform: glacier:GetJobOutput on resource: arn:aws:glacier:us-west-2:REDACTED:vaults/deleteMeIn2020&lt;br /&gt;
user@disp5233:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, I realized that I was executing the above command literally with &#039;REDACTED&#039; as the &#039;--account-id&#039; – when I swapped that with the actual numerical account ID, I got the same results for the &#039;maltfield&#039; user&lt;br /&gt;
# ...but when I reverted back to the original creds (for the &#039;backup-cron&#039; user), I got a different error saying that the job ID was not found. Is it possible that there&#039;s a very narrow window where the job needs to be queried? This is impossible!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp5233:~$ aws glacier get-job-output --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020 --job-id &amp;quot;ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&amp;quot; ./output.json&lt;br /&gt;
&lt;br /&gt;
An error occurred (ResourceNotFoundException) when calling the GetJobOutput operation: The job ID was not found: ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&lt;br /&gt;
user@disp5233:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# in the WUI, I switched to the s3 Glacier service https://us-west-2.console.aws.amazon.com/glacier/home?region=us-west-2#/vaults&lt;br /&gt;
# I clicked-on the &#039;deleteMeIn2020&#039; bucket, and it still says the last inventory date is &amp;quot;August 1, 2018, 02:41:31 (UTC-05:00)&amp;quot;. wtf?&lt;br /&gt;
# well a query for all jobs shows no jobs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp5233:~$ aws glacier list-jobs --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020&lt;br /&gt;
{&lt;br /&gt;
	&amp;quot;JobList&amp;quot;: []&lt;br /&gt;
}&lt;br /&gt;
user@disp5233:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, I initiated a new inventory job AGAIN&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp5233:~$ aws glacier initiate-job --job-parameters &#039;{&amp;quot;Type&amp;quot;: &amp;quot;inventory-retrieval&amp;quot;}&#039; --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020&lt;br /&gt;
{&lt;br /&gt;
	&amp;quot;location&amp;quot;: &amp;quot;/REDACTED/vaults/deleteMeIn2020/jobs/tnLbYFxINicZDRYy06ri1dlxiVX8wVKLMKrQmiyatMuhfs26ggw8o_nMzc2VGpWjF8Z9IDqnXclrdq9B3pFc2X5n99qN&amp;quot;,&lt;br /&gt;
	&amp;quot;jobId&amp;quot;: &amp;quot;tnLbYFxINicZDRYy06ri1dlxiVX8wVKLMKrQmiyatMuhfs26ggw8o_nMzc2VGpWjF8Z9IDqnXclrdq9B3pFc2X5n99qN&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
user@disp5233:~$ &lt;br /&gt;
&lt;br /&gt;
user@disp5233:~$ aws glacier list-jobs --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020&lt;br /&gt;
{&lt;br /&gt;
	&amp;quot;JobList&amp;quot;: [&lt;br /&gt;
		{&lt;br /&gt;
			&amp;quot;JobId&amp;quot;: &amp;quot;tnLbYFxINicZDRYy06ri1dlxiVX8wVKLMKrQmiyatMuhfs26ggw8o_nMzc2VGpWjF8Z9IDqnXclrdq9B3pFc2X5n99qN&amp;quot;,&lt;br /&gt;
			&amp;quot;Action&amp;quot;: &amp;quot;InventoryRetrieval&amp;quot;,&lt;br /&gt;
			&amp;quot;VaultARN&amp;quot;: &amp;quot;arn:aws:glacier:us-west-2:REDACTED:vaults/deleteMeIn2020&amp;quot;,&lt;br /&gt;
			&amp;quot;CreationDate&amp;quot;: &amp;quot;2024-12-12T18:07:24.156Z&amp;quot;,&lt;br /&gt;
			&amp;quot;Completed&amp;quot;: false,&lt;br /&gt;
			&amp;quot;StatusCode&amp;quot;: &amp;quot;InProgress&amp;quot;,&lt;br /&gt;
			&amp;quot;InventoryRetrievalParameters&amp;quot;: {&lt;br /&gt;
				&amp;quot;Format&amp;quot;: &amp;quot;JSON&amp;quot;&lt;br /&gt;
			}&lt;br /&gt;
		}&lt;br /&gt;
	]&lt;br /&gt;
}&lt;br /&gt;
user@disp5233:~$ &lt;br /&gt;
&lt;br /&gt;
user@disp5233:~$ aws glacier get-job-output --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020 --job-id &amp;quot;tnLbYFxINicZDRYy06ri1dlxiVX8wVKLMKrQmiyatMuhfs26ggw8o_nMzc2VGpWjF8Z9IDqnXclrdq9B3pFc2X5n99qN&amp;quot; ./output.json&lt;br /&gt;
&lt;br /&gt;
An error occurred (InvalidParameterValueException) when calling the GetJobOutput operation: The job is not currently available for download: tnLbYFxINicZDRYy06ri1dlxiVX8wVKLMKrQmiyatMuhfs26ggw8o_nMzc2VGpWjF8Z9IDqnXclrdq9B3pFc2X5n99qN&lt;br /&gt;
user@disp5233:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess now we wait some days and hopefully it doesn&#039;t disappear again before we have a chance to check up on it?&lt;br /&gt;
# ...&lt;br /&gt;
# returning to store.opensourceecology.org, the theme issues are now resolved&lt;br /&gt;
# the site is still pretty broken, but it was never really working anyway, so that&#039;s as good as it&#039;ll get&lt;br /&gt;
# what does remain, however, is finding a replacement for our now-unavailable security plugins &#039; rename-wp-login&#039; and &#039;force-strong-passwords&#039;&lt;br /&gt;
# currently the login is exposed on the normal wp-login.php, which isn&#039;t good https://store.opensourceecology.org/wp-login.php&lt;br /&gt;
# I recently setup a wordpress website which uses two plugins&lt;br /&gt;
## melapress-login-security https://wordpress.org/plugins/melapress-login-security/&lt;br /&gt;
## wp-2fa https://wordpress.org/plugins/wp-2fa/&lt;br /&gt;
# it looks like I did install wps-hide-login, but I ended up deactivating it because &#039;melapress-login-security&#039; includes this (as well as forcing strong passwords, feeding two birds with one scone)&lt;br /&gt;
# one thing I didn&#039;t like so much about the &#039;wp-2fa&#039; plugin was that it doesn&#039;t have a &amp;quot;relaxed&amp;quot; mode.&lt;br /&gt;
# Currently we&#039;re using a &#039;google-authenticator&#039; plugin https://wordpress.org/plugins/google-authenticator/&lt;br /&gt;
## this plugin hasn&#039;t been updated in 2 years, but it has &amp;gt;30,000 active installations, and it appears to be working fine still&lt;br /&gt;
## what I really like about it is that it has a checkbox for &amp;quot;Relaxed mode allows for more time drifting on your phone clock (±4 min).&amp;quot;&lt;br /&gt;
### this setting is found on the Users -&amp;gt; Profile&lt;br /&gt;
### in my experience, 2FA tends to cause security issues with websites due to loss of availability because users constantly get locked out due to time sync issues on their phone. For our purposes, I think the happy-medium between security-and-convenience is found where several codes in the past (and future) few minutes are ok. I think there&#039;s very little risk in this, but it&#039;s surprising how few TOTP implementations allow this&lt;br /&gt;
### oh, actually, it&#039;s been years since they&#039;ve done a release, but the last commit was only 11 months ago, so that&#039;s not terrible https://github.com/ivankruchkoff/google-authenticator&lt;br /&gt;
### I&#039;m not a huge fan of this plugin otherwise; it&#039;s not actively updated. But if I could find an alternative that&#039;s equally as lightweight&lt;br /&gt;
# I figured I&#039;d go ahead now and open a feature request to add a &amp;quot;relaxed&amp;quot; mode to the alternative &#039;wp-2fa&#039; that I&#039;ve already tested, but I couldn&#039;t fucking find the forge!&lt;br /&gt;
## I opened a support ticket on their wordpress plugin page asking for a link to their vcs forge (eg github) https://wordpress.org/support/topic/where-is-the-vcs-forge/&lt;br /&gt;
# anyway, I&#039;ll go ahead and 3TOFU all of them, but now I&#039;m leaning towards not using melapress&#039; &#039;wp-2fa&#039;, but yes using melapress&#039; &#039;melapress-login-security&#039;&lt;br /&gt;
# in poking around melapress, I saw they also have a plugin for hCaptcha = &#039;advanced-nocaptcha-recaptcha&#039; https://wordpress.org/plugins/advanced-nocaptcha-recaptcha/&lt;br /&gt;
## I hate reCAPTCHA and cloudflare&#039;s often fails, but I&#039;ve had pretty good experiences with hCaptcha. This supports all of them&lt;br /&gt;
## previously I&#039;ve used &#039;hcaptcha-for-forms-and-more&#039; and had a good experience, so I&#039;ll add both to the 3TOFU https://wordpress.org/plugins/hcaptcha-for-forms-and-more/&lt;br /&gt;
## in the site where I used &#039;hcaptcha-for-forms-and-more&#039; before, they used &#039;wpforms-lite&#039;, so I&#039;ll add that too https://wordpress.org/plugins/wpforms-lite/&lt;br /&gt;
### looks like OSE is using &#039;contact-form-7&#039;, which is actively developed still https://wordpress.org/plugins/contact-form-7/&lt;br /&gt;
### despite the name, I found that &#039;wpforms-lite&#039; actually is *less* lightweight than &#039;contact-form-7&#039; in benchmark tests&lt;br /&gt;
#### https://wphive.com/plugins/wpforms-lite/&lt;br /&gt;
#### https://wphive.com/plugins/contact-form-7/&lt;br /&gt;
### contact-form-7 is a bit more popular (10+ million active installs vs 6+ million), but &#039;wpforms-lite&#039; has a bit better reviews (4.9/5 vs 4/5)&lt;br /&gt;
### oh, it looks like contact-form-7 only supports reCAPTCHA whereas wpforms-lite has built-in support for hCaptcha and cf too&lt;br /&gt;
### I&#039;ll go ahead and add them both to the 3TOFU just incase we need it, but it&#039;s probably best to stick to contact-form-7&lt;br /&gt;
# let&#039;s to a new 3TOFU of some candidate plugins, after which I can demo them and see what we want to use. I&#039;m also going to add some others that we may or may not use&lt;br /&gt;
## wps-hide-login&lt;br /&gt;
## melapress-login-security&lt;br /&gt;
## activitypub&lt;br /&gt;
## aurora-heatmap&lt;br /&gt;
## raw-html&lt;br /&gt;
## related-posts-by-taxonomy&lt;br /&gt;
## smart-slider-3&lt;br /&gt;
## spam-destroyer&lt;br /&gt;
## coinpayments-payment-gateway-for-woocommerce&lt;br /&gt;
## woocommerce-gateway-stripe&lt;br /&gt;
## wpfront-notification-bar&lt;br /&gt;
## wordpress-seo&lt;br /&gt;
## wp-pgp-encrypted-emails&lt;br /&gt;
## woo-multi-currency&lt;br /&gt;
## woocommerce-multilingual&lt;br /&gt;
## include-mastodon-feed&lt;br /&gt;
## bulk-media-register&lt;br /&gt;
## enable-media-replace&lt;br /&gt;
## regenerate-thumbnails&lt;br /&gt;
## wp-qrcode&lt;br /&gt;
## wp-pgp-encrypted-emails&lt;br /&gt;
## woo-multi-currency&lt;br /&gt;
## woocommerce-multilingual&lt;br /&gt;
## include-mastodon-feed&lt;br /&gt;
## wp-2fa&lt;br /&gt;
## advanced-nocaptcha-recaptcha&lt;br /&gt;
## hcaptcha-for-forms-and-more&lt;br /&gt;
## leaflet-map&lt;br /&gt;
## extensions-leaflet-map&lt;br /&gt;
## wpforms-lite&lt;br /&gt;
# in the past few months, I recently had to do an upgrade of my personal wordpress site&lt;br /&gt;
# to do this securely, I wrote a script that spits-out a script that can be used for 3TOFU of all the themes &amp;amp; plugins installed&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@host:~$ cat /usr/local/bin/wordpress_3tofu.sh &lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#set -x&lt;br /&gt;
################################################################################&lt;br /&gt;
# File:    wordpress_3tofu.sh&lt;br /&gt;
# Version: 0.1&lt;br /&gt;
# Purpose: Generates a list of 3TOFU commands to verfiy the latest versions of &lt;br /&gt;
#          all currently-installed wordpress themes and plugins&lt;br /&gt;
#           * https://tech.michaelaltfied.net/3tofu&lt;br /&gt;
# Authors: Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
# Created: 2024-09-28&lt;br /&gt;
# Updated: 2024-09-28&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  SETTINGS                                    #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  FUNCTIONS                                   #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  MAIN BODY                                   #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
#####################&lt;br /&gt;
# DECLARE VARIABLES #&lt;br /&gt;
#####################&lt;br /&gt;
&lt;br /&gt;
# space-delimited list of URLs for 3TOFU&lt;br /&gt;
REMOTE_FILES=&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
CURL=$(which curl) || (echo &amp;quot;ERROR: Cannot find &#039;curl&#039;&amp;quot;; exit 1)&lt;br /&gt;
GREP=$(which grep) || (echo &amp;quot;ERROR: Cannot find &#039;grep&#039;&amp;quot;; exit 1)&lt;br /&gt;
&lt;br /&gt;
########&lt;br /&gt;
# CORE #&lt;br /&gt;
########&lt;br /&gt;
&lt;br /&gt;
###########&lt;br /&gt;
# PLUGINS #&lt;br /&gt;
###########&lt;br /&gt;
&lt;br /&gt;
# get list of plugins&lt;br /&gt;
plugins=$(sudo -u wp -i wp --path=/var/www/html/wordpress/htdocs --format=csv plugin list | $GREP -vE &#039;^name,&#039; | cut -d, -f1 | tr &amp;quot;\n&amp;quot; &amp;quot; &amp;quot;)&lt;br /&gt;
&lt;br /&gt;
##########&lt;br /&gt;
# THEMES #&lt;br /&gt;
##########&lt;br /&gt;
&lt;br /&gt;
# get list of themes&lt;br /&gt;
themes=$(sudo -u wp -i wp --path=/var/www/html/wordpress/htdocs --format=csv theme list | $GREP -vE &#039;^name,&#039; | cut -d, -f1 | tr &amp;quot;\n&amp;quot; &amp;quot; &amp;quot;)&lt;br /&gt;
&lt;br /&gt;
###################&lt;br /&gt;
# OUTPUT COMMANDS #&lt;br /&gt;
###################&lt;br /&gt;
&lt;br /&gt;
# HEADER&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
# File:    3tofu.sh&lt;br /&gt;
# Purpose: Execute these commands on 3 distinct machines (or VMs) on 3 distinct&lt;br /&gt;
#          days using 3 distinct networks exiting from 3 distinct countries&lt;br /&gt;
# &lt;br /&gt;
#          For more info on 3TOFU (and why this is important), see:&lt;br /&gt;
#           * https://tech.michaelaltfied.net/3tofu&lt;br /&gt;
#&lt;br /&gt;
# Authors: Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
# Created: $(date -u --rfc-3339=seconds)&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
JQ=$(which jq) || (echo &amp;quot;ERROR: Cannot find &#039;jq&#039;&amp;quot;; exit 1)&lt;br /&gt;
CURL=&amp;quot;$(which curl) --retry 5 --retry-all-errors&amp;quot; || (echo &amp;quot;ERROR: Cannot find &#039;curl&#039;&amp;quot;; exit 1)&lt;br /&gt;
GREP=$(which grep) || (echo &amp;quot;ERROR: Cannot find &#039;grep&#039;&amp;quot;; exit 1)&lt;br /&gt;
&lt;br /&gt;
REMOTE_FILES=&amp;quot;&amp;quot;&lt;br /&gt;
WARNINGS=&amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# in tails, we must torify&lt;br /&gt;
if  &amp;quot;`whoami`&amp;quot; == &amp;quot;amnesia&amp;quot;  ; then&lt;br /&gt;
	CURL=&amp;quot;/usr/bin/torify ${CURL}&amp;quot;&lt;br /&gt;
	PYTHON=&amp;quot;/usr/bin/torify ${PYTHON}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
tmpDir=`mktemp -d`&lt;br /&gt;
pushd &amp;quot;${tmpDir}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# first get some info about our internet connection&lt;br /&gt;
${CURL} -s https://ifconfig.co/country | head -n1&lt;br /&gt;
${CURL} -s https://check.torproject.org | grep Congratulations | head -n1&lt;br /&gt;
&lt;br /&gt;
# and today&#039;s date&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# CORE&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
echo &amp;quot;INFO: Determining Latest Version of Wordpress Core&amp;quot;&lt;br /&gt;
json=$($CURL -s &amp;quot;https://api.wordpress.org/core/version-check/1.7/&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
REMOTE_FILES=&amp;quot;${REMOTE_FILES} $(echo &amp;quot;${json}&amp;quot; | $JQ -r &#039;[.offers[]|select(.response==&amp;quot;upgrade&amp;quot;)][0].download&#039;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# PLUGINS&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;plugins=&#039;${plugins}&#039;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
echo -ne &amp;quot;INFO: Determining Latest Version of Wordpress Plugins \n\t&amp;quot;&lt;br /&gt;
for plugin in $plugins; do&lt;br /&gt;
	echo -n &#039;. &#039;&lt;br /&gt;
&lt;br /&gt;
	json=$(curl -so plugin.json https://api.wordpress.org/plugins/info/1.0/${plugin}.json)&lt;br /&gt;
	latest_version=$(cat plugin.json | jq -r .version)&lt;br /&gt;
	url=$(cat plugin.json | jq -r &amp;quot;.versions.\&amp;quot;${latest_version}\&amp;quot;&amp;quot;)&lt;br /&gt;
	&lt;br /&gt;
	if [ &amp;quot;${url}&amp;quot; = &amp;quot;null&amp;quot; ]; then&lt;br /&gt;
		error=$(cat plugin.json | jq -r .error);&lt;br /&gt;
		description=$(cat plugin.json | jq -r .description);&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\nWARNING: Failed to download plugin ${plugin}&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$error&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$description&amp;quot;&lt;br /&gt;
	else&lt;br /&gt;
		REMOTE_FILES=&amp;quot;${REMOTE_FILES} ${url}&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
	&lt;br /&gt;
done&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# THEMES&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;themes=&#039;${themes}&#039;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
echo -ne &amp;quot;INFO: Determining Latest Version of Wordpress Themes \n\t&amp;quot;&lt;br /&gt;
for theme in $themes; do&lt;br /&gt;
	echo -n &#039;. &#039;&lt;br /&gt;
	json=$($CURL -s &amp;quot;https://api.wordpress.org/themes/info/1.2/?action=theme_information&amp;amp;slug=${theme}&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
	latest_version=$(echo $json | $JQ -r .version)&lt;br /&gt;
	&lt;br /&gt;
	if [ &amp;quot;${latest_version}&amp;quot; = &amp;quot;null&amp;quot; ]; then&lt;br /&gt;
		error=$(echo $json | $JQ -r .error);&lt;br /&gt;
		description=$(echo $json | $JQ -r .description);&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\nWARNING: Failed to download theme ${theme}&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$error&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$description&amp;quot;&lt;br /&gt;
	else&lt;br /&gt;
		REMOTE_FILES=&amp;quot;${REMOTE_FILES} $(echo $json | $JQ -r &amp;quot;.download_link&amp;quot;)&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
	&lt;br /&gt;
done&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# WARNINGS&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
echo -e &amp;quot;${WARNINGS}&amp;quot;&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# DOWNLOAD PAYLOADS&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
# get the file&lt;br /&gt;
for file in ${REMOTE_FILES}; do&lt;br /&gt;
	echo &amp;quot;${file}&amp;quot;&lt;br /&gt;
	${CURL} --progress-bar -O &amp;quot;${file}&amp;quot;&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# FINISH&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
# checksum&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
sha256sum *&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
exit 0&lt;br /&gt;
user@host:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this script works fine on my personal server, which uses a single wp multisite install, but it needs to be updated for OSE&#039;s multiple independent wordpress installs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/local/bin # sudo -u wp -i wp --path=/var/www/html/wordpress/htdocs --format=csv plugin list | $GREP -vE &#039;^name,&#039; | cut -d, -f1 | tr &amp;quot;\n&amp;quot; &amp;quot; &amp;quot;&lt;br /&gt;
-bash: -vE: command not found&lt;br /&gt;
Error: This does not seem to be a WordPress installation.&lt;br /&gt;
The used path is: /var/www/html/wordpress/htdocs/&lt;br /&gt;
Pass --path=`path/to/wordpress` or run `wp core download`.&lt;br /&gt;
root@hetzner3 /usr/local/bin # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ignoring the spam of errors, this does work&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/local/bin # sudo -u wp -i wp --path=&amp;quot;/var/www/html/store.opensourceecology.org/htdocs&amp;quot; --format=csv plugin list | grep -vE &#039;^name,&#039; | cut -d, -f1 | tr &amp;quot;\n&amp;quot; &amp;quot; &amp;quot;&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_HOST&amp;quot; in /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/vcaching/vcaching.php on line 196&lt;br /&gt;
Warning: Undefined array key &amp;quot;HTTP_HOST&amp;quot; in /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/vcaching/vcaching.php on line 196&lt;br /&gt;
PHP Warning:  wp_update_plugins(): An unexpected error occurred. Something may be wrong with WordPress.org or this server&amp;amp;#8217;s configuration. If you continue to have problems, please try the &amp;lt;a href=&amp;quot;https://wordpress.org/support/forums/&amp;quot;&amp;gt;support forums&amp;lt;/a&amp;gt;. (WordPress could not establish a secure connection to WordPress.org. Please contact your server administrator.) in /var/www/html/store.opensourceecology.org/htdocs/wp-includes/functions.php on line 6085&lt;br /&gt;
Warning: wp_update_plugins(): An unexpected error occurred. Something may be wrong with WordPress.org or this server&amp;amp;#8217;s configuration. If you continue to have problems, please try the &amp;lt;a href=&amp;quot;https://wordpress.org/support/forums/&amp;quot;&amp;gt;support forums&amp;lt;/a&amp;gt;. (WordPress could not establish a secure connection to WordPress.org. Please contact your server administrator.) in /var/www/html/store.opensourceecology.org/htdocs/wp-includes/functions.php on line 6085&lt;br /&gt;
akismet classic-editor contact-form-7 google-authenticator-encourage-user-activation google-authenticator hello meta-box ssl-insecure-content-fixer vcaching woocommerce coingate-for-woocommerce root@hetzner3 /usr/local/bin # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can throw it in a loop that dynamically gets all of the sites&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	echo $wp_docroot&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s an execution (currently we&#039;ve restored only 1 wordpress vhost on hetzner3)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/local/bin # wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
		wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
		echo $wp_docroot&lt;br /&gt;
done&lt;br /&gt;
/var/www/html/store.opensourceecology.org/htdocs&lt;br /&gt;
root@hetzner3 /usr/local/bin # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can update this loop to get all the plugins for every site as follows&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
plugins=&#039;&#039;&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
	echo $wordpress_site;&lt;br /&gt;
&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	echo $wp_docroot&lt;br /&gt;
	echo $vhost_dir&lt;br /&gt;
&lt;br /&gt;
	plugins=&amp;quot;${plugins} $(sudo -u wp -i wp --path=&amp;quot;${wp_docroot}&amp;quot; --format=csv plugin list 2&amp;gt;/dev/null | grep -vE &#039;^name,&#039; | cut -d, -f1 | tr &amp;quot;\n&amp;quot; &amp;quot; &amp;quot;)&amp;quot;&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
echo ${plugins}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s an execution&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /usr/local/bin # plugins=&#039;&#039;&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
		echo $wordpress_site;&lt;br /&gt;
&lt;br /&gt;
		wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
		vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
		echo $wp_docroot&lt;br /&gt;
		echo $vhost_dir&lt;br /&gt;
&lt;br /&gt;
		plugins=&amp;quot;${plugins} $(sudo -u wp -i wp --path=&amp;quot;${wp_docroot}&amp;quot; --format=csv plugin list 2&amp;gt;/dev/null | grep -vE &#039;^name,&#039; | cut -d, -f1 | tr &amp;quot;\n&amp;quot; &amp;quot; &amp;quot;)&amp;quot;&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
echo ${plugins}&lt;br /&gt;
/var/www/html/store.opensourceecology.org/htdocs/wp-content&lt;br /&gt;
/var/www/html/store.opensourceecology.org/htdocs&lt;br /&gt;
/var/www/html/store.opensourceecology.org&lt;br /&gt;
akismet classic-editor contact-form-7 google-authenticator-encourage-user-activation google-authenticator hello meta-box ssl-insecure-content-fixer vcaching woocommerce coingate-for-woocommerce&lt;br /&gt;
root@hetzner3 /usr/local/bin # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, here&#039;s the updated script&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ cat /usr/local/bin/wordpress_3tofu.sh &lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#set -x&lt;br /&gt;
################################################################################&lt;br /&gt;
# File:    wordpress_3tofu.sh&lt;br /&gt;
# Version: 0.2&lt;br /&gt;
# Purpose: Generates a list of 3TOFU commands to verfiy the latest versions of &lt;br /&gt;
#          all currently-installed wordpress themes and plugins&lt;br /&gt;
#           * https://tech.michaelaltfied.net/3tofu&lt;br /&gt;
# Authors: Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
# Created: 2024-09-28&lt;br /&gt;
# Updated: 2024-12-12&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  SETTINGS                                    #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  FUNCTIONS                                   #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  MAIN BODY                                   #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
#####################&lt;br /&gt;
# DECLARE VARIABLES #&lt;br /&gt;
#####################&lt;br /&gt;
&lt;br /&gt;
# space-delimited list of URLs for 3TOFU&lt;br /&gt;
REMOTE_FILES=&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
CURL=$(which curl) || (echo &amp;quot;ERROR: Cannot find &#039;curl&#039;&amp;quot;; exit 1)&lt;br /&gt;
GREP=$(which grep) || (echo &amp;quot;ERROR: Cannot find &#039;grep&#039;&amp;quot;; exit 1)&lt;br /&gt;
&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
########&lt;br /&gt;
# CORE #&lt;br /&gt;
########&lt;br /&gt;
&lt;br /&gt;
###########&lt;br /&gt;
# PLUGINS #&lt;br /&gt;
###########&lt;br /&gt;
&lt;br /&gt;
# get list of plugins&lt;br /&gt;
plugins=&#039;&#039;&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	plugins=&amp;quot;${plugins} $(sudo -u wp -i wp --path=&amp;quot;${wp_docroot}&amp;quot; --format=csv plugin list 2&amp;gt;/dev/null | $GREP -vE &#039;^name,&#039; | cut -d, -f1 | tr &amp;quot;\n&amp;quot; &amp;quot; &amp;quot;)&amp;quot;&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
##########&lt;br /&gt;
# THEMES #&lt;br /&gt;
##########&lt;br /&gt;
&lt;br /&gt;
# get list of themes&lt;br /&gt;
themes=&#039;&#039;&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	themes=&amp;quot;${themes} $(sudo -u wp -i wp --path=&amp;quot;${wp_docroot}&amp;quot; --format=csv theme list 2&amp;gt;/dev/null | $GREP -vE &#039;^name,&#039; | cut -d, -f1 | tr &amp;quot;\n&amp;quot; &amp;quot; &amp;quot;)&amp;quot;&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
###################&lt;br /&gt;
# OUTPUT COMMANDS #&lt;br /&gt;
###################&lt;br /&gt;
&lt;br /&gt;
# HEADER&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
# File:    3tofu.sh&lt;br /&gt;
# Purpose: Execute these commands on 3 distinct machines (or VMs) on 3 distinct&lt;br /&gt;
#          days using 3 distinct networks exiting from 3 distinct countries&lt;br /&gt;
# &lt;br /&gt;
#          For more info on 3TOFU (and why this is important), see:&lt;br /&gt;
#           * https://tech.michaelaltfied.net/3tofu&lt;br /&gt;
#&lt;br /&gt;
# Authors: Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
# Created: $(date -u --rfc-3339=seconds)&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
JQ=$(which jq) || (echo &amp;quot;ERROR: Cannot find &#039;jq&#039;&amp;quot;; exit 1)&lt;br /&gt;
CURL=&amp;quot;$(which curl) --retry 5 --retry-all-errors&amp;quot; || (echo &amp;quot;ERROR: Cannot find &#039;curl&#039;&amp;quot;; exit 1)&lt;br /&gt;
GREP=$(which grep) || (echo &amp;quot;ERROR: Cannot find &#039;grep&#039;&amp;quot;; exit 1)&lt;br /&gt;
&lt;br /&gt;
REMOTE_FILES=&amp;quot;&amp;quot;&lt;br /&gt;
WARNINGS=&amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# in tails, we must torify&lt;br /&gt;
if  &amp;quot;`whoami`&amp;quot; == &amp;quot;amnesia&amp;quot;  ; then&lt;br /&gt;
	CURL=&amp;quot;/usr/bin/torify ${CURL}&amp;quot;&lt;br /&gt;
	PYTHON=&amp;quot;/usr/bin/torify ${PYTHON}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
tmpDir=`mktemp -d`&lt;br /&gt;
pushd &amp;quot;${tmpDir}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# first get some info about our internet connection&lt;br /&gt;
${CURL} -s https://ifconfig.co/country | head -n1&lt;br /&gt;
${CURL} -s https://check.torproject.org | grep Congratulations | head -n1&lt;br /&gt;
&lt;br /&gt;
# and today&#039;s date&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# CORE&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
echo &amp;quot;INFO: Determining Latest Version of Wordpress Core&amp;quot;&lt;br /&gt;
json=$($CURL -s &amp;quot;https://api.wordpress.org/core/version-check/1.7/&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
REMOTE_FILES=&amp;quot;${REMOTE_FILES} $(echo &amp;quot;${json}&amp;quot; | $JQ -r &#039;[.offers[]|select(.response==&amp;quot;upgrade&amp;quot;)][0].download&#039;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# PLUGINS&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;plugins=&#039;${plugins}&#039;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
echo -ne &amp;quot;INFO: Determining Latest Version of Wordpress Plugins \n\t&amp;quot;&lt;br /&gt;
for plugin in $plugins; do&lt;br /&gt;
	echo -n &#039;. &#039;&lt;br /&gt;
&lt;br /&gt;
	json=$(curl -so plugin.json https://api.wordpress.org/plugins/info/1.0/${plugin}.json)&lt;br /&gt;
	latest_version=$(cat plugin.json | jq -r .version)&lt;br /&gt;
	url=$(cat plugin.json | jq -r &amp;quot;.versions.\&amp;quot;${latest_version}\&amp;quot;&amp;quot;)&lt;br /&gt;
	&lt;br /&gt;
	if [ &amp;quot;${url}&amp;quot; = &amp;quot;null&amp;quot; ]; then&lt;br /&gt;
		error=$(cat plugin.json | jq -r .error);&lt;br /&gt;
		description=$(cat plugin.json | jq -r .description);&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\nWARNING: Failed to download plugin ${plugin}&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$error&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$description&amp;quot;&lt;br /&gt;
	else&lt;br /&gt;
		REMOTE_FILES=&amp;quot;${REMOTE_FILES} ${url}&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
	&lt;br /&gt;
done&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# THEMES&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;themes=&#039;${themes}&#039;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
echo -ne &amp;quot;INFO: Determining Latest Version of Wordpress Themes \n\t&amp;quot;&lt;br /&gt;
for theme in $themes; do&lt;br /&gt;
	echo -n &#039;. &#039;&lt;br /&gt;
	json=$($CURL -s &amp;quot;https://api.wordpress.org/themes/info/1.2/?action=theme_information&amp;amp;slug=${theme}&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
	latest_version=$(echo $json | $JQ -r .version)&lt;br /&gt;
	&lt;br /&gt;
	if [ &amp;quot;${latest_version}&amp;quot; = &amp;quot;null&amp;quot; ]; then&lt;br /&gt;
		error=$(echo $json | $JQ -r .error);&lt;br /&gt;
		description=$(echo $json | $JQ -r .description);&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\nWARNING: Failed to download theme ${theme}&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$error&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$description&amp;quot;&lt;br /&gt;
	else&lt;br /&gt;
		REMOTE_FILES=&amp;quot;${REMOTE_FILES} $(echo $json | $JQ -r &amp;quot;.download_link&amp;quot;)&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
	&lt;br /&gt;
done&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# WARNINGS&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
echo -e &amp;quot;${WARNINGS}&amp;quot;&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# DOWNLOAD PAYLOADS&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
# get the file&lt;br /&gt;
for file in ${REMOTE_FILES}; do&lt;br /&gt;
	echo &amp;quot;${file}&amp;quot;&lt;br /&gt;
	${CURL} --progress-bar -O &amp;quot;${file}&amp;quot;&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# FINISH&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
# checksum&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
sha256sum *&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
exit 0&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and an execution shows it&#039;s working (again, this script outputs another script that we&#039;ll copy-and-paste onto some VM for 3TOFU&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ sudo /usr/local/bin/wordpress_3tofu.sh &lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
# File:    3tofu.sh&lt;br /&gt;
# Purpose: Execute these commands on 3 distinct machines (or VMs) on 3 distinct&lt;br /&gt;
#          days using 3 distinct networks exiting from 3 distinct countries&lt;br /&gt;
# &lt;br /&gt;
#          For more info on 3TOFU (and why this is important), see:&lt;br /&gt;
#           * https://tech.michaelaltfied.net/3tofu&lt;br /&gt;
#&lt;br /&gt;
# Authors: Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
# Created: 2024-12-12 21:14:30+00:00&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
JQ=$(which jq) || (echo &amp;quot;ERROR: Cannot find &#039;jq&#039;&amp;quot;; exit 1)&lt;br /&gt;
CURL=&amp;quot;$(which curl) --retry 5 --retry-all-errors&amp;quot; || (echo &amp;quot;ERROR: Cannot find &#039;curl&#039;&amp;quot;; exit 1)&lt;br /&gt;
GREP=$(which grep) || (echo &amp;quot;ERROR: Cannot find &#039;grep&#039;&amp;quot;; exit 1)&lt;br /&gt;
&lt;br /&gt;
REMOTE_FILES=&amp;quot;&amp;quot;&lt;br /&gt;
WARNINGS=&amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# in tails, we must torify&lt;br /&gt;
if  &amp;quot;`whoami`&amp;quot; == &amp;quot;amnesia&amp;quot;  ; then&lt;br /&gt;
	CURL=&amp;quot;/usr/bin/torify ${CURL}&amp;quot;&lt;br /&gt;
	PYTHON=&amp;quot;/usr/bin/torify ${PYTHON}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
tmpDir=`mktemp -d`&lt;br /&gt;
pushd &amp;quot;${tmpDir}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# first get some info about our internet connection&lt;br /&gt;
${CURL} -s https://ifconfig.co/country | head -n1&lt;br /&gt;
${CURL} -s https://check.torproject.org | grep Congratulations | head -n1&lt;br /&gt;
&lt;br /&gt;
# and today&#039;s date&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;INFO: Determining Latest Version of Wordpress Core&amp;quot;&lt;br /&gt;
json=$($CURL -s &amp;quot;https://api.wordpress.org/core/version-check/1.7/&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
REMOTE_FILES=&amp;quot;${REMOTE_FILES} $(echo &amp;quot;${json}&amp;quot; | $JQ -r &#039;[.offers[]|select(.response==&amp;quot;upgrade&amp;quot;)][0].download&#039;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
plugins=&#039; akismet classic-editor contact-form-7 google-authenticator-encourage-user-activation google-authenticator hello meta-box ssl-insecure-content-fixer vcaching woocommerce coingate-for-woocommerce &#039;&lt;br /&gt;
echo -ne &amp;quot;INFO: Determining Latest Version of Wordpress Plugins \n\t&amp;quot;&lt;br /&gt;
for plugin in $plugins; do&lt;br /&gt;
	echo -n &#039;. &#039;&lt;br /&gt;
&lt;br /&gt;
	json=$(curl -so plugin.json https://api.wordpress.org/plugins/info/1.0/${plugin}.json)&lt;br /&gt;
	latest_version=$(cat plugin.json | jq -r .version)&lt;br /&gt;
	url=$(cat plugin.json | jq -r &amp;quot;.versions.\&amp;quot;${latest_version}\&amp;quot;&amp;quot;)&lt;br /&gt;
	&lt;br /&gt;
	if [ &amp;quot;${url}&amp;quot; = &amp;quot;null&amp;quot; ]; then&lt;br /&gt;
		error=$(cat plugin.json | jq -r .error);&lt;br /&gt;
		description=$(cat plugin.json | jq -r .description);&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\nWARNING: Failed to download plugin ${plugin}&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$error&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$description&amp;quot;&lt;br /&gt;
	else&lt;br /&gt;
		REMOTE_FILES=&amp;quot;${REMOTE_FILES} ${url}&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
	&lt;br /&gt;
done&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
themes=&#039; oshin storefront twentyeleven twentyfifteen twentyfourteen twentynineteen twentyseventeen twentysixteen twentyten twentythirteen twentytwelve &#039;&lt;br /&gt;
echo -ne &amp;quot;INFO: Determining Latest Version of Wordpress Themes \n\t&amp;quot;&lt;br /&gt;
for theme in $themes; do&lt;br /&gt;
	echo -n &#039;. &#039;&lt;br /&gt;
	json=$($CURL -s &amp;quot;https://api.wordpress.org/themes/info/1.2/?action=theme_information&amp;amp;slug=${theme}&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
	latest_version=$(echo $json | $JQ -r .version)&lt;br /&gt;
	&lt;br /&gt;
	if [ &amp;quot;${latest_version}&amp;quot; = &amp;quot;null&amp;quot; ]; then&lt;br /&gt;
		error=$(echo $json | $JQ -r .error);&lt;br /&gt;
		description=$(echo $json | $JQ -r .description);&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\nWARNING: Failed to download theme ${theme}&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$error&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$description&amp;quot;&lt;br /&gt;
	else&lt;br /&gt;
		REMOTE_FILES=&amp;quot;${REMOTE_FILES} $(echo $json | $JQ -r &amp;quot;.download_link&amp;quot;)&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
	&lt;br /&gt;
done&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
echo -e &amp;quot;${WARNINGS}&amp;quot;&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
# get the file&lt;br /&gt;
for file in ${REMOTE_FILES}; do&lt;br /&gt;
	echo &amp;quot;${file}&amp;quot;&lt;br /&gt;
	${CURL} --progress-bar -O &amp;quot;${file}&amp;quot;&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# checksum&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
sha256sum *&lt;br /&gt;
&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so that&#039;s great in the future when we need to 3TOFU updates to plugins &amp;amp; themes that are found already-installed on the server, but let&#039;s hack it with a manually-defined list of plugins (above) for 3TOFU of these new plugins&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#set -x&lt;br /&gt;
################################################################################&lt;br /&gt;
# File:    wordpress_3tofu.sh&lt;br /&gt;
# Version: 0.1&lt;br /&gt;
# Purpose: Generates a list of 3TOFU commands to verfiy the latest versions of &lt;br /&gt;
#          all currently-installed wordpress themes and plugins&lt;br /&gt;
#           * https://tech.michaelaltfied.net/3tofu&lt;br /&gt;
# Authors: Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
# Created: 2024-09-28&lt;br /&gt;
# Updated: 2024-09-28&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  SETTINGS                                    #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  FUNCTIONS                                   #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  MAIN BODY                                   #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
#####################&lt;br /&gt;
# DECLARE VARIABLES #&lt;br /&gt;
#####################&lt;br /&gt;
&lt;br /&gt;
# space-delimited list of URLs for 3TOFU&lt;br /&gt;
REMOTE_FILES=&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
CURL=$(which curl) || (echo &amp;quot;ERROR: Cannot find &#039;curl&#039;&amp;quot;; exit 1)&lt;br /&gt;
GREP=$(which grep) || (echo &amp;quot;ERROR: Cannot find &#039;grep&#039;&amp;quot;; exit 1)&lt;br /&gt;
&lt;br /&gt;
########&lt;br /&gt;
# CORE #&lt;br /&gt;
########&lt;br /&gt;
&lt;br /&gt;
###########&lt;br /&gt;
# PLUGINS #&lt;br /&gt;
###########&lt;br /&gt;
&lt;br /&gt;
# get list of plugins&lt;br /&gt;
plugins=&amp;quot;wps-hide-login melapress-login-security activitypub aurora-heatmap raw-html related-posts-by-taxonomy smart-slider-3 spam-destroyer coinpayments-payment-gateway-for-woocommerce woocommerce-gateway-stripe wpfront-notification-bar wordpress-seo wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed bulk-media-register enable-media-replace regenerate-thumbnails wp-qrcode wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed wp-2fa advanced-nocaptcha-recaptcha hcaptcha-for-forms-and-more leaflet-map extensions-leaflet-map wpforms-lite&amp;quot;&lt;br /&gt;
&lt;br /&gt;
###################&lt;br /&gt;
# OUTPUT COMMANDS #&lt;br /&gt;
###################&lt;br /&gt;
&lt;br /&gt;
# HEADER&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
# File:    3tofu.sh&lt;br /&gt;
# Purpose: Execute these commands on 3 distinct machines (or VMs) on 3 distinct&lt;br /&gt;
#          days using 3 distinct networks exiting from 3 distinct countries&lt;br /&gt;
# &lt;br /&gt;
#          For more info on 3TOFU (and why this is important), see:&lt;br /&gt;
#           * https://tech.michaelaltfied.net/3tofu&lt;br /&gt;
#&lt;br /&gt;
# Authors: Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
# Created: $(date -u --rfc-3339=seconds)&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
JQ=$(which jq) || (echo &amp;quot;ERROR: Cannot find &#039;jq&#039;&amp;quot;; exit 1)&lt;br /&gt;
CURL=&amp;quot;$(which curl) --retry 5 --retry-all-errors&amp;quot; || (echo &amp;quot;ERROR: Cannot find &#039;curl&#039;&amp;quot;; exit 1)&lt;br /&gt;
GREP=$(which grep) || (echo &amp;quot;ERROR: Cannot find &#039;grep&#039;&amp;quot;; exit 1)&lt;br /&gt;
&lt;br /&gt;
REMOTE_FILES=&amp;quot;&amp;quot;&lt;br /&gt;
WARNINGS=&amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# in tails, we must torify&lt;br /&gt;
if  &amp;quot;`whoami`&amp;quot; == &amp;quot;amnesia&amp;quot;  ; then&lt;br /&gt;
	CURL=&amp;quot;/usr/bin/torify ${CURL}&amp;quot;&lt;br /&gt;
	PYTHON=&amp;quot;/usr/bin/torify ${PYTHON}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
tmpDir=`mktemp -d`&lt;br /&gt;
pushd &amp;quot;${tmpDir}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# first get some info about our internet connection&lt;br /&gt;
${CURL} -s https://ifconfig.co/country | head -n1&lt;br /&gt;
${CURL} -s https://check.torproject.org | grep Congratulations | head -n1&lt;br /&gt;
&lt;br /&gt;
# and today&#039;s date&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# CORE&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
echo &amp;quot;INFO: Determining Latest Version of Wordpress Core&amp;quot;&lt;br /&gt;
json=$($CURL -s &amp;quot;https://api.wordpress.org/core/version-check/1.7/&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
REMOTE_FILES=&amp;quot;${REMOTE_FILES} $(echo &amp;quot;${json}&amp;quot; | $JQ -r &#039;[.offers[]|select(.response==&amp;quot;upgrade&amp;quot;)][0].download&#039;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# PLUGINS&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;plugins=&#039;${plugins}&#039;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
echo -ne &amp;quot;INFO: Determining Latest Version of Wordpress Plugins \n\t&amp;quot;&lt;br /&gt;
for plugin in $plugins; do&lt;br /&gt;
	echo -n &#039;. &#039;&lt;br /&gt;
&lt;br /&gt;
	json=$(curl -so plugin.json https://api.wordpress.org/plugins/info/1.0/${plugin}.json)&lt;br /&gt;
	latest_version=$(cat plugin.json | jq -r .version)&lt;br /&gt;
	url=$(cat plugin.json | jq -r &amp;quot;.versions.\&amp;quot;${latest_version}\&amp;quot;&amp;quot;)&lt;br /&gt;
	&lt;br /&gt;
	if [ &amp;quot;${url}&amp;quot; = &amp;quot;null&amp;quot; ]; then&lt;br /&gt;
		error=$(cat plugin.json | jq -r .error);&lt;br /&gt;
		description=$(cat plugin.json | jq -r .description);&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\nWARNING: Failed to download plugin ${plugin}&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$error&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$description&amp;quot;&lt;br /&gt;
	else&lt;br /&gt;
		REMOTE_FILES=&amp;quot;${REMOTE_FILES} ${url}&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
	&lt;br /&gt;
done&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# WARNINGS&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
echo -e &amp;quot;${WARNINGS}&amp;quot;&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# DOWNLOAD PAYLOADS&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
# get the file&lt;br /&gt;
for file in ${REMOTE_FILES}; do&lt;br /&gt;
	echo &amp;quot;${file}&amp;quot;&lt;br /&gt;
	${CURL} --progress-bar -O &amp;quot;${file}&amp;quot;&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
# FINISH&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
# checksum&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
sha256sum *&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
exit 0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I created ^ this script on some DispVM and executed it; it spat-out our 3TOFU script&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp1594:~$ vim wordpress_3tofu.sh&lt;br /&gt;
user@disp1594:~$ &lt;br /&gt;
user@disp1594:~$ chmod +x wordpress_3tofu.sh &lt;br /&gt;
user@disp1594:~$ ./wordpress_3tofu.sh &lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
# File:    3tofu.sh&lt;br /&gt;
# Purpose: Execute these commands on 3 distinct machines (or VMs) on 3 distinct&lt;br /&gt;
#          days using 3 distinct networks exiting from 3 distinct countries&lt;br /&gt;
# &lt;br /&gt;
#          For more info on 3TOFU (and why this is important), see:&lt;br /&gt;
#           * https://tech.michaelaltfied.net/3tofu&lt;br /&gt;
#&lt;br /&gt;
# Authors: Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
# Created: 2024-12-12 20:45:59+00:00&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
JQ=$(which jq) || (echo &amp;quot;ERROR: Cannot find &#039;jq&#039;&amp;quot;; exit 1)&lt;br /&gt;
CURL=&amp;quot;$(which curl) --retry 5 --retry-all-errors&amp;quot; || (echo &amp;quot;ERROR: Cannot find &#039;curl&#039;&amp;quot;; exit 1)&lt;br /&gt;
GREP=$(which grep) || (echo &amp;quot;ERROR: Cannot find &#039;grep&#039;&amp;quot;; exit 1)&lt;br /&gt;
&lt;br /&gt;
REMOTE_FILES=&amp;quot;&amp;quot;&lt;br /&gt;
WARNINGS=&amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# in tails, we must torify&lt;br /&gt;
if  &amp;quot;`whoami`&amp;quot; == &amp;quot;amnesia&amp;quot;  ; then&lt;br /&gt;
	CURL=&amp;quot;/usr/bin/torify ${CURL}&amp;quot;&lt;br /&gt;
	PYTHON=&amp;quot;/usr/bin/torify ${PYTHON}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
tmpDir=`mktemp -d`&lt;br /&gt;
pushd &amp;quot;${tmpDir}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# first get some info about our internet connection&lt;br /&gt;
${CURL} -s https://ifconfig.co/country | head -n1&lt;br /&gt;
${CURL} -s https://check.torproject.org | grep Congratulations | head -n1&lt;br /&gt;
&lt;br /&gt;
# and today&#039;s date&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;INFO: Determining Latest Version of Wordpress Core&amp;quot;&lt;br /&gt;
json=$($CURL -s &amp;quot;https://api.wordpress.org/core/version-check/1.7/&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
REMOTE_FILES=&amp;quot;${REMOTE_FILES} $(echo &amp;quot;${json}&amp;quot; | $JQ -r &#039;[.offers[]|select(.response==&amp;quot;upgrade&amp;quot;)][0].download&#039;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
plugins=&#039;wps-hide-login melapress-login-security activitypub aurora-heatmap raw-html related-posts-by-taxonomy smart-slider-3 spam-destroyer coinpayments-payment-gateway-for-woocommerce woocommerce-gateway-stripe wpfront-notification-bar wordpress-seo wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed bulk-media-register enable-media-replace regenerate-thumbnails wp-qrcode wp-pgp-encrypted-emails woo-multi-currency woocommerce-multilingual include-mastodon-feed wp-2fa advanced-nocaptcha-recaptcha hcaptcha-for-forms-and-more leaflet-map extensions-leaflet-map wpforms-lite&#039;&lt;br /&gt;
echo -ne &amp;quot;INFO: Determining Latest Version of Wordpress Plugins \n\t&amp;quot;&lt;br /&gt;
for plugin in $plugins; do&lt;br /&gt;
	echo -n &#039;. &#039;&lt;br /&gt;
&lt;br /&gt;
	json=$(curl -so plugin.json https://api.wordpress.org/plugins/info/1.0/${plugin}.json)&lt;br /&gt;
	latest_version=$(cat plugin.json | jq -r .version)&lt;br /&gt;
	url=$(cat plugin.json | jq -r &amp;quot;.versions.\&amp;quot;${latest_version}\&amp;quot;&amp;quot;)&lt;br /&gt;
	&lt;br /&gt;
	if [ &amp;quot;${url}&amp;quot; = &amp;quot;null&amp;quot; ]; then&lt;br /&gt;
		error=$(cat plugin.json | jq -r .error);&lt;br /&gt;
		description=$(cat plugin.json | jq -r .description);&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\nWARNING: Failed to download plugin ${plugin}&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$error&amp;quot;&lt;br /&gt;
		WARNINGS=&amp;quot;${WARNINGS}\n\t$description&amp;quot;&lt;br /&gt;
	else&lt;br /&gt;
		REMOTE_FILES=&amp;quot;${REMOTE_FILES} ${url}&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
	&lt;br /&gt;
done&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
echo -e &amp;quot;${WARNINGS}&amp;quot;&lt;br /&gt;
echo&lt;br /&gt;
&lt;br /&gt;
# get the file&lt;br /&gt;
for file in ${REMOTE_FILES}; do&lt;br /&gt;
	echo &amp;quot;${file}&amp;quot;&lt;br /&gt;
	${CURL} --progress-bar -O &amp;quot;${file}&amp;quot;&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# checksum&lt;br /&gt;
date -u +&amp;quot;%Y-%m-%d&amp;quot;&lt;br /&gt;
sha256sum *&lt;br /&gt;
&lt;br /&gt;
user@disp1594:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I copied and pasted ^ that script into a whonix dispVM&lt;br /&gt;
# here&#039;s our TOFU 1/3 (Tor, exit in Poland)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Congratulations. This browser is configured to use Tor.&lt;br /&gt;
2024-12-12&lt;br /&gt;
INFO: Determining Latest Version of Wordpress Core&lt;br /&gt;
INFO: Determining Latest Version of Wordpress Plugins &lt;br /&gt;
	. . . . . . . . . jq: error (at &amp;lt;stdin&amp;gt;:0): Cannot index array with string &amp;quot;1.0.17&amp;quot;&lt;br /&gt;
. . . . . . . . . . . . . . . . . . . . . &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
WARNING: Failed to download plugin woo-multi-currency&lt;br /&gt;
	null&lt;br /&gt;
	null&lt;br /&gt;
&lt;br /&gt;
WARNING: Failed to download plugin woo-multi-currency&lt;br /&gt;
	null&lt;br /&gt;
	null&lt;br /&gt;
&lt;br /&gt;
https://downloads.wordpress.org/release/wordpress-6.7.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wps-hide-login.1.9.17.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/melapress-login-security.2.0.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/activitypub.4.4.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/aurora-heatmap.1.7.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/raw-html.1.6.4.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/related-posts-by-taxonomy.2.7.6.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/smart-slider-3.3.5.1.25.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/spam-destroyer.2.1.4.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/woocommerce-gateway-stripe.9.0.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wpfront-notification-bar.3.4.2.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wordpress-seo.24.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-pgp-encrypted-emails.0.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/woocommerce-multilingual.5.3.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/include-mastodon-feed.1.9.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/bulk-media-register.1.40.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/enable-media-replace.4.1.5.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/regenerate-thumbnails.3.1.6.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-qrcode.1.1.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-pgp-encrypted-emails.0.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/woocommerce-multilingual.5.3.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/include-mastodon-feed.1.9.9.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wp-2fa.2.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/advanced-nocaptcha-recaptcha.7.5.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/hcaptcha-for-forms-and-more.4.8.0.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/leaflet-map.3.4.1.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/extensions-leaflet-map.4.4.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
https://downloads.wordpress.org/plugin/wpforms-lite.1.9.2.3.zip&lt;br /&gt;
######################################################################### 100.0%&lt;br /&gt;
2024-12-12&lt;br /&gt;
8b1f9a708838b8710b4198da1116689197e0a6134e0a1a5e786500576383034f  activitypub.4.4.0.zip&lt;br /&gt;
101f645a8f4becdf0394c27195679fe6d134063fde6bd851dc1d57217db5e0e9  advanced-nocaptcha-recaptcha.7.5.0.zip&lt;br /&gt;
873928dd3e940064f5dcac8b74335a9760823147388f472bb755ce5a804eaf53  aurora-heatmap.1.7.0.zip&lt;br /&gt;
5dc1fff3c3e664774ea51d52477e28c060e0b6733a47c6fb5db800eba3a4ea0f  bulk-media-register.1.40.zip&lt;br /&gt;
ad98e83a3bce28612025010d5bca77dd2d29f1df539f2667865d6d959f67e3e0  enable-media-replace.4.1.5.zip&lt;br /&gt;
1a53bdcd1ddb160d5807dc17a0f9e474402e22c899b3a9af486c9d5f0d2c4b36  extensions-leaflet-map.4.4.zip&lt;br /&gt;
27f1ab1e3f5274335d48d0cadaabdef98284880b0324771890d36a1f562fb44a  hcaptcha-for-forms-and-more.4.8.0.zip&lt;br /&gt;
bb0e885969df637767d64d02504d8defb1184db24cd0ade0111ef55ef63c81b9  include-mastodon-feed.1.9.9.zip&lt;br /&gt;
13d906d4677dc3da617752fbe9e7540f0bf84128c0fae43598a10b876dac4217  leaflet-map.3.4.1.zip&lt;br /&gt;
fd1593eefe2fa546926ce0765e7d9944e24c1aca0f9cf2606d3136f4b60cb1b5  melapress-login-security.2.0.1.zip&lt;br /&gt;
923f38397284dceda1028a12c01e78bde22e0d0fecfdd8b95e52cfcc04e47342  plugin.json&lt;br /&gt;
f2cfaf226788dddd8744e723fe1ef53ef0984f956c4fa2678f932f0d8b72116c  raw-html.1.6.4.zip&lt;br /&gt;
757f29991412ef63a099c4fe77a921d23b51097ddb207dff669fbf24ace6a7d6  regenerate-thumbnails.3.1.6.zip&lt;br /&gt;
4f0e6f6505b8eb39b53dd971e8dba8fe98c65a56a7bb24443f4a513c7940f193  related-posts-by-taxonomy.2.7.6.zip&lt;br /&gt;
ebd87841f73bb7946216ae4827a413dcc97fc5094cee2f8ddb6dea7eff356358  smart-slider-3.3.5.1.25.zip&lt;br /&gt;
41bcae0e3cd94b73d7b5761527e68acb9111cb28080dd68f2f83a82cfd87f210  spam-destroyer.2.1.4.zip&lt;br /&gt;
aa52f9a4c8bbe856fe045e5c76ffedae3573374ee43435de78e1561d8e0169a9  woocommerce-gateway-stripe.9.0.0.zip&lt;br /&gt;
fbe62fc4ec4b91915024c126d9b86b3798c283f60d95435f3e6e1226ddd722aa  woocommerce-multilingual.5.3.9.zip&lt;br /&gt;
75f4e9cb71e583ca3f8b19691b5754adb9c981580762137f82443e1eec468f9c  wordpress-6.7.1.zip&lt;br /&gt;
f9ce7a98840dd4bf490d955320a68ac553c767ba7f0eeae6e4f067be5a927ef3  wordpress-seo.24.0.zip&lt;br /&gt;
feda19ad71ea22abe4dbcff422f6e0e6c8315f26a7d246099967a5eea17b4d38  wp-2fa.2.8.0.zip&lt;br /&gt;
130ba1a4f2396a8e183b8ce732c9bc8a3cf6698890f6f216550188e78e082fda  wpforms-lite.1.9.2.3.zip&lt;br /&gt;
6e1d71809f4421463fc19c5c119c5e49788cd3676b730f7980e3dcd209520a1c  wpfront-notification-bar.3.4.2.zip&lt;br /&gt;
e3cb9db45795a8caed13e00414ce7f43d2bb517a35b88cda98ad91b6871b46e2  wp-pgp-encrypted-emails.0.8.0.zip&lt;br /&gt;
e50735bcda4e85df1e522fda113ae24fd973f000e75154472544d4bcf51491f1  wp-qrcode.1.1.1.zip&lt;br /&gt;
bedfe5b456f5a5b3b6d4b29dd6577f6b8492f4594a192678555691e8403a56d7  wps-hide-login.1.9.17.1.zip&lt;br /&gt;
user@host:/tmp/user/1000/tmp.THquHNCCMu$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well that looks good, except for the failure of the &#039;woo-multi-currency&#039; plugin&lt;br /&gt;
## looks like the URL for the &amp;quot;latest version&amp;quot; is missing&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@host:~$ json=$(curl -so plugin.json https://api.wordpress.org/plugins/info/1.0/woo-multi-currency.json)&lt;br /&gt;
user@host:~$ echo $json&lt;br /&gt;
&lt;br /&gt;
user@host:~$ ls&lt;br /&gt;
Desktop    Downloads  Pictures     Public     Videos&lt;br /&gt;
Documents  Music      plugin.json  Templates&lt;br /&gt;
user@host:~$ &lt;br /&gt;
&lt;br /&gt;
user@host:~$ cat plugin.json | jq -r .version&lt;br /&gt;
2.2.4&lt;br /&gt;
user@host:~$ &lt;br /&gt;
&lt;br /&gt;
user@host:~$ cat plugin.json | jq -r &amp;quot;.versions.2.2.4&amp;quot;&lt;br /&gt;
jq: error: Invalid numeric literal at EOF at line 1, column 6 (while parsing &#039;.2.2.4&#039;) at &amp;lt;top-level&amp;gt;, line 1:&lt;br /&gt;
.versions.2.2.4         &lt;br /&gt;
jq: error: syntax error, unexpected LITERAL, expecting $end (Unix shell quoting issues?) at &amp;lt;top-level&amp;gt;, line 1:&lt;br /&gt;
.versions.2.2.4         &lt;br /&gt;
jq: 2 compile errors&lt;br /&gt;
user@host:~$ &lt;br /&gt;
&lt;br /&gt;
user@host:~$ cat plugin.json | jq -r &amp;quot;.versions&amp;quot;&lt;br /&gt;
{&lt;br /&gt;
  &amp;quot;2.1.12&amp;quot;: &amp;quot;https://downloads.wordpress.org/plugin/woo-multi-currency.2.1.12.zip&amp;quot;,&lt;br /&gt;
  &amp;quot;2.1.14&amp;quot;: &amp;quot;https://downloads.wordpress.org/plugin/woo-multi-currency.2.1.14.zip&amp;quot;,&lt;br /&gt;
  &amp;quot;2.1.7&amp;quot;: &amp;quot;https://downloads.wordpress.org/plugin/woo-multi-currency.2.1.7.zip&amp;quot;,&lt;br /&gt;
  &amp;quot;2.1.8&amp;quot;: &amp;quot;https://downloads.wordpress.org/plugin/woo-multi-currency.2.1.8.zip&amp;quot;,&lt;br /&gt;
  &amp;quot;2.1.9&amp;quot;: &amp;quot;https://downloads.wordpress.org/plugin/woo-multi-currency.2.1.9.zip&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
user@host:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# whatever, this plugin wasn&#039;t so important. Let&#039;s just ignore &amp;amp; skip this plugin&lt;br /&gt;
# by next week we should have all 3 TOFUs. If they match, I&#039;ll go ahead and copy them to hetzner3, figure out which ones we want to use, and then finish off the store.opensourceecology.org migration script CHG steps&lt;br /&gt;
# ...&lt;br /&gt;
# I still have 1 remaining TODO item on the hetzner3 backups, which was to wait some weeks and then verify that the backups cron/lifecycle rules are working&lt;br /&gt;
# well, it&#039;s been &amp;gt;11 weeks since I setup backups on hetzner3 https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024_Q3#Sun_Sep_22.2C_2024&lt;br /&gt;
# here&#039;s what we see in the bucket&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # sudo rclone lsl b2:ose-server-backups | grep hetzner3&lt;br /&gt;
2258493547 2024-12-10 07:29:09.905000000 daily_hetzner3_20241210_072828.tar.gpg&lt;br /&gt;
2266009707 2024-12-11 07:58:32.491000000 daily_hetzner3_20241211_075750.tar.gpg&lt;br /&gt;
2272696427 2024-12-12 07:25:19.985000000 daily_hetzner3_20241212_072443.tar.gpg&lt;br /&gt;
1782579309 2024-10-01 08:05:13.312000000 monthly_hetzner3_20241001_080447.tar.gpg&lt;br /&gt;
1986529389 2024-11-01 07:37:07.030000000 monthly_hetzner3_20241101_073631.tar.gpg&lt;br /&gt;
2195302509 2024-12-01 07:45:57.990000000 monthly_hetzner3_20241201_074518.tar.gpg&lt;br /&gt;
2104289388 2024-11-18 07:23:52.227000000 weekly_hetzner3_20241118_072314.tar.gpg&lt;br /&gt;
2153154668 2024-11-25 07:50:49.139000000 weekly_hetzner3_20241125_075009.tar.gpg&lt;br /&gt;
2202644588 2024-12-02 08:05:51.395000000 weekly_hetzner3_20241202_080510.tar.gpg&lt;br /&gt;
2251448428 2024-12-09 07:31:51.872000000 weekly_hetzner3_20241209_073110.tar.gpg&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so we&#039;ve got 3 monthlies already. that&#039;s great&lt;br /&gt;
# we have the past 3 daily backups and nothing more. so that suggests that recent backups are working fine, and the lifecycle rules are too; great&lt;br /&gt;
# and we have 4 weeklies, which sounds correct; all looks great!&lt;br /&gt;
# ...&lt;br /&gt;
# another item I had pending on my TODO list was to verify munin after some time to ensure that the charts were getting populated with data &lt;br /&gt;
# obviously this server is basically idle, but any lines at all are fine&lt;br /&gt;
# apache looks good&lt;br /&gt;
## apache logs were actually empy on hetzner2 for some reason, and there was only two charts: apache processes and apache volume (both empty)&lt;br /&gt;
## on hetzner3, we have 3 charts: apache access (accesses per second, mostly zero with a couple spikes ), apache processes (it shows a solid 48-49 idle servers and a solid 100 &amp;quot;free slots&amp;quot; over the past few months), and apache volume (mostly zero with occasional spikes to ~2k-18k bytes per second)&lt;br /&gt;
# we have some data for nginx, but not as many charts as we used to have&lt;br /&gt;
## on hetzner2, it was called &amp;quot;webserver&amp;quot; whereas on hetzner3 it&#039;s called &amp;quot;nginx&amp;quot; for some reason&lt;br /&gt;
## on hetzner2, we had 4 charts: nginx requests, nginx requests, nginx status, ngingx status&lt;br /&gt;
## oh wait, no, that&#039;s the two charts duplicated; it&#039;s the same data.&lt;br /&gt;
## on hetzner3 we also have these 2 chats. They&#039;re near zero, but both have spikes up to ~0.2-1.2 requests per second and 1-9 connections per second&lt;br /&gt;
# our varnish charts on hetzner3 are also working&lt;br /&gt;
## we have 9 charts on hetzner2 and 8 charts on hetzner3&lt;br /&gt;
## looks like we&#039;re missing the &amp;quot;uptime&amp;quot; chart on hetzner3. I&#039;ve read this can be useful just to monitor if varnish has some issue that causes it to restart its child processes, so it would probably be good to add that one, if trivial&lt;br /&gt;
## all the other charts have data except the &amp;quot;misbehavoiur&amp;quot; chart is all zeros. I&#039;m guessing that&#039;s possibly because there are no misbehaviours. Maybe.&lt;br /&gt;
## curiously, this section on hetzner2 used to be called &#039;varnish4&#039; whereas on hetzner3 it&#039;s called &amp;quot;webserver&amp;quot;&lt;br /&gt;
# I did notice that all of the mysql charts are empty on hetzner3, so we should probably investigate that too&lt;br /&gt;
# the &amp;quot;process info&amp;quot; charts are also all empty, so we should check on that&lt;br /&gt;
# well, I confirmed that the varnish_uptime plugin is in-place&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ls -lah | grep -i varnish&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_backend_traffic -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_bad -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_expunge -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_hit_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_memory_usage -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_objects -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_request_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_threads -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_transfer_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# for comparison, here&#039;s what we have on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# ls -lah | grep -i varnish&lt;br /&gt;
-rwxr-xr-x 1 root root  26K Mar  3  2018 varnish4_&lt;br /&gt;
lrwxrwxrwx 1 root root    9 Mar  3  2018 varnish_backend_traffic -&amp;gt; varnish4_&lt;br /&gt;
lrwxrwxrwx 1 root root   33 Mar  3  2018 varnish_backend_traffic.bak -&amp;gt; /usr/share/munin/plugins/varnish_&lt;br /&gt;
lrwxrwxrwx 1 root root    9 Mar  3  2018 varnish_bad -&amp;gt; varnish4_&lt;br /&gt;
lrwxrwxrwx 1 root root   33 Mar  3  2018 varnish_bad.bak -&amp;gt; /usr/share/munin/plugins/varnish_&lt;br /&gt;
lrwxrwxrwx 1 root root    9 Mar  3  2018 varnish_expunge -&amp;gt; varnish4_&lt;br /&gt;
lrwxrwxrwx 1 root root   33 Mar  3  2018 varnish_expunge.bak -&amp;gt; /usr/share/munin/plugins/varnish_&lt;br /&gt;
lrwxrwxrwx 1 root root    9 Mar  3  2018 varnish_hit_rate -&amp;gt; varnish4_&lt;br /&gt;
lrwxrwxrwx 1 root root   33 Mar  3  2018 varnish_hit_rate.bak -&amp;gt; /usr/share/munin/plugins/varnish_&lt;br /&gt;
lrwxrwxrwx 1 root root    9 Mar  3  2018 varnish_memory_usage -&amp;gt; varnish4_&lt;br /&gt;
lrwxrwxrwx 1 root root   33 Mar  3  2018 varnish_memory_usage.bak -&amp;gt; /usr/share/munin/plugins/varnish_&lt;br /&gt;
lrwxrwxrwx 1 root root    9 Mar  3  2018 varnish_objects -&amp;gt; varnish4_&lt;br /&gt;
lrwxrwxrwx 1 root root   33 Mar  3  2018 varnish_objects.bak -&amp;gt; /usr/share/munin/plugins/varnish_&lt;br /&gt;
lrwxrwxrwx 1 root root    9 Mar  3  2018 varnish_request_rate -&amp;gt; varnish4_&lt;br /&gt;
lrwxrwxrwx 1 root root   33 Mar  3  2018 varnish_request_rate.bak -&amp;gt; /usr/share/munin/plugins/varnish_&lt;br /&gt;
lrwxrwxrwx 1 root root    9 Mar  3  2018 varnish_threads -&amp;gt; varnish4_&lt;br /&gt;
lrwxrwxrwx 1 root root   33 Mar  3  2018 varnish_threads.bak -&amp;gt; /usr/share/munin/plugins/varnish_&lt;br /&gt;
lrwxrwxrwx 1 root root    9 Mar  3  2018 varnish_transfer_rates -&amp;gt; varnish4_&lt;br /&gt;
lrwxrwxrwx 1 root root   33 Mar  3  2018 varnish_transfer_rates.bak -&amp;gt; /usr/share/munin/plugins/varnish_&lt;br /&gt;
lrwxrwxrwx 1 root root    9 Mar  3  2018 varnish_uptime -&amp;gt; varnish4_&lt;br /&gt;
lrwxrwxrwx 1 root root   33 Mar  3  2018 varnish_uptime.bak -&amp;gt; /usr/share/munin/plugins/varnish_&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, it looks like the new version that we have on hetzner3 has separated &amp;quot;varnish_uptime&amp;quot; into two distinct uptimes. my guess is that one is the parent and one is the child process&lt;br /&gt;
## hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# grep -A3 &#039;uptime&#039; varnish_uptime&lt;br /&gt;
		&#039;uptime&#039; =&amp;gt; {&lt;br /&gt;
				&#039;title&#039; =&amp;gt; &#039;Varnish uptime&#039;,&lt;br /&gt;
				&#039;vlabel&#039; =&amp;gt; &#039;days&#039;,&lt;br /&gt;
				&#039;scale&#039; =&amp;gt; &#039;no&#039;,&lt;br /&gt;
				&#039;values&#039; =&amp;gt; {&lt;br /&gt;
						&#039;uptime&#039; =&amp;gt; {&lt;br /&gt;
								&#039;type&#039; =&amp;gt; &#039;GAUGE&#039;,&lt;br /&gt;
								&#039;cdef&#039; =&amp;gt; &#039;uptime,86400,/&#039;&lt;br /&gt;
						}&lt;br /&gt;
				}&lt;br /&gt;
		},&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # grep -A3 uptime varnish_uptime &lt;br /&gt;
		&#039;main_uptime&#039; =&amp;gt; {&lt;br /&gt;
				&#039;type&#039; =&amp;gt; &#039;MAIN&#039;,&lt;br /&gt;
				&#039;title&#039; =&amp;gt; &#039;Varnish Child uptime&#039;,&lt;br /&gt;
				&#039;vlabel&#039; =&amp;gt; &#039;days&#039;,&lt;br /&gt;
				&#039;scale&#039; =&amp;gt; &#039;no&#039;,&lt;br /&gt;
				&#039;values&#039; =&amp;gt; {&lt;br /&gt;
						&#039;uptime&#039; =&amp;gt; {&lt;br /&gt;
								&#039;type&#039; =&amp;gt; &#039;GAUGE&#039;,&lt;br /&gt;
								&#039;cdef&#039; =&amp;gt; &#039;uptime,86400,/&#039;&lt;br /&gt;
						},&lt;br /&gt;
				}&lt;br /&gt;
		},&lt;br /&gt;
		&#039;mgt_uptime&#039; =&amp;gt; {&lt;br /&gt;
				&#039;type&#039; =&amp;gt; &#039;MGT&#039;,&lt;br /&gt;
				&#039;title&#039; =&amp;gt; &#039;Varnish Management uptime&#039;,&lt;br /&gt;
				&#039;vlabel&#039; =&amp;gt; &#039;days&#039;,&lt;br /&gt;
				&#039;scale&#039; =&amp;gt; &#039;no&#039;,&lt;br /&gt;
				&#039;values&#039; =&amp;gt; {&lt;br /&gt;
						&#039;uptime&#039; =&amp;gt; {&lt;br /&gt;
								&#039;type&#039; =&amp;gt; &#039;GAUGE&#039;,&lt;br /&gt;
								&#039;cdef&#039; =&amp;gt; &#039;uptime,86400,/&#039;&lt;br /&gt;
						},&lt;br /&gt;
				}&lt;br /&gt;
		},&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I updated the symlinks to include these two charts on hetzner3&#039;s muni plugins dir&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ls -lah varnish_uptime &lt;br /&gt;
lrwxrwxrwx 1 root root 34 Sep 25 01:47 varnish_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # rm -f varnish_uptime&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ln -s /usr/share/munin/plugins/varnish5_ varnish_main_uptime&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ln -s /usr/share/munin/plugins/varnish5_ varnish_mgt_uptime&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ls -lah | grep -i varnish_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_backend_traffic -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_bad -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_expunge -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_hit_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_main_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_memory_usage -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Dec 13 00:03 varnish_mgt_uptime -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_objects -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_request_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_threads -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
lrwxrwxrwx 1 root root   34 Sep 25 01:47 varnish_transfer_rate -&amp;gt; /usr/share/munin/plugins/varnish5_&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # service munin-node restart&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# a manual test run looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # munin-run varnish_main_uptime&lt;br /&gt;
uptime.value 6635471&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # munin-run varnish_mgt_uptime&lt;br /&gt;
uptime.value 6635477&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I refreshed the munin WUI and, yep, I now see two additional charts. One&#039;s named &amp;quot;Varnish Child Uptime&amp;quot; and the other &amp;quot;Varnish Management Uptime&amp;quot;. Perfect.&lt;br /&gt;
# varnish was the lowest hanging fruit, but the most important missing chart is mysql. Well, all our mysql charts are missing. what gives?&lt;br /&gt;
# first, our plugins dirs look very different on both servers&lt;br /&gt;
## hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# ls -lah | grep -i mysql&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 bin_relay_log -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 commands -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 connections -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 files_tables -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 innodb_bpool -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 innodb_bpool_act -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 innodb_insert_buf -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 innodb_io -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 innodb_io_pend -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 innodb_log -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 innodb_rows -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 innodb_semaphores -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 innodb_tnx -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 myisam_indexes -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 mysql_ -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   38 Oct  8  2019 mysql_queries -&amp;gt; /usr/share/munin/plugins/mysql_queries&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 network_traffic -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   28 Oct  8  2019 ps_mysqld -&amp;gt; /usr/share/munin/plugins/ps_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 qcache -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 qcache_mem -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 replication -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 select_types -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 slow -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 sorts -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 table_locks -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Oct  8  2019 tmp_tables -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # ls -lah | grep mysql&lt;br /&gt;
lrwxrwxrwx 1 root root   31 Sep 25 01:47 mysql_ -&amp;gt; /usr/share/munin/plugins/mysql_&lt;br /&gt;
lrwxrwxrwx 1 root root   36 Sep 25 01:47 mysql_bytes -&amp;gt; /usr/share/munin/plugins/mysql_bytes&lt;br /&gt;
lrwxrwxrwx 1 root root   37 Sep 25 01:47 mysql_innodb -&amp;gt; /usr/share/munin/plugins/mysql_innodb&lt;br /&gt;
lrwxrwxrwx 1 root root   42 Sep 25 01:47 mysql_isam_space_ -&amp;gt; /usr/share/munin/plugins/mysql_isam_space_&lt;br /&gt;
lrwxrwxrwx 1 root root   38 Sep 25 01:47 mysql_queries -&amp;gt; /usr/share/munin/plugins/mysql_queries&lt;br /&gt;
lrwxrwxrwx 1 root root   42 Sep 25 01:47 mysql_slowqueries -&amp;gt; /usr/share/munin/plugins/mysql_slowqueries&lt;br /&gt;
lrwxrwxrwx 1 root root   38 Sep 25 01:47 mysql_threads -&amp;gt; /usr/share/munin/plugins/mysql_threads&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# there&#039;s a lot of different files here, but let&#039;s just test the &amp;quot;mysql_queries&amp;quot; one&lt;br /&gt;
## hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# munin-run mysql_queries&lt;br /&gt;
delete.value 17433&lt;br /&gt;
insert.value 20779&lt;br /&gt;
replace.value 206033&lt;br /&gt;
select.value 94566392&lt;br /&gt;
update.value 39021&lt;br /&gt;
cache_hits.value 0&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # munin-run mysql_queries&lt;br /&gt;
mysqladmin: connect to server at &#039;localhost&#039; failed&lt;br /&gt;
error: &#039;Access denied for user &#039;munin&#039;@&#039;localhost&#039; (using password: NO)&#039;&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, it&#039;s an auth error&lt;br /&gt;
# if I check the munin configs, then it&#039;s pretty obvious. I have a password defined for hetzner2 and nothing for hetzner3. I imagine a password might not be necessary if we allow passwordless auth from localhost or something, but I guess I never set that up?&lt;br /&gt;
## hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology plugins]# cat ../plugin-conf.d/zzz-ose &lt;br /&gt;
# ose-specific configs go here per this doc&lt;br /&gt;
#  * http://guide.munin-monitoring.org/en/latest/plugin/use.html#configuring&lt;br /&gt;
&lt;br /&gt;
[nginx_wiki.opensourceecology.org_*]&lt;br /&gt;
env.url https://wiki.opensourceecology.org/nginx_status&lt;br /&gt;
env.graph_title graph title&lt;br /&gt;
env.graph_info graph info goes here&lt;br /&gt;
&lt;br /&gt;
[nginx_www.opensourceecology.org_*]&lt;br /&gt;
env.url https://www.opensourceecology.org/nginx_status&lt;br /&gt;
&lt;br /&gt;
[mysql*]&lt;br /&gt;
user root&lt;br /&gt;
group wheel&lt;br /&gt;
env.mysqlopts -u munin_user -pREDACTED&lt;br /&gt;
&lt;br /&gt;
[multips_memory]&lt;br /&gt;
env.names varnishd mysqld httpd varnishlog systemd-journal rsyslogd b2 nginx munin munin-node ssh sshd openvpn tuned ossec-analysisd bash vim screen tail gpg gpg2 polkitd tuned&lt;br /&gt;
[root@opensourceecology plugins]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # cat ../plugin-conf.d/zzz-myconf &lt;br /&gt;
# Ansible managed&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
# File:    zzz-myconf&lt;br /&gt;
# Version: 0.1&lt;br /&gt;
# Purpose: Munin custom config&lt;br /&gt;
#          we set custom munin configs in &#039;zzz-myconf&#039; per this doc&lt;br /&gt;
#           * http://guide.munin-monitoring.org/en/latest/plugin/use.html#configuring&lt;br /&gt;
# Author:  Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
# Created: 2024-09-14&lt;br /&gt;
# Updated: 2024-09-14&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
[nginx_*]&lt;br /&gt;
env.url http://127.0.0.1/nginx_status&lt;br /&gt;
&lt;br /&gt;
[apache_*]&lt;br /&gt;
env.url   http://127.0.0.1:%d/server-status?auto&lt;br /&gt;
env.ports 8000&lt;br /&gt;
&lt;br /&gt;
[mysql*]&lt;br /&gt;
env.mysqlopts -u munin&lt;br /&gt;
env.mysqluser munin&lt;br /&gt;
&lt;br /&gt;
[multips_memory]&lt;br /&gt;
env.names mysqld apache2 cache-main ossec-analysisd wazuh-db ossec-syscheckd munin-node munin-html munin-update nginx ssh sshd pickup journalctl trivial-rewrite systemd-journal varnishd wazuh-modulesd unattended-upgrades bash sudo tcpdump varnishlog ossec-remoted systemd-logind su dhclient ossec-authd ossec-execd ossec-logcollector ossec-maild ossec-monitord screen (sd-pam) systemd cleanup qmgr tlsmgr rewrite bounce defer trace verify flush proxymap proxywrite smtp relay showq error retry discard local virtual lmtp anvil scache submission grep tail rsyslogd systemd-udevd dbus-daemon gpg gpg2 cleanMega backup.sh rclone b2 chown chmod tar rm mv python agetty sh qemu-ga gpg-agent awstats.pl openvpn tuned vim polkitd&lt;br /&gt;
&lt;br /&gt;
[varnish*]&lt;br /&gt;
user root&lt;br /&gt;
env.varnishstat /usr/bin/varnishstat&lt;br /&gt;
&lt;br /&gt;
[proc]&lt;br /&gt;
env.procname mysqld apache2&lt;br /&gt;
root@hetzner3 /etc/munin/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# checking my logs on the wiki, it looks like I first&lt;br /&gt;
&lt;br /&gt;
=Wed Dec 11, 2024=&lt;br /&gt;
&lt;br /&gt;
# Catarina responded to my mail from yesterday affirming that I should reset the unused store.opensourceecology.org wordpress site to some free wordpress theme for now, and use the license that would otherwise be used for it for www.opensourceecology.org instead&lt;br /&gt;
# I updated my /etc/hosts to point to the hetzner3 server for store.opensourceecology.org&lt;br /&gt;
# I checked http://store.opensourceecology.org/ in my web browser. it still just returns the generic wordpress critical error message&lt;br /&gt;
# I was able to login to the wp admin wui on the hetzner3 store site&lt;br /&gt;
# I changed the theme from &#039;oshine&#039; to &#039;twenty seventeen&#039;&lt;br /&gt;
# now when I load the store site&#039;s frontpage, the error has disappeared, but it still says &amp;quot;oshine&amp;quot; at the top and the body is lots of broken shorcodes&lt;br /&gt;
# if I go to Settings -&amp;gt; Reading, then I can see that the &amp;quot;Homepage&amp;quot; is set to &amp;quot;Home v37&amp;quot;&lt;br /&gt;
# I changed the store hoepage to &amp;quot;Sample Page&amp;quot;&lt;br /&gt;
# that&#039;s the best page I found. It still has broken tatsu shortcodes in the footer, but at last the word &amp;quot;oshine&amp;quot; doesn&#039;t appear anywhere.&lt;br /&gt;
# this site is temporary, and these actions are going to get overwritten at the time we actually rsync from live to hetzner3 during the migration, so I need a place to record each of these little changes as part of the migration playbook&lt;br /&gt;
# I created a wiki CHG &amp;quot;ticket&amp;quot; for the migration of store.opensourceecology.org from hetzner2 to hetzner3 https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_migrate_store_to_hetzner3&lt;br /&gt;
# I&#039;m using wp-cli to more easily get the list of themes &amp;amp; plugins versions&lt;br /&gt;
## hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# sudo -u wp -i wp --path=/var/www/html/store.opensourceecology.org/htdocs/ plugin list&lt;br /&gt;
...&lt;br /&gt;
+------------------------------------------------+----------+--------+---------+&lt;br /&gt;
| name                                           | status   | update | version |&lt;br /&gt;
+------------------------------------------------+----------+--------+---------+&lt;br /&gt;
| akismet                                        | inactive | none   | 4.1.1   |&lt;br /&gt;
| be-gdpr                                        | active   | none   | 1.1.2   |&lt;br /&gt;
| be-portfolio-post                              | active   | none   | 1.1     |&lt;br /&gt;
| classic-editor                                 | inactive | none   | 1.4     |&lt;br /&gt;
| colorhub                                       | active   | none   | 1.0.5   |&lt;br /&gt;
| contact-form-7                                 | active   | none   | 5.1.1   |&lt;br /&gt;
| force-strong-passwords                         | active   | none   | 1.8.0   |&lt;br /&gt;
| google-authenticator                           | active   | none   | 0.48    |&lt;br /&gt;
| google-authenticator-encourage-user-activation | active   | none   | 0.2     |&lt;br /&gt;
| hello                                          | inactive | none   | 1.7.1   |&lt;br /&gt;
| masterslider                                   | active   | none   | 3.2.7   |&lt;br /&gt;
| meta-box                                       | active   | none   | 4.17.3  |&lt;br /&gt;
| meta-box-conditional-logic                     | active   | none   | 1.6.4   |&lt;br /&gt;
| meta-box-show-hide                             | active   | none   | 1.1.0   |&lt;br /&gt;
| meta-box-tabs                                  | active   | none   | 1.1.1   |&lt;br /&gt;
| oshine-core                                    | active   | none   | 1.3.7   |&lt;br /&gt;
| oshine-modules                                 | active   | none   | 2.2.9   |&lt;br /&gt;
| redux-vendor-support                           | active   | none   | 1.0.1   |&lt;br /&gt;
| rename-wp-login                                | active   | none   | 2.5.5   |&lt;br /&gt;
| revslider                                      | active   | none   | 5.4.8.3 |&lt;br /&gt;
| ssl-insecure-content-fixer                     | active   | none   | 2.7.2   |&lt;br /&gt;
| tatsu                                          | active   | none   | 2.9.3.3 |&lt;br /&gt;
| typehub                                        | active   | none   | 1.4.3   |&lt;br /&gt;
| vcaching                                       | active   | none   | 1.6.9   |&lt;br /&gt;
| woocommerce                                    | active   | none   | 3.5.7   |&lt;br /&gt;
| coingate-for-woocommerce                       | active   | none   | 1.2.2   |&lt;br /&gt;
+------------------------------------------------+----------+--------+---------+&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
+---------------+----------+--------+---------+---------------+---------------+&lt;br /&gt;
| name          | status   | update | version | update_versio | auto_update   |&lt;br /&gt;
|               |          |        |         | n             |               |&lt;br /&gt;
+---------------+----------+--------+---------+---------------+---------------+&lt;br /&gt;
| akismet       | inactive | none   | 5.3.3   |               | off           |&lt;br /&gt;
| classic-edito | inactive | none   | 1.6.5   |               | off           |&lt;br /&gt;
| r             |          |        |         |               |               |&lt;br /&gt;
| contact-form- | active   | none   | 5.9.8   |               | off           |&lt;br /&gt;
| 7             |          |        |         |               |               |&lt;br /&gt;
| google-authen | active   | none   | 0.2     |               | off           |&lt;br /&gt;
| ticator-encou |          |        |         |               |               |&lt;br /&gt;
| rage-user-act |          |        |         |               |               |&lt;br /&gt;
| ivation       |          |        |         |               |               |&lt;br /&gt;
| google-authen | active   | none   | 0.54    |               | off           |&lt;br /&gt;
| ticator       |          |        |         |               |               |&lt;br /&gt;
| hello         | inactive | none   | 1.7.1   |               | off           |&lt;br /&gt;
| meta-box      | active   | none   | 5.10.2  |               | off           |&lt;br /&gt;
| ssl-insecure- | active   | none   | 2.7.2   |               | off           |&lt;br /&gt;
| content-fixer |          |        |         |               |               |&lt;br /&gt;
| vcaching      | active   | none   | 1.8.3   |               | off           |&lt;br /&gt;
| woocommerce   | active   | none   | 9.3.3   |               | off           |&lt;br /&gt;
| coingate-for- | inactive | none   | 2.1.1   |               | off           |&lt;br /&gt;
| woocommerce   |          |        |         |               |               |&lt;br /&gt;
+---------------+----------+--------+---------+---------------+---------------+&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # &lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # sudo -u wp -i wp --path=/var/www/html/store.opensourceecology.org/htdocs/ plugin list&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# here&#039;s what I got so far&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
update plugin &#039;akismet&#039; from v4.1.1 to v5.3.3&lt;br /&gt;
uninstall plugin &#039;be-gdpr&#039;&lt;br /&gt;
uninstall plugin &#039;be-portfolio-post&#039;&lt;br /&gt;
update plugin &#039;classic-editor&#039; from v1.4 to v1.6.5&lt;br /&gt;
uninstall plugin &#039;colorhub&#039;&lt;br /&gt;
update plugin &#039;contact-form-7&#039; from v5.1.1 to v5.9.8&lt;br /&gt;
uninstall plugin &#039;force-strong-passwords&#039;&lt;br /&gt;
update plugin &#039;google-authenticator&#039; from v0.48 to 0.54&lt;br /&gt;
uninstall plugin &#039;masterslider&#039;&lt;br /&gt;
update plugin &#039;meta-box&#039; from v4.17.3 to v5.10.2&lt;br /&gt;
uninstall plugin &#039;meta-box-conditional-logic&#039;&lt;br /&gt;
uninstall plugin &#039;meta-box-show-hide&#039;&lt;br /&gt;
uninstall plugin &#039;meta-box-tabs&#039;&lt;br /&gt;
uninstall plugin &#039;oshine-core&#039;&lt;br /&gt;
uninstall plugin &#039;oshine-modules&#039;&lt;br /&gt;
uninstall plugin &#039;redux-vendor-support&#039;&lt;br /&gt;
uninstall plugin &#039;rename-wp-login&#039;&lt;br /&gt;
uninstall plugin &#039;revslider&#039;&lt;br /&gt;
uninstall plugin &#039;tatsu&#039;&lt;br /&gt;
uninstall plugin &#039;typehub&#039;&lt;br /&gt;
update plugin &#039;vaching&#039; from v1.6.9 to v1.8.3&lt;br /&gt;
update plugin &#039;woocommerce&#039; from v3.5.7 to v9.3.3&lt;br /&gt;
update plugin &#039;coingate-for-woocommerce&#039; from v1.2.2 to v2.1.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but I think I&#039;m going to have to add some other plugins. For example we need something to replace the now-defunct &#039;rename-wp-login&#039; and &#039;forece-strong-passwords&#039; plugins&lt;br /&gt;
# I also got the theme info on hetzner2 &amp;amp; hetzner3 to diff and add to the ticket&lt;br /&gt;
## hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# sudo -u wp -i wp --path=/var/www/html/store.opensourceecology.org/htdocs/ theme list&lt;br /&gt;
...&lt;br /&gt;
+-----------------+----------+--------+---------+&lt;br /&gt;
| name            | status   | update | version |&lt;br /&gt;
+-----------------+----------+--------+---------+&lt;br /&gt;
| oshin           | active   | none   | 6.6.4.4 |&lt;br /&gt;
| storefront      | inactive | none   | 2.4.5   |&lt;br /&gt;
| twentyeleven    | inactive | none   | 3.2     |&lt;br /&gt;
| twentyfifteen   | inactive | none   | 2.4     |&lt;br /&gt;
| twentyfourteen  | inactive | none   | 2.6     |&lt;br /&gt;
| twentynineteen  | inactive | none   | 1.3     |&lt;br /&gt;
| twentyseventeen | inactive | none   | 2.1     |&lt;br /&gt;
| twentysixteen   | inactive | none   | 1.9     |&lt;br /&gt;
| twentyten       | inactive | none   | 2.8     |&lt;br /&gt;
| twentythirteen  | inactive | none   | 2.8     |&lt;br /&gt;
| twentytwelve    | inactive | none   | 2.9     |&lt;br /&gt;
+-----------------+----------+--------+---------+&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # sudo -u wp -i wp --path=/var/www/html/store.opensourceecology.org/htdocs/ theme list&lt;br /&gt;
...&lt;br /&gt;
+---------------+----------+--------+---------+---------------+---------------+&lt;br /&gt;
| name          | status   | update | version | update_versio | auto_update   |&lt;br /&gt;
|               |          |        |         | n             |               |&lt;br /&gt;
+---------------+----------+--------+---------+---------------+---------------+&lt;br /&gt;
| oshin         | inactive | none   | 7.2.1   |               | off           |&lt;br /&gt;
| storefront    | inactive | none   | 4.6.0   |               | off           |&lt;br /&gt;
| twentyeleven  | inactive | none   | 4.7     |               | off           |&lt;br /&gt;
| twentyfifteen | inactive | none   | 3.8     |               | off           |&lt;br /&gt;
| twentyfourtee | inactive | none   | 4.0     |               | off           |&lt;br /&gt;
| n             |          |        |         |               |               |&lt;br /&gt;
| twentyninetee | inactive | none   | 2.9     |               | off           |&lt;br /&gt;
| n             |          |        |         |               |               |&lt;br /&gt;
| twentysevente | active   | none   | 3.7     |               | off           |&lt;br /&gt;
| en            |          |        |         |               |               |&lt;br /&gt;
| twentysixteen | inactive | none   | 3.3     |               | off           |&lt;br /&gt;
| twentyten     | inactive | none   | 4.2     |               | off           |&lt;br /&gt;
| twentythirtee | inactive | none   | 4.2     |               | off           |&lt;br /&gt;
| n             |          |        |         |               |               |&lt;br /&gt;
| twentytwelve  | inactive | none   | 4.3     |               | off           |&lt;br /&gt;
+---------------+----------+--------+---------+---------------+---------------+&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I then spent some time updating the CHG script&lt;br /&gt;
# I got stuck a bit on the rsync&lt;br /&gt;
# first I tried pulling on the hetzner3 from hetzner2, but it didn&#039;t like that we have to auth with a password as sudo on hetzner2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # rsync -avvv --progress --rsync-path=&amp;quot;sudo rsync&amp;quot; -e &amp;quot;ssh -p 32415&amp;quot; maltfield@138.201.84.223:32415${backupDir_hetzner2}/current/* ${backupDir_hetzner3}/current/&lt;br /&gt;
opening connection using: ssh -p 32415 -l maltfield 138.201.84.223 &amp;quot;sudo rsync&amp;quot; --server --sender -vvvlogDtpre.iLsfxCIvu . &amp;quot;32415/var/tmp/backups_for_migration_to_hetzner3/store.opensourceecology.org_20241212/current/*&amp;quot;  (12 args)&lt;br /&gt;
sudo: no tty present and no askpass program specified&lt;br /&gt;
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]&lt;br /&gt;
rsync error: error in rsync protocol data stream (code 12) at io.c(231) [Receiver=3.2.7]&lt;br /&gt;
[Receiver] _exit_cleanup(code=12, file=io.c, line=231): about to call exit(12)&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# instead I decided to push on hetzner2 to hetzner3&lt;br /&gt;
# first I updated the hetzner2:/etc/hosts file to hard-code the IP address for &amp;quot;hetzner3&amp;quot;, so this will go smother in the future&lt;br /&gt;
# ok, this worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[maltfield@opensourceecology ~]$ rsync -av --progress --rsync-path=&amp;quot;sudo rsync&amp;quot; -e &amp;quot;ssh -p 32415&amp;quot; ${backupDir_hetzner2}/current/* maltfield@hetzner3:${backupDir_hetzner3}/current/&lt;br /&gt;
sending incremental file list&lt;br /&gt;
mysqldump_store.opensourceecology.org.20241211.sql.bz2&lt;br /&gt;
      1,406,610 100%  109.18MB/s    0:00:00 (xfr#1, to-chk=1/2)&lt;br /&gt;
store.opensourceecology.org_files.20241211.tar.gz&lt;br /&gt;
    184,205,865 100%   92.26MB/s    0:00:01 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 185,658,021 bytes  received 54 bytes  53,045,164.29 bytes/sec&lt;br /&gt;
total size is 185,612,475  speedup is 1.00&lt;br /&gt;
[maltfield@opensourceecology ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Tue Dec 10, 2024=&lt;br /&gt;
&lt;br /&gt;
# last week we got an email from the let&#039;s encrypt expiry bot saying that our cert is going to expire soon&lt;br /&gt;
# we don&#039;t normally get these, so it stood-out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hello,&lt;br /&gt;
&lt;br /&gt;
Your certificate (or certificates) for the names listed below will expire in 19 days (on 2024-12-24). Please make sure to renew your certificate before then, or visitors to your web site will encounter errors.&lt;br /&gt;
&lt;br /&gt;
We recommend renewing certificates automatically when they have a third of their total lifetime left. For Let&#039;s Encrypt&#039;s current 90-day certificates, that means renewing 30 days before expiration. See https://letsencrypt.org/docs/integration-guide/ for details.&lt;br /&gt;
&lt;br /&gt;
awstats.openbuildinginstitute.org&lt;br /&gt;
awstats.opensourceecology.org&lt;br /&gt;
fef.opensourceecology.org&lt;br /&gt;
forum.opensourceecology.org&lt;br /&gt;
microfactory.opensourceecology.org&lt;br /&gt;
munin.opensourceecology.org&lt;br /&gt;
openbuildinginstitute.org&lt;br /&gt;
opensourceecology.org&lt;br /&gt;
oswh.opensourceecology.org&lt;br /&gt;
phplist.opensourceecology.org&lt;br /&gt;
seedhome.openbuildinginstitute.org&lt;br /&gt;
staging.opensourceecology.org&lt;br /&gt;
store.opensourceecology.org&lt;br /&gt;
wiki.opensourceecology.org&lt;br /&gt;
www.openbuildinginstitute.org&lt;br /&gt;
www.opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
For details about when we send these emails, please visit: https://letsencrypt.org/docs/expiration-emails/ In particular, note that this reminder email is still sent if you&#039;ve obtained a slightly different certificate by adding or removing names. If you&#039;ve replaced this certificate with a newer one that covers more or fewer names than the list above, you may be able to ignore this message.&lt;br /&gt;
&lt;br /&gt;
For any questions or support, please visit: https://community.letsencrypt.org/ Unfortunately, we can&#039;t provide support by email.&lt;br /&gt;
&lt;br /&gt;
To learn more about the latest technical and organizational updates from Let&#039;s Encrypt, sign up for our newsletter: https://letsencrypt.org/opt-in/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# I checked the website in my browser and confirmed that I see a cert that says it&#039;s going to expire soon&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Not Before Wed, 25 Sep 2024 17:04:09 GMT&lt;br /&gt;
Not After Tue, 24 Dec 2024 17:04:08 GMT&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# It looks like here&#039;s the cron file for the cert renewal&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# cat /etc/cron.d/letsencrypt &lt;br /&gt;
# once a month, update our letsencrypt cert&lt;br /&gt;
20 4 13 * * root /root/bin/letsencrypt/renew.sh &amp;amp;&amp;gt;&amp;gt; /var/log/letsEncryptRenew.log&lt;br /&gt;
[root@opensourceecology log]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# the latest log entry isn&#039;t dated, but the last time it ran, it appears to have decided that Dec 24 was too far away&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
The following certificates are not due for renewal yet:&lt;br /&gt;
  /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem expires on 2024-12-24 (skipped)&lt;br /&gt;
  /etc/letsencrypt/live/opensourceecology.org/fullchain.pem expires on 2024-12-24 (skipped)&lt;br /&gt;
No renewals were attempted.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# It looks like the last time cron mentions it being executed was in 2024-11-13, which makes sense&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology log]# grep letsencrypt cron*&lt;br /&gt;
[root@opensourceecology log]# zgrep letsencrypt cron*&lt;br /&gt;
cron-20241113.gz:Nov 13 04:20:01 opensourceecology CROND[1103]: (root) CMD (/root/bin/letsencrypt/renew.sh &amp;amp;&amp;gt;&amp;gt; /var/log/letsEncryptRenew.log)&lt;br /&gt;
[root@opensourceecology log]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# oh, checking the cron again, it is set to run only on the 13th of every month&lt;br /&gt;
# I checked our new server, which has this (installed by debian)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # cat /etc/cron.d/certbot &lt;br /&gt;
# /etc/cron.d/certbot: crontab entries for the certbot package&lt;br /&gt;
#&lt;br /&gt;
# Upstream recommends attempting renewal twice a day&lt;br /&gt;
#&lt;br /&gt;
# Eventually, this will be an opportunity to validate certificates&lt;br /&gt;
# haven&#039;t been revoked, etc.  Renewal will only occur if expiration&lt;br /&gt;
# is within 30 days.&lt;br /&gt;
#&lt;br /&gt;
# Important Note!  This cronjob will NOT be executed if you are&lt;br /&gt;
# running systemd as your init system.  If you are running systemd,&lt;br /&gt;
# the cronjob.timer function takes precedence over this cronjob.  For&lt;br /&gt;
# more details, see the systemd.timer manpage, or use systemctl show&lt;br /&gt;
# certbot.timer.&lt;br /&gt;
SHELL=/bin/sh&lt;br /&gt;
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin&lt;br /&gt;
&lt;br /&gt;
0 */12 * * * root test -x /usr/bin/certbot -a \! -d /run/systemd/system &amp;amp;&amp;amp; perl -e &#039;sleep int(rand(43200))&#039; &amp;amp;&amp;amp; certbot -q renew --no-random-sleep-on-renew&lt;br /&gt;
root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# wow, so certbot says that let&#039;s encrypt says that we should check twice per day. then once per month seems ridiculous&lt;br /&gt;
# I went ahead and changed it to do once per day&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology cron.d]# vim letsencrypt &lt;br /&gt;
[root@opensourceecology cron.d]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology cron.d]# cat letsencrypt &lt;br /&gt;
# once a month, update our letsencrypt cert&lt;br /&gt;
20 4 * * * root /root/bin/letsencrypt/renew.sh &amp;amp;&amp;gt;&amp;gt; /var/log/letsEncryptRenew.log&lt;br /&gt;
[root@opensourceecology cron.d]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and I gave it a manual run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology cron.d]# /root/bin/letsencrypt/renew.sh&lt;br /&gt;
...&lt;br /&gt;
Cert is due for renewal, auto-renewing...&lt;br /&gt;
Plugins selected: Authenticator webroot, Installer None&lt;br /&gt;
Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org&lt;br /&gt;
Renewing an existing certificate for fef.opensourceecology.org and 11 more domains&lt;br /&gt;
Performing the following challenges:&lt;br /&gt;
http-01 challenge for awstats.opensourceecology.org&lt;br /&gt;
http-01 challenge for fef.opensourceecology.org&lt;br /&gt;
http-01 challenge for forum.opensourceecology.org&lt;br /&gt;
http-01 challenge for microfactory.opensourceecology.org&lt;br /&gt;
http-01 challenge for munin.opensourceecology.org&lt;br /&gt;
http-01 challenge for opensourceecology.org&lt;br /&gt;
http-01 challenge for oswh.opensourceecology.org&lt;br /&gt;
http-01 challenge for phplist.opensourceecology.org&lt;br /&gt;
http-01 challenge for staging.opensourceecology.org&lt;br /&gt;
http-01 challenge for store.opensourceecology.org&lt;br /&gt;
http-01 challenge for wiki.opensourceecology.org&lt;br /&gt;
http-01 challenge for www.opensourceecology.org&lt;br /&gt;
Using the webroot path /var/www/html/staging.opensourceecology.org/htdocs for all unmatched domains.&lt;br /&gt;
Waiting for verification...&lt;br /&gt;
Cleaning up challenges&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -&lt;br /&gt;
new certificate deployed without reload, fullchain is&lt;br /&gt;
/etc/letsencrypt/live/opensourceecology.org/fullchain.pem&lt;br /&gt;
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -&lt;br /&gt;
Congratulations, all renewals succeeded: &lt;br /&gt;
  /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem (success)&lt;br /&gt;
  /etc/letsencrypt/live/opensourceecology.org/fullchain.pem (success)&lt;br /&gt;
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -&lt;br /&gt;
Redirecting to /bin/systemctl reload nginx.service&lt;br /&gt;
[root@opensourceecology cron.d]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I confirmed that the cert has been updated&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp2766:~$ echo -n | openssl s_client -showcerts -connect opensourceecology.org:443 | sed -ne &#039;/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p&#039; &amp;gt; mycert.pem&lt;br /&gt;
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1&lt;br /&gt;
verify return:1&lt;br /&gt;
depth=1 C = US, O = Let&#039;s Encrypt, CN = R10&lt;br /&gt;
verify return:1&lt;br /&gt;
depth=0 CN = fef.opensourceecology.org&lt;br /&gt;
verify return:1&lt;br /&gt;
DONE&lt;br /&gt;
user@disp2766:~$ openssl x509 -text -in mycert.pem | less&lt;br /&gt;
user@disp2766:~$ openssl x509 -text -in mycert.pem | grep &#039;Not &#039;&lt;br /&gt;
			Not Before: Dec 10 23:19:37 2024 GMT&lt;br /&gt;
			Not After : Mar 10 23:19:36 2025 GMT&lt;br /&gt;
user@disp2766:~$ &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it looks like nothing was broken here, we just hadn&#039;t previously been getting these alerts – and this is the first time we&#039;ve gotten this alert since I&#039;ve setup the google group to forward admin emails to members of the google group (incuding marcin and I)&lt;br /&gt;
# I wrote Marcin to let him know the issue is not an issue, but I&#039;ve updated the cron anyway&lt;br /&gt;
# ...&lt;br /&gt;
# returning to the work of the hetzner3 project on my last work day (2024-10-07), I checked up on hetzner2&#039;s use of the oshine plugin, which I thought was in-use in the following sites:&lt;br /&gt;
## obi&lt;br /&gt;
## osemain&lt;br /&gt;
## microfactory&lt;br /&gt;
## store&lt;br /&gt;
# I confirmed that we are using the &#039;oshine&#039; theme on obi, microfactory, and store, but osemain is using Enegmatic&lt;br /&gt;
# iirc, Catarina asked me to switch osemain to oshine.&lt;br /&gt;
# most importantly, we only have two licenses for oshine.&lt;br /&gt;
## store was never setup&lt;br /&gt;
## microfactory&#039;s last blog post was for an event in Feb 2019&lt;br /&gt;
## obi&#039;s last workshop listing is from 2014, so that&#039;s even 10 years old&lt;br /&gt;
# I sent an email to Catarina and Marcin asking how they want to allocate the license they *do* have with these sites on the new server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Catarina &amp;amp; Marcin,&lt;br /&gt;
&lt;br /&gt;
I think we don&#039;t have enough theme licenses for our wordpress sites.&lt;br /&gt;
&lt;br /&gt;
When we last visited this in August, I downloaded:&lt;br /&gt;
&lt;br /&gt;
 [a] two licenses of oshine&lt;br /&gt;
&lt;br /&gt;
 [b] one license of enigmatic&lt;br /&gt;
&lt;br /&gt;
Unfortunately, you have three sites using oshine:&lt;br /&gt;
&lt;br /&gt;
 [1] openbuildinginstitute.org&lt;br /&gt;
&lt;br /&gt;
 [2] microfactory.opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
 [3] store.opensourceecology.org&lt;br /&gt;
&lt;br /&gt;
I think you also mentioned that you might want to switch www.opensourceecology.org to Oshine? Well we definitely don&#039;t have a license for that.&lt;br /&gt;
&lt;br /&gt;
I tried to find some way to copy the license keys from hetzner2 (our old/live prod server), but I couldn&#039;t find a way to view the currently-used license keys for each site. And when I install the latest version of the Oshine theme on the new servers, it requests a license key for each site.&lt;br /&gt;
&lt;br /&gt;
If you&#039;d like, I can just setup:&lt;br /&gt;
&lt;br /&gt;
 * openbuildinginstitute.org with oshine #1&lt;br /&gt;
&lt;br /&gt;
 * microfactory.opensourceecology.org with oshine #2&lt;br /&gt;
&lt;br /&gt;
 * www.opensourceecology.org with enigmatic&lt;br /&gt;
&lt;br /&gt;
 * change store.opensourceecology.org to use some free theme like twentytwentyfour&lt;br /&gt;
&lt;br /&gt;
Or you could buy more licenses and I&#039;ll set up the sites with those.&lt;br /&gt;
&lt;br /&gt;
Please let me know how you want to allocate your wordpress licenses for your sites.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you, &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah, shortly after I sent that email, I found a third oshine license in my files&lt;br /&gt;
# so the only issue is if we still want to use oshine for www.opensourceecology.org – then we&#039;d need to buy a new license or take the one away from store.opensourceecology.org&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Update: sorry, I found the third oshine license:&lt;br /&gt;
&lt;br /&gt;
1. Purchased 2016 by &amp;quot;Catarina Mota&amp;quot;&lt;br /&gt;
&lt;br /&gt;
2. Purchased 2018 by &amp;quot;Open Source Ecology&amp;quot;&lt;br /&gt;
&lt;br /&gt;
3. Purchased 2019 by &amp;quot;Catarina Mota&amp;quot;&lt;br /&gt;
&lt;br /&gt;
So I think the only issue is that we&#039;d have to either buy a new license for www.opensourceecology.org, if that&#039;s a change you&#039;d like to make.&lt;br /&gt;
&lt;br /&gt;
Or use the license that&#039;s currently in-use by store.opensourceecology.org for www.opensourceecology.org and set store.opensourceecology.org to some free theme.&lt;br /&gt;
&lt;br /&gt;
Please let me know how I should proceed in allocating these oshine licenses. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ...&lt;br /&gt;
# anyway, until we hear back about how to handle these oshine licenses, I think we&#039;re basically blocked from store.opensourceecology.org. I don&#039;t want to associate one of our precious license on the newly installed server&#039;s wordpress config if we&#039;re not going to use it.&lt;br /&gt;
# refresher: here&#039;s our site migration order list&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1. forum.opensourceecology.org&lt;br /&gt;
2. store.opensourceecology.orgc&lt;br /&gt;
3. microfactory.opensourceecology.org&lt;br /&gt;
4. fef.opensourceecology.org&lt;br /&gt;
5. oswh.opensourceecology.org&lt;br /&gt;
6. seedhome.openbuildinginstitute.org&lt;br /&gt;
7. www.openbuildinginstitute.org&lt;br /&gt;
8. www.opensourceecology.org&lt;br /&gt;
9. phplist.opensourceecology.org&lt;br /&gt;
10. wiki.opensourceecology.org&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well next is microfatory, which is yet-another oshine ambiguity for now&lt;br /&gt;
# after that we have fef&lt;br /&gt;
# I logged into the fef wp admin wui dashboard https://fef.opensourceecology.org/&lt;br /&gt;
## I confirmed that fef is using some theme called &amp;quot;Simple Photo Responsive&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Mon Oct 07, 2024=&lt;br /&gt;
# I installed the latest version of the &#039;oshine&#039; theme to the store.opensourceecology.org wordpress site&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
rsync -av --progress /var/tmp/wordpress/themes/oshin /var/www/html/store.opensourceecology.org/htdocs/wp-content/themes/&lt;br /&gt;
&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
	chown not-apache:apache-admins &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
	chmod 0040 &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, after that loading it in the browser still yeilds a blank page, but it&#039;s just because it&#039;s cached https://store.opensourceecology.org/&lt;br /&gt;
# if I just append a bullshit GET variable on the end, then it loads https://store.opensourceecology.org/?nocache=2&lt;br /&gt;
# the site is finally loading, but it&#039;s all fucked&lt;br /&gt;
# I&#039;m not seeing any 403 errors in the network tab of firefox on-load, so I&#039;m pretty sure the issue is just missing plugins&lt;br /&gt;
# for example, there&#039;s a bunch of shortcodes being displayed raw, like this one at the top&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[tatsu_section bg_color= “rgba(29,29,29,1)” bg_image= “http://brandexponents.com/oshine-lite/v37/wp-content/uploads/sites/44/2018/02/home-hero.jpeg” bg_repeat= “no-repeat” bg_attachment= “scroll” bg_position= “center center” bg_size= “cover” bg_animation= “none” padding= ‘{“d”:”200px 0% 200px 0% “}’ margin= “0px 0px 0px 0px” border= “0px 0px px 0px” border_color= “” bg_video= “0” bg_video_mp4_src= “” bg_video_ogg_src= “” bg_video_webm_src= “” bg_overlay= “1” overlay_color= “rgba(0,0,0,0.1)” full_screen= “1” section_id= “” section_class= “” section_title= “” offset_section= “” offset_value= “0” full_screen_header_scheme= “background–dark” hide_in= “0” bg_stretch= “1” key= “REDACTED”]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, if I login to the WUI and go to appearance -&amp;gt; themes, now it does show the oshine theme https://store.opensourceecology.org/wp-admin/themes.php&lt;br /&gt;
# I deactivated it and reactivated it, and I got a message at the top&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
This theme requires the following plugins: BE Portfolio Post Type, Meta Box Conditional Logic, Meta Box Show Hide, Meta Box Tabs, Oshine Core, Oshine Modules and Tatsu. This theme recommends the following plugins: BE GDPR, Master Slider, Safe SVG, Slider Revolution and WPForms Lite. Begin installing plugins | Dismiss this notice &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# If I click to customize the theme, it has a tab &amp;quot;Install Plugins&amp;quot;&lt;br /&gt;
# If I click the &amp;quot;Install Plugins&amp;quot; tab, it yells at me with big red text&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Please provide a valid purchase code of the theme in order to install plugins and import demo&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# on another DispVM, I logged into the hetzner2 store.opensourceecology.org&lt;br /&gt;
# clicked appearance -&amp;gt; themes -&amp;gt; oshin -&amp;gt; customize&lt;br /&gt;
## no, that didn&#039;t work&lt;br /&gt;
# clicked &amp;quot;Oshine Options&amp;quot; in the left-hand navbar -&amp;gt; &lt;br /&gt;
## no, I went through all the settings and couldn&#039;t find it there either :(&lt;br /&gt;
# I mean, I have the keys already downloaded, but I&#039;d like to keep them consistent. We have two keys and I don&#039;t know which was used for store.opensourceecology.org. This is dumb. Why do they make it so hard to find?&lt;br /&gt;
# I tried pulling it out of the DB, but I didn&#039;t find anything obvious in the options table named &amp;quot;*oshine*&amp;quot;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [store_db]&amp;gt; select * from wp_options where option_name like &#039;%oshine%&#039; limit 100;&lt;br /&gt;
+-----------+---------------------------------+---------------------------------------------------------------------------------------------------+----------+&lt;br /&gt;
| option_id | option_name                     | option_value                                                                                      | autoload |&lt;br /&gt;
+-----------+---------------------------------+---------------------------------------------------------------------------------------------------+----------+&lt;br /&gt;
|       347 | external_updates-oshine-core    | O:8:&amp;quot;stdClass&amp;quot;:3:{s:9:&amp;quot;lastCheck&amp;quot;;i:1728339766;s:14:&amp;quot;checkedVersion&amp;quot;;s:5:&amp;quot;1.3.7&amp;quot;;s:6:&amp;quot;update&amp;quot;;N;} | no       |&lt;br /&gt;
|       349 | external_updates-oshine-modules | O:8:&amp;quot;stdClass&amp;quot;:3:{s:9:&amp;quot;lastCheck&amp;quot;;i:1728339766;s:14:&amp;quot;checkedVersion&amp;quot;;s:5:&amp;quot;2.2.9&amp;quot;;s:6:&amp;quot;update&amp;quot;;N;} | no       |&lt;br /&gt;
|       352 | oshine_redux_to_colorhub        | 1                                                                                                 | yes      |&lt;br /&gt;
|       355 | oshine_redux_to_typehub         | 1                                                                                                 | yes      |&lt;br /&gt;
+-----------+---------------------------------+---------------------------------------------------------------------------------------------------+----------+&lt;br /&gt;
4 rows in set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [store_db]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I realized it&#039;s probably easier to just search the mysqldump file&lt;br /&gt;
## got it!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/hetzner2-www-20240926/root/backups/sync/daily_hetzner2_20240926_072001/mysqldump # grep -ir &#039;purchase&#039; mysqldump.20240926_072001b.sql | grep -ir code | less&lt;br /&gt;
...&lt;br /&gt;
193,&#039;be_themes_purchase_data&#039;,&#039;a:2:{s:8:\&amp;quot;last_tab\&amp;quot;;s:0:\&amp;quot;\&amp;quot;;s:19:\&amp;quot;theme_purchase_code\&amp;quot;;s:36:\&amp;quot;REDACTED\&amp;quot;;}&#039;,&#039;yes&#039;),(194,&#039;be_themes_purchase_data-transients&#039;,&#039;a:2:{s:14:\&amp;quot;changed_values\&amp;quot;;a:0:{}s:9:\&amp;quot;last_save\&amp;quot;;i:1471011372;}&#039;,&#039;yes&#039;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# actually, that still doesn&#039;t tell me which server it is&lt;br /&gt;
# I think that&#039;s the wrong site&#039;s db, because I see nothing when I query just the store wordpress db&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
MariaDB [store_db]&amp;gt; select * from wp_options where option_name like &#039;%be%&#039; and option_value like &#039;%theme_purchase_code%&#039; limit 100;&lt;br /&gt;
Empty set (0.00 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [store_db]&amp;gt; select * from wp_options where option_value like &#039;%theme_purchase_code%&#039; limit 100;&lt;br /&gt;
Empty set (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [store_db]&amp;gt; select * from wp_options where option_value like &#039;%theme_purchase_data%&#039; limit 100;&lt;br /&gt;
Empty set (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [store_db]&amp;gt; select * from wp_options where option_value like &#039;%themes_purchase_data%&#039; limit 100;&lt;br /&gt;
Empty set (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [store_db]&amp;gt; select * from wp_options where option_value like &#039;%theme_purchase%&#039; limit 100;&lt;br /&gt;
Empty set (0.01 sec)&lt;br /&gt;
&lt;br /&gt;
MariaDB [store_db]&amp;gt; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I exported the data from the hetzner2 store theme too, but it wasn&#039;t there&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp928:~/Downloads$ du -sh *&lt;br /&gt;
28K	redux_options_be_themes_data_backup_07-10-2024.json&lt;br /&gt;
user@disp928:~/Downloads$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# fuck it, I&#039;m just going to use these alphabetically&lt;br /&gt;
# which sites use this?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology hetzner3]# nice find /var/www/html -type d -iname oshin&lt;br /&gt;
/var/www/html/www.openbuildinginstitute.org/htdocs/wp-content/themes/oshin&lt;br /&gt;
/var/www/html/d3d.opensourceecology.org/htdocs/wp-content/themes/oshine_6.5/Oshine Buyers Package 6.5/oshin&lt;br /&gt;
/var/www/html/d3d.opensourceecology.org/htdocs/wp-content/themes/oshin&lt;br /&gt;
/var/www/html/microfactory.opensourceecology.org/htdocs/wp-content/themes/oshin&lt;br /&gt;
/var/www/html/staging.openbuildinginstitute.org/htdocs/wp-content/themes/oshin&lt;br /&gt;
/var/www/html/store.opensourceecology.org/htdocs/wp-content/themes/oshin&lt;br /&gt;
/var/www/html/3dp.opensourceecology.org/htdocs/wp-content/themes/oshin&lt;br /&gt;
[root@opensourceecology hetzner3]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# d3d and 3dp are both broken with cert errors right now&lt;br /&gt;
# right, my notes say these were two sites that marcin abandoned domain naames for. we eventually built microfactory.opensourceecology.org instead&lt;br /&gt;
## &lt;br /&gt;
## &lt;br /&gt;
## &lt;br /&gt;
# ugh, power went out&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Sun Oct 06, 2024=&lt;br /&gt;
# I checked on the status of the inventory job of our &#039;deleteMeIn2020&#039; galcier vault; looks like it&#039;s still unavailable. Guess I&#039;ll give it a week or so before trying again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp8678:~$  aws configure set aws_access_key_id &#039;REDACTED&amp;quot;&lt;br /&gt;
user@disp8678:~$  &lt;br /&gt;
&lt;br /&gt;
user@disp8678:~$  aws configure set aws_secret_access_key &#039;REDACTED&#039;&lt;br /&gt;
user@disp8678:~$ &lt;br /&gt;
&lt;br /&gt;
user@disp8678:~$ aws glacier get-job-output --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020 --job-id &amp;quot;ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&amp;quot; ./output.json&lt;br /&gt;
&lt;br /&gt;
An error occurred (ResourceNotFoundException) when calling the GetJobOutput operation: The job ID was not found: ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&lt;br /&gt;
user@disp8678:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
# I returned to work on fixing the vhost config to permit traffic to wp-config.php temporarily, but I kept getting 429 errors from wordpress.org &lt;br /&gt;
# This has been a frustrating, recurring issue for many months. I finally filed a bug report https://meta.trac.wordpress.org/ticket/7792#ticket&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Title: Too Many &amp;quot;429 Too Many Requests&amp;quot; Errors (Nginx Misconfiguration causing False-Positives)&lt;br /&gt;
&lt;br /&gt;
 Since the past ~6 months, I have been frequently unable to access content on wordpress.org&lt;br /&gt;
&lt;br /&gt;
If I&#039;m lucky, then when I&#039;m browsing wordpress documentation pages, I&#039;m able to load the main html file with the content, but the website is horribly mis-rendered because many dependent assets don&#039;t load (eg css files, images, javascript, etc) due to &amp;quot;429 Too Many Requests&amp;quot; errors.&lt;br /&gt;
&lt;br /&gt;
If I&#039;m unlucky, even the main page doesn&#039;t load load at all -- due to &amp;quot;429 Too Many Requests&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
Usually, I start-off being able to load one or more pages, but as I click around the website trying to find the page that I need, I eventually get this error.&lt;br /&gt;
&lt;br /&gt;
I am not a bot. I am a human. I&#039;m just trying to load reference documentation as I develop a wordpress plugin. This has been extremely frustrating, and forced me to third party websites and to &amp;quot;guess&amp;quot; php functions, attributes, and return values as I&#039;m developing, reducing my productivity.&lt;br /&gt;
&lt;br /&gt;
Since the Snowden revelations of 2013, it&#039;s become clear that many at-risk users should not be using the Internet without using privacy-protections like Tor. For security and privacy reasons, I do not access the internet without passing my traffic through Tor or a VPN. To prevent discrimination against at-risk folks, it&#039;s important that WordPress servers do not block traffic from shared networks, such as VPNs or Tor exit nodes.&lt;br /&gt;
&lt;br /&gt;
It appears that nginx&#039;s settings are too strict, and lots of good users are getting caught in the dragnet.&lt;br /&gt;
&lt;br /&gt;
Whatever the current nginx config is, please double it to fix these false-positives.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# alright, I updated the apache config and pushed it with ansible&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ git diff&lt;br /&gt;
diff --git a/hetzner3/roles/maltfield.apache/templates/security.virtualhost.include.j2 b/hetzner3/roles/maltfield.apache/templates/security.virtualhost.include.j2&lt;br /&gt;
index c0575a3..c413c74 100644&lt;br /&gt;
--- a/hetzner3/roles/maltfield.apache/templates/security.virtualhost.include.j2&lt;br /&gt;
+++ b/hetzner3/roles/maltfield.apache/templates/security.virtualhost.include.j2&lt;br /&gt;
@@ -2,12 +2,12 @@&lt;br /&gt;
 &lt;br /&gt;
 ################################################################################&lt;br /&gt;
 # File:    security.virtualhost.include&lt;br /&gt;
-# Version: 0.2&lt;br /&gt;
+# Version: 0.3&lt;br /&gt;
 # Purpose: File includes some common security-hardening that&#039;s intended to be&lt;br /&gt;
 #          Include()d into other vhost files&#039; &amp;lt;VirtualHost&amp;gt; blocks&lt;br /&gt;
 # Author:  Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
 # Created: 2024-09-14&lt;br /&gt;
-# Updated: 2024-09-24&lt;br /&gt;
+# Updated: 2024-10-06&lt;br /&gt;
 ################################################################################&lt;br /&gt;
 &lt;br /&gt;
        # don&#039;t execute any php files inside uploads directories&lt;br /&gt;
@@ -56,7 +56,10 @@&lt;br /&gt;
 &lt;br /&gt;
    # block access to &#039;wp-login.php&#039; from brute-forcers;&lt;br /&gt;
        # see wp plugin &#039;rename-wp-login&#039;&lt;br /&gt;
-   &amp;lt;LocationMatch &amp;quot;.*wp-login.php&amp;quot;&amp;gt;&lt;br /&gt;
-               Require all denied&lt;br /&gt;
-   &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
+       # TODO: 2024-10: we need to re-enable this after we find a replacement for the&lt;br /&gt;
+       #                (now-deprecated) &#039;rename-wp-login&#039; wordpress plugin&lt;br /&gt;
+       #               * https://wordpress.org/plugins/rename-wp-login/&lt;br /&gt;
+#   &amp;lt;LocationMatch &amp;quot;.*wp-login.php&amp;quot;&amp;gt;&lt;br /&gt;
+#              Require all denied&lt;br /&gt;
+#   &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now I&#039;m able to load the login page, but when I do, I still get 403 errors on a few of the dependent requests&lt;br /&gt;
## https://store.opensourceecology.org/wp-admin/images/wordpress-logo.svg?ver=20131107&lt;br /&gt;
## https://store.opensourceecology.org/wp-includes/images/w-logo-blue-white-bg.png&lt;br /&gt;
# the mod_security log shows the 403 response for these images, but I&#039;m not sure why it&#039;s happening&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/log/apache2 # cat modsec_audit.log&lt;br /&gt;
...&lt;br /&gt;
--69ede32a-H--&lt;br /&gt;
Apache-Error: [file &amp;quot;mod_authz_core.c&amp;quot;] [line 879] [level 3] AH01630: client denied by server configuration: /var/www/html/store.opensourceecology.org/htdocs/wp-admin/images/wordpress-logo.svg&lt;br /&gt;
Stopwatch: 1728256656835561 1333 (- - -)&lt;br /&gt;
Stopwatch2: 1728256656835561 1333; combined=33, p1=31, p2=0, p3=1, p4=0, p5=1, sr=0, sw=0, l=0, gc=0&lt;br /&gt;
Response-Body-Transformed: Dechunked&lt;br /&gt;
Producer: ModSecurity for Apache/2.9.7 (http://www.modsecurity.org/).&lt;br /&gt;
Server: Apache&lt;br /&gt;
Engine-Mode: &amp;quot;ENABLED&amp;quot;&lt;br /&gt;
&lt;br /&gt;
--69ede32a-Z--&lt;br /&gt;
...&lt;br /&gt;
--6a3d866d-H--&lt;br /&gt;
Apache-Error: [file &amp;quot;mod_authz_core.c&amp;quot;] [line 879] [level 3] AH01630: client denied by server configuration: /var/www/html/store.opensourceecology.org/htdocs/wp-includes/images/w-logo-blue-white-bg.png&lt;br /&gt;
Stopwatch: 1728256938397718 1105 (- - -)&lt;br /&gt;
Stopwatch2: 1728256938397718 1105; combined=27, p1=23, p2=0, p3=0, p4=0, p5=3, sr=0, sw=1, l=0, gc=0&lt;br /&gt;
Response-Body-Transformed: Dechunked&lt;br /&gt;
Producer: ModSecurity for Apache/2.9.7 (http://www.modsecurity.org/).&lt;br /&gt;
Server: Apache&lt;br /&gt;
Engine-Mode: &amp;quot;ENABLED&amp;quot;&lt;br /&gt;
&lt;br /&gt;
--6a3d866d-Z--&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, I wonder if this is it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ grep -irC4 &#039;require all&#039; | less&lt;br /&gt;
...&lt;br /&gt;
roles/maltfield.apache/templates/security.virtualhost.include.j2-       &amp;lt;LocationMatch &amp;quot;/images/&amp;quot;&amp;gt;&lt;br /&gt;
roles/maltfield.apache/templates/security.virtualhost.include.j2-               SetHandler !&lt;br /&gt;
roles/maltfield.apache/templates/security.virtualhost.include.j2:               Require all denied&lt;br /&gt;
roles/maltfield.apache/templates/security.virtualhost.include.j2-       &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so one thing we changed from hetzner2 was the logic for preventing php scripts from being executed inside user-uploadable directories.&lt;br /&gt;
# in hetzner2, we used mod_php, so this was done with &#039;php_flag engine off&#039; -- but that doesn&#039;t work with php-fpm&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
        # don&#039;t execute any php files inside the uploads directory&lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;/wp-content/uploads/&amp;quot;&amp;gt;&lt;br /&gt;
                php_flag engine off&lt;br /&gt;
        &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;/wp-content/uploads/.*(?i)\.(cgi|shtml|php3?|phps|phtml)$&amp;quot;&amp;gt;&lt;br /&gt;
                Order Deny,Allow&lt;br /&gt;
                Deny from All&lt;br /&gt;
        &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
&lt;br /&gt;
        # block dot files, such as svn files from checking out wp core&lt;br /&gt;
        &amp;lt;LocationMatch .*\.(svn|git|hg|bzr|cvs|ht)/.*&amp;gt;&lt;br /&gt;
                Deny From All&lt;br /&gt;
        &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
&lt;br /&gt;
        # block access to &#039;wp-login.php&#039; from brute-forcers; see wp plugin &#039;rename-wp-login&#039;&lt;br /&gt;
        &amp;lt;LocationMatch .*wp-login.php&amp;gt;&lt;br /&gt;
                Deny From All&lt;br /&gt;
        &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I think the idea of &amp;quot;Require all denied&amp;quot; was that it would capture anything that wasn&#039;t already sent-off to the php-fpm server proxy. Basically because that should be captured first, and then if apache still sees it, we should deny access to it&lt;br /&gt;
# this assumes that the filename ends with php (or our more complex regex above), but that logic doesn&#039;t translate for the whole /images/ directory&lt;br /&gt;
# alright, I made these changes, which fixed it. basically we want to &amp;quot;SetHandler !&amp;quot; on everything, but only &amp;quot;Require all denied&amp;quot; for the .php files&lt;br /&gt;
## Note: I wasn&#039;t able to figure out what &amp;quot;SetHandler !&amp;quot; does. Bing/ddg returns no results. And Google just ignores any queries with an exclamation mark in them. It&#039;s literally not possible to search-for. But I did find lots of results asking about how to use SetHandler in Apache to point to php-fpm, so my best-guess is that this sets the Handler to &#039;null&#039; or something, which would overwrite any previous setting that told it to send it to some other cgi proxy or something &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
        # don&#039;t execute any php files inside uploads directories&lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;/wp-content/uploads/&amp;quot;&amp;gt;&lt;br /&gt;
                SetHandler !&lt;br /&gt;
-               Require all denied&lt;br /&gt;
        &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;/wp-content/uploads/.*(?i)\.(cgi|shtml|php3?|phps|phtml)$&amp;quot;&amp;gt;&lt;br /&gt;
                Require all denied&lt;br /&gt;
@@ -21,7 +20,6 @@&lt;br /&gt;
 &lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;/uploadimages/&amp;quot;&amp;gt;&lt;br /&gt;
                SetHandler !&lt;br /&gt;
-               Require all denied&lt;br /&gt;
        &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;/uploadimages/.*(?i)\.(cgi|shtml|php3?|phps|phtml)$&amp;quot;&amp;gt;&lt;br /&gt;
                Require all denied&lt;br /&gt;
@@ -29,7 +27,6 @@&lt;br /&gt;
 &lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;/images/&amp;quot;&amp;gt;&lt;br /&gt;
                SetHandler !&lt;br /&gt;
-               Require all denied&lt;br /&gt;
        &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;/images/.*(?i)\.(cgi|shtml|php3?|phps|phtml)$&amp;quot;&amp;gt;&lt;br /&gt;
                Require all denied&lt;br /&gt;
@@ -38,7 +35,6 @@&lt;br /&gt;
        # don&#039;t execute php files in W3 Total Cache&#039;s tmp dir&lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;/wp-content/cache/&amp;quot;&amp;gt;&lt;br /&gt;
                SetHandler !&lt;br /&gt;
-               Require all denied&lt;br /&gt;
        &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;/wp-content/cache/.*(?i)\.(cgi|shtml|php3?|phps|phtml)$&amp;quot;&amp;gt;&lt;br /&gt;
                Require all denied&lt;br /&gt;
@@ -46,17 +42,22 @@&lt;br /&gt;
 &lt;br /&gt;
        # block dot (hidden) files&lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;/\.(?!well\-known)&amp;quot;&amp;gt;&lt;br /&gt;
+               SetHandler !&lt;br /&gt;
                Require all denied&lt;br /&gt;
        &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
        # block config files&lt;br /&gt;
        &amp;lt;LocationMatch &amp;quot;config.php&amp;quot;&amp;gt;&lt;br /&gt;
+               SetHandler !&lt;br /&gt;
                Require all denied&lt;br /&gt;
        &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
    # block access to &#039;wp-login.php&#039; from brute-forcers;&lt;br /&gt;
        # see wp plugin &#039;rename-wp-login&#039;&lt;br /&gt;
-   &amp;lt;LocationMatch &amp;quot;.*wp-login.php&amp;quot;&amp;gt;&lt;br /&gt;
-               Require all denied&lt;br /&gt;
-   &amp;lt;/LocationMatch&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# cool, I was able to login to store.opensourceecology.org on hetzner3 with my old creds now&lt;br /&gt;
# the dashboard is littered with alerts:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Action Scheduler: 3 past-due actions found; something may be wrong. Read documentation »&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
WooCommerce database update required&lt;br /&gt;
&lt;br /&gt;
WooCommerce has been updated! To keep things running smoothly, we have to update your database to the newest version. The database update process runs in the background and may take a little while, so please be patient. Advanced users can alternatively update via WP CLI.&lt;br /&gt;
&lt;br /&gt;
Update WooCommerce Database Learn more about updates&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 Geolocation has not been configured.&lt;br /&gt;
&lt;br /&gt;
You must enter a valid license key on the MaxMind integration settings page in order to use the geolocation service. If you do not need geolocation for shipping or taxes, you should change the default customer location on the general settings page. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
The plugin be-gdpr/be-gdpr.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin be-portfolio-post/be-portfolio-post.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin coingate-for-woocommerce/coingate.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin colorhub/colorhub.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin force-strong-passwords/slt-force-strong-passwords.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin masterslider/masterslider.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin meta-box-conditional-logic/meta-box-conditional-logic.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin meta-box-show-hide/meta-box-show-hide.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin meta-box-tabs/meta-box-tabs.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin oshine-core/oshine-core.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin oshine-modules/oshine-modules.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin redux-vendor-support/redux-vendor-support.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin rename-wp-login/rename-wp-login.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin revslider/revslider.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin tatsu/tatsu.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&lt;br /&gt;
The plugin typehub/typehub.php has been deactivated due to an error: Plugin file does not exist.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I kicked-off the woocommerce db upgrade&lt;br /&gt;
# ugh, akismet isn&#039;t activated. we have 2,213 comments in the queue&lt;br /&gt;
# if I click on &#039;themes&#039; in the wui, then I get a notice at the top&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
The active theme is broken. Reverting to the default theme.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it says that &#039;oshine&#039; is the active theme&lt;br /&gt;
# allright, I downloaded these files before&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~/tmp/hetzner3$ ls&lt;br /&gt;
13757819-enigmatic-responsive-multipurpose-wp-theme-license.txt&lt;br /&gt;
28755060-oshine-creative-multipurpose-wordpress-theme-license.txt&lt;br /&gt;
47932235-oshine-creative-multipurpose-wordpress-theme-license.txt&lt;br /&gt;
52287820-oshine-creative-multipurpose-wordpress-theme-license.txt&lt;br /&gt;
backup-restore-test&lt;br /&gt;
themeforest-2XwUOcbo-enigmatic-responsive-multipurpose-wp-theme-wordpress-theme.zip&lt;br /&gt;
themeforest-3JjZqZRr-oshine-creative-multipurpose-wordpress-theme-wordpress-theme.zip&lt;br /&gt;
themeforest-4EaAhtH1-oshine-creative-multipurpose-wordpress-theme-wordpress-theme.zip&lt;br /&gt;
user@ose:~/tmp/hetzner3$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# unfortunately, these are paid themes, and I have to coordinate with catarina to get an OTP every time I login, so I can&#039;t 3TOFU these :( I&#039;ll just have to 1TOFU it&lt;br /&gt;
# apparently these two oshine themes have identical contents but different names&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~/tmp/hetzner3$ sha256sum *.zip&lt;br /&gt;
ed0628d0e57bb4e44b1af24eb235c6c384433c9ca94806c11b881e16f7f2b74a  themeforest-2XwUOcbo-enigmatic-responsive-multipurpose-wp-theme-wordpress-theme.zip&lt;br /&gt;
7506d6759ff1ee3f66d6135176537f12067ce86f2d5ba045c125f20df6240789  themeforest-3JjZqZRr-oshine-creative-multipurpose-wordpress-theme-wordpress-theme.zip&lt;br /&gt;
7506d6759ff1ee3f66d6135176537f12067ce86f2d5ba045c125f20df6240789  themeforest-4EaAhtH1-oshine-creative-multipurpose-wordpress-theme-wordpress-theme.zip&lt;br /&gt;
user@ose:~/tmp/hetzner3$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I rsync&#039;d these files up to hetzner3&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@ose:~/tmp/hetzner3$ rsync -av --progress themeforest-2XwUOcbo-enigmatic-responsive-multipurpose-wp-theme-wordpress-theme.zip hetzner3:&lt;br /&gt;
Enter passphrase for key &#039;/home/user/.ssh/id_rsa&#039;: &lt;br /&gt;
Enter passphrase for key &#039;/home/user/.ssh/id_rsa&#039;: &lt;br /&gt;
sending incremental file list&lt;br /&gt;
themeforest-2XwUOcbo-enigmatic-responsive-multipurpose-wp-theme-wordpress-theme.zip&lt;br /&gt;
     10,582,975 100%  318.17kB/s    0:00:32 (xfr#1, to-chk=0/1)&lt;br /&gt;
&lt;br /&gt;
sent 10,585,730 bytes  received 35 bytes  201,633.62 bytes/sec&lt;br /&gt;
total size is 10,582,975  speedup is 1.00&lt;br /&gt;
user@ose:~/tmp/hetzner3$ &lt;br /&gt;
&lt;br /&gt;
user@ose:~/tmp/hetzner3$ rsync -av --progress themeforest-3JjZqZRr-oshine-creative-multipurpose-wordpress-theme-wordpress-theme.zip hetzner3:&lt;br /&gt;
Enter passphrase for key &#039;/home/user/.ssh/id_rsa&#039;: &lt;br /&gt;
sending incremental file list&lt;br /&gt;
themeforest-3JjZqZRr-oshine-creative-multipurpose-wordpress-theme-wordpress-theme.zip&lt;br /&gt;
     11,394,173 100%  996.24kB/s    0:00:11 (xfr#1, to-chk=0/1)&lt;br /&gt;
&lt;br /&gt;
sent 11,397,129 bytes  received 35 bytes  303,924.37 bytes/sec&lt;br /&gt;
total size is 11,394,173  speedup is 1.00&lt;br /&gt;
user@ose:~/tmp/hetzner3$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I copied them over to our other dir with all the themes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # ls /home/maltfield/*.zip&lt;br /&gt;
/home/maltfield/themeforest-2XwUOcbo-enigmatic-responsive-multipurpose-wp-theme-wordpress-theme.zip&lt;br /&gt;
/home/maltfield/themeforest-3JjZqZRr-oshine-creative-multipurpose-wordpress-theme-wordpress-theme.zip&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # rsync -av --progress /home/maltfield/*.zip .&lt;br /&gt;
sending incremental file list&lt;br /&gt;
themeforest-2XwUOcbo-enigmatic-responsive-multipurpose-wp-theme-wordpress-theme.zip&lt;br /&gt;
     10.582.975 100%  670,76MB/s    0:00:00 (xfr#1, to-chk=1/2)&lt;br /&gt;
themeforest-3JjZqZRr-oshine-creative-multipurpose-wordpress-theme-wordpress-theme.zip&lt;br /&gt;
     11.394.173 100%  329,28MB/s    0:00:00 (xfr#2, to-chk=0/2)&lt;br /&gt;
&lt;br /&gt;
sent 21.982.821 bytes  received 54 bytes  43.965.750,00 bytes/sec&lt;br /&gt;
total size is 21.977.148  speedup is 1,00 &lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # shred -u /home/maltfield/*.zip&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # chown root:root themeforest-*&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # chmod 0400 themeforest-*&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Fri Oct 04, 2024=&lt;br /&gt;
# Marcin gave me the go-ahead to delete the &#039;deleteMeIn2020&#039; vault from our AWS Glacier account&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1. Yes, delete the vault.&lt;br /&gt;
&lt;br /&gt;
2. Thanks, good insights - i&#039;ll look into those more closely to see what&lt;br /&gt;
would fit best.&lt;br /&gt;
&lt;br /&gt;
MJ&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah ffs, I logged into the amazon WUI, but when I clicked &amp;quot;delete&amp;quot; on the vault, it gave me an error saying I have to delete all the objects in the vault first&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
This vault is not empty&lt;br /&gt;
Vaults can be deleted only if there are no archives in the vault as of the last inventory it computed and there have been no writes to the vault since the last inventory. To delete all archives in the vault, use the REST API, the AWS SDK for Java, the AWS SDK for .NET or the AWS CLI. &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# apparently this can only be done in the CLI via the API!?! It links to this https://docs.aws.amazon.com/console/glacier/using-aws-sdk&lt;br /&gt;
# we do have some &#039;glacier.py&#039; script on our old server, but it complains about missing module(s)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology backups]# ls&lt;br /&gt;
backup.old.20180115.sh    backup.settings.20221028  glacierRestore        sync&lt;br /&gt;
backupReport.sh           backup.sh                 glacierTest.py        sync.old&lt;br /&gt;
backupReport.sh.20221028  backup.sh.20221028        ose-backups-cron.key&lt;br /&gt;
backup.settings           cleanLocal.pl             README.txt&lt;br /&gt;
[root@opensourceecology backups]#&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology backups]# glacier.py &lt;br /&gt;
Traceback (most recent call last):&lt;br /&gt;
  File &amp;quot;/root/bin/glacier.py&amp;quot;, line 36, in &amp;lt;module&amp;gt;&lt;br /&gt;
    import boto.glacier&lt;br /&gt;
ImportError: No module named boto.glacier&lt;br /&gt;
[root@opensourceecology backups]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# allright, at least debian has the cli in its repos&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp3919:~/Downloads$ apt-cache search awscli&lt;br /&gt;
awscli - Unified command line interface to Amazon Web Services&lt;br /&gt;
user@disp3919:~/Downloads$&lt;br /&gt;
&lt;br /&gt;
user@disp3919:~/Downloads$ sudo apt-get install awscli&lt;br /&gt;
Reading package lists... Done&lt;br /&gt;
Building dependency tree... Done&lt;br /&gt;
Reading state information... Done&lt;br /&gt;
awscli is already the newest version (2.9.19-1).&lt;br /&gt;
The following packages were automatically installed and are no longer required:&lt;br /&gt;
  librnp0 libwpe-1.0-1 libwpebackend-fdo-1.0-1 linux-image-6.1.0-10-amd64&lt;br /&gt;
  linux-image-6.1.0-11-amd64 linux-image-6.1.0-13-amd64&lt;br /&gt;
  linux-image-6.1.0-17-amd64 linux-image-6.1.0-18-amd64&lt;br /&gt;
  linux-image-6.1.0-20-amd64 linux-image-6.1.0-21-amd64&lt;br /&gt;
  linux-image-6.1.0-22-amd64&lt;br /&gt;
Use &#039;sudo apt autoremove&#039; to remove them.&lt;br /&gt;
0 upgraded, 0 newly installed, 0 to remove and 17 not upgraded.&lt;br /&gt;
user@disp3919:~/Downloads$ aws &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I was able to auth with some creds I found on hetzner2:/root/backups/glacierTest.py&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp3919:~/Downloads$  aws configure set aws_access_key_id &#039;REDACTED&#039;&lt;br /&gt;
user@disp3919:~/Downloads$  aws configure set aws_secret_access_key &#039;REDACTED&#039;&lt;br /&gt;
user@disp3919:~/Downloads$ aws sts get-caller-identity&lt;br /&gt;
{&lt;br /&gt;
    &amp;quot;UserId&amp;quot;: &amp;quot;REDACTED&amp;quot;,&lt;br /&gt;
    &amp;quot;Account&amp;quot;: &amp;quot;REDACTED&amp;quot;,&lt;br /&gt;
    &amp;quot;Arn&amp;quot;: &amp;quot;arn:aws:iam::REDACTED:user/backup-cron&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
user@disp3919:~/Downloads$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# apparently we now have to create an inventory and then iterate though that inventory to delete all of the objects that it lists https://gist.github.com/veuncent/ac21ae8131f24d3971a621fac0d95be5&lt;br /&gt;
# creating an inventory can take hours or days; let&#039;s initiate it now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp3919:~/Downloads$ aws glacier initiate-job --job-parameters &#039;{&amp;quot;Type&amp;quot;: &amp;quot;inventory-retrieval&amp;quot;}&#039; --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020&lt;br /&gt;
{&lt;br /&gt;
    &amp;quot;location&amp;quot;: &amp;quot;/099400651767/vaults/deleteMeIn2020/jobs/ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&amp;quot;,&lt;br /&gt;
    &amp;quot;jobId&amp;quot;: &amp;quot;ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
user@disp3919:~/Downloads$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I guess now we wait a few days for the job to complete before we can download it, parse it, and then delete all of the objects it identifies per https://gist.github.com/veuncent/ac21ae8131f24d3971a621fac0d95be5&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp3919:~/Downloads$ aws glacier get-job-output --account-id REDACTED --region us-west-2 --vault-name deleteMeIn2020 --job-id &amp;quot;ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&amp;quot; ./output.json&lt;br /&gt;
&lt;br /&gt;
An error occurred (InvalidParameterValueException) when calling the GetJobOutput operation: The job is not currently available for download: ucc6VDVVygGXS3EnMRVtzyqDpunVE81S91S_mUHuFL7-bfeMgVr6SxsVB3-_8g1Fs_NMdr_kV0rFCd_JFZU17EbUYXoS&lt;br /&gt;
user@disp3919:~/Downloads$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
# after much debugging, I figured out why store.opensourceecology.org gives different results for a `curl` coming from my laptop vs the server&lt;br /&gt;
# I found that the `curl` from my laptop was making it to nginx -&amp;gt; varnish -&amp;gt; apache&lt;br /&gt;
# but the logs were mysteriously absent for varnish &amp;amp; apache when I did the curl from the machine itself&lt;br /&gt;
# I even did a tcpdump, but I only saw a tiny blip of traffic when doing the command locally&lt;br /&gt;
# here&#039;s why: the server returns an http -&amp;gt; https redirect to store.opensourceecology.org. When the *server*&#039;s curl command gets that, it does a public DNS lookup and then sends the query to hetzner2!&lt;br /&gt;
# I updated the /etc/hosts file to prevent this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # cd /etc&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc # vim hosts&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc # diff hosts.20241004 hosts&lt;br /&gt;
2a3,13&lt;br /&gt;
&amp;gt; 127.0.0.1 forum.opensourceecology.org&lt;br /&gt;
&amp;gt; 127.0.0.1 store.opensourceecology.org&lt;br /&gt;
&amp;gt; 127.0.0.1 microfactory.opensourceecology.org&lt;br /&gt;
&amp;gt; 127.0.0.1 fef.opensourceecology.org&lt;br /&gt;
&amp;gt; 127.0.0.1 oswh.opensourceecology.org&lt;br /&gt;
&amp;gt; 127.0.0.1 seedhome.openbuildinginstitute.org&lt;br /&gt;
&amp;gt; 127.0.0.1 www.openbuildinginstitute.org&lt;br /&gt;
&amp;gt; 127.0.0.1 www.opensourceecology.org&lt;br /&gt;
&amp;gt; 127.0.0.1 phplist.opensourceecology.org&lt;br /&gt;
&amp;gt; 127.0.0.1 wiki.opensourceecology.org&lt;br /&gt;
&amp;gt;&lt;br /&gt;
3a15&lt;br /&gt;
&amp;gt;&lt;br /&gt;
root@hetzner3 /etc # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, now it&#039;s stuck in an infinite redirect. It just keeps going back-and-forth adding and removing the slash at the end&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ curl -iLkH &#039;Host: store.opensourceecology.org&#039; https://localhost/index.php?nocache=local5&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Sat, 05 Oct 2024 03:23:38 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://opensourceecology.org&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Sat, 05 Oct 2024 03:22:29 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://www.opensourceecology.org/&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Sat, 05 Oct 2024 03:23:39 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://opensourceecology.org&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
...&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Sat, 05 Oct 2024 03:22:29 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://www.opensourceecology.org/&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Sat, 05 Oct 2024 03:23:39 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 162&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Location: https://opensourceecology.org&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
curl: (47) Maximum (50) redirects followed&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, so it looks like it&#039;s getting picked-up by the default site&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/nginx # grep -ir 301 *&lt;br /&gt;
nginx.conf:                     return 301 https://$host$request_uri;&lt;br /&gt;
nginx.conf.1282157.2024-09-28@23:10:52~:                        return 301 https://$host$request_uri;&lt;br /&gt;
sites-enabled/00-default.conf:  return 301 https://opensourceecology.org;&lt;br /&gt;
root@hetzner3 /etc/nginx # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ah shit, yeah, nginx isn&#039;t even listening on 127.0.0.1 lol&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/nginx # less sites-enabled/store.opensourceecology.org.conf &lt;br /&gt;
# Ansible managed&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
# File:    store.opensourceecology.org.conf&lt;br /&gt;
# Version: 0.2&lt;br /&gt;
# Purpose: Internet-listening web server for truncating https, basic DOS&lt;br /&gt;
#          protection, and passing to varnish cache (varnish then passes to&lt;br /&gt;
#          apache)&lt;br /&gt;
# Author:  Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
# Created: 2019-04-09&lt;br /&gt;
# Updated: 2024-09-14&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
server {&lt;br /&gt;
&lt;br /&gt;
        access_log /var/log/nginx/store.opensourceecology.org/access.log main;&lt;br /&gt;
        error_log /var/log/nginx/store.opensourceecology.org/error.log;&lt;br /&gt;
&lt;br /&gt;
   include conf.d/secure.include;&lt;br /&gt;
   include conf.d/https.opensourceecology.org.include;&lt;br /&gt;
&lt;br /&gt;
   listen 144.76.164.201:443;&lt;br /&gt;
   listen [2a01:4f8:200:40d7::2]:443;&lt;br /&gt;
&lt;br /&gt;
   server_name store.opensourceecology.org;&lt;br /&gt;
&lt;br /&gt;
        #############&lt;br /&gt;
        # SITE_DOWN #&lt;br /&gt;
        #############&lt;br /&gt;
        # uncomment this block &amp;amp;&amp;amp; restart nginx prior to apache work to display the&lt;br /&gt;
        # &amp;quot;SITE DOWN&amp;quot; webpage for our clients&lt;br /&gt;
&lt;br /&gt;
#       root /var/www/html/SITE_DOWN/htdocs/;&lt;br /&gt;
#   index index.html index.htm;&lt;br /&gt;
#&lt;br /&gt;
#       # force all requests to load exactly this page&lt;br /&gt;
#       location / {&lt;br /&gt;
#               try_files $uri /index.html;&lt;br /&gt;
#       }&lt;br /&gt;
&lt;br /&gt;
        ###################&lt;br /&gt;
        # SEND TO VARNISH #&lt;br /&gt;
        ###################&lt;br /&gt;
&lt;br /&gt;
   location / {&lt;br /&gt;
      proxy_pass http://127.0.0.1:6081;&lt;br /&gt;
      proxy_set_header X-Real-IP $remote_addr;&lt;br /&gt;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;&lt;br /&gt;
      proxy_set_header X-Forwarded-Proto https;&lt;br /&gt;
      proxy_set_header X-Forwarded-Port 443;&lt;br /&gt;
      proxy_set_header Host $host;&lt;br /&gt;
   }&lt;br /&gt;
&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well it is, but this server block is not&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/nginx # netstat -plan | grep -i 443&lt;br /&gt;
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      3914728/nginx: mast &lt;br /&gt;
tcp        0      0 144.76.164.201:4443     0.0.0.0:*               LISTEN      3914728/nginx: mast &lt;br /&gt;
tcp       25      0 144.76.164.201:51710    104.21.40.220:443       CLOSE_WAIT  15751/wazuh-modules &lt;br /&gt;
tcp        0      0 127.0.0.1:80            127.0.0.1:54436         TIME_WAIT   -                   &lt;br /&gt;
tcp        0      0 127.0.0.1:54432         127.0.0.1:80            TIME_WAIT   -                   &lt;br /&gt;
tcp6       0      0 :::443                  :::*                    LISTEN      3914728/nginx: mast &lt;br /&gt;
tcp6       0      0 2a01:4f8:200:40d7::4443 :::*                    LISTEN      3914728/nginx: mast &lt;br /&gt;
tcp6      25      0 2a01:4f8:200:40d7:49016 2606:4700:3033::ac4:443 CLOSE_WAIT  15751/wazuh-modules &lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
root@hetzner3 /etc/nginx # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, default is -- which is why it&#039;s picking it up instead&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/nginx # grep -ir listen&lt;br /&gt;
sites-available/forum.opensourceecology.org.conf:# Purpose: Internet-listening web server for truncating https, basic DOS&lt;br /&gt;
sites-available/forum.opensourceecology.org.conf:   listen 144.76.164.201:443;&lt;br /&gt;
sites-available/forum.opensourceecology.org.conf:   listen [2a01:4f8:200:40d7::2]:443;&lt;br /&gt;
sites-available/default:        listen 80 default_server;&lt;br /&gt;
sites-available/default:        listen [::]:80 default_server;&lt;br /&gt;
sites-available/default:        # listen 443 ssl default_server;&lt;br /&gt;
sites-available/default:        # listen [::]:443 ssl default_server;&lt;br /&gt;
sites-available/default:#       listen 80;&lt;br /&gt;
sites-available/default:#       listen [::]:80;&lt;br /&gt;
sites-available/store.opensourceecology.org.conf:# Purpose: Internet-listening web server for truncating https, basic DOS&lt;br /&gt;
sites-available/store.opensourceecology.org.conf:   listen 144.76.164.201:443;&lt;br /&gt;
sites-available/store.opensourceecology.org.conf:   listen [2a01:4f8:200:40d7::2]:443;&lt;br /&gt;
nginx.conf.1282157.2024-09-28@23:10:52~:                listen 80;&lt;br /&gt;
nginx.conf.1282157.2024-09-28@23:10:52~:                listen [::]:80;&lt;br /&gt;
nginx.conf:             listen 80;&lt;br /&gt;
nginx.conf:             listen [::]:80;&lt;br /&gt;
nginx.conf.85740.2024-09-24@04:17:16~:#         listen     localhost:110;&lt;br /&gt;
nginx.conf.85740.2024-09-24@04:17:16~:#         listen     localhost:143;&lt;br /&gt;
sites-enabled/00-default.conf:  listen 443;&lt;br /&gt;
sites-enabled/00-default.conf:  listen [::]:443;&lt;br /&gt;
sites-enabled/awstats.opensourceecology.org.conf:# Purpose: Internet-listening web server for truncating https, basic DOS&lt;br /&gt;
sites-enabled/awstats.opensourceecology.org.conf:   listen 144.76.164.201:443;&lt;br /&gt;
sites-enabled/awstats.opensourceecology.org.conf:   listen [2a01:4f8:200:40d7::2]:443;&lt;br /&gt;
sites-enabled/awstats.opensourceecology.org.conf:   listen 144.76.164.201:4443;&lt;br /&gt;
sites-enabled/awstats.opensourceecology.org.conf:   listen [2a01:4f8:200:40d7::2]:4443;&lt;br /&gt;
sites-enabled/munin.opensourceecology.org.conf:# Purpose: Internet-listening web server for truncating https, basic DOS&lt;br /&gt;
sites-enabled/munin.opensourceecology.org.conf:   listen 144.76.164.201:443;&lt;br /&gt;
sites-enabled/munin.opensourceecology.org.conf:   listen [2a01:4f8:200:40d7::2]:443;&lt;br /&gt;
sites-enabled/munin.opensourceecology.org.conf:   listen 144.76.164.201:4443;&lt;br /&gt;
sites-enabled/munin.opensourceecology.org.conf:   listen [2a01:4f8:200:40d7::2]:4443;&lt;br /&gt;
root@hetzner3 /etc/nginx # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this is actually the same as our hetzner2 config&lt;br /&gt;
# I updated the nginx config in ansible and pushed it out again&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
diff --git a/hetzner3/roles/maltfield.nginx/templates/store.opensourceecology.org.conf.j2 b/hetzner3/roles/maltfield.nginx/templates/store.opensourceecology.org.conf.j2&lt;br /&gt;
index f4b62cd..f750651 100644&lt;br /&gt;
--- a/hetzner3/roles/maltfield.nginx/templates/store.opensourceecology.org.conf.j2&lt;br /&gt;
+++ b/hetzner3/roles/maltfield.nginx/templates/store.opensourceecology.org.conf.j2&lt;br /&gt;
@@ -2,13 +2,13 @@&lt;br /&gt;
 &lt;br /&gt;
 ################################################################################&lt;br /&gt;
 # File:    store.opensourceecology.org.conf&lt;br /&gt;
-# Version: 0.2&lt;br /&gt;
+# Version: 0.3&lt;br /&gt;
 # Purpose: Internet-listening web server for truncating https, basic DOS&lt;br /&gt;
 #          protection, and passing to varnish cache (varnish then passes to&lt;br /&gt;
 #          apache)&lt;br /&gt;
 # Author:  Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
 # Created: 2019-04-09&lt;br /&gt;
-# Updated: 2024-09-14&lt;br /&gt;
+# Updated: 2024-10-04&lt;br /&gt;
 ################################################################################&lt;br /&gt;
 &lt;br /&gt;
 server {&lt;br /&gt;
@@ -19,6 +19,8 @@ server {&lt;br /&gt;
    include conf.d/secure.include;&lt;br /&gt;
    include conf.d/https.opensourceecology.org.include;&lt;br /&gt;
 &lt;br /&gt;
+   listen 127.0.0.1:443;&lt;br /&gt;
+   listen [::1]:443;&lt;br /&gt;
    listen {{ ansible_default_ipv4.address }}:443;&lt;br /&gt;
    listen [{{ ansible_default_ipv6.address }}]:443;&lt;br /&gt;
 &lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ &lt;br /&gt;
diff --git a/hetzner3/roles/maltfield.nginx/templates/store.opensourceecology.org.conf.j2 b/hetzner3/roles/maltfield.nginx/templates/store.opensourceecology.org.conf.j2&lt;br /&gt;
index f4b62cd..f750651 100644&lt;br /&gt;
--- a/hetzner3/roles/maltfield.nginx/templates/store.opensourceecology.org.conf.j2&lt;br /&gt;
+++ b/hetzner3/roles/maltfield.nginx/templates/store.opensourceecology.org.conf.j2&lt;br /&gt;
@@ -2,13 +2,13 @@&lt;br /&gt;
 &lt;br /&gt;
 ################################################################################&lt;br /&gt;
 # File:    store.opensourceecology.org.conf&lt;br /&gt;
-# Version: 0.2&lt;br /&gt;
+# Version: 0.3&lt;br /&gt;
 # Purpose: Internet-listening web server for truncating https, basic DOS&lt;br /&gt;
 #          protection, and passing to varnish cache (varnish then passes to&lt;br /&gt;
 #          apache)&lt;br /&gt;
 # Author:  Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
 # Created: 2019-04-09&lt;br /&gt;
-# Updated: 2024-09-14&lt;br /&gt;
+# Updated: 2024-10-04&lt;br /&gt;
 ################################################################################&lt;br /&gt;
 &lt;br /&gt;
 server {&lt;br /&gt;
@@ -19,6 +19,8 @@ server {&lt;br /&gt;
    include conf.d/secure.include;&lt;br /&gt;
    include conf.d/https.opensourceecology.org.include;&lt;br /&gt;
 &lt;br /&gt;
+   listen 127.0.0.1:443;&lt;br /&gt;
+   listen [::1]:443;&lt;br /&gt;
    listen {{ ansible_default_ipv4.address }}:443;&lt;br /&gt;
    listen [{{ ansible_default_ipv6.address }}]:443;&lt;br /&gt;
 &lt;br /&gt;
user@personal:~/sandbox_local/ansible/hetzner3$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# and, well, the good/bad news is that now the curl from the local machine is as equally broken as the curl from my laptop&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ curl -iLkH &#039;Host: store.opensourceecology.org&#039; https://localhost/index.php?nocache=local6&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Sat, 05 Oct 2024 03:46:30 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 0&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Redirect-By: WordPress&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
Location: https://store.opensourceecology.org/?nocache=local6&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Frame-Options: deny&lt;br /&gt;
Referrer-Policy: no-referrer-when-downgrade&lt;br /&gt;
X-Varnish: 89&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish (Varnish/7.1)&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 200 OK&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Sat, 05 Oct 2024 03:46:30 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 5&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
Last-Modified: Fri, 04 Oct 2024 04:49:23 GMT&lt;br /&gt;
ETag: &amp;quot;5-6239f651921da&amp;quot;&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Frame-Options: deny&lt;br /&gt;
Referrer-Policy: no-referrer-when-downgrade&lt;br /&gt;
Pragma: public&lt;br /&gt;
Cache-Control: public, max-age=300&lt;br /&gt;
X-Varnish: 98500&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish (Varnish/7.1)&lt;br /&gt;
Accept-Ranges: bytes&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
true&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the &#039;true&#039; is obviously coming from &#039;index.html&#039;, so my first thought was just to get rid of that file&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # rm index.html&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but now we&#039;re just back to the empty output (again)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ curl -iLkH &#039;Host: store.opensourceecology.org&#039; https://localhost/index.php?nocache=local7&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Sat, 05 Oct 2024 03:49:48 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 0&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Redirect-By: WordPress&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
Location: https://store.opensourceecology.org/?nocache=local7&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Frame-Options: deny&lt;br /&gt;
Referrer-Policy: no-referrer-when-downgrade&lt;br /&gt;
X-Varnish: 94&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish (Varnish/7.1)&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 200 OK&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Sat, 05 Oct 2024 03:49:49 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 0&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Link: &amp;lt;https://store.opensourceecology.org/wp-json/&amp;gt;; rel=&amp;quot;https://api.w.org/&amp;quot;, &amp;lt;https://store.opensourceecology.org/wp-json/wp/v2/pages/2796&amp;gt;; rel=&amp;quot;alternate&amp;quot;; title=&amp;quot;JSON&amp;quot;; type=&amp;quot;application/json&amp;quot;, &amp;lt;https://store.opensourceecology.org/&amp;gt;; rel=shortlink&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Frame-Options: deny&lt;br /&gt;
Referrer-Policy: no-referrer-when-downgrade&lt;br /&gt;
X-Varnish: 97&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish (Varnish/7.1)&lt;br /&gt;
Accept-Ranges: bytes&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# varnish logs look fine; it basically just calls the backend&lt;br /&gt;
# apahce logs indicate that it did figure out which file to server with php&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
==&amp;gt; store.opensourceecology.org/error.log &amp;lt;==&lt;br /&gt;
[Sat Oct 05 03:56:05.631466 2024] [authz_core:debug] [pid 3909393:tid 3909439] mod_authz_core.c(733): [client 127.0.0.1:0] AH01625: authorization result of &amp;lt;RequireAny&amp;gt;: granted (directive limited to other methods)&lt;br /&gt;
[Sat Oct 05 03:56:05.631539 2024] [proxy_fcgi:debug] [pid 3909393:tid 3909439] mod_proxy_fcgi.c(123): [client 127.0.0.1:0] AH01060: set r-&amp;gt;filename to proxy:fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php&lt;br /&gt;
[Sat Oct 05 03:56:05.631557 2024] [proxy:debug] [pid 3909393:tid 3909439] mod_proxy.c(1465): [client 127.0.0.1:0] AH01143: Running scheme fcgi handler (attempt 0)&lt;br /&gt;
[Sat Oct 05 03:56:05.631571 2024] [proxy_fcgi:debug] [pid 3909393:tid 3909439] mod_proxy_fcgi.c(1078): [client 127.0.0.1:0] AH01076: url: fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php proxyname: (null) proxyport: 0&lt;br /&gt;
[Sat Oct 05 03:56:05.631584 2024] [proxy_fcgi:debug] [pid 3909393:tid 3909439] mod_proxy_fcgi.c(1087): [client 127.0.0.1:0] AH01078: serving URL fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php&lt;br /&gt;
[Sat Oct 05 03:56:05.631597 2024] [proxy:debug] [pid 3909393:tid 3909439] proxy_util.c(2797): AH00942: FCGI: has acquired connection for (*:80)&lt;br /&gt;
[Sat Oct 05 03:56:05.631610 2024] [proxy:debug] [pid 3909393:tid 3909439] proxy_util.c(3242): [client 127.0.0.1:0] AH00944: connecting fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php to localhost:8000&lt;br /&gt;
[Sat Oct 05 03:56:05.631624 2024] [proxy:debug] [pid 3909393:tid 3909439] proxy_util.c(3309): [client 127.0.0.1:0] AH02545: fcgi: has determined UDS as /run/php/php8.2-fpm.sock (for localhost:8000)&lt;br /&gt;
[Sat Oct 05 03:56:05.631638 2024] [proxy:debug] [pid 3909393:tid 3909439] proxy_util.c(3450): [client 127.0.0.1:0] AH00947: connecting /var/www/html/store.opensourceecology.org/htdocs/index.php to /run/php/php8.2-fpm.sock:0 (localhost:8000)&lt;br /&gt;
[Sat Oct 05 03:56:05.631673 2024] [proxy:debug] [pid 3909393:tid 3909439] proxy_util.c(3832): AH02823: FCGI: connection established with Unix domain socket /run/php/php8.2-fpm.sock (localhost:8000)&lt;br /&gt;
[Sat Oct 05 03:56:06.720816 2024] [proxy:debug] [pid 3909393:tid 3909439] proxy_util.c(2813): AH00943: FCGI: has released connection for (*:80)&lt;br /&gt;
&lt;br /&gt;
==&amp;gt; store.opensourceecology.org/access.log &amp;lt;==&lt;br /&gt;
127.0.0.1 - - [05/Oct/2024:03:56:05 +0000] &amp;quot;GET /index.php?nocache=local10 HTTP/1.1&amp;quot; 301 436 &amp;quot;-&amp;quot; &amp;quot;curl/7.88.1&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==&amp;gt; store.opensourceecology.org/error.log &amp;lt;==&lt;br /&gt;
[Sat Oct 05 03:56:06.725670 2024] [authz_core:debug] [pid 3909393:tid 3909441] mod_authz_core.c(733): [client 127.0.0.1:0] AH01625: authorization result of &amp;lt;RequireAny&amp;gt;: granted (directive limited to other methods)&lt;br /&gt;
[Sat Oct 05 03:56:06.725738 2024] [authz_core:debug] [pid 3909393:tid 3909441] mod_authz_core.c(733): [client 127.0.0.1:0] AH01625: authorization result of &amp;lt;RequireAny&amp;gt;: granted (directive limited to other methods)&lt;br /&gt;
[Sat Oct 05 03:56:06.725854 2024] [authz_core:debug] [pid 3909393:tid 3909441] mod_authz_core.c(733): [client 127.0.0.1:0] AH01625: authorization result of &amp;lt;RequireAny&amp;gt;: granted (directive limited to other methods)&lt;br /&gt;
[Sat Oct 05 03:56:06.725886 2024] [proxy_fcgi:debug] [pid 3909393:tid 3909441] mod_proxy_fcgi.c(123): [client 127.0.0.1:0] AH01060: set r-&amp;gt;filename to proxy:fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php&lt;br /&gt;
[Sat Oct 05 03:56:06.725895 2024] [proxy:debug] [pid 3909393:tid 3909441] mod_proxy.c(1465): [client 127.0.0.1:0] AH01143: Running scheme fcgi handler (attempt 0)&lt;br /&gt;
[Sat Oct 05 03:56:06.725901 2024] [proxy_fcgi:debug] [pid 3909393:tid 3909441] mod_proxy_fcgi.c(1078): [client 127.0.0.1:0] AH01076: url: fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php proxyname: (null) proxyport: 0&lt;br /&gt;
[Sat Oct 05 03:56:06.725928 2024] [proxy_fcgi:debug] [pid 3909393:tid 3909441] mod_proxy_fcgi.c(1087): [client 127.0.0.1:0] AH01078: serving URL fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php&lt;br /&gt;
[Sat Oct 05 03:56:06.725935 2024] [proxy:debug] [pid 3909393:tid 3909441] proxy_util.c(2797): AH00942: FCGI: has acquired connection for (*:80)&lt;br /&gt;
[Sat Oct 05 03:56:06.725941 2024] [proxy:debug] [pid 3909393:tid 3909441] proxy_util.c(3242): [client 127.0.0.1:0] AH00944: connecting fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php to localhost:8000&lt;br /&gt;
[Sat Oct 05 03:56:06.725950 2024] [proxy:debug] [pid 3909393:tid 3909441] proxy_util.c(3309): [client 127.0.0.1:0] AH02545: fcgi: has determined UDS as /run/php/php8.2-fpm.sock (for localhost:8000)&lt;br /&gt;
[Sat Oct 05 03:56:06.725959 2024] [proxy:debug] [pid 3909393:tid 3909441] proxy_util.c(3450): [client 127.0.0.1:0] AH00947: connecting /var/www/html/store.opensourceecology.org/htdocs/index.php to /run/php/php8.2-fpm.sock:0 (localhost:8000)&lt;br /&gt;
[Sat Oct 05 03:56:06.726002 2024] [proxy:debug] [pid 3909393:tid 3909441] proxy_util.c(3832): AH02823: FCGI: connection established with Unix domain socket /run/php/php8.2-fpm.sock (localhost:8000)&lt;br /&gt;
[Sat Oct 05 03:56:07.778759 2024] [proxy:debug] [pid 3909393:tid 3909441] proxy_util.c(2813): AH00943: FCGI: has released connection for (*:80)&lt;br /&gt;
&lt;br /&gt;
==&amp;gt; store.opensourceecology.org/access.log &amp;lt;==&lt;br /&gt;
127.0.0.1 - - [05/Oct/2024:03:56:06 +0000] &amp;quot;GET /?nocache=local10 HTTP/1.1&amp;quot; 200 586 &amp;quot;-&amp;quot; &amp;quot;curl/7.88.1&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# this suggests that it might do this if the theme dir is empty? that would likely apply in our case https://serverfault.com/a/766146&lt;br /&gt;
# oh, it *does* load if I try &#039;/wp-admin/&#039;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ curl -iLkH &#039;Host: store.opensourceecology.org&#039; https://localhost/wp-admin/&lt;br /&gt;
...&lt;br /&gt;
HTTP/1.1 200 OK&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Sat, 05 Oct 2024 04:24:26 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 1516&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Expires: Wed, 11 Jan 1984 05:00:00 GMT&lt;br /&gt;
Cache-Control: no-cache, must-revalidate, max-age=0&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
Vary: Accept-Encoding&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Frame-Options: deny&lt;br /&gt;
Referrer-Policy: no-referrer-when-downgrade&lt;br /&gt;
X-Varnish: 98551&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish (Varnish/7.1)&lt;br /&gt;
Accept-Ranges: bytes&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!DOCTYPE html&amp;gt;&lt;br /&gt;
&amp;lt;html lang=&amp;quot;en-US&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;head&amp;gt;&lt;br /&gt;
        &amp;lt;meta name=&amp;quot;viewport&amp;quot; content=&amp;quot;width=device-width&amp;quot; /&amp;gt;&lt;br /&gt;
        &amp;lt;meta http-equiv=&amp;quot;Content-Type&amp;quot; content=&amp;quot;text/html; charset=UTF-8&amp;quot; /&amp;gt;&lt;br /&gt;
        &amp;lt;meta name=&amp;quot;robots&amp;quot; content=&amp;quot;noindex,nofollow&amp;quot; /&amp;gt;&lt;br /&gt;
        &amp;lt;title&amp;gt;WordPress &amp;amp;rsaquo; Update&amp;lt;/title&amp;gt;&lt;br /&gt;
        &amp;lt;link rel=&#039;stylesheet&#039; id=&#039;dashicons-css&#039; href=&#039;https://store.opensourceecology.org/wp-includes/css/dashicons.min.css?ver=6.6.1&#039; type=&#039;text/css&#039; media=&#039;all&#039; /&amp;gt;&lt;br /&gt;
&amp;lt;link rel=&#039;stylesheet&#039; id=&#039;buttons-css&#039; href=&#039;https://store.opensourceecology.org/wp-includes/css/buttons.min.css?ver=6.6.1&#039; type=&#039;text/css&#039; media=&#039;all&#039; /&amp;gt;&lt;br /&gt;
&amp;lt;link rel=&#039;stylesheet&#039; id=&#039;forms-css&#039; href=&#039;https://store.opensourceecology.org/wp-admin/css/forms.min.css?ver=6.6.1&#039; type=&#039;text/css&#039; media=&#039;all&#039; /&amp;gt;&lt;br /&gt;
&amp;lt;link rel=&#039;stylesheet&#039; id=&#039;l10n-css&#039; href=&#039;https://store.opensourceecology.org/wp-admin/css/l10n.min.css?ver=6.6.1&#039; type=&#039;text/css&#039; media=&#039;all&#039; /&amp;gt;&lt;br /&gt;
&amp;lt;link rel=&#039;stylesheet&#039; id=&#039;install-css&#039; href=&#039;https://store.opensourceecology.org/wp-admin/css/install.min.css?ver=6.6.1&#039; type=&#039;text/css&#039; media=&#039;all&#039; /&amp;gt;&lt;br /&gt;
&amp;lt;/head&amp;gt;&lt;br /&gt;
&amp;lt;body class=&amp;quot;wp-core-ui&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;p id=&amp;quot;logo&amp;quot;&amp;gt;&amp;lt;a href=&amp;quot;https://wordpress.org/&amp;quot;&amp;gt;WordPress&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
        &amp;lt;h1&amp;gt;Database Update Required&amp;lt;/h1&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;WordPress has been updated! Next and final step is to update your database to the newest version.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;The database update process may take a little while, so please be patient.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p class=&amp;quot;step&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;button button-large button-primary&amp;quot; href=&amp;quot;upgrade.php?step=1&amp;amp;amp;backto=%2Fwp-admin%2F&amp;quot;&amp;gt;Update WordPress Database&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
                        &amp;lt;/body&amp;gt;&lt;br /&gt;
&amp;lt;/html&amp;gt;&lt;br /&gt;
maltfield@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I loaded that in the web browser, and it told me a wordpress database update was needed. I just pressed the button -- it didn&#039;t even prompt me to auth&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Your WordPress database has been successfully updated!&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I clicked &amp;quot;Continue&amp;quot;&lt;br /&gt;
# that redirected me here, and I immediately got &#039;403 forbidden&#039; https://store.opensourceecology.org/wp-login.php?redirect_to=https%3A%2F%2Fstore.opensourceecology.org%2Fwp-admin%2F&amp;amp;reauth=1&lt;br /&gt;
# that would be because block access to &#039;wp-login.php&#039; since we were using a plugin to rename it; we&#039;ll have to temp disable that until we replace that (now deprecated) plugin&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Thr Oct 03, 2024=&lt;br /&gt;
# I sent an invoice (AS-0106) to OSE for 67 hours in Sep 2024&lt;br /&gt;
...&lt;br /&gt;
# continuing to debug store.opensourceecology.org, I see that it&#039;s redirecting from &#039;/index.php&#039; to &#039;/&#039;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp3919:~$ curl -i https://store.opensourceecology.org/index.php&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Fri, 04 Oct 2024 04:47:39 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 0&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Redirect-By: WordPress&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
Location: https://store.opensourceecology.org/&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Frame-Options: deny&lt;br /&gt;
Referrer-Policy: no-referrer-when-downgrade&lt;br /&gt;
X-Varnish: 131132 98385&lt;br /&gt;
Age: 88&lt;br /&gt;
Via: 1.1 varnish (Varnish/7.1)&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
user@disp3919:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I try to hit &#039;index.html&#039;, I don&#039;t get a redirect -- I just get a 404&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp3919:~$ curl -i https://store.opensourceecology.org/index.html&lt;br /&gt;
HTTP/1.1 404 Not Found&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Fri, 04 Oct 2024 04:48:25 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 0&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
Expires: Wed, 11 Jan 1984 05:00:00 GMT&lt;br /&gt;
Cache-Control: no-cache, must-revalidate, max-age=0&lt;br /&gt;
Link: &amp;lt;https://store.opensourceecology.org/wp-json/&amp;gt;; rel=&amp;quot;https://api.w.org/&amp;quot;&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Frame-Options: deny&lt;br /&gt;
Referrer-Policy: no-referrer-when-downgrade&lt;br /&gt;
X-Varnish: 131134&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish (Varnish/7.1)&lt;br /&gt;
&lt;br /&gt;
user@disp3919:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# if I create it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # cp is_hetzner3 index.html&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # ls -lah is_hetzner3 index.html &lt;br /&gt;
----r----- 1 root       root     5 Oct  4 04:49 index.html&lt;br /&gt;
----r----- 1 not-apache www-data 5 Sep 27 04:44 is_hetzner3&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # chown not-apache:www-data index.html &lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# then it works&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp3919:~$ curl -i https://store.opensourceecology.org/index.html&lt;br /&gt;
HTTP/1.1 200 OK&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Fri, 04 Oct 2024 04:49:51 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 5&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
Last-Modified: Fri, 04 Oct 2024 04:49:23 GMT&lt;br /&gt;
ETag: &amp;quot;5-6239f651921da&amp;quot;&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Frame-Options: deny&lt;br /&gt;
Referrer-Policy: no-referrer-when-downgrade&lt;br /&gt;
Pragma: public&lt;br /&gt;
Cache-Control: public, max-age=300&lt;br /&gt;
X-Varnish: 98387&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish (Varnish/7.1)&lt;br /&gt;
Accept-Ranges: bytes&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
true&lt;br /&gt;
user@disp3919:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# hmm, so apache is just refusing to serve &#039;index.php&#039; files. -- what about &#039;something.php&#039;?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # echo &amp;quot;&amp;lt;?php echo &#039;it works&#039;; ?&amp;gt;&amp;quot; &amp;gt; something.php&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # chown root:www-data something.php &lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # chmod 0040 something.php &lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# it works, so it&#039;s something specific to &#039;index.php&#039;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp3919:~$ curl -i https://store.opensourceecology.org/something.php&lt;br /&gt;
HTTP/1.1 200 OK&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Fri, 04 Oct 2024 04:52:35 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 8&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Frame-Options: deny&lt;br /&gt;
Referrer-Policy: no-referrer-when-downgrade&lt;br /&gt;
X-Varnish: 98390&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish (Varnish/7.1)&lt;br /&gt;
Accept-Ranges: bytes&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
it worksuser@disp3919:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I&#039;d think it&#039;s an issue with the DirectoryIndex, but this looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/apache2 # grep -ir &#039;index.php&#039; *&lt;br /&gt;
conf-available/wordpress.directory.include:#            RewriteRule . /index.php [L]&lt;br /&gt;
mods-available/dir.conf:DirectoryIndex index.html index.cgi index.pl index.php index.xhtml index.htm&lt;br /&gt;
root@hetzner3 /etc/apache2 # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/apache2 # ls -lah mods-enabled/dir.conf &lt;br /&gt;
lrwxrwxrwx 1 root root 26 Sep 25 01:24 mods-enabled/dir.conf -&amp;gt; ../mods-available/dir.conf&lt;br /&gt;
root@hetzner3 /etc/apache2 # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I checked the old server, but I didn&#039;t see anything that we&#039;re missing in the new server&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;index.php&#039; conf&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;index.php&#039; conf.d&lt;br /&gt;
conf.d/php.conf:# Add index.php to the list of files that will be served as directory&lt;br /&gt;
conf.d/php.conf:DirectoryIndex index.php&lt;br /&gt;
conf.d/00-wiki.opensourceecology.org.conf:	Alias /wiki /var/www/html/wiki.opensourceecology.org/htdocs/index.php&lt;br /&gt;
conf.d/mod_evasive.conf:    #   http://security.lss.hr/index.php?page=details&amp;amp;ID=LSS-2005-01-01&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;index.php&#039; conf.modules.d/&lt;br /&gt;
[root@opensourceecology httpd]# grep -ir &#039;index.php&#039; modsecurity.d/&lt;br /&gt;
[root@opensourceecology httpd]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I changed wp-config.php to have WP_DEBUG set to &#039;true&#039;, but it didn&#039;t print anything extra. It seems like the error is occurring before wordpress&lt;br /&gt;
# I set LogLevel of apache.conf to &#039;debug&#039;, and this popped-up&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
==&amp;gt; forum.opensourceecology.org/access.log &amp;lt;==&lt;br /&gt;
127.0.0.1 - - [04/Oct/2024:05:10:02 +0000] &amp;quot;GET /server-status?auto HTTP/1.1&amp;quot; 200 1202 &amp;quot;-&amp;quot; &amp;quot;munin/2.0.73 (libwww-perl/6.68)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==&amp;gt; forum.opensourceecology.org/error.log &amp;lt;==&lt;br /&gt;
[Fri Oct 04 05:10:03.564975 2024] [authz_core:debug] [pid 3581402:tid 3581414] mod_authz_core.c(815): [client 127.0.0.1:32934] AH01626: authorization result of Require all denied: denied&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# well, that&#039;s an unrelated issue with munin, but it seems that the requests to &#039;/server-status&#039; are getting sent to the wrong vhost (forum.opensourceecology.org) and also denied access&lt;br /&gt;
# here&#039;s the actual output when I do the curl&lt;br /&gt;
## first, it outputs this immediately, then it pauses for maybe 10 seconds&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
==&amp;gt; store.opensourceecology.org/error.log &amp;lt;==&lt;br /&gt;
[Fri Oct 04 05:11:53.426292 2024] [authz_core:debug] [pid 3581402:tid 3581422] mod_authz_core.c(733): [client 81.17.16.91:0] AH01625: authorization result of &amp;lt;RequireAny&amp;gt;: granted (directive limited to other methods)&lt;br /&gt;
[Fri Oct 04 05:11:53.426458 2024] [proxy_fcgi:debug] [pid 3581402:tid 3581422] mod_proxy_fcgi.c(123): [client 81.17.16.91:0] AH01060: set r-&amp;gt;filename to proxy:fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php&lt;br /&gt;
[Fri Oct 04 05:11:53.426496 2024] [proxy:debug] [pid 3581402:tid 3581422] mod_proxy.c(1465): [client 81.17.16.91:0] AH01143: Running scheme fcgi handler (attempt 0)&lt;br /&gt;
[Fri Oct 04 05:11:53.426517 2024] [proxy_fcgi:debug] [pid 3581402:tid 3581422] mod_proxy_fcgi.c(1078): [client 81.17.16.91:0] AH01076: url: fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php proxyname: (null) proxyport: 0&lt;br /&gt;
[Fri Oct 04 05:11:53.426535 2024] [proxy_fcgi:debug] [pid 3581402:tid 3581422] mod_proxy_fcgi.c(1087): [client 81.17.16.91:0] AH01078: serving URL fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php&lt;br /&gt;
[Fri Oct 04 05:11:53.426586 2024] [proxy:debug] [pid 3581402:tid 3581422] proxy_util.c(2797): AH00942: FCGI: has acquired connection for (*:80)&lt;br /&gt;
[Fri Oct 04 05:11:53.426612 2024] [proxy:debug] [pid 3581402:tid 3581422] proxy_util.c(3242): [client 81.17.16.91:0] AH00944: connecting fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php to localhost:8000&lt;br /&gt;
[Fri Oct 04 05:11:53.426658 2024] [proxy:debug] [pid 3581402:tid 3581422] proxy_util.c(3309): [client 81.17.16.91:0] AH02545: fcgi: has determined UDS as /run/php/php8.2-fpm.sock (for localhost:8000)&lt;br /&gt;
[Fri Oct 04 05:11:53.426718 2024] [proxy:debug] [pid 3581402:tid 3581422] proxy_util.c(3450): [client 81.17.16.91:0] AH00947: connecting /var/www/html/store.opensourceecology.org/htdocs/index.php to /run/php/php8.2-fpm.sock:0 (localhost:8000)&lt;br /&gt;
[Fri Oct 04 05:11:53.426793 2024] [proxy:debug] [pid 3581402:tid 3581422] proxy_util.c(3832): AH02823: FCGI: connection established with Unix domain socket /run/php/php8.2-fpm.sock (localhost:8000)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## after maybe 10 seconds, it outputs this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Oct 04 05:12:03.646185 2024] [proxy:debug] [pid 3581402:tid 3581422] proxy_util.c(2813): AH00943: FCGI: has released connection for (*:80)&lt;br /&gt;
&lt;br /&gt;
==&amp;gt; store.opensourceecology.org/access.log &amp;lt;==&lt;br /&gt;
81.17.16.91 - - [04/Oct/2024:05:11:53 +0000] &amp;quot;GET /index.php?nocache=6 HTTP/1.1&amp;quot; 301 430 &amp;quot;-&amp;quot; &amp;quot;curl/7.88.1&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# so it sounds like maybe this is an issue with the php-fpm config?&lt;br /&gt;
# I tried to hit apache through the cli on the server itself and, oh, I get the payload as desired&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # curl -iLH &#039;Host: store.opensourceecology.org&#039; 127.0.0.1:8000/index.php&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;script&amp;gt;&lt;br /&gt;
        //jQuery(document).ready(function(){&lt;br /&gt;
                        // });&lt;br /&gt;
&amp;lt;/script&amp;gt;&lt;br /&gt;
&amp;lt;/body&amp;gt;&lt;br /&gt;
&amp;lt;/html&amp;gt;root@hetzne&lt;br /&gt;
You have new mail in /var/mail/root&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I loosened the error reporting settings on php.ini and I got it to spit this out when I curl from my laptop&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--2d1eeb03-H--&lt;br /&gt;
Apache-Error: [file &amp;quot;mod_authz_core.c&amp;quot;] [line 733] [level 7] AH01625: authorization result of &amp;lt;RequireAny&amp;gt;: granted (directive limited to other methods)&lt;br /&gt;
Apache-Error: [file &amp;quot;mod_proxy_fcgi.c&amp;quot;] [line 123] [level 7] AH01060: set r-&amp;gt;filename to proxy:fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php&lt;br /&gt;
Apache-Error: [file &amp;quot;mod_proxy.c&amp;quot;] [line 1465] [level 7] AH01143: Running scheme fcgi handler (attempt 0)&lt;br /&gt;
Apache-Error: [file &amp;quot;mod_proxy_fcgi.c&amp;quot;] [line 1078] [level 7] AH01076: url: fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php proxyname: (null) proxyport: 0&lt;br /&gt;
Apache-Error: [file &amp;quot;mod_proxy_fcgi.c&amp;quot;] [line 1087] [level 7] AH01078: serving URL fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php&lt;br /&gt;
Apache-Error: [file &amp;quot;proxy_util.c&amp;quot;] [line 3242] [level 7] AH00944: connecting fcgi://localhost/var/www/html/store.opensourceecology.org/htdocs/index.php to localhost:8000&lt;br /&gt;
Apache-Error: [file &amp;quot;proxy_util.c&amp;quot;] [line 3309] [level 7] AH02545: fcgi: has determined UDS as /run/php/php8.2-fpm.sock (for localhost:8000)&lt;br /&gt;
Apache-Error: [file &amp;quot;proxy_util.c&amp;quot;] [line 3450] [level 7] AH00947: connecting /var/www/html/store.opensourceecology.org/htdocs/index.php to /run/php/php8.2-fpm.sock:0 (localhost:8000)&lt;br /&gt;
Apache-Error: [file &amp;quot;mod_proxy_fcgi.c&amp;quot;] [line 911] [level 3] AH01071: Got error &#039;PHP message: PHP Fatal error:  Uncaught Error: Call to undefined function ini_set() in /var/www/html/store.opensourceecology.org/htdocs/wp-includes/load.php:590\\nStack trace:\\n#0 /var/www/html/store.opensourceecology.org/htdocs/wp-settings.php(82): wp_debug_mode()\\n#1 /var/www/html/store.opensourceecology.org/wp-config.php(105): require_once(&#039;...&#039;)\\n#2 /var/www/html/store.opensourceecology.org/htdocs/wp-load.php(55): require_once(&#039;...&#039;)\\n#3 /var/www/html/store.opensourceecology.org/htdocs/wp-blog-header.php(13): require_once(&#039;...&#039;)\\n#4 /var/www/html/store.opensourceecology.org/htdocs/index.php(17): require(&#039;...&#039;)\\n#5 {main}\\n  thrown in /var/www/html/store.opensourceecology.org/htdocs/wp-includes/load.php on line 590&#039;&lt;br /&gt;
Apache-Handler: proxy:unix:/run/php/php8.2-fpm.sock|fcgi://localhost&lt;br /&gt;
Stopwatch: 1728019471413382 53626 (- - -)&lt;br /&gt;
Stopwatch2: 1728019471413382 53626; combined=41, p1=21, p2=18, p3=1, p4=0, p5=1, sr=0, sw=0, l=0, gc=0&lt;br /&gt;
Response-Body-Transformed: Dechunked&lt;br /&gt;
Producer: ModSecurity for Apache/2.9.7 (http://www.modsecurity.org/).&lt;br /&gt;
Server: Apache&lt;br /&gt;
Engine-Mode: &amp;quot;ENABLED&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# yeah, so this is the wordpress bug that I submitted a PR for last month&lt;br /&gt;
## https://github.com/WordPress/wordpress-develop/pull/7352&lt;br /&gt;
## https://core.trac.wordpress.org/ticket/62047&lt;br /&gt;
## https://core.trac.wordpress.org/ticket/48693&lt;br /&gt;
# after a brief dialog with the wordpress devs, a workaround until they merge is to define a fake ini_set() function in wp-config.php&lt;br /&gt;
## from personal experience, I found it&#039;s best to wrap this in a conditional to make sure the function doesn&#039;t exist yet&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org # cp wp-config.php wp-config.php.20241003&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org # vim wp-config.php&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org # diff wp-config.php.20241003 wp-config.php&lt;br /&gt;
1a2,9&lt;br /&gt;
&amp;gt; &lt;br /&gt;
&amp;gt; # fix wordpress bug https://core.trac.wordpress.org/ticket/48693&lt;br /&gt;
&amp;gt; if( ! function_exists(&#039;ini_set&#039;) ){&lt;br /&gt;
&amp;gt;       function ini_set(){&lt;br /&gt;
&amp;gt;               return;&lt;br /&gt;
&amp;gt;       }&lt;br /&gt;
&amp;gt; }&lt;br /&gt;
&amp;gt; &lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# after that, I&#039;m back to getting blank pages on my curl on my laptop. it&#039;s flapping?&lt;br /&gt;
# alright, let me see if I can harden php back up again, but with errors actually logging. I&#039;ll update something.php to write to the error log&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org # cd htdocs/&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # vim something.php &lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # cat something.php &lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
error_log( &amp;quot;executing something.php&amp;quot; );&lt;br /&gt;
echo &#039;it works&#039;;&lt;br /&gt;
?&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, it&#039;s visible&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Fri Oct 04 05:47:06.734936 2024] [proxy_fcgi:error] [pid 3581402:tid 3581452] [client 127.0.0.1:36330] AH01071: Got error &#039;PHP message: executing something.php&#039;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# first I reduced the apache logs down to &#039;warn&#039; again. looks good&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[Fri Oct 04 05:48:39.732669 2024] [proxy_fcgi:error] [pid 3591906:tid 3591913] [client 127.0.0.1:41102] AH01071: Got error &#039;PHP message: executing something.php&#039;&lt;br /&gt;
&lt;br /&gt;
==&amp;gt; store.opensourceecology.org/access.log &amp;lt;==&lt;br /&gt;
127.0.0.1 - - [04/Oct/2024:05:48:39 +0000] &amp;quot;GET /something.php HTTP/1.1&amp;quot; 200 302 &amp;quot;-&amp;quot; &amp;quot;curl/7.88.1&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I obliterated my manual changes by pushing ansible&#039;s apache &amp;amp; php roles&lt;br /&gt;
# cool, I confirmed that both curl on my laptop and on the server produce the logs after restarting both apache2 &amp;amp; php8.2-fpm&lt;br /&gt;
# for some reason I still get an &#039;true&#039; on my laptop&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user@disp3919:~$ 3919:~$ curl -iL https://store.opensourceecolindex.php?nocache=19&lt;br /&gt;
HTTP/1.1 301 Moved Permanently&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Fri, 04 Oct 2024 05:56:15 GMT&lt;br /&gt;
Content-Type: text/html; charset=UTF-8&lt;br /&gt;
Content-Length: 0&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Redirect-By: WordPress&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
Location: https://store.opensourceecology.org/?nocache=19&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Frame-Options: deny&lt;br /&gt;
Referrer-Policy: no-referrer-when-downgrade&lt;br /&gt;
X-Varnish: 98469&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish (Varnish/7.1)&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
HTTP/1.1 200 OK&lt;br /&gt;
Server: nginx&lt;br /&gt;
Date: Fri, 04 Oct 2024 05:56:16 GMT&lt;br /&gt;
Content-Type: text/html&lt;br /&gt;
Content-Length: 5&lt;br /&gt;
Connection: keep-alive&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
Last-Modified: Fri, 04 Oct 2024 04:49:23 GMT&lt;br /&gt;
ETag: &amp;quot;5-6239f651921da&amp;quot;&lt;br /&gt;
X-Content-Type-Options: nosniff&lt;br /&gt;
X-XSS-Protection: 1; mode=block&lt;br /&gt;
X-Frame-Options: deny&lt;br /&gt;
Referrer-Policy: no-referrer-when-downgrade&lt;br /&gt;
Pragma: public&lt;br /&gt;
Cache-Control: public, max-age=300&lt;br /&gt;
X-Varnish: 131203&lt;br /&gt;
Age: 0&lt;br /&gt;
Via: 1.1 varnish (Varnish/7.1)&lt;br /&gt;
Accept-Ranges: bytes&lt;br /&gt;
Strict-Transport-Security: max-age=15552001&lt;br /&gt;
Public-Key-Pins: pin-sha256=&amp;quot;UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c=&amp;quot;; pin-sha256=&amp;quot;YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg=&amp;quot;; pin-sha256=&amp;quot;C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M=&amp;quot;; pin-sha256=&amp;quot;Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys=&amp;quot;; pin-sha256=&amp;quot;lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU=&amp;quot;; pin-sha256=&amp;quot;K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=&amp;quot;; pin-sha256=&amp;quot;Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o=&amp;quot;; pin-sha256=&amp;quot;EGn6R6CqT4z3ERscrqNl7q7RC//zJmDe9uBhS/rnCHU=&amp;quot;; pin-sha256=&amp;quot;NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ=&amp;quot;; pin-sha256=&amp;quot;fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A=&amp;quot;; pin-sha256=&amp;quot;oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo=&amp;quot;; pin-sha256=&amp;quot;0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo=&amp;quot;; pin-sha256=&amp;quot;MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA=&amp;quot;; pin-sha256=&amp;quot;OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU=&amp;quot;; max-age=3600; includeSubDomains; report-uri=&amp;quot;http://opensourceecology.org/hpkp-report&amp;quot;&lt;br /&gt;
&lt;br /&gt;
true&lt;br /&gt;
user@disp3919:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# but I get the actual html on the local machine&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # curl -iLH &#039;Host: store.opensourceecology.org&#039; 127.0.0.1:8000/index.php&lt;br /&gt;
...&lt;br /&gt;
/* ]]&amp;gt; */&lt;br /&gt;
&amp;lt;/script&amp;gt;&lt;br /&gt;
&amp;lt;script type=&#039;text/javascript&#039; src=&#039;https://store.opensourceecology.org/wp-content/themes/oshin/js/script.js?ver=5.0&#039;&amp;gt;&amp;lt;/script&amp;gt;&lt;br /&gt;
&amp;lt;script type=&#039;text/javascript&#039; src=&#039;https://store.opensourceecology.org/wp-includes/js/wp-embed.min.js?ver=5.1.1&#039;&amp;gt;&amp;lt;/script&amp;gt;&lt;br /&gt;
&amp;lt;!-- Option Panel Custom JavaScript --&amp;gt;&lt;br /&gt;
&amp;lt;script&amp;gt;&lt;br /&gt;
        //jQuery(document).ready(function(){&lt;br /&gt;
                        // });&lt;br /&gt;
&amp;lt;/script&amp;gt;&lt;br /&gt;
&amp;lt;/body&amp;gt;&lt;br /&gt;
&amp;lt;/html&amp;gt;root@hetzner3 ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Wed Oct 02, 2024=&lt;br /&gt;
# Marcin sent me a few emails in the past months asking about OSE&#039;s use of Amazon Glacier&lt;br /&gt;
# Today he sent a message saying that he got charged $1.03, and isn&#039;t sure why&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Michael,&lt;br /&gt;
&lt;br /&gt;
I&#039;m getting charged $1.03 for Glacier. Can we cancel that?&lt;br /&gt;
&lt;br /&gt;
Marcin&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# It took me a while to auth&lt;br /&gt;
## first I tried to login with my &#039;maltfield&#039; aws user, but aws rejected my creds (stored in my personal keepass)&lt;br /&gt;
## eventually I realized I had to click &amp;quot;Sign in using root user email&amp;quot; -- and then I could auth using the creds stored in the shared keepass&lt;br /&gt;
# after logging-in, I went to the &amp;quot;Billing and Cost Management&amp;quot; app https://us-east-1.console.aws.amazon.com/costmanagement/home?region=us-west-2#/home&lt;br /&gt;
# on this page, there was a link that said &amp;quot;Last month&#039;s total cost: $1.03&amp;quot;. Yep, that&#039;s all accounted-for. I clicked it.&lt;br /&gt;
# the next page showed a joke of a chart with one bar on a bar graph that said &amp;quot;$1.03&amp;quot;. And the bar was labeled &amp;quot;Total Cost&amp;quot;&lt;br /&gt;
# I had to click on the dropdown menu for &amp;quot;Dimension&amp;quot; and set it to &amp;quot;Service&amp;quot; -- then it listed 4 items&lt;br /&gt;
## Glacier - $1.03&lt;br /&gt;
## S3 - $0.00&lt;br /&gt;
## Tax $0.00&lt;br /&gt;
# So I switched over to the &amp;quot;Glacier&amp;quot; app https://us-east-1.console.aws.amazon.com/glacier/home?region=us-east-1&lt;br /&gt;
## Curiously, it listed 0 vaults&lt;br /&gt;
## but there was a note at the top saying we should use S3 for glaicer, so I clicked over to the &amp;quot;S3&amp;quot; app&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
We recommend that you use Glacier storage classes in Amazon S3 for archival storage&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here I saw one bucket called &amp;quot;oseserverbackups&amp;quot; in &amp;quot;US West (Oregon) us-west-2&amp;quot;&lt;br /&gt;
# the bucket had one 34.0 byte file in it called &amp;quot;test.txt&amp;quot;. That&#039;s it!&lt;br /&gt;
## this file was created July 6, 2018, 19:18:03 (UTC-05:00)&lt;br /&gt;
## I downloaded it; it has one line of text&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
some file destined for s3 this is&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
## I deleted the &#039;test.txt&#039; file object from the s3 bucket&lt;br /&gt;
# I then deleted the (now empty) &#039;oseserverbackups&#039; bucket&lt;br /&gt;
# unconvinced that that was the issue, I went back to the &amp;quot;glacier&amp;quot; app. This time I cycled through a few of the regions until I got to &amp;quot;us-west-2&amp;quot; -- this time I showed one vault named &amp;quot;deleteMeIn2020&amp;quot;&lt;br /&gt;
# I clicked on it, and it said&lt;br /&gt;
## this vault was created March 29, 2018, 16:36:06 (UTC-05:00)&lt;br /&gt;
## this vault was last inventoried August 1, 2018, 02:41:31 (UTC-05:00)&lt;br /&gt;
## this vault is 285.3 GB  (as of last inventory)&lt;br /&gt;
# well, it&#039;s after 2020. So I think we should delete it.&lt;br /&gt;
# I sent an email to Marcin asking for a confirmation before I delete it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Hey Marcin,&lt;br /&gt;
&lt;br /&gt;
You have a 285.3 GB vault in Amazon Glacier&#039;s us-west-2 region.&lt;br /&gt;
&lt;br /&gt;
I logged-into your AWS account today and did some digging. I found this vault 285.3 GB vault named &#039;deleteMeIn2020&#039;. I created this vault in 2018 Q1. It contains a final backup of files from hetzner1. I created it as part of the hetzner2 migration project, thinking that we should delete it in 2020 if we never needed to restore anything from it for 2 years.&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/CHG-2018-07-06_hetzner1_deprecation&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Maltfield_Log/2018_Q1#Sat_Mar_31.2C_2018&lt;br /&gt;
&lt;br /&gt;
Well, 2020 came and past. Four more years passed. I think you can safely delete the &#039;deleteMeIn2020&#039; vault.&lt;br /&gt;
&lt;br /&gt;
By the way, I also deleted a 53-byte test file from an S3 bucket named &#039;test.txt&#039; in a bucket in s3 called &#039;oseserverbackups&#039; in us-west-2. It was the only file in the bucket. I deleted the file and the empty bucket.&lt;br /&gt;
&lt;br /&gt;
Would you like me to proceed with deleting the 285.3 GB &#039;deleteMeIn2020&#039; glacier bucket from your AWS account?&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Thank you,&lt;br /&gt;
&lt;br /&gt;
Michael Altfield&lt;br /&gt;
Senior Technology Advisor&lt;br /&gt;
PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7  70D2 AA3E DF71 60E2 D97B&lt;br /&gt;
&lt;br /&gt;
Open Source Ecology&lt;br /&gt;
www.opensourceecology.org&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# meanwhile, I tried to figure out why I couldn&#039;t login as &#039;maltfield&#039;, and I realized that, ffs, we don&#039;t have IAM setup for our account?? Maybe Marcin deleted it when trying to elimiate costs? IAM is free, though..&lt;br /&gt;
# ok, I found my &#039;maltfield&#039; user under &amp;quot;Security Credentials&amp;quot; -&amp;gt; &amp;quot;Access Management&amp;quot; -&amp;gt; &amp;quot;Users&amp;quot;&lt;br /&gt;
# it says my last console sign-in was 424 days ago&lt;br /&gt;
# I went to my user&#039;s settings, selected the MFA token, and selected &amp;quot;Resync&amp;quot; -- then entered two consecutive OTPs&lt;br /&gt;
# I tried to login, and this time it let me in. Well that was annoying.&lt;br /&gt;
# I opened cloudtrail and reviewed the latest account events https://us-east-1.console.aws.amazon.com/cloudtrailv2/home?region=us-east-1#/events?ReadOnly=false&lt;br /&gt;
## the most recent event was the &#039;root&#039; user resyncing the MFA token of the &#039;matlfield&#039; token&lt;br /&gt;
## before that we have two ConsoleLogin for today&lt;br /&gt;
## before that &#039;mjakubowski&#039; user has a MakePayment event (and some other payment related events) on Sep 19&lt;br /&gt;
## before that we have a bunch of login &amp;amp; mfa-related entries for Marcin&#039;s user on Sep 06, 14, 17, and 19.&lt;br /&gt;
## and that&#039;s where the log ends; looks like we just get 90 days of logs for free.&lt;br /&gt;
...&lt;br /&gt;
# hetzner responded to my support inquery about how they handle failed disks&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Dear Mr Altfield&lt;br /&gt;
&lt;br /&gt;
Unfortunately it&#039;s an unmanaged root server monitoring is your responsibility I&#039;m afraid. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
If you have a problem please open a ticket in your robot account. &lt;br /&gt;
&lt;br /&gt;
Please click on &amp;quot;Servers&amp;quot; from the menu on the left and then select the corresponding server. Under the &amp;quot;Support&amp;quot; tab, you can choose &amp;quot;Hard drive is broken&amp;quot;. Please follow the instructions.&lt;br /&gt;
&lt;br /&gt;
https://docs.hetzner.com/robot/dedicated-server/troubleshooting/serial-numbers-and-information-on-defective-hard-drives/&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Our DC is 24/7 available and we exchange broken hardware as soon as possible for free.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hetzner clients can use the Server Monitoring System to monitor their servers and have an email sent to them when the status of one of the monitored services changes: &lt;br /&gt;
&lt;br /&gt;
https://docs.hetzner.com/robot/dedicated-server/security/system-monitor/&lt;br /&gt;
&lt;br /&gt;
https://docs.hetzner.com/robot/dedicated-server/raid/software-raid/#email-notification-when-a-drive-in-a-software-raid-fails&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Please use hetzner-status:&lt;br /&gt;
&lt;br /&gt;
https://www.hetzner-status.de/en.html&lt;br /&gt;
&lt;br /&gt;
This web page publishes announcements and current fault reports from our datacenters. Would you like to receive email notification of fault reports? Log on as exclusive Hetzner client in your administrations interface. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
If you have any questions please do not hesitate to contact us. &lt;br /&gt;
&lt;br /&gt;
Kind regards&lt;br /&gt;
&lt;br /&gt;
Jan Kolb&lt;br /&gt;
&lt;br /&gt;
Sales&lt;br /&gt;
&lt;br /&gt;
Hetzner Online GmbH&lt;br /&gt;
Sigmundstrasse 135&lt;br /&gt;
90431 Nürnberg&lt;br /&gt;
Tel: +49 911 234 226-927&lt;br /&gt;
Fax: +49 9831 505-3&lt;br /&gt;
sales@hetzner.com&lt;br /&gt;
www.hetzner.com&lt;br /&gt;
&lt;br /&gt;
Register Court: Registergericht Ansbach, HRB 6089&lt;br /&gt;
CEO: Martin Hetzner, Stephan Konvickova, Günther Müller&lt;br /&gt;
&lt;br /&gt;
For the purposes of this communication, we may save some &lt;br /&gt;
of your personal data. For information on our data privacy &lt;br /&gt;
policy, please see: www.hetzner.com/datenschutzhinweis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
09/29/2024 21:23 - REDACTED@opensourceecology.org REDACTED@opensourceecology.org wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Hi Hetzner,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Can you please tell us more about the process of disk failure on our new dedicated&lt;br /&gt;
&amp;gt; server plan (Server Auction #2443019)?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Specifically, if a disk fails, does Hetzner cover the cost of replacing the disk?&lt;br /&gt;
&amp;gt; Or do we have to pay a fee? If so, how much?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; And does Hetzner have some system in-place that monitors the hardware for disk&lt;br /&gt;
&amp;gt; failure? Or do we have to monitor this in software and alert Hetnzer that a disk&lt;br /&gt;
&amp;gt; is failing? If Hetzner does monitor for disk failure, how does it do it?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; Senior Technology Advisor&lt;br /&gt;
&amp;gt; PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7  70D2 AA3E DF71 60E2 D97B&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Open Source Ecology&lt;br /&gt;
&amp;gt; www.opensourceecology.org&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# the docs linked-to actually don&#039;t mention mdadm, which I setup earlier to monitor and send us email alerts on our disks&lt;br /&gt;
# instead, hetzner mentions `smartctl`, which is included in the debian package `smartmontools` -- which wasn&#039;t even installed!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/mdadm # sudo apt-get install smartmontools&lt;br /&gt;
...&lt;br /&gt;
root@hetzner3 /etc/mdadm # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/mdadm # smartctl -H /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/mdadm # smartctl -H /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/mdadm # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# we can get more information with the `-A` argument&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/mdadm # smartctl -A /dev/nvme0n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        36 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    3%&lt;br /&gt;
Data Units Read:                    142.729.615 [73,0 TB]&lt;br /&gt;
Data Units Written:                 20.452.874 [10,4 TB]&lt;br /&gt;
Host Read Commands:                 6.862.184.005&lt;br /&gt;
Host Write Commands:                876.931.661&lt;br /&gt;
Controller Busy Time:               15.948&lt;br /&gt;
Power Cycles:                       28&lt;br /&gt;
Power On Hours:                     16.350&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      159&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               36 Celsius&lt;br /&gt;
Temperature Sensor 2:               45 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/mdadm # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/mdadm # smartctl -A /dev/nvme1n1&lt;br /&gt;
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-21-amd64] (local build)&lt;br /&gt;
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF SMART DATA SECTION ===&lt;br /&gt;
SMART/Health Information (NVMe Log 0x02)&lt;br /&gt;
Critical Warning:                   0x00&lt;br /&gt;
Temperature:                        34 Celsius&lt;br /&gt;
Available Spare:                    100%&lt;br /&gt;
Available Spare Threshold:          10%&lt;br /&gt;
Percentage Used:                    3%&lt;br /&gt;
Data Units Read:                    130.064.348 [66,5 TB]&lt;br /&gt;
Data Units Written:                 24.932.683 [12,7 TB]&lt;br /&gt;
Host Read Commands:                 1.276.781.490&lt;br /&gt;
Host Write Commands:                879.017.438&lt;br /&gt;
Controller Busy Time:               14.879&lt;br /&gt;
Power Cycles:                       23&lt;br /&gt;
Power On Hours:                     14.678&lt;br /&gt;
Unsafe Shutdowns:                   5&lt;br /&gt;
Media and Data Integrity Errors:    0&lt;br /&gt;
Error Information Log Entries:      149&lt;br /&gt;
Warning  Comp. Temperature Time:    0&lt;br /&gt;
Critical Comp. Temperature Time:    0&lt;br /&gt;
Temperature Sensor 1:               34 Celsius&lt;br /&gt;
Temperature Sensor 2:               37 Celsius&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/mdadm # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh nvm, their third link describes mdadm alerts for monitoring our software raid&lt;br /&gt;
# they also said to check /etc/default/mdadm, which I didn&#039;t do before&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /etc/mdadm # cat /etc/default/mdadm &lt;br /&gt;
# mdadm Debian configuration&lt;br /&gt;
#&lt;br /&gt;
# You can run &#039;dpkg-reconfigure mdadm&#039; to modify the values in this file, if&lt;br /&gt;
# you want. You can also change the values here and changes will be preserved.&lt;br /&gt;
# Do note that only the values are preserved; the rest of the file is&lt;br /&gt;
# rewritten.&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
# AUTOCHECK:&lt;br /&gt;
#   should mdadm run periodic redundancy checks over your arrays? See&lt;br /&gt;
#   /etc/cron.d/mdadm.&lt;br /&gt;
AUTOCHECK=true&lt;br /&gt;
&lt;br /&gt;
# AUTOSCAN:&lt;br /&gt;
#   should mdadm check once a day for degraded arrays? See&lt;br /&gt;
#   /etc/cron.daily/mdadm.&lt;br /&gt;
AUTOSCAN=true&lt;br /&gt;
&lt;br /&gt;
# START_DAEMON:&lt;br /&gt;
#   should mdadm start the MD monitoring daemon during boot?&lt;br /&gt;
START_DAEMON=true&lt;br /&gt;
&lt;br /&gt;
# DAEMON_OPTIONS:&lt;br /&gt;
#   additional options to pass to the daemon.&lt;br /&gt;
DAEMON_OPTIONS=&amp;quot;--syslog&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# VERBOSE:&lt;br /&gt;
#   if this variable is set to true, mdadm will be a little more verbose e.g.&lt;br /&gt;
#   when creating the initramfs.&lt;br /&gt;
VERBOSE=false&lt;br /&gt;
root@hetzner3 /etc/mdadm # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# note that &amp;quot;AUTOCHECK&amp;quot; is enabled -- so we&#039;re all good here.&lt;br /&gt;
...&lt;br /&gt;
# ok, back to updating wordpress.&lt;br /&gt;
# first, I&#039;m just going to unzip all these (now TOFU-verified) .zip files and make sure there&#039;s no zipbombs&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # cd /var/tmp/wordpress/themes/&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # ls&lt;br /&gt;
bouquet.1.2.5.zip          sketch.1.2.4.zip      twentyfifteen.3.8.zip   twentyseventeen.3.7.zip  twentythirteen.4.2.zip&lt;br /&gt;
gk-portfolio.1.5.3.zip     storefront.4.6.0.zip  twentyfourteen.4.0.zip  twentysixteen.3.3.zip    twentytwelve.4.3.zip&lt;br /&gt;
portfolio-press.2.8.0.zip  twentyeleven.4.7.zip  twentynineteen.2.9.zip  twentyten.4.2.zip&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # for file in $(ls *.zip); do unzip $file; done&lt;br /&gt;
...&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # ls&lt;br /&gt;
bouquet                 portfolio-press.2.8.0.zip  twentyeleven           twentyfourteen.4.0.zip   twentysixteen          twentythirteen.4.2.zip&lt;br /&gt;
bouquet.1.2.5.zip       sketch                     twentyeleven.4.7.zip   twentynineteen           twentysixteen.3.3.zip  twentytwelve&lt;br /&gt;
gk-portfolio            sketch.1.2.4.zip           twentyfifteen          twentynineteen.2.9.zip   twentyten              twentytwelve.4.3.zip&lt;br /&gt;
gk-portfolio.1.5.3.zip  storefront                 twentyfifteen.3.8.zip  twentyseventeen          twentyten.4.2.zip&lt;br /&gt;
portfolio-press         storefront.4.6.0.zip       twentyfourteen         twentyseventeen.3.7.zip  twentythirteen&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/themes # cd ../plugins/&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # for file in $(ls *.zip); do unzip $file; done&lt;br /&gt;
...&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # ls&lt;br /&gt;
akismet                                                 jetpack                               vcaching&lt;br /&gt;
akismet.5.3.3.zip                                       jetpack.13.8.1.zip                    vcaching.1.8.3.zip&lt;br /&gt;
black-studio-tinymce-widget                             meta-box                              w3-total-cache&lt;br /&gt;
black-studio-tinymce-widget.2.7.3.zip                   meta-box.5.10.2.zip                   w3-total-cache.2.7.6.zip&lt;br /&gt;
chartbeat                                               ml-slider                             wonderm00ns-simple-facebook-open-graph-tags&lt;br /&gt;
chartbeat.2.0.7.zip                                     ml-slider.3.91.0.zip                  wonderm00ns-simple-facebook-open-graph-tags.3.3.3.zip&lt;br /&gt;
classic-editor                                          open-in-new-window-plugin             woocommerce&lt;br /&gt;
classic-editor.1.6.5.zip                                open-in-new-window-plugin.3.0.zip     woocommerce.9.3.3.zip&lt;br /&gt;
coingate-for-woocommerce                                post-types-order                      wordpress-importer&lt;br /&gt;
coingate-for-woocommerce.2.1.1.zip                      post-types-order.2.2.6.zip            wordpress-importer.0.8.2.zip&lt;br /&gt;
contact-form-7                                          revision-control                      wordpress-seo&lt;br /&gt;
contact-form-7.5.9.8.zip                                revision-control.2.3.2.zip            wordpress-seo.23.5.zip&lt;br /&gt;
duplicate-page                                          shareaholic                           wpautop-control&lt;br /&gt;
duplicate-page.4.5.zip                                  shareaholic.9.7.12.zip                wpautop-control.1.6.zip&lt;br /&gt;
duplicate-post                                          share-on-diaspora                     wp-memory-usage&lt;br /&gt;
duplicate-post.4.5.zip                                  share-on-diaspora.0.7.9.zip           wp-memory-usage.1.2.10.zip&lt;br /&gt;
google-authenticator                                    shariff                               wp-optimize&lt;br /&gt;
google-authenticator.0.54.zip                           shariff.4.6.14.zip                    wp-optimize.3.6.0.zip&lt;br /&gt;
google-authenticator-encourage-user-activation          ssl-insecure-content-fixer            wp-smushit&lt;br /&gt;
google-authenticator-encourage-user-activation.0.2.zip  ssl-insecure-content-fixer.2.7.2.zip  wp-smushit.3.16.6.zip&lt;br /&gt;
insert-headers-and-footers                              varnish-http-purge                    wp-super-cache&lt;br /&gt;
insert-headers-and-footers.2.2.2.zip                    varnish-http-purge.5.2.2.zip          wp-super-cache.1.12.4.zip&lt;br /&gt;
root@hetzner3 /var/tmp/wordpress/plugins # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, that looks good. now let&#039;s see if we can script copying-over these themes as-needed&lt;br /&gt;
## and, to err on the side of caution, I&#039;m going to intentionally delete any theme or plugin dir, even if we don&#039;t have one to replace it.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wp_docroot=&amp;quot;/var/www/html/store.opensourceecology.org/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for theme_path in $(find &amp;quot;${wp_docroot}/wp-content/themes&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
	theme=$(basename &amp;quot;${theme_path}&amp;quot;)&lt;br /&gt;
	&lt;br /&gt;
	echo &amp;quot;${theme}&amp;quot;&lt;br /&gt;
	rm -rf ${theme_path};&lt;br /&gt;
	rsync -av --progress &amp;quot;/var/tmp/wordpress/themes/${theme}/&amp;quot; &amp;quot;${theme_path}/&amp;quot;&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# after execution, looks like it worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # ls -lah themes/&lt;br /&gt;
total 68K&lt;br /&gt;
d---r-x--- 16 not-apache www-data 4,0K Oct  3 04:02 .&lt;br /&gt;
d---r-x---  7 not-apache www-data 4,0K Jul 23 15:15 ..&lt;br /&gt;
----r-----  1 not-apache www-data   28 Jun  5  2014 index.php&lt;br /&gt;
drwxr-xr-x  2 root       root     4,0K Oct  3 04:02 oshin&lt;br /&gt;
drwxr-xr-x  5 root       root     4,0K May 16 08:29 storefront&lt;br /&gt;
drwxr-xr-x  7 root       root     4,0K Jul 16 13:09 twentyeleven&lt;br /&gt;
drwxr-xr-x  7 root       root     4,0K Jul 16 13:28 twentyfifteen&lt;br /&gt;
drwxr-xr-x  9 root       root     4,0K Jul 16 13:23 twentyfourteen&lt;br /&gt;
drwxr-xr-x  9 root       root     4,0K Jul 16 13:30 twentynineteen&lt;br /&gt;
drwxr-xr-x  5 root       root     4,0K Jul 16 13:29 twentyseventeen&lt;br /&gt;
drwxr-xr-x  8 root       root     4,0K Jul 16 13:29 twentysixteen&lt;br /&gt;
drwxr-xr-x  4 root       root     4,0K Jul 15 17:17 twentyten&lt;br /&gt;
drwxr-xr-x  8 root       root     4,0K Jul 16 13:20 twentythirteen&lt;br /&gt;
drwxr-xr-x  8 root       root     4,0K Jul 16 13:17 twentytwelve&lt;br /&gt;
drwxr-xr-x  2 root       root     4,0K Oct  3 04:02 twentytwentyfour&lt;br /&gt;
drwxr-xr-x  2 root       root     4,0K Oct  3 04:02 twentytwentythree&lt;br /&gt;
drwxr-xr-x  2 root       root     4,0K Oct  3 04:02 twentytwentytwo&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh, wait, no. it created some silly empty dirs when it didn&#039;t have a source to copy-from&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # ls -lah themes/oshin/&lt;br /&gt;
total 8,0K&lt;br /&gt;
drwxr-xr-x  2 root       root     4,0K Oct  3 04:02 .&lt;br /&gt;
d---r-x--- 16 not-apache www-data 4,0K Oct  3 04:02 ..&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# let&#039;s wrap that in a condition. and also disable verbose &amp;amp; progress on rsync, so we can see the whole output&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for theme_path in $(find &amp;quot;${wp_docroot}/wp-content/themes&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
	theme=$(basename &amp;quot;${theme_path}&amp;quot;)&lt;br /&gt;
	source_path=&amp;quot;/var/tmp/wordpress/themes/${theme}&amp;quot;&lt;br /&gt;
	&lt;br /&gt;
	echo &amp;quot;${theme}&amp;quot;&lt;br /&gt;
	rm -rf ${theme_path};&lt;br /&gt;
	if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
		rsync -a ${source_path}/ &amp;quot;${theme_path}/&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# here&#039;s the execution; that&#039;s better&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # for theme_path in $(find &amp;quot;${wp_docroot}/wp-content/themes&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
        theme=$(basename &amp;quot;${theme_path}&amp;quot;)&lt;br /&gt;
        source_path=&amp;quot;/var/tmp/wordpress/themes/${theme}&amp;quot;&lt;br /&gt;
        &lt;br /&gt;
        echo &amp;quot;${theme}&amp;quot;&lt;br /&gt;
        rm -rf ${theme_path};&lt;br /&gt;
        if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
                rsync -a ${source_path}/ &amp;quot;${theme_path}/&amp;quot;&lt;br /&gt;
        fi&lt;br /&gt;
done&lt;br /&gt;
twentytwelve&lt;br /&gt;
twentysixteen&lt;br /&gt;
storefront&lt;br /&gt;
twentyseventeen&lt;br /&gt;
twentyfourteen&lt;br /&gt;
twentyeleven&lt;br /&gt;
twentytwentythree&lt;br /&gt;
oshin&lt;br /&gt;
twentytwentyfour&lt;br /&gt;
twentythirteen&lt;br /&gt;
twentyten&lt;br /&gt;
twentyfifteen&lt;br /&gt;
twentynineteen&lt;br /&gt;
twentytwentytwo&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # ls -lah themes/&lt;br /&gt;
total 52K&lt;br /&gt;
d---r-x--- 12 not-apache www-data 4,0K Oct  3 04:04 .&lt;br /&gt;
d---r-x---  7 not-apache www-data 4,0K Jul 23 15:15 ..&lt;br /&gt;
----r-----  1 not-apache www-data   28 Jun  5  2014 index.php&lt;br /&gt;
drwxr-xr-x  5 root       root     4,0K May 16 08:29 storefront&lt;br /&gt;
drwxr-xr-x  7 root       root     4,0K Jul 16 13:09 twentyeleven&lt;br /&gt;
drwxr-xr-x  7 root       root     4,0K Jul 16 13:28 twentyfifteen&lt;br /&gt;
drwxr-xr-x  9 root       root     4,0K Jul 16 13:23 twentyfourteen&lt;br /&gt;
drwxr-xr-x  9 root       root     4,0K Jul 16 13:30 twentynineteen&lt;br /&gt;
drwxr-xr-x  5 root       root     4,0K Jul 16 13:29 twentyseventeen&lt;br /&gt;
drwxr-xr-x  8 root       root     4,0K Jul 16 13:29 twentysixteen&lt;br /&gt;
drwxr-xr-x  4 root       root     4,0K Jul 15 17:17 twentyten&lt;br /&gt;
drwxr-xr-x  8 root       root     4,0K Jul 16 13:20 twentythirteen&lt;br /&gt;
drwxr-xr-x  8 root       root     4,0K Jul 16 13:17 twentytwelve&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# now let&#039;s do the plugins with this&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wp_docroot=&amp;quot;/var/www/html/store.opensourceecology.org/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for plugin_path in $(find &amp;quot;${wp_docroot}/wp-content/plugins&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
	plugin=$(basename &amp;quot;${plugin_path}&amp;quot;)&lt;br /&gt;
	source_path=&amp;quot;/var/tmp/wordpress/plugins/${plugin}&amp;quot;&lt;br /&gt;
	&lt;br /&gt;
	echo &amp;quot;${plugin}&amp;quot;&lt;br /&gt;
	rm -rf ${plugin_path};&lt;br /&gt;
	if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
		rsync -a ${source_path}/ &amp;quot;${plugin_path}/&amp;quot;&lt;br /&gt;
	fi&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# I actually messed this up, and I had to restore the original plugins dir from the backup; easy enough&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
rsync -av --progress /var/tmp/hetzner2-www-20240926/root/backups/sync/daily_hetzner2_20240926_072001/www/var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/ /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# alright, here&#039;s the run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # wp_docroot=&amp;quot;/var/www/html/store.opensourceecology.org/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for plugin_path in $(find &amp;quot;${wp_docroot}/wp-content/plugins&amp;quot; -mindepth 1 -maxdepth 1 -type d); do&lt;br /&gt;
        plugin=$(basename &amp;quot;${plugin_path}&amp;quot;)&lt;br /&gt;
        source_path=&amp;quot;/var/tmp/wordpress/plugins/${plugin}&amp;quot;&lt;br /&gt;
        &lt;br /&gt;
        echo &amp;quot;${plugin}&amp;quot;&lt;br /&gt;
        rm -rf ${plugin_path};&lt;br /&gt;
        if [ -d &amp;quot;${source_path}&amp;quot; ]; then&lt;br /&gt;
                rsync -a ${source_path}/ &amp;quot;${plugin_path}/&amp;quot;&lt;br /&gt;
        fi&lt;br /&gt;
done&lt;br /&gt;
meta-box-show-hide&lt;br /&gt;
classic-editor&lt;br /&gt;
be-portfolio-post&lt;br /&gt;
colorhub&lt;br /&gt;
ssl-insecure-content-fixer&lt;br /&gt;
oshine-core&lt;br /&gt;
tatsu&lt;br /&gt;
revslider&lt;br /&gt;
redux-vendor-support&lt;br /&gt;
akismet&lt;br /&gt;
rename-wp-login&lt;br /&gt;
meta-box-tabs&lt;br /&gt;
google-authenticator&lt;br /&gt;
coingate-for-woocommerce&lt;br /&gt;
be-gdpr&lt;br /&gt;
google-authenticator-encourage-user-activation&lt;br /&gt;
typehub&lt;br /&gt;
meta-box&lt;br /&gt;
woocommerce&lt;br /&gt;
meta-box-conditional-logic&lt;br /&gt;
contact-form-7&lt;br /&gt;
vcaching&lt;br /&gt;
force-strong-passwords&lt;br /&gt;
masterslider&lt;br /&gt;
oshine-modules&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content #&lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # ls -lah plugins/&lt;br /&gt;
total 56K&lt;br /&gt;
d---r-x--- 12       1012       48 4,0K Oct  3 04:09 .&lt;br /&gt;
d---r-x---  7 not-apache www-data 4,0K Jul 23 15:15 ..&lt;br /&gt;
drwxr-xr-x  4 root       root     4,0K Jul 10 22:16 akismet&lt;br /&gt;
drwxr-xr-x  3 root       root     4,0K Sep 27 21:51 classic-editor&lt;br /&gt;
drwxr-xr-x  8 root       root     4,0K Nov 21  2022 coingate-for-woocommerce&lt;br /&gt;
drwxr-xr-x  7 root       root     4,0K Jul 25 08:28 contact-form-7&lt;br /&gt;
drwxr-xr-x  3 root       root     4,0K Jul  4  2022 google-authenticator&lt;br /&gt;
drwxr-xr-x  4 root       root     4,0K Apr 23  2021 google-authenticator-encourage-user-activation&lt;br /&gt;
----r-----  1       1012       48 2,3K Apr  9  2019 hello.php&lt;br /&gt;
----r-----  1       1012       48   28 Apr  9  2019 index.php&lt;br /&gt;
drwxr-xr-x  8 root       root     4,0K Sep 27 07:22 meta-box&lt;br /&gt;
drwxr-xr-x  8 root       root     4,0K Mar 17  2024 ssl-insecure-content-fixer&lt;br /&gt;
drwxr-xr-x  4 root       root     4,0K Oct 21  2019 vcaching&lt;br /&gt;
drwxr-xr-x 13 root       root     4,0K Sep 25 13:56 woocommerce&lt;br /&gt;
root@hetzner3 /var/www/html/store.opensourceecology.org/htdocs/wp-content # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# with that, I tried wp-cli again, but it gave us an empty plugin list?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wp@hetzner3:~$ wp --path=/var/www/html/store.opensourceecology.org/htdocs plugin list&lt;br /&gt;
+------+--------+--------+---------+----------------+-------------+&lt;br /&gt;
| name | status | update | version | update_version | auto_update |&lt;br /&gt;
+------+--------+--------+---------+----------------+-------------+&lt;br /&gt;
+------+--------+--------+---------+----------------+-------------+&lt;br /&gt;
wp@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# oh shoot, I forgot to update permissions. I&#039;ll do that now&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
	chown not-apache:apache-admins &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
	chmod 0040 &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# ok, then I retry wp-cli; it works!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wp@hetzner3:~$ wp --path=/var/www/html/store.opensourceecology.org/htdocs plugin list&lt;br /&gt;
PHP Warning:  Undefined array key &amp;quot;HTTP_HOST&amp;quot; in /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/vcaching/vcaching.php on line 196&lt;br /&gt;
Warning: Undefined array key &amp;quot;HTTP_HOST&amp;quot; in /var/www/html/store.opensourceecology.org/htdocs/wp-content/plugins/vcaching/vcaching.php on line 196&lt;br /&gt;
+------------------------------------------------+----------+--------+---------+----------------+-------------+&lt;br /&gt;
| name                                           | status   | update | version | update_version | auto_update |&lt;br /&gt;
+------------------------------------------------+----------+--------+---------+----------------+-------------+&lt;br /&gt;
| akismet                                        | inactive | none   | 5.3.3   |                | off         |&lt;br /&gt;
| classic-editor                                 | inactive | none   | 1.6.5   |                | off         |&lt;br /&gt;
| contact-form-7                                 | active   | none   | 5.9.8   |                | off         |&lt;br /&gt;
| google-authenticator-encourage-user-activation | active   | none   | 0.2     |                | off         |&lt;br /&gt;
| google-authenticator                           | active   | none   | 0.54    |                | off         |&lt;br /&gt;
| hello                                          | inactive | none   | 1.7.1   |                | off         |&lt;br /&gt;
| meta-box                                       | active   | none   | 5.10.2  |                | off         |&lt;br /&gt;
| ssl-insecure-content-fixer                     | active   | none   | 2.7.2   |                | off         |&lt;br /&gt;
| vcaching                                       | active   | none   | 1.8.3   |                | off         |&lt;br /&gt;
| woocommerce                                    | active   | none   | 9.3.3   |                | off         |&lt;br /&gt;
| coingate-for-woocommerce                       | inactive | none   | 2.1.1   |                | off         |&lt;br /&gt;
+------------------------------------------------+----------+--------+---------+----------------+-------------+&lt;br /&gt;
wp@hetzner3:~$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# unfortunately, I get a blank page when I try to load store.opensourceecology.org in my web browser&lt;br /&gt;
# nginx is fine, but the varnish logs show that apache is returning a 403&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Thu Oct 03 04:19:37.076411 2024] [authz_core:error] [pid 3116759:tid 3116768] [client 81.17.16.77:0] AH01630: client denied by server configuration: &lt;br /&gt;
/var/www/html/store.opensourceecology.org/htdocs/wp-includes/images/w-logo-blue-white-bg.png, referer: https://store.opensourceecology.org/&lt;br /&gt;
&lt;br /&gt;
==&amp;gt; modsec_audit.log &amp;lt;==&lt;br /&gt;
--fd8c6d25-A--&lt;br /&gt;
[03/Oct/2024:04:19:37.076625 +0000] Zv4bWZVyO5GHCka9cecUKwAAAEE 127.0.0.1 40720 127.0.0.1 8000&lt;br /&gt;
--fd8c6d25-B--&lt;br /&gt;
GET /wp-includes/images/w-logo-blue-white-bg.png HTTP/1.1&lt;br /&gt;
X-Real-IP: 81.17.16.77&lt;br /&gt;
X-Forwarded-Proto: https&lt;br /&gt;
X-Forwarded-Port: 443&lt;br /&gt;
Host: store.opensourceecology.org&lt;br /&gt;
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36&lt;br /&gt;
Accept: image/avif,image/webp,*/*&lt;br /&gt;
Accept-Language: en-US,en;q=0.5&lt;br /&gt;
Referer: https://store.opensourceecology.org/&lt;br /&gt;
Sec-Fetch-Dest: image&lt;br /&gt;
Sec-Fetch-Mode: no-cors&lt;br /&gt;
Sec-Fetch-Site: same-origin&lt;br /&gt;
Sec-GPC: 1&lt;br /&gt;
Pragma: no-cache&lt;br /&gt;
Accept-Encoding: gzip&lt;br /&gt;
hash: #store.opensourceecology.org&lt;br /&gt;
X-Varnish: 98343&lt;br /&gt;
&lt;br /&gt;
--fd8c6d25-F--&lt;br /&gt;
HTTP/1.1 403 Forbidden&lt;br /&gt;
X-Frame-Options: SAMEORIGIN&lt;br /&gt;
Content-Length: 199&lt;br /&gt;
Content-Type: text/html; charset=iso-8859-1&lt;br /&gt;
&lt;br /&gt;
--fd8c6d25-E--&lt;br /&gt;
&amp;lt;!DOCTYPE HTML PUBLIC &amp;quot;-//IETF//DTD HTML 2.0//EN&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;&lt;br /&gt;
&amp;lt;title&amp;gt;403 Forbidden&amp;lt;/title&amp;gt;&lt;br /&gt;
&amp;lt;/head&amp;gt;&amp;lt;body&amp;gt;&lt;br /&gt;
&amp;lt;h1&amp;gt;Forbidden&amp;lt;/h1&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;You don&#039;t have permission to access this resource.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;&lt;br /&gt;
&lt;br /&gt;
--fd8c6d25-H--&lt;br /&gt;
Apache-Error: [file &amp;quot;mod_authz_core.c&amp;quot;] [line 879] [level 3] AH01630: client denied by server configuration: /var/www/html/store.opensourceecology.org/htdocs/wp-includes/images/w-logo-blue-white-bg.png&lt;br /&gt;
Stopwatch: 1727929177076046 856 (- - -)&lt;br /&gt;
Stopwatch2: 1727929177076046 856; combined=26, p1=24, p2=0, p3=0, p4=0, p5=2, sr=0, sw=0, l=0, gc=0&lt;br /&gt;
Response-Body-Transformed: Dechunked&lt;br /&gt;
Producer: ModSecurity for Apache/2.9.7 (http://www.modsecurity.org/).&lt;br /&gt;
Server: Apache&lt;br /&gt;
Engine-Mode: &amp;quot;ENABLED&amp;quot;&lt;br /&gt;
&lt;br /&gt;
--fd8c6d25-Z--&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305938</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305938"/>
		<updated>2025-04-24T17:49:36Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: updated CHG start time to 2025-04-30 11:00 UTC&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:56 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved the start time of this CHG&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes, time is perfect at 6 am. Any day.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 12:38 PM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; When would be a good time to replace the second disk on hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; If we enable daily reboots on hetzner2 at 10:40 UTC, then I propose next&lt;br /&gt;
&amp;gt; week on Wednesday 2025-04-30 11:00 UTC, which is:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;     * 13:00 in Germany (where the server lives)&lt;br /&gt;
&amp;gt;     * 06:00 here in Ecuador, and&lt;br /&gt;
&amp;gt;     * 06:00 at FeF&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; For details about what this change entails, and expected downtime,&lt;br /&gt;
&amp;gt; please see the change ticket:&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;   *&lt;br /&gt;
&amp;gt; https://wiki.opensourceecology.org/wiki/CHG-2025-XX-XX_replace_hetzner2_sda&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Please let me know if you approve this change, if the suggested time is&lt;br /&gt;
&amp;gt; agreeable to you, and if you have any questions.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:37 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved purchasing a new disk for this replacement&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 9:37 AM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I was a bit disappointed to learn that hetzner replaced a disk with 0%&lt;br /&gt;
&amp;gt; &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for&lt;br /&gt;
&amp;gt; choosing the free disk replacement..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive&lt;br /&gt;
&amp;gt; nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously&lt;br /&gt;
&amp;gt; they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on&lt;br /&gt;
&amp;gt; replacing that one next week too, but I would recommend that you pay for&lt;br /&gt;
&amp;gt; a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do you authorize me selecting €41.18 for the replacement of /dev/sda on&lt;br /&gt;
&amp;gt; hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-30 11:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-30 06:00 Kansas City, US&lt;br /&gt;
* = 2025-04-30 06:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20250430T110000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;At cost&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305937</id>
		<title>CHG-2025-04-24 replace hetzner2 sdb</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305937"/>
		<updated>2025-04-24T17:47:28Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* See Also */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 16:18 UTC==&lt;br /&gt;
&lt;br /&gt;
The new disk is now fully in-sync with the old (failing) disk, since sometime between 15:15 and 15:20 UTC&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 15:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      [===================&amp;gt;.]  recovery = 96.5% (202794752/209984640) finish=2.5min speed=46324K/sec&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
      523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:20:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
     	 523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now SMART says /dev/sdb is PASSED and /dev/sda is still FAILED&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Full info&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78516&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       50&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3445&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       47&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   060   046   000    Old_age   Always       -       40 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       407132499909&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12839097351&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26313144762&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52083&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       33&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1449&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       20&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   061   049   000    Old_age   Always       -       39 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       600236629947&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18860233219&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11828985935&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I&#039;m marking this change as completed successfully. Next-up we replace the other failing disk. See [[CHG-2025-XX-XX_replace_hetzner2_sda]]&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 14:23 UTC==&lt;br /&gt;
&lt;br /&gt;
The wiki is back!&lt;br /&gt;
&lt;br /&gt;
Unfortunately, hetzner fucked-up and removed *both* disks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kind regards&lt;br /&gt;
&lt;br /&gt;
 Nils Wei=C3=9Fer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The result was that /dev/sda was listed as /dev/sdc, the new drive was /dev/sdb, and dmesg was being spammed with I/O and RAID errors. The wiki was down. Disks were read-only, so I couldn&#039;t even take backups. I tried to reboot, but even &amp;lt;code&amp;gt;reboot&amp;lt;/code&amp;gt; failed due to i/o errors.&lt;br /&gt;
&lt;br /&gt;
I used the WUI to trigger a reboot, and--thank god--the server came-up again. I immediately took down all the web services as I investigated the damage and triggered a new backup.&lt;br /&gt;
&lt;br /&gt;
I was able to partition the new disk and add it to a RAID. At the time of writing, both swap and boot are synced (and grub installed on the new disk), and it&#039;s still syncing the root partition on the new disk in the RAID (currently at 35% and writing at 58 MB/s&lt;br /&gt;
&lt;br /&gt;
When the backup finished uploading, I put the web services back online and typed this status message.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:32 UTC==&lt;br /&gt;
&lt;br /&gt;
I finished submitting the request to hetnzer to replace the disk for free.&lt;br /&gt;
&lt;br /&gt;
It says we should expect the new disk to be inserted in 2-4 hours. One part of the form said this would happen without downtime. But the (required) checkbox at the bottom said that I understand that downtime is required. So that&#039;s ambiguous.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:22 UTC==&lt;br /&gt;
&lt;br /&gt;
Because the RAID wasn&#039;t defective, I first had to force it to break&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:07 UTC==&lt;br /&gt;
&lt;br /&gt;
I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
      523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
      209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:04 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting CHG&lt;br /&gt;
&lt;br /&gt;
==2025-04-19 11:49 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved this CHG for 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-24 05:00 Kansas City, US&lt;br /&gt;
* = 2025-04-24 05:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sdb = Crucial_CT250MX200SSD1_154410FA4520) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sdb partitions from our software RAID&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sda is the same and sdb has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]] Investigation into failed disks (after db corruption event in April)&lt;br /&gt;
# [[CHG-2025-04-30_replace_hetzner2_sda]] replacement of the second disk&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-XX-XX_replace_hetzner2_sda&amp;diff=305936</id>
		<title>CHG-2025-XX-XX replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-XX-XX_replace_hetzner2_sda&amp;diff=305936"/>
		<updated>2025-04-24T17:46:58Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: Maltfield moved page CHG-2025-XX-XX replace hetzner2 sda to CHG-2025-04-30 replace hetzner2 sda: marcin approved the date &amp;amp; time of this CHG&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[CHG-2025-04-30 replace hetzner2 sda]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305935</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305935"/>
		<updated>2025-04-24T17:46:58Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: Maltfield moved page CHG-2025-XX-XX replace hetzner2 sda to CHG-2025-04-30 replace hetzner2 sda: marcin approved the date &amp;amp; time of this CHG&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:37 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved purchasing a new disk for this replacement&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 9:37 AM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I was a bit disappointed to learn that hetzner replaced a disk with 0%&lt;br /&gt;
&amp;gt; &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for&lt;br /&gt;
&amp;gt; choosing the free disk replacement..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive&lt;br /&gt;
&amp;gt; nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously&lt;br /&gt;
&amp;gt; they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on&lt;br /&gt;
&amp;gt; replacing that one next week too, but I would recommend that you pay for&lt;br /&gt;
&amp;gt; a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do you authorize me selecting €41.18 for the replacement of /dev/sda on&lt;br /&gt;
&amp;gt; hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-??-?? ??:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-??-?? ??:00 Kansas City, US&lt;br /&gt;
* = 2025-??-?? ??:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;At cost&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305934</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305934"/>
		<updated>2025-04-24T17:32:15Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: marcin approved buying a new disk for hetzner2&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 17:37 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved purchasing a new disk for this replacement&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
On Thu, Apr 24, 2025, 9:37 AM Michael Altfield &amp;lt;me4eapr@disroot.org&amp;gt; wrote:&lt;br /&gt;
&lt;br /&gt;
&amp;gt; Hey Marcin,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Would you authorize spending €41.18 on a new disk for your server?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Update: Your websites are back online. The RAID is still syncing.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; I was a bit disappointed to learn that hetzner replaced a disk with 0%&lt;br /&gt;
&amp;gt; &amp;quot;life left&amp;quot; with a disk with 4% &amp;quot;life left&amp;quot;. That&#039;s what we get for&lt;br /&gt;
&amp;gt; choosing the free disk replacement..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; The &amp;quot;free&amp;quot; option said it would replace it with a &amp;quot;Replacement drive&lt;br /&gt;
&amp;gt; nearly new or used and tested; depends on what is in stock.&amp;quot; Obviously&lt;br /&gt;
&amp;gt; they didn&#039;t give us a &amp;quot;nearly new&amp;quot; drive..&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Your other disk is also at 0% &amp;quot;life left&amp;quot;. I was already planning on&lt;br /&gt;
&amp;gt; replacing that one next week too, but I would recommend that you pay for&lt;br /&gt;
&amp;gt; a new drive for this one. The cost listed on the website is €41.18.&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Do you authorize me selecting €41.18 for the replacement of /dev/sda on&lt;br /&gt;
&amp;gt; hetzner2?&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Thank you,&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Michael Altfield&lt;br /&gt;
&amp;gt; https://www.michaelaltfield.net&lt;br /&gt;
&amp;gt; PGP Fingerprint: 0465 E42F 7120 6785 E972  644C FE1B 8449 4E64 0D41&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; Note: If you cannot reach me via email, please check to see if I have&lt;br /&gt;
&amp;gt; changed my email address by visiting my website at&lt;br /&gt;
&amp;gt; https://email.michaelaltfield.net&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-??-?? ??:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-??-?? ??:00 Kansas City, US&lt;br /&gt;
* = 2025-??-?? ??:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;At cost&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=Hetzner3&amp;diff=305933</id>
		<title>Hetzner3</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=Hetzner3&amp;diff=305933"/>
		<updated>2025-04-24T16:57:55Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: add CHG for osemain migration&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This article is about the migration from &amp;quot;[[Hetzner2]]&amp;quot; to &amp;quot;Hetzner3&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
For more general information about the OSE Server, &#039;&#039;&#039;you probably want to see [[OSE Server]]&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
=Initial Provisioning=&lt;br /&gt;
&lt;br /&gt;
{{Hint|For a verbose log of the project to provision the Hetzner3 server, see [[Maltfield_Log/2024_Q3]], [[Maltfield_Log/2024_Q4]], and [[Maltfield_Log/2025_Q1]]}}&lt;br /&gt;
&lt;br /&gt;
==OS Install==&lt;br /&gt;
&lt;br /&gt;
We used hetzner&#039;s [https://docs.hetzner.com/robot/dedicated-server/operating-systems/installimage/ installimage] tool to install Debian 12 on hetzner3.&lt;br /&gt;
&lt;br /&gt;
We kept all the defaults, except the hostname.&lt;br /&gt;
&lt;br /&gt;
The two NVMe disks were setup in a software RAID1 with a 32G swap, 1G &#039;/boot&#039;, and the rest for &#039;/&#039;.&lt;br /&gt;
&lt;br /&gt;
==Initial Hardening==&lt;br /&gt;
&lt;br /&gt;
After the OS&#039;s first boot, I ([[User:Maltfield|Michael Altfield]]) ran a quick set of commands to create a user for me, do basic ssh hardening, and setup a basic firewall to block everything except ssh&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
adduser maltfield --disabled-password --gecos &#039;&#039;&lt;br /&gt;
groupadd sshaccess&lt;br /&gt;
gpasswd -a maltfield sshaccess&lt;br /&gt;
mkdir /home/maltfield/.ssh/&lt;br /&gt;
echo &amp;quot;ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGNYjR7UKiJSAG/AbP+vlCBqNfQZ2yuSXfsEDuM7cEU8PQNJyuJnS7m0VcA48JRnpUpPYYCCB0fqtIEhpP+szpMg2LByfTtbU0vDBjzQD9mEfwZ0mzJsfzh1Nxe86l/d6h6FhxAqK+eG7ljYBElDhF4l2lgcMAl9TiSba0pcqqYBRsvJgQoAjlZOIeVEvM1lyfWfrmDaFK37jdUCBWq8QeJ98qpNDX4A76f9T5Y3q5EuSFkY0fcU+zwFxM71bGGlgmo5YsMMdSsW+89fSG0652/U4sjf4NTHCpuD0UaSPB876NJ7QzeDWtOgyBC4nhPpS8pgjsnl48QZuVm6FNDqbXr9bVk5BdntpBgps+gXdSL2j0/yRRayLXzps1LCdasMCBxCzK+lJYWGalw5dNaIDHBsEZiK55iwPp0W3lU9vXFO4oKNJGFgbhNmn+KAaW82NBwlTHo/tOlj2/VQD9uaK5YLhQqAJzIq0JuWZWFLUC2FJIIG0pJBIonNabANcN+vq+YJqjd+JXNZyTZ0mzuj3OAB/Z5zS6lT9azPfnEjpcOngFs46P7S/1hRIrSWCvZ8kfECpa8W+cTMus4rpCd40d1tVKzJA/n0MGJjEs2q4cK6lC08pXxq9zAyt7PMl94PHse2uzDFhrhh7d0ManxNZE+I5/IPWOnG1PJsDlOe4Yqw== maltfield@ose&amp;quot; &amp;gt; /home/maltfield/.ssh/authorized_keys&lt;br /&gt;
chown -R maltfield:maltfield /home/maltfield/.ssh&lt;br /&gt;
chmod -R 0600 /home/maltfield/.ssh&lt;br /&gt;
chmod 0700 /home/maltfield/.ssh&lt;br /&gt;
&lt;br /&gt;
# without this, apt-get may get stuck&lt;br /&gt;
export DEBIAN_FRONTEND=noninteractive&lt;br /&gt;
&lt;br /&gt;
apt-get update&lt;br /&gt;
apt-get -y install iptables iptables-persistent&lt;br /&gt;
apt-get -y purge nftables&lt;br /&gt;
&lt;br /&gt;
update-alternatives --set iptables /usr/sbin/iptables-legacy&lt;br /&gt;
update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy&lt;br /&gt;
update-alternatives --set arptables /usr/sbin/arptables-legacy&lt;br /&gt;
update-alternatives --set ebtables /usr/sbin/ebtables-legacy&lt;br /&gt;
&lt;br /&gt;
iptables -A INPUT -i lo -j ACCEPT&lt;br /&gt;
iptables -A INPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j DROP&lt;br /&gt;
iptables -A INPUT -p icmp -j ACCEPT&lt;br /&gt;
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT&lt;br /&gt;
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT&lt;br /&gt;
iptables -A INPUT -j DROP&lt;br /&gt;
&lt;br /&gt;
iptables -A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT&lt;br /&gt;
iptables -A OUTPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j ACCEPT&lt;br /&gt;
iptables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT&lt;br /&gt;
iptables -A OUTPUT -m owner --uid-owner 42 -j ACCEPT&lt;br /&gt;
iptables -A OUTPUT -m owner --uid-owner 1000 -j ACCEPT&lt;br /&gt;
iptables -A OUTPUT -m limit --limit 5/min -j LOG --log-prefix &amp;quot;iptables denied: &amp;quot; --log-level 7&lt;br /&gt;
iptables -A OUTPUT -j DROP&lt;br /&gt;
&lt;br /&gt;
ip6tables -A INPUT -i lo -j ACCEPT&lt;br /&gt;
ip6tables -A INPUT -s ::1/128 -d ::1/128 -j DROP&lt;br /&gt;
ip6tables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT&lt;br /&gt;
ip6tables -A INPUT -j DROP&lt;br /&gt;
&lt;br /&gt;
ip6tables -A OUTPUT -s ::1/128 -d ::1/128 -j ACCEPT&lt;br /&gt;
ip6tables -A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT&lt;br /&gt;
ip6tables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT&lt;br /&gt;
ip6tables -A OUTPUT -m owner --uid-owner 42 -j ACCEPT&lt;br /&gt;
ip6tables -A OUTPUT -m owner --uid-owner 1000 -j ACCEPT&lt;br /&gt;
ip6tables -A OUTPUT -j DROP&lt;br /&gt;
&lt;br /&gt;
iptables-save &amp;gt; /etc/iptables/rules.v4&lt;br /&gt;
ip6tables-save &amp;gt; /etc/iptables/rules.v6&lt;br /&gt;
&lt;br /&gt;
cp /etc/ssh/sshd_config /etc/ssh/sshd_config.orig.`date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;`&lt;br /&gt;
grep &#039;Port 32415&#039; /etc/ssh/sshd_config || echo &#039;Port 32415&#039; &amp;gt;&amp;gt; /etc/ssh/sshd_config&lt;br /&gt;
grep &#039;AllowGroups sshaccess&#039; /etc/ssh/sshd_config || echo &#039;AllowGroups sshaccess&#039; &amp;gt;&amp;gt; /etc/ssh/sshd_config&lt;br /&gt;
grep &#039;PermitRootLogin no&#039; /etc/ssh/sshd_config || echo &#039;PermitRootLogin no&#039; &amp;gt;&amp;gt; /etc/ssh/sshd_config&lt;br /&gt;
grep &#039;PasswordAuthentication no&#039; /etc/ssh/sshd_config || echo &#039;PasswordAuthentication no&#039; &amp;gt;&amp;gt; /etc/ssh/sshd_config&lt;br /&gt;
systemctl restart sshd.service&lt;br /&gt;
&lt;br /&gt;
apt-get -y upgrade&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After all the packages updated, I gave my new user sudo permission&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@mail ~ # cp /etc/sudoers /etc/sudoers.20240731.orig&lt;br /&gt;
root@mail ~ # &lt;br /&gt;
&lt;br /&gt;
root@mail ~ # visudo&lt;br /&gt;
root@mail ~ # &lt;br /&gt;
&lt;br /&gt;
root@mail ~ # diff /etc/sudoers.20240731.orig /etc/sudoers&lt;br /&gt;
47a48&lt;br /&gt;
&amp;gt; maltfield ALL=(ALL:ALL) NOPASSWD:ALL&lt;br /&gt;
root@mail ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Ansible==&lt;br /&gt;
&lt;br /&gt;
After basic, manual hardening was done, we used Ansible to further provision and configure Hetzner3.&lt;br /&gt;
&lt;br /&gt;
The Ansible playbook that we use is called &amp;lt;code&amp;gt;provision.yml&amp;lt;/code&amp;gt;. It contains some public and many custom ansible roles. All of this is available on our GitHub:&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/OpenSourceEcology/ansible&lt;br /&gt;
&lt;br /&gt;
First, we used ansible to push-out only the highest-priority roles for hardening the server: dev-sec.ssh-hardening, mikegleasonjr.firewall, maltfield.wazuh, maltfield.unattended-upgrades&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024_Q3#Sat_Sep_14.2C_2024&lt;br /&gt;
&lt;br /&gt;
The ssh role didn&#039;t create new sshd keys with our hardened specifications, so I did this manually&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -czvf /etc/ssh.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).tar.gz /etc/ssh/*&lt;br /&gt;
&lt;br /&gt;
cd /etc/ssh/&lt;br /&gt;
# enter no passphrase for each command indivdually (-N can automate this, but only on some distros [centos but not debian])&lt;br /&gt;
ssh-keygen -f /etc/ssh/ssh_host_rsa_key -t rsa -b 4096 -o -a 100&lt;br /&gt;
ssh-keygen -f /etc/ssh/ssh_host_ecdsa_key -t ecdsa -b 521 -o -a 100&lt;br /&gt;
ssh-keygen -f /etc/ssh/ssh_host_ed25519_key -t ed25519 -a 100&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Unfortunately, wazuh couldn&#039;t be fully setup because email wasn&#039;t setup. So the next step was to use ansible to install postfix, stubby, unbound, and update the firewall with roles: mikegleasonjr.firewall, maltfield.dns, maltfield.postfix, maltfield.wazuh&lt;br /&gt;
&lt;br /&gt;
At this point, I also updated the hostname, updated the DNS SPF records in cloudflare, and set the RDNS in hetzner.&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024_Q3#Mon_Sep_16.2C_2024&lt;br /&gt;
&lt;br /&gt;
To finish setting-up wazuh, I manually created &amp;lt;code&amp;gt;/var/sent_encrypted_alarm.settings&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;/var/ossec/.gnupg/&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After wazuh email alerts were working, I used ansible to setup backups on Hetnzer3 with the role: maltfield.backups.&lt;br /&gt;
&lt;br /&gt;
After ansible installed most of the files, I manually copied-over &amp;lt;code&amp;gt;/root/backups/backups.settings&amp;lt;/code&amp;gt; from the old server and added both the old and a new keyfile, which were pregenerated and stored in our shared ose keepass (I also made sure these keys were stored in Marcin&#039;s veracrypt USB drive when I visited FeF), which are located at &amp;lt;code&amp;gt;/root/backups/ose-backups-cron.key&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;/root/backups/ose-backups-cron.2.key&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I also [[Backblaze#Create_Key|created a new Backblace B2 set of API keys]] and [[Backblaze#Configure_rclone|configured rclone]] to use them.&lt;br /&gt;
&lt;br /&gt;
Before continuing, I made sure that the backup script was working, and I did a full restore test by downloading a backup file from the Backblaze B2 WUI, decrypting it, extracting it, and doing a spot-check to make sure I could actually read one file from every archive as-expected.&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024_Q3#Sun_Sep_22.2C_2024&lt;br /&gt;
&lt;br /&gt;
After I confirmed that backups were fully working, I moved-on to the web server stack.&lt;br /&gt;
&lt;br /&gt;
First I used ansible to push the &#039;maltfield.certbot&#039; role. And then I force-renewed the certs on hetzner2 and securely copied the entire contents of /etc/letsencrypt/ from hetzner2 to hetzner3.&lt;br /&gt;
&lt;br /&gt;
Then I used ansible to push the rest of the web stack roles: maltfield.nginx, maltfield.varnish, maltfield.php, maltfield.mariadb, maltfield.apache, maltfield.munin, maltfield.awstats, maltfield.cron, and maltfield.logrotate.&lt;br /&gt;
&lt;br /&gt;
One those roles were able to push without issue, I uncommented all the roles and made sure the ansible playbook could do a complete provisioning of all our roles without any errors.&lt;br /&gt;
&lt;br /&gt;
 * https://wiki.opensourceecology.org/wiki/Maltfield_Log/2024_Q3#Wed_Sep_25.2C_2024&lt;br /&gt;
&lt;br /&gt;
==Restore State (snapshot &amp;amp; test)==&lt;br /&gt;
&lt;br /&gt;
Next, I restored the server state with just a snapshot of the hetzner2 server&#039;s state. I downloaded the latest hetzner2 backup onto hetzner3.&lt;br /&gt;
&lt;br /&gt;
I manually hardened mysql on hetzner3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mysql_secure_installation&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then I restored all the mysql DBs from the hetzner2 snapshot.&lt;br /&gt;
&lt;br /&gt;
To get munin to be able to collect data from mysql, I crated the munin user (note that you should *not* GRANT it access to anything)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
 echo &amp;quot;CREATE USER munin@localhost identified by &#039;CHANGEME&#039;;&amp;quot; | mysql -u${mysqlUser} -p${mysqlPass}&lt;br /&gt;
 echo &amp;quot;flush privileges;&amp;quot; | mysql -u${mysqlUser} -p${mysqlPass}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And I created the munin config file (including setting the mysql user&#039;s password as set above).  Note that this file is *not* in ansible because it contains a password. Future migrations will want to create this file by copying to from the old server. Or from backups, if needed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
vim /etc/munin/plugin-conf.d/zzz-myconf&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I created &#039;/var/www/html/.htpasswd&#039; (copied from the old server), and I tested that munin and awstats were functioning.&lt;br /&gt;
&lt;br /&gt;
One-by-one, I copied each vhost docroot from the hetzner2 backups into hetzner3&#039;s vhost docroots. I set the &amp;lt;code&amp;gt;/etc/hosts&amp;lt;/code&amp;gt; file on my laptop to override DNS and point each vhost domain to the hetzner3 server. To confirm I was loading the right server&#039;s vhost in my browser, I added &#039;/is_hetzner3&#039; with this command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for docroot in $(sudo find /var/www/html/* -maxdepth 1 -regextype awk -regex &amp;quot;.*(htdocs|public_html)&amp;quot; -type d); do echo &amp;quot;true&amp;quot; | sudo tee &amp;quot;$docroot/is_hetzner3&amp;quot;; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And after restoring each vhost docroot, I created an unprivliged &#039;not-apache&#039; user:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
adduser not-apache --disabled-password --gecos  --home /dev/null --shell /usr/sbin/nologin&#039;&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then I fixed the permissions with this (since CentOS and Debian have different users &amp;amp; groups). But you, dear future reader, would probably be smarter to copy this file from our backups.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;gt; /usr/local/bin/fix_web_permissions.sh &amp;lt;&amp;lt;&#039;EOF&#039;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#set -x&lt;br /&gt;
################################################################################&lt;br /&gt;
# File:    fix_web_permissions.sh&lt;br /&gt;
# Version: 0.2&lt;br /&gt;
# Purpose: Idempotent script that will set the minimum permissions required for&lt;br /&gt;
#          all of the files in /var/www/html. Run this script after you make&lt;br /&gt;
#          changes to any files (eg update wordpress, mediawiki, phplist, etc)&lt;br /&gt;
# Authors: Michael Altfield &amp;lt;michael@michaelaltfield.net&amp;gt;&lt;br /&gt;
# Created: 2025-02-06&lt;br /&gt;
# Updated: 2025-02-13&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  SETTINGS                                    #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  FUNCTIONS                                   #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
################################################################################&lt;br /&gt;
#                                  MAIN BODY                                   #&lt;br /&gt;
################################################################################&lt;br /&gt;
&lt;br /&gt;
# first pass, whole site&lt;br /&gt;
chown -R not-apache:www-data &amp;quot;/var/www/html&amp;quot;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
find &amp;quot;/var/www/html&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
#############&lt;br /&gt;
# WORDPRESS #&lt;br /&gt;
#############&lt;br /&gt;
&lt;br /&gt;
wordpress_sites=&amp;quot;$(find /var/www/html -type d -wholename *htdocs/wp-content)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for wordpress_site in $wordpress_sites; do&lt;br /&gt;
&lt;br /&gt;
	wp_docroot=&amp;quot;$(dirname &amp;quot;${wordpress_site}&amp;quot;)&amp;quot;&lt;br /&gt;
	vhost_dir=&amp;quot;$(dirname &amp;quot;${wp_docroot}&amp;quot;)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
	find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
	chown not-apache:apache-admins &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
	chmod 0040 &amp;quot;${vhost_dir}/wp-config.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/uploads&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
	[ -d &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; ] || mkdir &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	chown -R not-apache:www-data &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
	find &amp;quot;${wp_docroot}/wp-content/tmp&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
###########&lt;br /&gt;
# phpList #&lt;br /&gt;
###########&lt;br /&gt;
&lt;br /&gt;
phplist_sites=&amp;quot;$(find /var/www/html -maxdepth 1 -type d -iname *phplist*)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for vhost_dir in $phplist_sites; do&lt;br /&gt;
&lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type d -exec chmod 0050 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}&amp;quot; -type f -exec chmod 0040 {} \;; done&lt;br /&gt;
&lt;br /&gt;
	for dir in ${vhost_dir}; do [ -d &amp;quot;${dir}/public_html/uploadimages&amp;quot; ] || mkdir &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do chown -R not-apache:www-data &amp;quot;${dir}/public_html/uploadimages&amp;quot;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type f -exec chmod 0660 {} \;; done&lt;br /&gt;
	for dir in ${vhost_dir}; do find &amp;quot;${dir}/public_html/uploadimages&amp;quot; -type d -exec chmod 0770 {} \;; done&lt;br /&gt;
&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
#############&lt;br /&gt;
# MediaWiki #&lt;br /&gt;
#############&lt;br /&gt;
&lt;br /&gt;
vhost_dir=&amp;quot;/var/www/html/wiki.opensourceecology.org&amp;quot;&lt;br /&gt;
mw_docroot=&amp;quot;${vhost_dir}/htdocs&amp;quot;&lt;br /&gt;
&lt;br /&gt;
chown -R not-apache:www-data &amp;quot;${vhost_dir}&amp;quot;&lt;br /&gt;
find &amp;quot;${vhost_dir}&amp;quot; -type d -exec chmod 0050 {} \;&lt;br /&gt;
find &amp;quot;${vhost_dir}&amp;quot; -type f -exec chmod 0040 {} \;&lt;br /&gt;
&lt;br /&gt;
chown not-apache:apache-admins &amp;quot;${vhost_dir}/LocalSettings.php&amp;quot;&lt;br /&gt;
chmod 0040 &amp;quot;${vhost_dir}/LocalSettings.php&amp;quot;&lt;br /&gt;
&lt;br /&gt;
[ -d &amp;quot;${mw_docroot}/images&amp;quot; ] || mkdir &amp;quot;${mw_docroot}/images&amp;quot;&lt;br /&gt;
chown -R www-data:www-data &amp;quot;${mw_docroot}/images&amp;quot;&lt;br /&gt;
find &amp;quot;${mw_docroot}/images&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
find &amp;quot;${mw_docroot}/images&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
[ -d &amp;quot;${mw_docroot}/images/captcha&amp;quot; ] || mkdir &amp;quot;${mw_docroot}/images/captcha&amp;quot;&lt;br /&gt;
chown -R www-data:www-data &amp;quot;${mw_docroot}/images/captcha&amp;quot;&lt;br /&gt;
find &amp;quot;${mw_docroot}/images/captcha&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
find &amp;quot;${mw_docroot}/images/captcha&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
[ -d &amp;quot;${vhost_dir}/cache&amp;quot; ] || mkdir &amp;quot;${vhost_dir}/cache&amp;quot;&lt;br /&gt;
chown -R www-data:www-data &amp;quot;${vhost_dir}/cache&amp;quot;&lt;br /&gt;
find &amp;quot;${vhost_dir}/cache&amp;quot; -type f -exec chmod 0660 {} \;&lt;br /&gt;
find &amp;quot;${vhost_dir}/cache&amp;quot; -type d -exec chmod 0770 {} \;&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
chown root:root /usr/local/bin/fix_web_permissions.sh&lt;br /&gt;
chmod 0755 /usr/local/bin/fix_web_permissions.sh&lt;br /&gt;
&lt;br /&gt;
time /usr/local/bin/fix_web_permissions.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I installled wp-cli per [[Wordpress#WP-CLI]]. wp-cli is intentionally *not* given internet access, but it is still useful for getting info and activiating/deactivatig plugins &amp;amp; themes.&lt;br /&gt;
&lt;br /&gt;
I configured mdadm to send emails to our ops list in the event that one of the disks in our RAID1 array fails (note this is not configured in ansible because we don&#039;t want our email addresses on GitHub)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # cd /etc/mdadm/&lt;br /&gt;
root@hetzner3 /etc/mdadm # cp mdadm.conf mdadm.conf.20240929.orig&lt;br /&gt;
root@hetzner3 /etc/mdadm # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/mdadm # vim mdadm.conf&lt;br /&gt;
root@hetzner3 /etc/mdadm # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/mdadm # diff mdadm.conf.20240929.orig mdadm.conf&lt;br /&gt;
18c18,19&lt;br /&gt;
&amp;lt; MAILADDR root&lt;br /&gt;
---&lt;br /&gt;
&amp;gt; MAILFROM REDACTED@hetzner3.opensourceecology.org&lt;br /&gt;
&amp;gt; MAILADDR REDACTED@opensourceecology.org&lt;br /&gt;
root@hetzner3 /etc/mdadm # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I configured mdadm to send emails to our ops list in the event that one of the disks starts to show signs of failure (note this is not configured in ansible because we don&#039;t want our email addresses on GitHub).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@hetzner3 ~ # cd /etc/&lt;br /&gt;
root@hetzner3 /etc/ # mv /etc/smartd.conf /etc/smartd.conf.$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;).orig&lt;br /&gt;
root@hetzner3 /etc/ # &lt;br /&gt;
&lt;br /&gt;
root@hetzner3 /etc/ # echo &amp;quot;DEVICESCAN -d removable -n standby -m REDACTED@opensourceecology.org -M exec /usr/share/smartmontools/smartd-runner&amp;quot; &amp;gt; /etc/smartd.conf&lt;br /&gt;
root@hetzner3 /etc/ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I created accounts for Marcin, Catarina, and Tom Griffing &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
USERNAME=&#039;marcin&#039;&lt;br /&gt;
PUBKEY=&#039;ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDDMiIUel3xYyxuiXAj82PzoJDwRczrEpDgUoRI4W9ceL5FqVcY38Go9q3SF2Nx0FEj+IdCUXc08lyy6ZPUbPcKvscFxWeue4aMM62ikzNxmhGBdjqgT3q3wpJgyjTXmt9AJcglcAm9mcQffSUi3RD9KDlCyc/T923eZdaLAkW/BMhjuOZqY90tjGqs/r/kxN0gf4vI24NMFL/41ct7OMKVnNNsjIpQtceX9fCOCumAx53OdtJEcp46TvzevZk2987Zn0VsONznvVCJ0kmm8B0RJxwIfmiLM73f+reo0pv+sSc2rU7SrpzLfPWLFcM7pkJQc3HtLnktl5form3flp+EkI7fr7348r8A7W+QIifjXk66ohJReDni9H/S4JSX2L1lf8LfJKSHtAqrFRWSPp22MKre5hiH0IybED6XZfz59HT0cgMK2iNcPRj/J+hEbBM0f4zZu62PUad7rr1JI4Vv078/ROaD47fykicxYhauI4R71J1YucSj/vekXf17x3xlO+u8ucSeUhdpMuIAa3Yk16bXsrwo4nIdcApC6rwfNiQDK8Ecx6+M6pV6z+dII4OMHvEYWw92wWJZfIyk7emvAoataqp3DfI0DQagPNBo2ieEZYLvNYny+X9hf6faZ6trsGnR4GfN83PEt3ZfmoEoyTVB2POiBdM8a1GNTlEasQ== marcin@Precision-M6500&#039;&lt;br /&gt;
&lt;br /&gt;
# create user if it doesn&#039;t yet exist&lt;br /&gt;
adduser ${USERNAME} --disabled-password --gecos &#039;&#039;&lt;br /&gt;
&lt;br /&gt;
# add ssh pubkey&lt;br /&gt;
mkdir /home/${USERNAME}/.ssh&lt;br /&gt;
echo $PUBKEY &amp;gt; /home/${USERNAME}/.ssh/authorized_keys&lt;br /&gt;
chown -R ${USERNAME}:${USERNAME} /home/${USERNAME}/.ssh&lt;br /&gt;
chmod 0700 /home/${USERNAME}/.ssh&lt;br /&gt;
chmod 0600 /home/${USERNAME}/.ssh/authorized_keys&lt;br /&gt;
&lt;br /&gt;
USERNAME=&#039;cmota&#039;&lt;br /&gt;
PUBKEY=&#039;ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDjEu4tbVMJxAX6VuCIrhLYDh/PBlyFHfgGU5ovuPPZLOWYAYI2xYgl5SCweKgB8g2hNTOoyLKgxj0UF7MH22xYQV/EwEwVxX/isSwNvGGikOaKOfb9rBt6nlW2K6ehJlPpHA2nPiqDcuU/1wT3T1FTpYL+uTtQxr4gF8Ijt4aNLwpRdvzgza5vZ1I1R0yLuC2VI1m0NNqT45yGRyWZpM7thT7YGS5Jr0DQa0kLxEZvhGcgdAL6kYfJLW1IhglOZTcoff5TGY6Q8X/gjNrVv6ZUeNF2QiXz4Gm6I6I1YtUDdEEfndu0bHATkMX9aeNG6qAfcYcUcm8pnK+c/RehE0LAcNSDCg9VozsDGg65ywgYw+k0mTl2sW8V95Igfi8oxf/ulGuzxgyriQlhFA4JckDA6Vz2BCjcYabcRhc0ugG34SBRPOUCxVzdb40FSGftVcxb1FeDxsnHxQkl23W9dCcwMMU1m2ssY6F09TTiqhbIp816MkepfWNkB5QDPbmu6EWgT4jp3zWqjMUNcYz9NmRsb6VZ9G357LPOZgMM36XOQXIePcWo5bCQYSusPDSXXjqeSeEVnrfrJJEpBr2AxFCt1R3Dw/fs/rG+YFGNdFadsgiSHxHs2zJglV+Pj8buI6z/EOuHXylZN/2jfOAT17oRU5QXz0HlT0ToeehwFb1+Gw== catarina@Computer&#039;&lt;br /&gt;
&lt;br /&gt;
# create user if it doesn&#039;t yet exist&lt;br /&gt;
adduser ${USERNAME} --disabled-password --gecos &#039;&#039;&lt;br /&gt;
&lt;br /&gt;
# add ssh pubkey&lt;br /&gt;
mkdir /home/${USERNAME}/.ssh&lt;br /&gt;
echo $PUBKEY &amp;gt; /home/${USERNAME}/.ssh/authorized_keys&lt;br /&gt;
chown -R ${USERNAME}:${USERNAME} /home/${USERNAME}/.ssh&lt;br /&gt;
chmod 0700 /home/${USERNAME}/.ssh&lt;br /&gt;
chmod 0600 /home/${USERNAME}/.ssh/authorized_keys&lt;br /&gt;
&lt;br /&gt;
USERNAME=&#039;tgriffing&#039;&lt;br /&gt;
PUBKEY=&#039;ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDHS8EYmP85HqJwsP4kJ3D2dIBFBgVY8A8YUFubm+bjOFKHr9mV4nnJoY2TweQsKjsT8Kvg8uRPeThls5/7QK/3gDz/objdWp/2W5kvwhDxlZwEWyf+5a6F23OYLc5oeixwR/TyU13OokXSeZeTxPX3m/It1VBKEz0QUCjwTHEkPrjjhbeVlQ7vFeCAwGlrA8puDF1l8SUIO23hpiU9E+IM/+wTasEP8YblSk9445mLow4BexlvmfrRsXXdg/vrdObchzeo9rhZxMTWPE2nbyVUp86iaNp/PVbeTNKWx0hZF0zr7TjIbsmYmGXlPMZcKaStpcfMlVJ+hJ9NxwTHrqhC0lsfNz9pvPdLkZM3O5Ychevu4xlFb3XddMiO1QHodqf56vZhicMA+9cLfZpFTcwtVGseD+JpURPuG2DBtEDkozGk1szx2SoX5B6ccprZYvfj4HiTW6+qv7XN2uMbRMHw0VMyAPjwSKYC/YzTZ885VAFj8Oo5t5Q6F9VW1oRUF3gWrcLBcvL2XUDQCCUpF3bDlHxQQqJZ3EifW6rDZVHlyLkzq6/FKTUPHuHdX4K5DPdJxEcfdm5zyjiGEtGQ2uzHx3WAJMaykjFsJElsE7avhHagKzneS/b4shReEEseNErhW5d0AyAoPkEoVkCyauS2vOvNAZ29OXc2Yf6DEIdU9Q== tom@thomasgfingsmbp&lt;br /&gt;
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGwcx6l+W/6CTU+bm9gOZM53uEPvSsTk31PqQE/svb9qNzrM+Ny8xBfofbhlbTXYFAldlGC3S3DgBO6yOHCQgnHtf7zqBD+sNsVqMGKSpDhkAn09CmJy9P90p4ovZqHfGpvSPXfyPyBF/ebLgJeS8roxcU9OyTO+iRMXv8rOgK7zLLbdMy+/tXr6muGyaIzHJljYpaebd4kjM4INaycGYY7gEVBmBzC6wHj+PDLcPSeYXTVG6R7RrfGQuvtM61hNY90+pw2di0GR57wqF/0tLvfJ5+QyWJoh4ns4gBhRf8/2QVfcy+DD9ofQ8ILRVVf77IxZRTY8j+zgUBD4YjvBmtx/UB2nJJRwyDjPEB55grC+LjQ8ehwgc2LpE2nVvEWCUZjdw5kFZjD4fHVWRhbcVmusSIAyw47xPpywRtry0+rdbL90i2JTitFMRzqTZLETAOgEfRp50WiPulxh2Gj1bVCHFvx1p/hdxbEWZx2k2s62SOYvZj+yBazK9gBFLwPZWBx5bzeu091Yxvingt+EZ4qGF807trP5e46oJCLmAU1DXD4enWmTfGQxvsallREYj6xbdWjMq+Az35nWmlg7omlvZPVMDZ7S+++dTO9ypxJeeVEfBav/gkghqcY5lGIU51eCiBEric476NQRG7aJp9rakgF2wKj8qWIoOzRysWYw== tom@imac0&#039;&lt;br /&gt;
&lt;br /&gt;
# create user if it doesn&#039;t yet exist&lt;br /&gt;
adduser ${USERNAME} --disabled-password --gecos &#039;&#039;&lt;br /&gt;
&lt;br /&gt;
# add ssh pubkey&lt;br /&gt;
mkdir /home/${USERNAME}/.ssh&lt;br /&gt;
echo $PUBKEY &amp;gt; /home/${USERNAME}/.ssh/authorized_keys&lt;br /&gt;
chown -R ${USERNAME}:${USERNAME} /home/${USERNAME}/.ssh&lt;br /&gt;
chmod 0700 /home/${USERNAME}/.ssh&lt;br /&gt;
chmod 0600 /home/${USERNAME}/.ssh/authorized_keys&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I updated the ansible playbook to permit these new human user accounts to access the Internet, and I re-ran the ansible playbook to update the firewall.&lt;br /&gt;
&lt;br /&gt;
 * https://github.com/OpenSourceEcology/ansible/commit/f294f3384f5d4bc47ae55a1631c264950e0e1355&lt;br /&gt;
&lt;br /&gt;
I created a group for users who have access to the shared OSE keepass file on the server, and I added the new users to the groups as-needed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
groupadd keepass&lt;br /&gt;
&lt;br /&gt;
gpasswd -a maltfield keepass&lt;br /&gt;
gpasswd -a marcin keepass&lt;br /&gt;
gpasswd -a cmota keepass&lt;br /&gt;
gpasswd -a tgriffing keepass&lt;br /&gt;
&lt;br /&gt;
gpasswd -a marcin sshaccess&lt;br /&gt;
gpasswd -a cmota sshaccess&lt;br /&gt;
gpasswd -a tgriffing sshaccess&lt;br /&gt;
&lt;br /&gt;
gpasswd -a marcin www-data&lt;br /&gt;
gpasswd -a cmota www-data&lt;br /&gt;
gpasswd -a tgriffing www-data&lt;br /&gt;
&lt;br /&gt;
gpasswd -a tgriffing apache-admins&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# TODO: actually migrate keepass&lt;br /&gt;
&lt;br /&gt;
To see the actual commands used to migrate, update, and fix our websites from hetzner2 -&amp;gt; hetzner3, see&lt;br /&gt;
&lt;br /&gt;
* TODO: forums&lt;br /&gt;
* [[CHG-2025-XX-XX migrate store to hetzner3]]&lt;br /&gt;
* [[CHG-2025-XX-XX migrate microfactory to hetzner3]]&lt;br /&gt;
* [[CHG-2025-XX-XX deprecate fef]]&lt;br /&gt;
* [[CHG-2025-XX-XX deprecate oswh ]]&lt;br /&gt;
* [[CHG-2025-XX-XX_migrate_obi_to_hetzner3]]&lt;br /&gt;
* [[CHG-2025-XX-XX_migrate_osemain_to_hetzner3]]&lt;br /&gt;
* [[CHG-2025-XX-XX_migrate_phplist_to_hetzner3]]&lt;br /&gt;
* [[CHG-2025-XX-XX migrate wiki to hetzner3]]&lt;br /&gt;
&lt;br /&gt;
Note that we did *not* migrate seedhome.opensourceecology.org. That site was never setup, and we decided to deprecate it.&lt;br /&gt;
&lt;br /&gt;
=Purchase=&lt;br /&gt;
[[Image:Hetzner3_auction1.png|right|OSE server specs on Hetzner3 as of July 2014.]]&lt;br /&gt;
&lt;br /&gt;
We purchased Hetzner3 from a Dedicated Server Auction on 2024-07-30 for 37.72 EUR/mo.&lt;br /&gt;
&lt;br /&gt;
Before becoming a discount auction server, Hetzner3 was sold as dedicated server model &amp;lt;code&amp;gt;EX42-NVMe&amp;lt;/code&amp;gt;. For comparison, [[Hetzner2]] was a &amp;lt;code&amp;gt;EX41S-SSD&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Hardware==&lt;br /&gt;
&lt;br /&gt;
[[Image:Hetzner3_hardware1.png|right|500px|OSE server specs on Hetzner3 as of July 2014.]]&lt;br /&gt;
&lt;br /&gt;
Hetzner3 came with the following hardware:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
* Intel Core i7-6700&lt;br /&gt;
* 2x SSD M.2 NVMe 512 GB&lt;br /&gt;
* 4x RAM 16384 MB DDR4&lt;br /&gt;
* NIC 1 Gbit Intel I219-LM&lt;br /&gt;
* Location: Germany&lt;br /&gt;
* Rescue system (English)&lt;br /&gt;
* 1 x Primary IPv4 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===CPU===&lt;br /&gt;
Hetzner3 has a Intel Core i7-6700. It&#039;s a 4-core (8-thread) 3.4 Ghz processor from 2015 with 8M Cache. This cannot be upgraded.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@mail ~ # cat /proc/cpuinfo &lt;br /&gt;
...&lt;br /&gt;
processor	: 7&lt;br /&gt;
vendor_id	: GenuineIntel&lt;br /&gt;
cpu family	: 6&lt;br /&gt;
model		: 94&lt;br /&gt;
model name	: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz&lt;br /&gt;
stepping	: 3&lt;br /&gt;
microcode	: 0xf0&lt;br /&gt;
cpu MHz		: 905.921&lt;br /&gt;
cache size	: 8192 KB&lt;br /&gt;
physical id	: 0&lt;br /&gt;
siblings	: 8&lt;br /&gt;
core id		: 3&lt;br /&gt;
cpu cores	: 4&lt;br /&gt;
apicid		: 7&lt;br /&gt;
initial apicid	: 7&lt;br /&gt;
fpu		: yes&lt;br /&gt;
fpu_exception	: yes&lt;br /&gt;
cpuid level	: 22&lt;br /&gt;
wp		: yes&lt;br /&gt;
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities&lt;br /&gt;
vmx flags	: vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple shadow_vmcs pml&lt;br /&gt;
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds mmio_stale_data retbleed gds&lt;br /&gt;
bogomips	: 6799.81&lt;br /&gt;
clflush size	: 64&lt;br /&gt;
cache_alignment	: 64&lt;br /&gt;
address sizes	: 39 bits physical, 48 bits virtual&lt;br /&gt;
power management:&lt;br /&gt;
&lt;br /&gt;
root@mail ~ # &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For comparison, this is the same processor that we&#039;ve been using in [[Hetzner2]], and it&#039;s way over-provisioned for our needs.&lt;br /&gt;
&lt;br /&gt;
===Disk===&lt;br /&gt;
&lt;br /&gt;
2x 512 GB NVMe disks should suit us fine.&lt;br /&gt;
&lt;br /&gt;
We also have one empty NVMe slot and two emtpy SATA slots. As of today, we can upgrade each SATA slot with a max 3.84 TB SSD or max 22 TB HDD.&lt;br /&gt;
&lt;br /&gt;
For comparison, we had 2x 250 GB SSD disks in [[Hetzner2]], so this should be approximately double the capacity and a somewhat better disk io ops.&lt;br /&gt;
&lt;br /&gt;
===Memory===&lt;br /&gt;
&lt;br /&gt;
We have 64 GB of DDR4 RAM. This cannot be upgraded; this is the maximum memory that this system can take.&lt;br /&gt;
&lt;br /&gt;
For comparison, this is the same memory as we&#039;ve been using in [[Hetzner2]]. We could get-by with less, but varnish is happy to use it.&lt;br /&gt;
&lt;br /&gt;
=Initial Specifications Research=&lt;br /&gt;
Because hetzner2 ran on CentOS7 (which was EOL&#039;d 2024-06-30), [[User:Marcin|Marcin]] asked [[User:Maltfield|Michael]] in July 2024 to begin provisioning a &amp;quot;hetzner3&amp;quot; with Debian to replace &amp;quot;hetzner2&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note: The charts in this section come from Hetzner2, not Hetzner3&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==Munin==&lt;br /&gt;
&lt;br /&gt;
I ([[User:Maltfield|Michael Altfield]]) collected some charts from Hetzner2&#039;s munin to confirm my understanding of the Hetzner2 server&#039;s resource needs before purchasing a new Hetzner3 dedicated server from Hetzner.&lt;br /&gt;
&lt;br /&gt;
===CPU===&lt;br /&gt;
&lt;br /&gt;
In 2018&amp;lt;ref name=&amp;quot;server-req-2018&amp;quot;&amp;gt;https://wiki.opensourceecology.org/index.php?title=OSE_Server&amp;amp;oldid=298909#OSE_Server_and_Server_Requirements&amp;gt;&amp;lt;/ref&amp;gt;, I said we&#039;d want min 2-4 cores.&lt;br /&gt;
&lt;br /&gt;
After reviewing the cpu &amp;amp; load charts for the past year, load rarely ever touches 3. Most of the time it hovers between 0.2 - 1. So I agree that 4 cores is fine for us now.&lt;br /&gt;
&lt;br /&gt;
Most of these auctions have a Intel Core i7-4770, which is a 4-core + 8 thread proc. That should be fine.&lt;br /&gt;
&lt;br /&gt;
[[File:Munin_cpu-day_20240730.gif]]&lt;br /&gt;
[[File:Munin_cpu-year_20240730.gif]]&lt;br /&gt;
&lt;br /&gt;
[[File:Munin_load-day_20240730.gif]]&lt;br /&gt;
[[File:Munin_load-year_20240730.gif]]&lt;br /&gt;
&lt;br /&gt;
===Disk===&lt;br /&gt;
&lt;br /&gt;
Honestly, I expect that the lowest offerings of a dedicated server in 2024 are probably going to suffice for us, but what I&#039;m mostly concerned-about is the disk. Even [[CHG-2024-07-26_yum_update|last week when I did the yum updates]], I nearly filled the disk just by extracting a copy of our backups. Currently we have two 250G disks in a software RAID-1 (mirror) array. That give us a useable 197G&lt;br /&gt;
&lt;br /&gt;
It&#039;s important to me that we double this at-least, but I&#039;ll see if there&#039;s any deals on 1TB disks or larger.&lt;br /&gt;
&lt;br /&gt;
Also what we currently have is a 6 Gb/s SSD, so I don&#039;t want to downgrade that by going to a spinning-disk HDD. NvME might be a welcome upgrade. I/O wait is probably a bottleneck, but not currently one that&#039;s causing us agony&lt;br /&gt;
&lt;br /&gt;
[[File:Munin_df-year_20240730.gif]]&lt;br /&gt;
&lt;br /&gt;
To be clear: the usage line of &#039;/&#039; in this chart is the middle-green line, which is ~50% full&lt;br /&gt;
&lt;br /&gt;
[[File:Munin_swap-day_20240730.gif]]&lt;br /&gt;
[[File:Munin_swap-year_20240730.gif]]&lt;br /&gt;
[[File:Munin_diskstats_throughput-day_20240731.gif]]&lt;br /&gt;
[[File:Munin_diskstats_throughput-year_20240731.gif]]&lt;br /&gt;
[[File:Munin_diskstats-page-day_20240731.gif]]&lt;br /&gt;
[[File:Munin_diskstats-page-year_20240731.gif]]&lt;br /&gt;
&lt;br /&gt;
===Memory===&lt;br /&gt;
&lt;br /&gt;
In 2018&amp;lt;ref name=&amp;quot;server-req-2018&amp;quot; /&amp;gt;, I said we&#039;d want 8-16G RAM minimum. While that&#039;s technically true, we currently have 64G RAM. Most of these base cheap-as-they-come dedicated servers in the hetzener auction page have 64G RAM.&lt;br /&gt;
&lt;br /&gt;
We use 40G of RAM just for varnish, which [a] greatly reduces load on the server and [b] gives our read-only visitors a much, much faster page load time. While we don&#039;t strictly *need* that much RAM, I&#039;m going to make sure hetzner3 has at least as much RAM as hetzner2.&lt;br /&gt;
&lt;br /&gt;
[[File:Munin_memory-day_20240730.gif]]&lt;br /&gt;
[[File:Munin_memory-year_20240730.gif]]&lt;br /&gt;
[[File:Munin_multips_memory-week_20240730.gif]]&lt;br /&gt;
[[File:Munin_multips_memory-year_20240730.gif]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Nginx===&lt;br /&gt;
[[File:Munin_nginx_wiki_opensourceecology_org_request-day_20240730.gif]]&lt;br /&gt;
[[File:Munin_nginx_wiki_opensourceecology_org_request-month_20240730.gif]]&lt;br /&gt;
[[File:Munin_nginx_wiki_opensourceecology_org_request-week_20240730.gif]]&lt;br /&gt;
[[File:Munin_nginx_wiki_opensourceecology_org_request-year_20240730.gif]]&lt;br /&gt;
[[File:Munin_nginx_wiki_opensourceecology_org_status-day_20240730.gif]]&lt;br /&gt;
[[File:Munin_nginx_wiki_opensourceecology_org_status-month_20240730.gif]]&lt;br /&gt;
[[File:Munin_nginx_wiki_opensourceecology_org_status-week_20240730.gif]]&lt;br /&gt;
[[File:Munin_nginx_wiki_opensourceecology_org_status-year_20240730.gif]]&lt;br /&gt;
&lt;br /&gt;
===Varnish===&lt;br /&gt;
[[File:Munin_varnish_hit_rate-week_20240730.gif]]&lt;br /&gt;
[[File:Munin_varnish_hit_rate-year_20240730.gif]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Full===&lt;br /&gt;
[[File:Munin_munin_screenshot_20240730.gif]]&lt;br /&gt;
&lt;br /&gt;
==See Also==&lt;br /&gt;
* [[OSE Server]]&lt;br /&gt;
* [[OSE Development Server]]&lt;br /&gt;
* [[OSE Staging Server]]&lt;br /&gt;
* [[Website]]&lt;br /&gt;
* [[Web server configuration]]&lt;br /&gt;
* [[Wordpress]]&lt;br /&gt;
* [[Vanilla Forums]]&lt;br /&gt;
* [[Mediawiki]]&lt;br /&gt;
* [[Munin]]&lt;br /&gt;
* [[Awstats]]&lt;br /&gt;
* [[Ossec]]&lt;br /&gt;
* [[Google Workspace]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* Hetzner Login to manage Hetzner3 - https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
{{reflist}}&lt;br /&gt;
&lt;br /&gt;
[[Category: IT Infrastructure]]&lt;br /&gt;
[[Category: Software]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305932</id>
		<title>CHG-2025-04-24 replace hetzner2 sdb</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305932"/>
		<updated>2025-04-24T16:36:16Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* 2025-04-24 16:18 UTC */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 16:18 UTC==&lt;br /&gt;
&lt;br /&gt;
The new disk is now fully in-sync with the old (failing) disk, since sometime between 15:15 and 15:20 UTC&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 15:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      [===================&amp;gt;.]  recovery = 96.5% (202794752/209984640) finish=2.5min speed=46324K/sec&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
      523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:20:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
     	 523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now SMART says /dev/sdb is PASSED and /dev/sda is still FAILED&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: PASSED&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Full info&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78516&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       50&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3445&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       47&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   060   046   000    Old_age   Always       -       40 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       407132499909&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12839097351&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26313144762&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       3&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       3&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52083&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       33&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   004   004   000    Old_age   Always       -       1449&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       20&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   061   049   000    Old_age   Always       -       39 (Min/Max 22/51)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   004   004   001    Old_age   Offline      -       96&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       600236629947&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       18860233219&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       11828985935&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2470&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       12&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I&#039;m marking this change as completed successfully. Next-up we replace the other failing disk. See [[CHG-2025-XX-XX_replace_hetzner2_sda]]&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 14:23 UTC==&lt;br /&gt;
&lt;br /&gt;
The wiki is back!&lt;br /&gt;
&lt;br /&gt;
Unfortunately, hetzner fucked-up and removed *both* disks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kind regards&lt;br /&gt;
&lt;br /&gt;
 Nils Wei=C3=9Fer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The result was that /dev/sda was listed as /dev/sdc, the new drive was /dev/sdb, and dmesg was being spammed with I/O and RAID errors. The wiki was down. Disks were read-only, so I couldn&#039;t even take backups. I tried to reboot, but even &amp;lt;code&amp;gt;reboot&amp;lt;/code&amp;gt; failed due to i/o errors.&lt;br /&gt;
&lt;br /&gt;
I used the WUI to trigger a reboot, and--thank god--the server came-up again. I immediately took down all the web services as I investigated the damage and triggered a new backup.&lt;br /&gt;
&lt;br /&gt;
I was able to partition the new disk and add it to a RAID. At the time of writing, both swap and boot are synced (and grub installed on the new disk), and it&#039;s still syncing the root partition on the new disk in the RAID (currently at 35% and writing at 58 MB/s&lt;br /&gt;
&lt;br /&gt;
When the backup finished uploading, I put the web services back online and typed this status message.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:32 UTC==&lt;br /&gt;
&lt;br /&gt;
I finished submitting the request to hetnzer to replace the disk for free.&lt;br /&gt;
&lt;br /&gt;
It says we should expect the new disk to be inserted in 2-4 hours. One part of the form said this would happen without downtime. But the (required) checkbox at the bottom said that I understand that downtime is required. So that&#039;s ambiguous.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:22 UTC==&lt;br /&gt;
&lt;br /&gt;
Because the RAID wasn&#039;t defective, I first had to force it to break&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:07 UTC==&lt;br /&gt;
&lt;br /&gt;
I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
      523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
      209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:04 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting CHG&lt;br /&gt;
&lt;br /&gt;
==2025-04-19 11:49 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved this CHG for 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-24 05:00 Kansas City, US&lt;br /&gt;
* = 2025-04-24 05:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sdb = Crucial_CT250MX200SSD1_154410FA4520) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sdb partitions from our software RAID&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sda is the same and sdb has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]] Investigation into failed disks (after db corruption event in April)&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305931</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305931"/>
		<updated>2025-04-24T16:28:14Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: We had 5 hours of downtime on the replacement of sdb, due to hetzner unplugging both disks. So I&amp;#039;m bumping the expected downtime.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-??-?? ??:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-??-?? ??:00 Kansas City, US&lt;br /&gt;
* = 2025-??-?? ??:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 5 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305930</id>
		<title>CHG-2025-04-24 replace hetzner2 sdb</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305930"/>
		<updated>2025-04-24T16:26:49Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: completed successfully&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 16:18 UTC==&lt;br /&gt;
&lt;br /&gt;
The new disk is now fully in-sync with the old (failing) disk, since sometime between 15:15 and 15:20 UTC&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Thu Apr 24 15:15:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      [===================&amp;gt;.]  recovery = 96.5% (202794752/209984640) finish=2.5min speed=46324K/sec&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
      523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Thu Apr 24 15:20:59 UTC 2025&lt;br /&gt;
Personalities : [raid1]&lt;br /&gt;
md2 : active raid1 sdb3[2] sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      bitmap: 1/2 pages [4KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
md0 : active raid1 sdb1[2] sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
md1 : active raid1 sdb2[2] sda2[0]&lt;br /&gt;
      523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I&#039;m marking this change as completed successfully. Next-up we replace the other failing disk. See [[CHG-2025-XX-XX_replace_hetzner2_sda]]&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 14:23 UTC==&lt;br /&gt;
&lt;br /&gt;
The wiki is back!&lt;br /&gt;
&lt;br /&gt;
Unfortunately, hetzner fucked-up and removed *both* disks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kind regards&lt;br /&gt;
&lt;br /&gt;
 Nils Wei=C3=9Fer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The result was that /dev/sda was listed as /dev/sdc, the new drive was /dev/sdb, and dmesg was being spammed with I/O and RAID errors. The wiki was down. Disks were read-only, so I couldn&#039;t even take backups. I tried to reboot, but even &amp;lt;code&amp;gt;reboot&amp;lt;/code&amp;gt; failed due to i/o errors.&lt;br /&gt;
&lt;br /&gt;
I used the WUI to trigger a reboot, and--thank god--the server came-up again. I immediately took down all the web services as I investigated the damage and triggered a new backup.&lt;br /&gt;
&lt;br /&gt;
I was able to partition the new disk and add it to a RAID. At the time of writing, both swap and boot are synced (and grub installed on the new disk), and it&#039;s still syncing the root partition on the new disk in the RAID (currently at 35% and writing at 58 MB/s&lt;br /&gt;
&lt;br /&gt;
When the backup finished uploading, I put the web services back online and typed this status message.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:32 UTC==&lt;br /&gt;
&lt;br /&gt;
I finished submitting the request to hetnzer to replace the disk for free.&lt;br /&gt;
&lt;br /&gt;
It says we should expect the new disk to be inserted in 2-4 hours. One part of the form said this would happen without downtime. But the (required) checkbox at the bottom said that I understand that downtime is required. So that&#039;s ambiguous.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:22 UTC==&lt;br /&gt;
&lt;br /&gt;
Because the RAID wasn&#039;t defective, I first had to force it to break&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:07 UTC==&lt;br /&gt;
&lt;br /&gt;
I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
      523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
      209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:04 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting CHG&lt;br /&gt;
&lt;br /&gt;
==2025-04-19 11:49 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved this CHG for 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-24 05:00 Kansas City, US&lt;br /&gt;
* = 2025-04-24 05:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sdb = Crucial_CT250MX200SSD1_154410FA4520) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sdb partitions from our software RAID&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sda is the same and sdb has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]] Investigation into failed disks (after db corruption event in April)&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305929</id>
		<title>CHG-2025-04-24 replace hetzner2 sdb</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305929"/>
		<updated>2025-04-24T14:30:43Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* 2025-04-24 14:23 UTC */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 14:23 UTC==&lt;br /&gt;
&lt;br /&gt;
The wiki is back!&lt;br /&gt;
&lt;br /&gt;
Unfortunately, hetzner fucked-up and removed *both* disks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kind regards&lt;br /&gt;
&lt;br /&gt;
 Nils Wei=C3=9Fer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The result was that /dev/sda was listed as /dev/sdc, the new drive was /dev/sdb, and dmesg was being spammed with I/O and RAID errors. The wiki was down. Disks were read-only, so I couldn&#039;t even take backups. I tried to reboot, but even &amp;lt;code&amp;gt;reboot&amp;lt;/code&amp;gt; failed due to i/o errors.&lt;br /&gt;
&lt;br /&gt;
I used the WUI to trigger a reboot, and--thank god--the server came-up again. I immediately took down all the web services as I investigated the damage and triggered a new backup.&lt;br /&gt;
&lt;br /&gt;
I was able to partition the new disk and add it to a RAID. At the time of writing, both swap and boot are synced (and grub installed on the new disk), and it&#039;s still syncing the root partition on the new disk in the RAID (currently at 35% and writing at 58 MB/s&lt;br /&gt;
&lt;br /&gt;
When the backup finished uploading, I put the web services back online and typed this status message.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:32 UTC==&lt;br /&gt;
&lt;br /&gt;
I finished submitting the request to hetnzer to replace the disk for free.&lt;br /&gt;
&lt;br /&gt;
It says we should expect the new disk to be inserted in 2-4 hours. One part of the form said this would happen without downtime. But the (required) checkbox at the bottom said that I understand that downtime is required. So that&#039;s ambiguous.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:22 UTC==&lt;br /&gt;
&lt;br /&gt;
Because the RAID wasn&#039;t defective, I first had to force it to break&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:07 UTC==&lt;br /&gt;
&lt;br /&gt;
I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
      523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
      209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:04 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting CHG&lt;br /&gt;
&lt;br /&gt;
==2025-04-19 11:49 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved this CHG for 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-24 05:00 Kansas City, US&lt;br /&gt;
* = 2025-04-24 05:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sdb = Crucial_CT250MX200SSD1_154410FA4520) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sdb partitions from our software RAID&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sda is the same and sdb has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]] Investigation into failed disks (after db corruption event in April)&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305928</id>
		<title>CHG-2025-04-24 replace hetzner2 sdb</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305928"/>
		<updated>2025-04-24T14:30:31Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* Status */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 14:23 UTC==&lt;br /&gt;
&lt;br /&gt;
The wiki is back!&lt;br /&gt;
&lt;br /&gt;
Unfortunately, hetzner fucked-up and removed *both* disks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kind regards&lt;br /&gt;
&lt;br /&gt;
 Nils Wei=C3=9Fer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The result was that /dev/sda was listed as /dev/sdc, the new drive was /dev/sdb, and dmesg was being spammed with I/O and RAID errors. The wiki was down. Disks were read-only, so I couldn&#039;t even take backups. I tried to reboot, but even &amp;lt;code&amp;gt;reboot&amp;lt;/code&amp;gt; failed due to i/o errors.&lt;br /&gt;
&lt;br /&gt;
I used the WUI to trigger a reboot, and--thank god--the server came-up again. I immediately took down all the web services as I investigated the damage and triggered a new backup.&lt;br /&gt;
&lt;br /&gt;
I was able to partition the new disk and add it to a RAID. At the time of writing, both swap and boot are synced (and grub installed on the new disk), and it&#039;s still syncing the root partition on the new disk in the RAID (currently at 35% and writing at 58 MB/s&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
When the backup finished uploading, I put the web services back online and typed this status message.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:32 UTC==&lt;br /&gt;
&lt;br /&gt;
I finished submitting the request to hetnzer to replace the disk for free.&lt;br /&gt;
&lt;br /&gt;
It says we should expect the new disk to be inserted in 2-4 hours. One part of the form said this would happen without downtime. But the (required) checkbox at the bottom said that I understand that downtime is required. So that&#039;s ambiguous.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:22 UTC==&lt;br /&gt;
&lt;br /&gt;
Because the RAID wasn&#039;t defective, I first had to force it to break&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:07 UTC==&lt;br /&gt;
&lt;br /&gt;
I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
      523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
      209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:04 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting CHG&lt;br /&gt;
&lt;br /&gt;
==2025-04-19 11:49 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved this CHG for 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-24 05:00 Kansas City, US&lt;br /&gt;
* = 2025-04-24 05:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sdb = Crucial_CT250MX200SSD1_154410FA4520) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sdb partitions from our software RAID&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sda is the same and sdb has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]] Investigation into failed disks (after db corruption event in April)&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305927</id>
		<title>CHG-2025-04-24 replace hetzner2 sdb</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305927"/>
		<updated>2025-04-24T14:29:02Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* Status */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 14:23 UTC==&lt;br /&gt;
&lt;br /&gt;
The wiki is back!&lt;br /&gt;
&lt;br /&gt;
Unfortunately, hetzner fucked-up and removed *both* disks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Dear Client,&lt;br /&gt;
&lt;br /&gt;
we&#039;ve replaced the drive via hotswap as wished.&lt;br /&gt;
&lt;br /&gt;
The second drive was unfortunately also briefly disconnected as there was a=&lt;br /&gt;
 wrong physical label on it.&lt;br /&gt;
&lt;br /&gt;
If you have any further questions or problems, feel free to contact us agai=&lt;br /&gt;
n.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kind regards&lt;br /&gt;
&lt;br /&gt;
 Nils Wei=C3=9Fer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The result was that /dev/sda was listed as /dev/sdc, the new drive was /dev/sdb, and dmesg was being spammed with I/O and RAID errors. The wiki was down. Disks were read-only, so I couldn&#039;t even take backups. I tried to reboot, but even &amp;lt;code&amp;gt;reboot&amp;lt;/code&amp;gt; failed due to i/o errors.&lt;br /&gt;
&lt;br /&gt;
I used the WUI to trigger a reboot, and--thank god--the server came-up again. I immediately took down all the web services as I investigated the damage and triggered a new backup.&lt;br /&gt;
&lt;br /&gt;
I was able to partition the new disk and add it to a RAID. At the time of writing, both swap and boot are synced (and grub installed on the new disk), and it&#039;s still syncing the root partition on the new disk in the RAID (currently at 35% and writing at 58 MB/s&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:32 UTC==&lt;br /&gt;
&lt;br /&gt;
I finished submitting the request to hetnzer to replace the disk for free.&lt;br /&gt;
&lt;br /&gt;
It says we should expect the new disk to be inserted in 2-4 hours. One part of the form said this would happen without downtime. But the (required) checkbox at the bottom said that I understand that downtime is required. So that&#039;s ambiguous.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:22 UTC==&lt;br /&gt;
&lt;br /&gt;
Because the RAID wasn&#039;t defective, I first had to force it to break&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:07 UTC==&lt;br /&gt;
&lt;br /&gt;
I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
      523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
      209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:04 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting CHG&lt;br /&gt;
&lt;br /&gt;
==2025-04-19 11:49 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved this CHG for 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-24 05:00 Kansas City, US&lt;br /&gt;
* = 2025-04-24 05:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sdb = Crucial_CT250MX200SSD1_154410FA4520) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sdb partitions from our software RAID&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sda is the same and sdb has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]] Investigation into failed disks (after db corruption event in April)&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305926</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305926"/>
		<updated>2025-04-24T10:46:49Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* Change Steps */ added fail steps (needed since RAID isn&amp;#039;t in failed state)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-??-?? ??:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-??-?? ??:00 Kansas City, US&lt;br /&gt;
* = 2025-??-?? ??:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm --manage /dev/md0 --fail /dev/sda1&lt;br /&gt;
mdadm --manage /dev/md1 --fail /dev/sda2&lt;br /&gt;
mdadm --manage /dev/md2 --fail /dev/sda3&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305925</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305925"/>
		<updated>2025-04-24T10:44:12Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: s/sdb/sda/ in comments&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-??-?? ??:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-??-?? ??:00 Kansas City, US&lt;br /&gt;
* = 2025-??-?? ??:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sda partitions from our software RAID&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sdb is the same and sda has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305924</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305924"/>
		<updated>2025-04-24T10:43:10Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: update purpose with serial ID&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-??-?? ??:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-??-?? ??:00 Kansas City, US&lt;br /&gt;
* = 2025-??-?? ??:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda = Crucial_CT250MX200SSD1_154410FA336C) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sdb partitions from our software RAID&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sda is the same and sdb has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305923</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305923"/>
		<updated>2025-04-24T10:41:58Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: copy/paste from sdb CHG to this change steps&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-??-?? ??:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-??-?? ??:00 Kansas City, US&lt;br /&gt;
* = 2025-??-?? ??:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sdb partitions from our software RAID&lt;br /&gt;
mdadm /dev/md0 -r /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA336C&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sda is the same and sdb has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sdb | sfdisk /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sda1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sda2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sda3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sda&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305922</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305922"/>
		<updated>2025-04-24T10:38:19Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: /* See Also */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-??-?? ??:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-??-?? ??:00 Kansas City, US&lt;br /&gt;
* = 2025-??-?? ??:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
TODO&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]]&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305921</id>
		<title>CHG-2025-04-30 replace hetzner2 sda</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-30_replace_hetzner2_sda&amp;diff=305921"/>
		<updated>2025-04-24T10:37:54Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: add link to the similar CHG on the other/mirrored disk&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-??-?? ??:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-??-?? ??:00 Kansas City, US&lt;br /&gt;
* = 2025-??-?? ??:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sda) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
TODO&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]] Last (possible) update to hetzner2&lt;br /&gt;
# [[CHG-2025-04-24_replace_hetzner2_sdb]]&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
	<entry>
		<id>https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305920</id>
		<title>CHG-2025-04-24 replace hetzner2 sdb</title>
		<link rel="alternate" type="text/html" href="https://wiki.opensourceecology.org/index.php?title=CHG-2025-04-24_replace_hetzner2_sdb&amp;diff=305920"/>
		<updated>2025-04-24T10:33:42Z</updated>

		<summary type="html">&lt;p&gt;Maltfield: status update 10:32 UTC&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Status=&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:32 UTC==&lt;br /&gt;
&lt;br /&gt;
I finished submitting the request to hetnzer to replace the disk for free.&lt;br /&gt;
&lt;br /&gt;
It says we should expect the new disk to be inserted in 2-4 hours. One part of the form said this would happen without downtime. But the (required) checkbox at the bottom said that I understand that downtime is required. So that&#039;s ambiguous.&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:22 UTC==&lt;br /&gt;
&lt;br /&gt;
Because the RAID wasn&#039;t defective, I first had to force it to break&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot remove failed for /dev/sdb1: Device or resource busy&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md0 --fail /dev/sdb1&lt;br /&gt;
mdadm: set /dev/sdb1 faulty in /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md1 --fail /dev/sdb2&lt;br /&gt;
mdadm: set /dev/sdb2 faulty in /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm --manage /dev/md2 --fail /dev/sdb3&lt;br /&gt;
mdadm: set /dev/sdb3 faulty in /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1](F)&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1](F)&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1](F)&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm: hot removed /dev/sdb1 from /dev/md0&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm: hot removed /dev/sdb2 from /dev/md1&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
mdadm: hot removed /dev/sdb3 from /dev/md2&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0]&lt;br /&gt;
      523712 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0]&lt;br /&gt;
      33521664 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0]&lt;br /&gt;
      209984640 blocks super 1.2 [2/1] [U_]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:07 UTC==&lt;br /&gt;
&lt;br /&gt;
I confirmed that the RAID looks healthy, and our daily backups finished a few hours ago&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# cat /proc/mdstat&lt;br /&gt;
Personalities : [raid1] &lt;br /&gt;
md1 : active raid1 sda2[0] sdb2[1]&lt;br /&gt;
      523712 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md0 : active raid1 sda1[0] sdb1[1]&lt;br /&gt;
      33521664 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      &lt;br /&gt;
md2 : active raid1 sda3[0] sdb3[1]&lt;br /&gt;
      209984640 blocks super 1.2 [2/2] [UU]&lt;br /&gt;
      bitmap: 2/2 pages [8KB], 65536KB chunk&lt;br /&gt;
&lt;br /&gt;
unused devices: &amp;lt;none&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# source /root/backups/backup.settings&lt;br /&gt;
[root@opensourceecology ~]# ${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
20144027578 daily_hetzner3_20250424_074924.tar.gpg&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# date -u&lt;br /&gt;
Thu Apr 24 10:06:52 UTC 2025&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==2025-04-24 10:04 UTC==&lt;br /&gt;
&lt;br /&gt;
Starting CHG&lt;br /&gt;
&lt;br /&gt;
==2025-04-19 11:49 UTC==&lt;br /&gt;
&lt;br /&gt;
Marcin approved this CHG for 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
==2025-04-18 22:15 UTC==&lt;br /&gt;
&lt;br /&gt;
Initial Ticket draft created on wiki (WIP)&lt;br /&gt;
&lt;br /&gt;
=Change Info=&lt;br /&gt;
&lt;br /&gt;
==Scheduled Time==&lt;br /&gt;
&lt;br /&gt;
This change will take place on 2025-04-24 10:00 UTC&lt;br /&gt;
&lt;br /&gt;
* = 2025-04-24 05:00 Kansas City, US&lt;br /&gt;
* = 2025-04-24 05:00 Guayaquil, EC&lt;br /&gt;
&lt;br /&gt;
https://www.timeanddate.com/worldclock/converter.html?iso=20240727T160000&amp;amp;p1=405&amp;amp;p2=1440&amp;amp;p3=93&lt;br /&gt;
&lt;br /&gt;
==Purpose==&lt;br /&gt;
&lt;br /&gt;
This change will physically replace one of our two HDD (/dev/sdb = Crucial_CT250MX200SSD1_154410FA4520) on [[hetzner2]]&lt;br /&gt;
&lt;br /&gt;
On 2025-04-17, we had a database corruption event that took down all of the websites on hetzner2. The database wouldn&#039;t start because it was corrupt and it was not able to recover from the corruption due to a bug in mariadb. And because hetzner2 is EOL CentOS, we can&#039;t update mariadb. While I don&#039;t think the corruption was caused by disk failure, the SMART log output said both of our two redundant disks are going to fail within 24 hours and we should replace them immediately&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sda&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
START OF READ SMART DATA SECTION&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78355&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3433&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2599&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   064   046   000    Old_age   Always       -       36 (Min/Max 24/54)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       405734134966&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12794981941&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       26207531685&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# smartctl -A /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART Attributes Data Structure revision number: 16&lt;br /&gt;
Vendor Specific SMART Attributes with Thresholds:&lt;br /&gt;
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE&lt;br /&gt;
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0&lt;br /&gt;
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0&lt;br /&gt;
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78354&lt;br /&gt;
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43&lt;br /&gt;
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       3742&lt;br /&gt;
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       40&lt;br /&gt;
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2585&lt;br /&gt;
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
194 Temperature_Celsius     0x0022   065   044   000    Old_age   Always       -       35 (Min/Max 24/56)&lt;br /&gt;
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0&lt;br /&gt;
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
202 Percent_Lifetime_Remain 0x0030   000   000   001    Old_age   Offline  FAILING_NOW 100&lt;br /&gt;
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0&lt;br /&gt;
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0&lt;br /&gt;
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       406209116828&lt;br /&gt;
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12809824998&lt;br /&gt;
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       42504271864&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Points of Contact==&lt;br /&gt;
&lt;br /&gt;
Change being performed by: [[User:Maltfield|Michael Altfield]]&lt;br /&gt;
&lt;br /&gt;
Service owners: [[User:Catarina|Catarina Mota]] &amp;amp; [[User:Marcin|Marcin Jakubowski]]&lt;br /&gt;
&lt;br /&gt;
==Time Length==&lt;br /&gt;
&lt;br /&gt;
We expect at-most 4 hours of downtime.&lt;br /&gt;
&lt;br /&gt;
Re-partitioning the new disk, adding it to the raid, and updating grub should take less than 2 hours.&lt;br /&gt;
&lt;br /&gt;
Rebuilding the RAID1 mirror of the two disks might take a day or more. During this time we&#039;ll be vulnerable as we&#039;ll only have one disk (no redundancy). This is worse because both of the disks currently say they&#039;re going to fail within 24 hours.&lt;br /&gt;
&lt;br /&gt;
==Systems Impacted==&lt;br /&gt;
&lt;br /&gt;
This change impacts [[hetzner2]] and every service/website that runs on it will go down.&lt;br /&gt;
&lt;br /&gt;
==Staging Test==&lt;br /&gt;
&lt;br /&gt;
n/a&lt;br /&gt;
&lt;br /&gt;
=Change Steps=&lt;br /&gt;
&lt;br /&gt;
First, before we do anything, get the status of the RAID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before removing the second redundant disk from the RAID, confirm that today&#039;s backup was successfully uploaded to Backblaze&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# verify today&#039;s backup is present and a sane size&lt;br /&gt;
source /root/backups/backup.settings&lt;br /&gt;
${RCLONE} ls &amp;quot;b2:${B2_BUCKET_NAME}&amp;quot; | grep $(date &amp;quot;+%Y%m%d&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
At some time in Germany&#039;s morning-ish (and also very shortly after our daily backups complete), execute these commands to remove the drive from our RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# remove all sdb partitions from our software RAID&lt;br /&gt;
mdadm /dev/md0 -r /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -r /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -r /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Log into the Hetzner WUI https://robot.your-server.de/&lt;br /&gt;
&lt;br /&gt;
Go to the servers page https://robot.hetzner.com/server&lt;br /&gt;
&lt;br /&gt;
# Click the &amp;quot;Support&amp;quot; tab under hetzner2&lt;br /&gt;
# Click &amp;quot;Technical&amp;quot;&lt;br /&gt;
# Select &amp;quot;Server - Disk Failure&amp;quot;&lt;br /&gt;
# Select &amp;quot;Specification of the defective HDD/SSD&amp;quot; and enter &amp;quot;Crucial_CT250MX200SSD1_154410FA4520&amp;quot;&lt;br /&gt;
# Select &amp;quot;Free&amp;quot;&lt;br /&gt;
# Select &amp;quot;Swap while the system is running&amp;quot;&lt;br /&gt;
# Select &amp;quot;As soon as possible&amp;quot;&lt;br /&gt;
# In the &amp;quot;Entire SMART log&amp;quot; textarea, enter this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[root@opensourceecology ~]# smartctl -H /dev/sdb&lt;br /&gt;
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.119.1.el7.x86_64] (local build)&lt;br /&gt;
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org&lt;br /&gt;
&lt;br /&gt;
=== START OF READ SMART DATA SECTION ===&lt;br /&gt;
SMART overall-health self-assessment test result: FAILED!&lt;br /&gt;
Drive failure expected in less than 24 hours. SAVE ALL DATA.&lt;br /&gt;
No failed Attributes found.&lt;br /&gt;
&lt;br /&gt;
[root@opensourceecology ~]# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
# Click &amp;quot;Send request&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Wait until hetzner confirms that the replacement drive has been inserted&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# monitor for I/O events in kernel logs&lt;br /&gt;
dmesg -w&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After the replacement drive has been inserted, get some info about it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# get disks and partition info&lt;br /&gt;
lsblk&lt;br /&gt;
&lt;br /&gt;
# get serial numbers of both disk; confirm sda is the same and sdb has changed&lt;br /&gt;
udevadm info --query=property --name sda | grep ID_SER&lt;br /&gt;
udevadm info --query=property --name sdb | grep ID_SER&lt;br /&gt;
&lt;br /&gt;
# verify RAID status&lt;br /&gt;
cat /proc/mdstat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Before we modify the partition tables of any of our drives, let&#039;s make backups&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# create a temp dir for this change&lt;br /&gt;
stamp=$(date &amp;quot;+%Y%m%d_%H%M%S&amp;quot;)&lt;br /&gt;
chg_dir=/var/tmp/chg.$stamp&lt;br /&gt;
mkdir $chg_dir&lt;br /&gt;
chown root:root $chg_dir&lt;br /&gt;
chmod 0700 $chg_dir&lt;br /&gt;
pushd $chg_dir&lt;br /&gt;
&lt;br /&gt;
# make backups of both disks&#039; partition tables&lt;br /&gt;
sfdisk --dump /dev/sda &amp;gt; ${chg_dir}/sda_parttable_mbr.bak&lt;br /&gt;
sfdisk --dump /dev/sdb &amp;gt; ${chg_dir}/sdb_parttable_mbr.bak&lt;br /&gt;
&lt;br /&gt;
# verify&lt;br /&gt;
du -sh ${chg_dir}/*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy the partition table from our old disk to our new disk&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# dump the partition table of the first disk and pipe it to the second disk&lt;br /&gt;
sfdisk -d /dev/sda | sfdisk /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Tell the kernel to re-read the partition table&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# kernel reload of the new partition table&lt;br /&gt;
blockdev --rereadpt /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now add the new drive to the RAID array&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# add all of the new disks&#039;s partitions to the software RAID&lt;br /&gt;
mdadm /dev/md0 -a /dev/sdb1&lt;br /&gt;
mdadm /dev/md1 -a /dev/sdb2&lt;br /&gt;
mdadm /dev/md2 -a /dev/sdb3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Copy our grub configuration and files onto the new disk using `grub-install`&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
grub-install /dev/sdb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Execute this command to monitor the status of the RAID replication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
while true; do date; cat /proc/mdstat; echo; sleep 300; done&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You may need to &#039;&#039;&#039;wait several hours&#039;&#039;&#039; (hopefully less than 1 day) before proceeding.&lt;br /&gt;
&lt;br /&gt;
Once the sync is finally complete, test a reboot to make sure that grub is still functioning as-expected&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo reboot&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Revert Steps==&lt;br /&gt;
&lt;br /&gt;
Not sure if this is even possible, but we would have to contact hetzner and tell them to physically remove the new drive and re-install the old one that they just physically removed.&lt;br /&gt;
&lt;br /&gt;
=See Also=&lt;br /&gt;
&lt;br /&gt;
# [[Maltfield_Log/2025_Q2]] Investigation into failed disks (after db corruption event in April)&lt;br /&gt;
# [[:Category: CHGs|List of other CHG &amp;quot;tickets&amp;quot;]]&lt;br /&gt;
&lt;br /&gt;
=External Links=&lt;br /&gt;
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid/&lt;br /&gt;
&lt;br /&gt;
[[Category: CHGs]]&lt;/div&gt;</summary>
		<author><name>Maltfield</name></author>
	</entry>
</feed>