Maltfield Log/2019 Q4

From Open Source Ecology
Jump to: navigation, search

My work log from the year 2019 Quarter 4. I intentionally made this verbose to make future admin's work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.

See Also

  1. Maltfield_Log
  2. User:Maltfield
  3. Special:Contributions/Maltfield

Wed Dec 04, 2019

  1. Looks like Marcin is delegating adding software to OSE Linux, but I'm afraid we hit a space limit last time Chris tried to do this. I always felt that it made more sense for OSE Linux to be a proper Debian/Ubuntu variant rather than a live ISO. That solves the live iso byte limits and the persistance issues.
I wasn't asked and I'm not an authority, but I'm going to add my $0.02:

Chris hit a size issue with the last build where we reached a byte limit
of a live iso when trying to add additional software to OSE Linux. And
there's also issues with persistence on a live distro, in general.

I've always wondered if it makes more sense to run an Ubuntu (or
debian?) downstream distro/flavour/variant instead of a live distro

 * https://ubuntu.com/download/flavours
 * https://en.wikipedia.org/wiki/Category:Debian-based_distributions

And/or our own OSE apt repository such that we pin the versions and
update them on some standardized release cycle.

 * https://wiki.debian.org/DebianRepository#Set_up_and_maintain_a_repository

Honestly, the live OSE distro always felt like a hack, and creating our
own repo always seemed more--eh--apt. This would solve any iso disk
space limitations, persistence issues, and probably be easier to
maintain & iterate-on in the future.

But, again, I'm no authority here. I've never maintained a
publicly-accessible debian repository or built a live distro. My
experience with OSE Linux is extremely limited. Also, January 25th is a
tight deadline, so please don't let this email derail your effort. Only
consider it if you hit the space limitation as Chris did. This is a
longer-term discussion.


Cheers,

Michael Altfield
Senior System Administrator
PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7  70D2 AA3E DF71 60E2 D97B

Open Source Ecology
www.opensourceecology.org

Wed Dec 04, 2019

  1. Backblaze got back to me stating that they currently don't support append-only, but they'd forward my request for this feature to their development team

Hello Michael,

Unfortunately, we do not have an append-only feature implemented into our application keys but that is definitely something we can explore and potentially implement.  I will forward that suggestion to our developers however you are correct.

If a malicious actor were to gain access to both the keyID/application key they could overwrite file versions by listing the contents of a bucket and then writing junk files with the same file names found within the bucket.

Unfortunately, at this time we do not a timeline on when a possible append-only permission will be available for application keys.

Regards,

--Brad

The Backblaze Team
  1. Marcin confirmed that he hasn't yet made a copy of one of our backups stored off-site at FeF :(
  2. ...
  3. Marcin responded to my VPN Training email, including that his laptop won't boot so he's using Catarina's, but that he doesn't have sudo access on it. I sent him links on how to boot to recovery and/or single-user mode to reset the root password on the machine
    1. https://askubuntu.com/questions/24006/how-do-i-reset-a-lost-administrative-password
    2. https://www.debuntu.org/how-to-recover-root-password-under-linux-with-single-user-mode/
  4. I also made it clear that, in addition to having sudo access to your workstation and generating a CSR before our meeting, he'd also need access to his personal keepass, his 2FA app on his phone, and his ssh private key.

Tue Dec 03, 2019

  1. Well, my "Append-only" article on wikipedia was rejected for lack of citing of sources (even though I cited man pages, business blog posts of their usage of append-only in their data structures, and a couple academic papers describing append-only solutions to randsomware). It was suggested that I add this to existing articles, such as the "File Attribute" article https://en.wikipedia.org/wiki/Draft:Append-only
  2. ...
  3. Backblaze got back to me and said that B2's file versioning system is essentially immature, as is the entire industries
Hello,

Thanks for contacting Backblaze support.

With the lifecycle rules set as you have them, "daysFromUploadingToHiding 364" and "daysFromHidingToDeleting 1" the expected behavior of when a file version is uploaded, is to hide the file after 364 days and to delete file versions after 24 hours.

Unfortunately, there is no way to avoid creating new file versions other than renaming the file upon upload to b2 with an edited file name. No application key permission changes will address this if the key has write permission.

If a file is uploaded to b2 with the filename example.file any further example.file('s) will be treated as a file version, the previous version will be hidden and lifecycle rules will be applied.  If the file has a server name appended to it such as "exampleservername.file" then it will be treated differently than example.file and retained as a separate file. Additionally adding a date stamp to the file can also increase the amount of retained versions, i.e.  "exampleservername12022019.file" or if multiple versions per day will be added then a time and date stamp can be used, "exampleservername14162212022019.file.

A simpler method may be to increase the dayFromHidingToDeleting to an amount greater than 1 day to allow for a file review to ensure no version deletions are made without review.

The issue at hand is that B2 does not have a more robust file version control method built into the storage.  Even Amazon S3 does not have this and either a manual/scripted solution with appended file names must be used or a 3rd party Integration with more featured file versioning must be implemented.

Regards,

--Brad

The Backblaze Team
  1. I responded, reiterating the need for "append-only" for many org's defenses to ransomware, and asking if anything was being done about it. I CC'd Marcin.
Hi Brad, thanks for your response!

> adding a date stamp to the file can also increase the amount of
> retained versions, i.e.  "exampleservername12022019.file" or if multiple
> versions per day will be added then a time and date stamp can be used,
> "exampleservername14162212022019.file.

To be clear, the use-case here is an "append-only" ACL such that an
attacker that's compromised my machine where the keys' live (ie:
ransomware) does not have permission to overwrite existing data in our
bucket. Or, in the case specific to Backblaze B2, the key would also
need to not have permission to make existing files hidden.

Under this threat model, I cannot prevent the malicious actor who stole
our application key from just being nice and appending some string to or
existing files to prevent the old ones from being hidden!

> The issue at hand is that B2 does not have a more robust file version
> 	control method built into the storage.

Are there any improvements to B2 in the works that would fix this issue
or otherwise provide "append-only" ACL permissions to application keys
in conjunction with lifecycle rules? If so, is there an ETA on when I
can expect to be able to set "append-only" permissions to a given
application key in conjunction with lifecycle rules?

With the number of ransomware attacks that occurred in 2019 such that
the attacker literally encrypted all the victims' servers *and* deleted
the victims' backups, append-only permissions have become a critical
component to many organization's backup solutions. If Backblaze fully
supported append-only permissions to application keys with lifecycle
rules, it would certainly attract many of these customers who were
victim to ransomware.

...Or loose customers to a solution that *does* offer append-only
permissions, such as BorgBase or Wasabi.

 * https://www.borgbase.com/
 * https://wasabi.com/blog/use-immutable-storage/

Please let me know when I can expect to be able to use set "append-only"
permissions to an application key in conjunction with lifecycle rules.

Michael Altfield
Senior System Administrator
PGP Fingerprint: 8A4B 0AF8 162F 3B6A 79B7  70D2 AA3E DF71 60E2 D97B

Open Source Ecology
www.opensourceecology.org
  1. meanwhile, our monthly backup report email came into my inbox; the first of this month's backup was OK, but last night's was not. Did I break backups? The log shows the command about to be executed, but failed with an access control issue
[root@opensourceecology ~]# head /var/log/backups/backup.log-20191202 
================================================================================
INFO: Beginning Backup Run on 20191202_072001
INFO: Cleaning up old backup files
INFO: Beginning to backup mysql databases

real    1m27.779s
user    1m42.149s
sys     0m1.578s
INFO: Beginning to backup server's files
		INFO: /etc
[root@opensourceecology ~]# tail /var/log/backups/backup.log-20191202 
user    8m25.247s
sys     0m13.917s
		INFO: Deleting unencrypted backup archive
INFO: moving encrypted backup file to b2user's sync dir
INFO: Beginning upload to backblaze b2
ERROR: Application key is restricted to bucket: ose-server-backups

real    0m0.191s
user    0m0.164s
sys     0m0.025s
[root@opensourceecology ~]# 
  1. But it looks like our most recent backup succeeded, so there's no issue here
[root@opensourceecology ~]# head /var/log/backups/backup.log
================================================================================
INFO: Beginning Backup Run on 20191203_072001
INFO: Cleaning up old backup files
INFO: Beginning to backup mysql databases

real    1m25.295s
user    1m40.124s
sys     0m1.501s
INFO: Beginning to backup server's files
		INFO: /etc
[root@opensourceecology ~]# tail /var/log/backups/backup.log
  "action": "upload", 
  "fileId": "4_z5605817c251dadb96e4d0118_f205e5e6c6b206f16_d20191203_m074425_c001_v0001113_t0059", 
  "fileName": "daily_hetzner2_20191203_072001.tar.gpg", 
  "size": 18670808915, 
  "uploadTimestamp": 1575359065000
}

real    175m47.708s
user    5m14.135s
sys     0m56.405s
[root@opensourceecology ~]# 
  1. I sent an email to Marcin not to worry about the missing backup from the emailed backup report, and I also asked if Marcin ever successfully was able to download one copy of our backups to store safely on a disk offline at FeF.
  2. ...
  3. OK, back to OpenVPN 2FA. Unfortunately, it looks like I can't change the name of the prompt "Enter Auth Password: " to something like "Enter OTP Token: " https://openvpn.net/community-resources/reference-manual-for-openvpn-2-0/
  4. I began documenting this process. I figured I'd use `easy-rsa` for the OSE dev when they generate a certificate and certificate signing request, but I quickly found that easy-rsa isn't easy. At lesat not in a way that's easy & robust for me to document for how users to do it. I started with this:
sudo apt-get install openvpn openresolv easy-rsa

cd $HOME
mkdir openvpn
cd /usr/share/easy-rsa
source vars
KEY_DIR=$HOME/openvpn
KEY_CONFIG=/etc/ssl/openssl.cnf

# inputs
echo -n "Enter your two-digit country code: "; read KEY_COUNTRY
echo -n "Enter your state/province: "; read KEY_PROVINCE
echo -n "Enter your city: "; read KEY_CITY
echo -n "Enter the name of your organization: "; read KEY_ORG
echo -n "Enter your email address: "; read KEY_EMAIL

# generate certificate request
./build-req `whoami`
  1. but then I got an error that the KEY_CONFIG isn't easy-rsa specific. Well, this is a problem as the openssl.cnf files provided by RSA don't include one for OpenSSL 1.1.0l, the current version of openssl installed by debian. And why can't easy-rsa make this easy by automatically figuring out which one to use, anyway?
user@ose:/usr/share/easy-rsa$ sudo find / | grep -i openssl | grep -i cnf
/usr/lib/ssl/openssl.cnf
/usr/share/easy-rsa/openssl-0.9.6.cnf
/usr/share/easy-rsa/whichopensslcnf
/usr/share/easy-rsa/openssl-0.9.8.cnf
/usr/share/easy-rsa/openssl-1.0.0.cnf
/usr/share/doc/openvpn/examples/sample-keys/openssl.cnf
/etc/ssl/openssl.cnf
user@ose:/usr/share/easy-rsa$ 
user@ose:/usr/share/easy-rsa$ openssl version
OpenSSL 1.1.0l  10 Sep 2019
user@ose:/usr/share/easy-rsa$ 
  1. It might actually be easier to write this documentation for the user to use `openssl` instead of `easy-rsa`
  2. I finished the documentation for both
    1. The OSE Developer requesting VPN access and https://wiki.opensourceecology.org/wiki/VPN#Developers:_How_to_request_access_to_the_dev_VPN
    2. The OSE SysAdmin granting VPN access https://wiki.opensourceecology.org/wiki/VPN#Sysadmin:_How_to_grant_access_to_the_dev_VPN
  3. I sent an email to Marcin asking if he'd be free sometime this week for a meeting to setup & train him on connecting the VPN so he can access our staging sites (including our Discourse POC site)
  4. I sent an meial to Marcin as a status update on the Discourse POC
  5. I updated my TODO list on the OSE Server article https://wiki.opensourceecology.org/wiki/OSE_Server#TODO

Mon Dec 02, 2019

  1. I created a new key in Backblaze B2 with name 'prod-append-only'. This will be an append-only key such that our production server can put new data in our backblaze b2 bucket, but it cannot overwrite or delete existing backups. This is to prevent our box from having the capacity to delete our golden backups in the even that it gets hacked by, for example, randsomeware.
  2. unfortunately, while the wui lists a ton of key permissions, you can't actually granularly control them when creating them in the wui. I could only set "read-only" "write-only" and "read-write". If I set "write-only", I get this
deleteFiles, listBuckets, writeFiles
  1. of course, that's not what we want. "append-only" is distinct from "write-only" in that we want to be damn sure that it can add *new* files (or, from the filesystem meaning, appending to existing files is OK), but not be able to delete existing files or overwrite existing file's data.
  2. this backblaze b2 documentation give smore info on application keys and their permissions. Let's see if we can remove "deleteFiles" via the cli's `b2_create_key` and then test to make sure it can't overwrite existing files https://www.backblaze.com/b2/docs/application_keys.html
  3. ok, so the commands are a bit different for the `b2` python cli tool than the api documentation; here's a list of the existing keys
[b2user@opensourceecology ~]$ ~/virtualenv/bin/b2 --help
This program provides command-line access to the B2 service.

Usages:

	b2 authorize-account [<accountId>] [<applicationKey>]
	b2 cancel-all-unfinished-large-files <bucketName>
	b2 cancel-large-file <fileId>
	b2 clear-account
	b2 create-bucket [--bucketInfo <json>] [--corsRules <json>] [--lifecycleRules <json>] <bucketName> [allPublic | allPrivate]
	b2 create-key [--duration <validDurationSeconds>] [--bucket <bucketName>] [--namePrefix <namePrefix>] <keyName> <capabilities>
	b2 delete-bucket <bucketName>
	b2 delete-file-version [<fileName>] <fileId>
	b2 delete-key <applicationKeyId>
	b2 download-file-by-id [--noProgress] <fileId> <localFileName>
	b2 download-file-by-name [--noProgress] <bucketName> <fileName> <localFileName>
	b2 get-account-info
	b2 get-bucket [--showSize] <bucketName>
	b2 get-download-auth [--prefix <fileNamePrefix>] [--duration <durationInSeconds>] <bucketName>
	b2 get-download-url-with-auth [--duration <durationInSeconds>] <bucketName> <fileName>
	b2 get-file-info <fileId>
	b2 help [commandName]
	b2 hide-file <bucketName> <fileName>
	b2 list-buckets
	b2 list-file-names <bucketName> [<startFileName>] [<maxToShow>]
	b2 list-file-versions <bucketName> [<startFileName>] [<startFileId>] [<maxToShow>]
	b2 list-keys
	b2 list-parts <largeFileId>
	b2 list-unfinished-large-files <bucketName>
	b2 ls [--long] [--versions] [--recursive] <bucketName> [<folderName>]
	b2 make-url <fileId>
	b2 sync [--delete] [--keepDays N] [--skipNewer] [--replaceNewer] \
		[--compareVersions <option>] [--compareThreshold N] \
		[--threads N] [--noProgress] [--dryRun ] [--allowEmptySource ] \
		[--excludeRegex <regex> [--includeRegex <regex>]] \
		[--excludeDirRegex <regex>] \
		<source> <destination>
	b2 update-bucket [--bucketInfo <json>] [--corsRules <json>] [--lifecycleRules <json>] <bucketName> [allPublic | allPrivate]
	b2 upload-file [--sha1 <sha1sum>] [--contentType <contentType>] \
		[--info <key>=<value>]* [--minPartSize N] \
		[--noProgress] [--threads N] <bucketName> <localFilePath> <b2FileName>
	b2 version

The environment variable B2_ACCOUNT_INFO specifies the sqlite
file to use for caching authentication information.
The default file to use is: ~/.b2_account_info

For more details on one command: b2 help <command>

When authorizing with application keys, this tool requires that the key
have the 'listBuckets' capability so that it can take the bucket names
you provide on the command line and translate them into bucket IDs for the
B2 Storage service.  Each different command may required additional
capabilities.  You can find the details for each command in the help for
that command.

[b2user@opensourceecology ~]$
[b2user@opensourceecology ~]$ ~/virtualenv/bin/b2 list-keys
OBFUCATED1   dev                 
OBFUCATED2   prod-append-only    
[b2user@opensourceecology ~]$ 
  1. I was successfully able to delete the key I just made in the wui and create a new one with "writeFiles" only
[b2user@opensourceecology ~]$ ~/virtualenv/bin/b2 delete-key OBFUCATED2
OBFUCATED2
[b2user@opensourceecology ~]$ ~/virtualenv/bin/b2 create-key --bucket 'ose-server-backups' 'prod-append-only' 'writeFiles'
OBFUSCATED3 OBFUSCATEDSECRETKEY3
[b2user@opensourceecology ~]$ 
  1. There doesn't appear to be a way to query a key and get its permisions on the CLI, but a quick refresh on the 'secure.backblaze.com/app_keys.htm' page reflects that the old key is gone and the new one only has the 'writeFiles' permission. Nice!
  2. now for the test: firstI made a backup of the exsting creds, then cleared it and re-added the creds for the new appliciation key
[b2user@opensourceecology ~]$ cp .b2_account_info .b2_account_info.master
[b2user@opensourceecology ~]$
[b2user@opensourceecology ~]$ ~/virtualenv/bin/b2 clear-account --help

b2 clear-account

	Erases everything in ~/.b2_account_info.  Location
	of file can be overridden by setting B2_ACCOUNT_INFO.

[b2user@opensourceecology ~]$ ~/virtualenv/bin/b2 clear-account
[b2user@opensourceecology ~]$
  1. ugh, I got yelled at that listBuckets is required. I guess that's not *too* bad
[b2user@opensourceecology ~]$ ~/virtualenv/bin/b2 authorize-account 'OBFUSCATED3' 'OBFUSCATEDSECRETKEY3'
Using https://api.backblazeb2.com
ERROR: application key has no listBuckets capability, which is required for the b2 command-line tool
[b2user@opensourceecology ~]$ 
  1. ok, I deleted the old one (not shown) added a new one with listBuckets and writeFiles
[b2user@opensourceecology ~]$ ~/virtualenv/bin/b2 authorize-account 'OBFUSCATED4' 'OBFUSCATEDSECRETKEY4'
Using https://api.backblazeb2.com
[b2user@opensourceecology ~]$ ~/virtualenv/bin/b2 get-account-info
{
	"accountAuthToken": "OBFFUSCATED", 
	"accountId": "OBFFUSCATED", 
	"allowed": {
		"bucketId": "OBFFUSCATED", 
		"bucketName": "ose-server-backups", 
		"capabilities": [
			"listBuckets", 
			"writeFiles"
		], 
		"namePrefix": null
	}, 
	"apiUrl": "https://api001.backblazeb2.com", 
	"applicationKey": "OBFFUSCATED", 
	"downloadUrl": "https://f001.backblazeb2.com"
}
[b2user@opensourceecology ~]$ 
  1. I tried an ls, but it stupidly gave me an error suggesting that I was trying to list a bucket I didn't have access to. The bucket is right, but I don't have list permissions on it. Anyway..
[b2user@opensourceecology ~]$ ~/virtualenv/bin/b2 ls 'ose-server-backups'
ERROR: Application key is restricted to bucket: ose-server-backups
[b2user@opensourceecology ~]$ 
  1. son of a bitch, I can't upload either. Note that the bucket *is* correct. what gives?
[b2user@opensourceecology tmp]$ ~/virtualenv/bin/b2 upload-file 'ose-server-backups' test.txt test.txt
ERROR: Application key is restricted to bucket: ose-server-backups
[b2user@opensourceecology tmp]$ 
# this could maybe be related to this bug, which says it was fixed in b2 version 1.3.4. I'm using 1.3.3. https://github.com/Backblaze/B2_Command_Line_Tool/issues/485
<pre>
[b2user@opensourceecology tmp]$ ~/virtualenv/bin/b2 version
b2 command line tool, version 1.3.3
[b2user@opensourceecology tmp]$ 
  1. when I first installed b2 I broke our site because it broke our `certbot` tool; hopefully it's better now that it's in a virtualenv for the b2 user and not OS-level. Anyway, I was able to update to 1.4.3 within the virtualenv https://wiki.opensourceecology.org/wiki/Backblaze#Install_CLI
[b2user@opensourceecology ~]$ source ~/virtualenv/bin/activate
(virtualenv) [b2user@opensourceecology ~]$ cd ~/sandbox/B2_Command_Line_Tool/
(virtualenv) [b2user@opensourceecology B2_Command_Line_Tool]$ git pull
...
(virtualenv) [b2user@opensourceecology B2_Command_Line_Tool]$ python setup.py install
...
(virtualenv) [b2user@opensourceecology B2_Command_Line_Tool]$ b2 version
b2 command line tool, version 1.4.3
(virtualenv) [b2user@opensourceecology B2_Command_Line_Tool]$ 
  1. I'm still getting the same stupid error, though. I'm literally typing the bucket name that it says I'm restricted to. wtf?
(virtualenv) [b2user@opensourceecology B2_Command_Line_Tool]$ b2 ls ose-server-backups
ERROR: unauthorized for application key with capabilities 'listBuckets,writeFiles', restricted to bucket 'ose-server-backups' (unauthorized)
(virtualenv) [b2user@opensourceecology B2_Command_Line_Tool]$ 
  1. oh, duh, this error is different from before in that it says the capabilities don't macth. I can't do an `ls` because I don't have 'listFiles'. what about an upload?
  2. sweet, that worked!.I confirmed the file's existance on the wui too.
[b2user@opensourceecology ~]$ mkdir tmp
[b2user@opensourceecology ~]$ cd tmp
[b2user@opensourceecology tmp]$ echo 'test0' > test.txt
[b2user@opensourceecology tmp]$ ~/virtualenv/bin/b2 upload-file 'ose-server-backups' test.txt test.txt
test.txt: 100%|| 6.00/6.00 [00:01<00:00, 4.43B/s]
URL by file name: https://f001.backblazeb2.com/file/ose-server-backups/test.txt
URL by fileId: https://f001.backblazeb2.com/b2api/v2/b2_download_file_by_id?fileId=4_z5605817c251dadb96e4d0118_f118d984fa5fe76bd_d20191202_m080016_c001_v0001131_t0058
{
  "action": "upload", 
  "fileId": "4_z5605817c251dadb96e4d0118_f118d984fa5fe76bd_d20191202_m080016_c001_v0001131_t0058", 
  "fileName": "test.txt", 
  "size": 6, 
  "uploadTimestamp": 1575273616000
}
[b2user@opensourceecology tmp]$ 
  1. now let's download the file. cool, it fails because I can't read. that's fine.
[b2user@opensourceecology tmp]$ mkdir restore
[b2user@opensourceecology tmp]$ cd restore/
[b2user@opensourceecology restore]$ ls
[b2user@opensourceecology restore]$ ~/virtualenv/bin/b2 download-file-by-name 'ose-server-backups' test.txt test.txt
ERROR: unauthorized for application key with capabilities 'listBuckets,writeFiles', restricted to bucket 'ose-server-backups' (unauthorized)
[b2user@opensourceecology restore]$ 
  1. and just to be sure: this key can't delete, right? Nope, good. Note that there's no 'delete-file-by-name'; we have to use the 'delete-file-version' https://www.backblaze.com/b2/docs/b2_delete_file_version.html
[b2user@opensourceecology restore]$ ~/virtualenv/bin/b2 delete-file-version 'test.txt' '4_z5605817c251dadb96e4d0118_f118d984fa5fe76bd_d20191202_m080016_c001_v0001131_t0058'
ERROR: unauthorized for application key with capabilities 'listBuckets,writeFiles', restricted to bucket 'ose-server-backups' (unauthorized)
[b2user@opensourceecology restore]$ 
  1. the file's contents should be 'test0'. Let's see if our 'append-only' key has the ability to overwrite data by re-uploading the file with a different contents of 'test1'
[b2user@opensourceecology restore]$ ls
[b2user@opensourceecology restore]$ cd ..
[b2user@opensourceecology tmp]$ ls
restore  test.txt
[b2user@opensourceecology tmp]$ cat test.txt 
test0
[b2user@opensourceecology tmp]$ echo "test1" > test.txt
[b2user@opensourceecology tmp]$ cat test.txt 
test1
[b2user@opensourceecology tmp]$ ~/virtualenv/bin/b2 upload-file 'ose-server-backups' test.txt test.txt
test.txt: 100%|| 6.00/6.00 [00:01<00:00, 4.87B/s]
URL by file name: https://f001.backblazeb2.com/file/ose-server-backups/test.txt
URL by fileId: https://f001.backblazeb2.com/b2api/v2/b2_download_file_by_id?fileId=4_z5605817c251dadb96e4d0118_f1157fdf57c59dad1_d20191202_m080938_c001_v0001039_t0057
{
  "action": "upload", 
  "fileId": "4_z5605817c251dadb96e4d0118_f1157fdf57c59dad1_d20191202_m080938_c001_v0001039_t0057", 
  "fileName": "test.txt", 
  "size": 6, 
  "uploadTimestamp": 1575274178000
}
[b2user@opensourceecology tmp]$ 
  1. now to validate, I hop-back to another application key with more permissions. And, damn, it looks the file got overriden.
[b2user@opensourceecology tmp]$ cd restore/
[b2user@opensourceecology restore]$ ls
[b2user@opensourceecology restore]$ ~/virtualenv/bin/b2 download-file-by-name 'ose-server-backups' test.txt test.txt
test.txt: 100%|| 6.00/6.00 [00:00<00:00, 4.72kB/s]
File name:    test.txt
File id:      4_z5605817c251dadb96e4d0118_f1157fdf57c59dad1_d20191202_m080938_c001_v0001039_t0057
File size:    6
Content type: text/plain
Content sha1: dba7673010f19a94af4345453005933fd511bea9
INFO src_last_modified_millis: 1575274166571
checksum matches
[b2user@opensourceecology restore]$ cat test.txt 
test1
[b2user@opensourceecology restore]$ 
  1. But what about this "version" stuff. Is the old file there too? There's no great way to get all the versions of a given file. Note that 'startFileId' will just output all files in the bucket strting from the given start point. And it looks like the api defines a 'prefix', but the b2 cli tool doesn't implement it (yet?) https://www.backblaze.com/b2/docs/b2_list_file_versions.html
[b2user@opensourceecology restore]$ ~/virtualenv/bin/b2 list-file-versions --help

b2 list-file-versions <bucketName> [<startFileName>] [<startFileId>] [<maxToShow>]

	Lists the names of the files in a bucket, starting at the
	given point.  This is a low-level operation that reports the
	raw JSON returned from the service.  'b2 ls' provides a higher-
	level view.

	Requires capability: listFiles

[b2user@opensourceecology restore]$ 
  1. Anyway, we can force the `ls` command to list multiple versions of each file with '--versions', and we can get the file-id of each version with '--long', and we can just grep for our filename. Here we see both versions of the files. Nice!
[b2user@opensourceecology restore]$ ~/virtualenv/bin/b2 ls --versions --long 'ose-server-backups' | grep -i test.txt
4_z5605817c251dadb96e4d0118_f1157fdf57c59dad1_d20191202_m080938_c001_v0001039_t0057  upload  2019-12-02  08:09:38          6  test.txt
4_z5605817c251dadb96e4d0118_f118d984fa5fe76bd_d20191202_m080016_c001_v0001131_t0058  upload  2019-12-02  08:00:16          6  test.txt
[b2user@opensourceecology restore]$ 
  1. It does appear that old versions are not automatically deleted by default https://www.backblaze.com/b2/docs/lifecycle_rules.html
Keep all versions of the file (default) removes all lifecycle rules from the bucket, and keeps all versions of all files until you explicitly delete them
  1. But this can be achieved by creating a lifecycle rule "daysFromHidingToDeleting". Oh, damn it, we *do* set this so that our deleted files get deleted as soon as possible. Apparently when a file is overwritten (as we did above), the old version becomes "hidden". So, effectively, our append-only key can currently overwrite all our backups and after 1 day all our data would be lost!
The most commonly used setting is daysFromHidingToDeleting, which says how long to keep file versions that are not the current version. A file version counts as hidden when explicitly hidden with b2_hide_file, or when a newer file with the same name is uploaded. When a rule with this setting applies, the file will be deleted the given number of days after it is hidden. 
  1. So I originally setup these rules on Jul 28th without much of an understanding of the distincton between daysFromUploadingToHiding or daysFromHidingToDeleting. Or the versioning of files in B2. https://wiki.opensourceecology.org/wiki/Maltfield_Log/2018_Q3#Sat_Jul_28.2C_2018
  2. I think I can create a new lifecycle rules the same as before, but set "daysFromHidingToDeleting" to null, and that should achieve what I want.
  3. Ugh, no, I can't do that. There is no "daysFromUploadingToDeletign". It appears you have to go from Uploading -> Hiding -> Deleting. That sucks!
  4. So one hackish solution exists: I could just hide all our backups 1 day after uploading, and then delete them after some long interval. For example, our retention policy for monthly backups is 1 years. Currently we set 'daysFromUploadingToHiding' to '364' and daysFromHidingToDeleting to '1'. We could reverse those, so that the montly backups are hidden after 1 day but then retained for 364 days. This would achieve what we want, but it would give the illusion to anyone at OSE other than me that we only maintain 1 day's worth of backups, because they'd all be hidden except the most recent version. That's pretty hackish, but I guess it works if needed.
  5. Because I couldn't find any article on "append-only" keys in Backblaze's knowledgebase/faq, I opened a support ticket to Backblaze asking them how it's possible to create an append-only application key in conjunction with lifecycle rules that implement our backup retention policy without effectively giving the would-be "append-only" key the abiltiy to delete all our existing backups (after a 24 hour delay). The support request is #517135. http://help.backblaze.com/hc/requests/517135
How can I create a Backblaze B2 Application Key to have append-only permissions along with lifecycle rules that delete old backups?

I've been pretty happy since we switched our offsite backups to Backblaze B2 over a year ago, but we have a new requirement that our servers are only granted append-only permissions to the endpoint where our backups reside.

Append-only is a common access control that permits appending new data to the destination, but it does not permit deleting or overwriting existing content. Note that this is very distinct from 'read-write', and it's especially important in protecting backup data in the event that the server that's writing backups to B2 is hacked by, for example, ransomware.

I was able to create an append-only application key by granting it only the 'writeFiles' and 'listBuckets' capabilities (the latter being required by the `b2` python cli tool), but I found that--while this new application key could upload new files without being able to delete existing files (good!)--this application key *can* overwrite existing files (bad!).

I'm aware that old versions of files are, by default, not actually deleted on B2. But (!) it appears that using lifecycle rules on the B2-side to establish a retention policy (as is necessary if I want to make it so my client keys don't have permission to delete or hide existing files in our b2 buckets storing our backup data) necessarily requires setting the 'daysFromHidingToDeleting' rule to something non-null. 

(!!) ************ (!!)
Setting the 'daysFromHidingToDeleting' rule to something non-null, in effect, gives the would-be "append-only" key the ability to delete all our backups (after a 24-hour delay).
(!!) ************ (!!)

Specifically, for example, we use the following LifeCycle rules for our monthly backups:

   fileNamePrefix: monthly_
   daysFrom​UploadingToHiding: 364
   daysFromHiding​ToDeleting: 1

The above ^ rules means that our monthly backups are deleted after 356 days (1 year retention for monthly backups), but it also necessarily means that if our server was ever infected with ransomware, then the attacker could overwrite all of our existing monthly backups (with, say, 1 byte of data), and 24 hours later it would all be deleted from our bucket!

I imagine there's a number of solutions to this, but one that comes to mind is: how do I create lifecycle rules that delete files X days after they're uploaded *while keeping 'daysFromUploadingToHiding' = null?

If we could set a lifecycle rule to go straight from upload to delete such that we could leave daysFromUploadingToHiding null, then we wouldn't effectively allow an would-be "append-only" key to be able to delete all our data.

Or, alternatively, if there were a 'writeVersions' capability for keys, and if a key that lacked this permission attempted to (over)write a file in a bucket that already existed, it would trigger an error rather than permitting it to upload a new file (causing the existing file to be hidden and potentially deleted by the lifecycle rules after 24 hours).

Or if there were some way to make it so that when a new file uploaded "over" an existing file of the same name didn't make that old version "hidden", then it would also solve this issue.

Please let me know how I can create an "append-only" application key in conjunction with lifecycle rules that implement our data retention policy--without letting the "append-only" key effectively delete all existing data.
  1. It's also important to verify that our append-only key doesn't have permission to mark an old version as as hidden. Ugh, it does. IMHO, Backblaze should really create a distinct permission/capability for writing new files vs writing new versions of existing files (including this hidden-file command)
[b2user@opensourceecology restore]$ ~/virtualenv/bin/b2 hide-file 'ose-server-backups' test.txt
{
  "action": "hide", 
  "fileId": "4_z5605817c251dadb96e4d0118_f118addadb89fa4a0_d20191202_m102714_c001_v0001130_t0058", 
  "fileName": "test.txt", 
  "size": 0, 
  "uploadTimestamp": 1575282434000
}
[b2user@opensourceecology restore]$ ~/virtualenv/bin/b2 ls --versions --long 'ose-server-backups' | grep -i test.txt
4_z5605817c251dadb96e4d0118_f118addadb89fa4a0_d20191202_m102714_c001_v0001130_t0058    hide  2019-12-02  10:27:14          0  test.txt
4_z5605817c251dadb96e4d0118_f1157fdf57c59dad1_d20191202_m080938_c001_v0001039_t0057  upload  2019-12-02  08:09:38          6  test.txt
4_z5605817c251dadb96e4d0118_f118d984fa5fe76bd_d20191202_m080016_c001_v0001131_t0058  upload  2019-12-02  08:00:16          6  test.txt
[b2user@opensourceecology restore]$ 
  1. And we also validate that the append-only key cannot change the lifecycle rules. Let's test on the dev backups bucket to be safe. FIrst we get the existing rules
[b2user@opensourceecology restore]$ ~/virtualenv/bin/b2 get-bucket 'ose-dev-server-backups'
{
	"accountId": "OBFUSCATED", 
	"bucketId": "OBFUSCATED", 
	"bucketInfo": {}, 
	"bucketName": "ose-dev-server-backups", 
	"bucketType": "allPrivate", 
	"corsRules": [], 
	"lifecycleRules": [
		{
			"daysFromHidingToDeleting": 1, 
			"daysFromUploadingToHiding": 2, 
			"fileNamePrefix": "daily_"
		}, 
		{
			"daysFromHidingToDeleting": 1, 
			"daysFromUploadingToHiding": 364, 
			"fileNamePrefix": "monthly_"
		}, 
		{
			"daysFromHidingToDeleting": 1, 
			"daysFromUploadingToHiding": 30, 
			"fileNamePrefix": "weekly_"
		}
	], 
	"options": [], 
	"revision": 4
}
[b2user@opensourceecology restore]$ 
  1. unfortunately my attempts to update the lifecycle rules with the b2 cli always failed. it just printed out the help page for the command with no further information. Not sure what's the issue here..
  2. there's not a whole lot of great references out there on append-only as a solution to ransomware, and it seems that very few cloud providers have actually implemented it. Especially in the open source space: it appears that it can't be natively setup by OpenStack's swift, Nextcloud, Owncloud, etc. Wikipedia has a great article comparing file hosting providers, but it doesn't have a column for append-only or lifecycle policies, so I spent some time adding these two columns and adding to the rows: Backblaze B2, Wasabi, and Borgbase https://en.wikipedia.org/wiki/Comparison_of_file_hosting_services
  3. Even though append-only has long been an attribute for filesystems permissions, datastructures, databases, and now APIs to cloud storage providers, there's not an article defining "append-only", so I created one as a draft for wikipedia https://en.wikipedia.org/wiki/Draft:Append-only
  4. ok, that's all I can do on no-append backups for now. I don't want to do my hack-ish swap on hide & delete lifecycle rules just yet (especially if someone isn't aware of it and swaps them back, then causing all the now-hidden backups to be deleted!) I'll wait to hear back from Backblaze..
  5. ...
  6. I also need to update our dev server's openvpn configuration to support 2FA before I train marcin on it
  7. this guide uses the 'openvpn-auth-pam.so' plugin https://www.mikejonesey.co.uk/security/2fa/openvpn-with-2fa
    1. which then delegates to the google authenticator pam module https://github.com/google/google-authenticator/
  8. I went to check the dev server's repos, and I found a pam_2fa module already in the yum repos
[maltfield@osedev1 ~]$ yum search 2fa
Loaded plugins: fastestmirror
Could not set cachedir: [Errno 28] No space left on device: '/var/tmp/yum-maltfield-cVl2V7'
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast

N/S matched: 2fa
================
pam_2fa.x86_64 : Second factor authentication for PAM

  Name and summary matches only, use "search all" for everything.
[maltfield@osedev1 ~]$ yum search mfa
Loaded plugins: fastestmirror
Could not set cachedir: [Errno 28] No space left on device: '/var/tmp/yum-maltfield-E_Ggwe'
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
Warning: No matches found for: mfa
No matches found
[maltfield@osedev1 ~]$ 
  1. unrelated, the above command also informed me that we're out of disk space on the dev node :(
[maltfield@osedev1 ~]$ df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/sda1                      19G  3.4G   15G  19% /
devtmpfs                      873M     0  873M   0% /dev
tmpfs                         896M     0  896M   0% /dev/shm
tmpfs                         896M  100M  796M  12% /run
tmpfs                         896M     0  896M   0% /sys/fs/cgroup
/dev/mapper/ose_dev_volume_1  125G  120G     0 100% /mnt/ose_dev_volume_1
tmpfs                         180M     0  180M   0% /run/user/1000
[maltfield@osedev1 ~]$ 
  1. so that's our "ebs" network volume that filled up; it hold the storage for our lxc stating node's container's rootfs in /mnt/ose_dev_volume_1/var/cache/lxc/centos/x86_64/7/rootfs/
[root@osedev1 ~]# du -sh /mnt/ose_dev_volume_1/var/cache/lxc/centos/x86_64/7/rootfs
433M	/mnt/ose_dev_volume_1/var/cache/lxc/centos/x86_64/7/rootfs
[root@osedev1 ~]# 
  1. wtf? it shows only 433M usage? That sounds like some file got deleted but it's still in usage by some process, so it's stuck in purgetory
  2. no, it doesn't look like that on either dev or statging
[root@osedev1 ~]# lsof 2>&1 | grep '(deleted)$' | sort -rnk 7 | head -20
[root@osedev1 ~]# 
[maltfield@osestaging1 ~]$ lsof 2>&1 | grep '(deleted)$' | sort -rnk 7 | head -20
[maltfield@osestaging1 ~]$ 
  1. also, our prod server is >100G and the staging server should be too, so that 433M just doesn't make sense at all..
  2. I can't blame docker entirely, but it definitely is partially to blame: it's overlay2 dir is taking up 13G of space
[root@osestaging1 ~]# du -sh /var/lib/docker
13G	/var/lib/docker
[root@osestaging1 ~]# 
  1. But I'm mostly to blame; looks like I've been tailing verbose lxc logs to a file that grew to 46G
[root@osedev1 osestaging1]# pwd
/var/lib/lxc/osestaging1
[root@osedev1 osestaging1]# du -sh *
4.0K    config
8.0K    dev
47G     lxc-start.log
8.0K    osestaging1
71G     rootfs
0       rootfs.dev
4.0K    ts
[root@osedev1 osestaging1]# 
[root@osedev1 ~]# ps -ef | grep -i lxc
root      3644  1798  1 Oct22 pts/1    12:22:17 /usr/bin/lua /bin/lxc-top
root     20002  1760  0 Nov07 pts/2    03:48:55 lxc-start -n osestaging1 -f config -l trace -o lxc-start.log
root     27165  2125 26 15:21 pts/8    00:00:04 du -sh config dev lxc-start.log osestaging1 rootfs rootfs.dev ts
root     27203 27185  0 15:21 pts/17   00:00:00 grep --color=auto -i lxc
[root@osedev1 ~]# 
  1. ok, I truncated the log file and confirmed we now have 45G avaiable
[root@osedev1 osestaging1]# du -sh *
4.0K    config
8.0K    dev
44K     lxc-start.log
8.0K    osestaging1
df -h
71G     rootfs
0       rootfs.dev
4.0K    ts
[root@osedev1 osestaging1]# df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/sda1                      19G  3.4G   15G  19% /
devtmpfs                      873M     0  873M   0% /dev
tmpfs                         896M     0  896M   0% /dev/shm
tmpfs                         896M  100M  796M  12% /run
tmpfs                         896M     0  896M   0% /sys/fs/cgroup
/dev/mapper/ose_dev_volume_1  125G   74G   45G  63% /mnt/ose_dev_volume_1
tmpfs                         180M     0  180M   0% /run/user/1000
[root@osedev1 osestaging1]# 
  1. I went ahead and stopped the lxc staging container and restarted it *without* dumping trace-level loggings to an ever-expanding file
[root@osedev1 osestaging1]# lxc-stop --name osestaging1
[root@osedev1 osestaging1]# lxc-start -n osestaging1
...
[  OK  ] Started containerd container runtime.
		 Starting Docker Application Container Engine...

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

osestaging1 login: 
  1. ok, now that's sane, continuing with openvpn 2fa..
  2. this site lists at least 3x pam 2fa modules https://cern-cert.github.io/pam_2fa/
    1. yubico's https://github.com/Yubico/yubico-pam
    2. duo's https://github.com/duosecurity/duo_unix
    3. cern's https://github.com/CERN-CERT/pam_2fa
  3. it looks like there's also a 'google-authenticator' package in the yum repos
[root@osedev1 etc]# yum search google-auth
Loaded plugins: fastestmirror
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
Determining fastest mirrors
 * base: mirror.checkdomain.de
 * epel: ftp.plusline.net
 * extras: linux.darkpenguin.net
 * updates: mirror.ratiokontakt.de

N/S matched: google-auth
========================
google-authenticator.x86_64 : One-time pass-code support using open standards
python2-google-auth.noarch : Google Auth Python Library

  Name and summary matches only, use "search all" for everything.
You have new mail in /var/spool/mail/root
[root@osedev1 etc]# 
  1. according to pkgs.org, the yum pam_2fa package is a package from epel that came from CERN above https://centos.pkgs.org/7/epel-x86_64/pam_2fa-1.0-1.el7.x86_64.rpm.html
  2. And the 'google-authenticator' package comes from this one https://centos.pkgs.org/7/epel-x86_64/google-authenticator-1.04-1.el7.x86_64.rpm.html
    1. https://github.com/google/google-authenticator-libpam/
  3. let's compare
  4. https://github.com/CERN-CERT/pam_2fa
  5. - 1 contributor on github
  6. * Apache-2.0 license
  7. + 211 commits, most recently 1 month ago
  8. * first commit 2019-04
  9. + this repo is owned & maintained by Google, Inc (jesus, TIL there's a .google TLD) https://github.com/google
  10. https://github.com/google/google-authenticator-libpam/
  11. + 29 contributors on github
  12. * gnu license (?)
  13. * 114 commits, most recently 8 months ago
  14. + first commit on 2014-01
  15. + this repo is owned & maintained by the CERN CERT https://github.com/CERN-CERT
  16. all that considered, sorry CERN, I'm going to have to go with google's open source implementation.
  17. I went ahead and installed it
[root@osedev1 etc]# yum install google-authenticator
...
Installed:
  google-authenticator.x86_64 0:1.04-1.el7                                                                                                                                 

Complete!
[root@osedev1 etc]# rpm -ql google-authenticator
/usr/bin/google-authenticator
/usr/lib64/security/pam_google_authenticator.la
/usr/lib64/security/pam_google_authenticator.so
/usr/share/doc/google-authenticator-1.04
/usr/share/doc/google-authenticator-1.04/CONTRIBUTING.md
/usr/share/doc/google-authenticator/FILEFORMAT
/usr/share/doc/google-authenticator/README.md
/usr/share/doc/google-authenticator/totp.html
/usr/share/licenses/google-authenticator-1.04
/usr/share/licenses/google-authenticator-1.04/LICENSE
/usr/share/man/man1/google-authenticator.1.gz
/usr/share/man/man8/pam_google_authenticator.8.gz
[root@osedev1 etc]# 
  1. Per the documentation on the github page, I just ran the /usr/bin/google-authenticator binary to generate a new 2fa secret key for myself.
    1. a few notes: I set a window size of 8, which will make 8 codes in both the past & future acceptable; that'll permit up to a 4 minute time drift.
    2. I've set the rate limit to 2 every 30 seconds. If someone tries to login with an OTP code 3 or more times in a given 30 second window, it will be denied
    3. I actually wanted to set the emergency codes to 0, but it wouldn't let me :(
    4. as for the issuer & label, on my phone andOTP displays this as "vpn.opensourceecology.org - maltfield@osedev1", which I think is the best way to explain to the user what it is: first: this is a code for the VPN on OSE's network. Second, it was specifically generated on the osedev1 server for the user 'maltfield'.
google-authenticator --time-based --disallow-reuse --issuer "vpn.opensourceecology.org" --label "`whoami`@osedev1" --emergency-codes=1 --window-size=8 --rate-limit=2 --rate-time=30
  1. I created a new pam.d config file for openvpn
[root@osedev1 etc]# cat /etc/pam.d/openvpn 
# google auth
auth        required    /usr/lib64/security/pam_google_authenticator.so

account        required    pam_nologin.so
account        include        system-auth
password    include        system-auth
session        include        system-auth
[root@osedev1 etc]# 
  1. And I updated the openvpn server config file to use the above file. And I restarted the openvpn server.
[root@osedev1 server]# tail /etc/openvpn/server/server.conf
# Notify the client that when the server restarts so it
# can automatically reconnect.
explicit-exit-notify 1

# additional hardening --maltfield
tls-version-min 1.2
tls-cipher TLS-DHE-RSA-WITH-AES-256-GCM-SHA384

# google-authenticator 2fa
plugin /usr/lib64/openvpn/plugins/openvpn-plugin-auth-pam.so openvpn
[root@osedev1 server]# 
[root@osedev1 server]# systemctl restart openvpn@server
[root@osedev1 server]# 
  1. I disconnected fro the VPN on my laptop and attempted to reconnect, but I never got in. This is what I got on the client
Mon Dec  2 21:53:13 2019 ++ Certificate has key usage  00a0, expects 00a0
Mon Dec  2 21:53:13 2019 VERIFY KU OK
Mon Dec  2 21:53:13 2019 Validating certificate extended key usage
Mon Dec  2 21:53:13 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Mon Dec  2 21:53:13 2019 VERIFY EKU OK
Mon Dec  2 21:53:13 2019 VERIFY OK: depth=0, CN=server
  1. And this was the openvpn server's logs
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 TLS: Initial packet from [AF_INET]182.74.197.50:58218, sid=8f76e1f9 1d391d64
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 VERIFY OK: depth=1, CN=osedev1
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 VERIFY OK: depth=0, CN=maltfield
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 peer info: IV_VER=2.4.0
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 peer info: IV_PLAT=linux
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 peer info: IV_PROTO=2
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 peer info: IV_NCP=2
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 peer info: IV_LZ4=1
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 peer info: IV_LZ4v2=1
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 peer info: IV_LZO=1
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 peer info: IV_COMP_STUB=1
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 peer info: IV_COMP_STUBv2=1
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 peer info: IV_TCPNL=1
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 TLS Error: Auth Username/Password was not provided by peer
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 TLS Error: TLS handshake failed
Dec 02 17:08:13 osedev1 openvpn[24978]: Mon Dec  2 17:08:13 2019 182.74.197.50:58218 SIGUSR1[soft,tls-error] received, client-instance restarting
Dec 02 17:08:37 osedev1 kernel: docker0: port 1(vethdeb72fd) entered blocking state
  1. Turns out I have to add the 'auth-user-pass' field to my client config https://securityskittles.wordpress.com/2012/03/14/two-factor-authentication-for-openvpn-on-centos-using-google-authenticator/
user@ose:~$ tail openvpn/client.conf
# hardening
tls-cipher TLS-DHE-RSA-WITH-AES-256-GCM-SHA384

# dns for staging
script-security 2
up /etc/openvpn/update-resolv-conf
down /etc/openvpn/update-resolv-conf

# 2fa
auth-user-pass
user@ose:~$ 
  1. And now it works!
user@ose:~$ sudo openvpn openvpn/client.conf
Mon Dec  2 21:55:37 2019 OpenVPN 2.4.0 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Oct 14 2018
Mon Dec  2 21:55:37 2019 library versions: OpenSSL 1.0.2t  10 Sep 2019, LZO 2.08
Enter Auth Username: maltfield
Enter Auth Password: ******
Mon Dec  2 21:55:57 2019 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Enter Private Key Password: *
Mon Dec  2 21:56:00 2019 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Mon Dec  2 21:56:00 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Mon Dec  2 21:56:00 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Mon Dec  2 21:56:00 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Mon Dec  2 21:56:00 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Mon Dec  2 21:56:00 2019 UDP link local: (not bound)
Mon Dec  2 21:56:00 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Mon Dec  2 21:56:00 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=959cd30b 98bdba8b
Mon Dec  2 21:56:01 2019 VERIFY OK: depth=1, CN=osedev1
Mon Dec  2 21:56:01 2019 Validating certificate key usage
Mon Dec  2 21:56:01 2019 ++ Certificate has key usage  00a0, expects 00a0
Mon Dec  2 21:56:01 2019 VERIFY KU OK
Mon Dec  2 21:56:01 2019 Validating certificate extended key usage
Mon Dec  2 21:56:01 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Mon Dec  2 21:56:01 2019 VERIFY EKU OK
Mon Dec  2 21:56:01 2019 VERIFY OK: depth=0, CN=server
Mon Dec  2 21:56:01 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Mon Dec  2 21:56:01 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Mon Dec  2 21:56:02 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Mon Dec  2 21:56:02 2019 PUSH: Received control message: 'PUSH_REPLY,dhcp-option DNS 10.241.189.1,route 10.241.189.0 255.255.255.0,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 1,cipher AES-256-GCM'
Mon Dec  2 21:56:02 2019 OPTIONS IMPORT: timers and/or timeouts modified
Mon Dec  2 21:56:02 2019 OPTIONS IMPORT: --ifconfig/up options modified
Mon Dec  2 21:56:02 2019 OPTIONS IMPORT: route options modified
Mon Dec  2 21:56:02 2019 OPTIONS IMPORT: --ip-win32 and/or --dhcp-option options modified
Mon Dec  2 21:56:02 2019 OPTIONS IMPORT: peer-id set
Mon Dec  2 21:56:02 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Mon Dec  2 21:56:02 2019 OPTIONS IMPORT: data channel crypto options modified
Mon Dec  2 21:56:02 2019 Data Channel Encrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Mon Dec  2 21:56:02 2019 Data Channel Decrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Mon Dec  2 21:56:02 2019 ROUTE_GATEWAY 10.137.0.6
Mon Dec  2 21:56:02 2019 TUN/TAP device tun0 opened
Mon Dec  2 21:56:02 2019 TUN/TAP TX queue length set to 100
Mon Dec  2 21:56:02 2019 do_ifconfig, tt->did_ifconfig_ipv6_setup=0
Mon Dec  2 21:56:02 2019 /sbin/ip link set dev tun0 up mtu 1500
Mon Dec  2 21:56:02 2019 /sbin/ip addr add dev tun0 local 10.241.189.10 peer 10.241.189.9
Mon Dec  2 21:56:02 2019 /etc/openvpn/update-resolv-conf tun0 1500 1552 10.241.189.10 10.241.189.9 init
dhcp-option DNS 10.241.189.1
Mon Dec  2 21:56:02 2019 /sbin/ip route add 10.241.189.0/24 via 10.241.189.9
Mon Dec  2 21:56:02 2019 Initialization Sequence Completed
  1. I found that I could also eliminate the need to type my username every time if I give an argument to 'auth-user-pass' a path to a filename that contains only 1 line with the username on it (omitting the second line, which is ordinarailly the password, will make the openvpn client prompt the user for the password) https://openvpn.net/community-resources/reference-manual-for-openvpn-2-4/
user@ose:~$ tail openvpn/client.conf
# hardening
tls-cipher TLS-DHE-RSA-WITH-AES-256-GCM-SHA384

# dns for staging
script-security 2
up /etc/openvpn/update-resolv-conf
down /etc/openvpn/update-resolv-conf

# 2fa
auth-user-pass /home/user/openvpn/username.txt
user@ose:~$ cat openvpn/username.txt 
maltfield
user@ose:~$ sudo openvpn openvpn/client.conf
Mon Dec  2 22:04:53 2019 WARNING: file '/home/user/openvpn/username.txt' is group or others accessible
Mon Dec  2 22:04:53 2019 OpenVPN 2.4.0 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Oct 14 2018
Mon Dec  2 22:04:53 2019 library versions: OpenSSL 1.0.2t  10 Sep 2019, LZO 2.08
Enter Auth Password: 
  1. But that's not a great UX and it would certainly confuse Marcin. How do I replace "Enter Auth Password" with "Enter 2FA Token"?

Tue Nov 26, 2019

  1. continuing from yesterday, why are my iptables rules failing inside the docker container? I literally whitelisted every user on the system, but it's still failing to resolve DNS
root@osestaging1-discourse-ose:~/backups/iptables/20191125# cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
systemd-timesync:x:101:102:systemd Time Synchronization,,,:/run/systemd:/usr/sbin/nologin
systemd-network:x:102:103:systemd Network Management,,,:/run/systemd:/usr/sbin/nologin
systemd-resolve:x:103:104:systemd Resolver,,,:/run/systemd:/usr/sbin/nologin
messagebus:x:104:105::/nonexistent:/usr/sbin/nologin
Debian-exim:x:105:108::/var/spool/exim4:/usr/sbin/nologin
postgres:x:106:110:PostgreSQL administrator,,,:/var/lib/postgresql:/bin/bash
sshd:x:107:65534::/run/sshd:/usr/sbin/nologin
runit-log:x:999:999::/nonexistent:/usr/sbin/nologin
redis:x:108:111::/var/lib/redis:/usr/sbin/nologin
discourse:x:1000:1000::/home/discourse:/bin/bash
root@osestaging1-discourse-ose:~/backups/iptables/20191125# 
root@osestaging1-discourse-ose:~/backups/iptables/20191125# 
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -F
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 0 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 1 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 2 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 3 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 4 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 5 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 6 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 7 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 8 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 9 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 10 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 13 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 33 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 34 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 38 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 39 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 41 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 100 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 101 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 102 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 103 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -p tcp -m owner --uid-owner 65534 -j ACCEPT
root@osestaging1-discourse-ose:~/backups/iptables/20191125# iptables -A OUTPUT -j DROP
root@osestaging1-discourse-ose:~/backups/iptables/20191125# 
root@osestaging1-discourse-ose:/#
root@osestaging1-discourse-ose:/# apt-get install iptables-persistent
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  netfilter-persistent
The following NEW packages will be installed:
  iptables-persistent netfilter-persistent
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 21.8 kB of archives.
After this operation, 80.9 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Err:1 http://deb.debian.org/debian buster/main amd64 netfilter-persistent all 1.0.11
  Temporary failure resolving 'deb.debian.org'
Err:2 http://deb.debian.org/debian buster/main amd64 iptables-persistent all 1.0.11
  Temporary failure resolving 'deb.debian.org'
E: Failed to fetch http://deb.debian.org/debian/pool/main/i/iptables-persistent/netfilter-persistent_1.0.11_all.deb  Temporary failure resolving 'deb.debian.org'
E: Failed to fetch http://deb.debian.org/debian/pool/main/i/iptables-persistent/iptables-persistent_1.0.11_all.deb  Temporary failure resolving 'deb.debian.org'
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
root@osestaging1-discourse-ose:/#
  1. oh, duh, those are all explicitly tcp rules x_x
  2. yeah, removing to tcp arg (to permit udp) worked
root@osestaging1-discourse-ose:/# iptables -F
root@osestaging1-discourse-ose:/# iptables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
root@osestaging1-discourse-ose:/# iptables -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
root@osestaging1-discourse-ose:/# iptables -A OUTPUT -j DROP
root@osestaging1-discourse-ose:/# 
root@osestaging1-discourse-ose:/# iptables-save
# Generated by xtables-save v1.8.2 on Tue Nov 26 09:28:51 2019
*filter
:OUTPUT ACCEPT [66:3802]
:FORWARD ACCEPT [0:0]
:INPUT ACCEPT [82:547247]
-A OUTPUT -m owner --uid-owner 0 -j ACCEPT
-A OUTPUT -m owner --uid-owner 100 -j ACCEPT
-A OUTPUT -j DROP
COMMIT
# Completed on Tue Nov 26 09:28:51 2019
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
root@osestaging1-discourse-ose:/# apt-get install iptables-persistent
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  netfilter-persistent
The following NEW packages will be installed:
  iptables-persistent netfilter-persistent
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 21.8 kB of archives.
After this operation, 80.9 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://deb.debian.org/debian buster/main amd64 netfilter-persistent all 1.0.11 [10.1 kB]
Get:2 http://deb.debian.org/debian buster/main amd64 iptables-persistent all 1.0.11 [11.7 kB]
Fetched 21.8 kB in 0s (681 kB/s)
  1. now that iptables-persistent is installed, there are files at /etc/iptables/rules.v*
root@osestaging1-discourse-ose:/# cat /etc/iptables/rules.v*
# Generated by xtables-save v1.8.2 on Tue Nov 26 09:30:15 2019
*filter
:OUTPUT ACCEPT [66:3802]
:FORWARD ACCEPT [0:0]
:INPUT ACCEPT [101:571562]
-A OUTPUT -m owner --uid-owner 0 -j ACCEPT
-A OUTPUT -m owner --uid-owner 100 -j ACCEPT
-A OUTPUT -j DROP
COMMIT
# Completed on Tue Nov 26 09:30:15 2019
# Generated by xtables-save v1.8.2 on Tue Nov 26 09:30:15 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
COMMIT
# Completed on Tue Nov 26 09:30:15 2019
root@osestaging1-discourse-ose:/# 
  1. let's also add ipv6 rules
root@osestaging1-discourse-ose:/# ip6tables-save
# Generated by xtables-save v1.8.2 on Tue Nov 26 09:31:56 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
COMMIT
# Completed on Tue Nov 26 09:31:56 2019
# Warning: ip6tables-legacy tables present, use ip6tables-legacy-save to see them
root@osestaging1-discourse-ose:/# ip6tables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
root@osestaging1-discourse-ose:/# ip6tables -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
root@osestaging1-discourse-ose:/# ip6tables -A OUTPUT -j DROP
root@osestaging1-discourse-ose:/# 
root@osestaging1-discourse-ose:/# ip6tables-save
# Generated by xtables-save v1.8.2 on Tue Nov 26 09:32:03 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A OUTPUT -m owner --uid-owner 0 -j ACCEPT
-A OUTPUT -m owner --uid-owner 100 -j ACCEPT
-A OUTPUT -j DROP
COMMIT
# Completed on Tue Nov 26 09:32:03 2019
# Warning: ip6tables-legacy tables present, use ip6tables-legacy-save to see them
root@osestaging1-discourse-ose:/# 
  1. unfortunately there's now init.d service or sv runit scripts for saving or restarting iptables!
  2. I tried to stop & start iptables, but it lost the config; even though the config was stored to /etc/itptables/rules.v4 :\
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/# iptables-save
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
root@osestaging1-discourse-ose:/# iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
# Warning: iptables-legacy tables present, use iptables-legacy to see them
root@osestaging1-discourse-ose:/# cat /etc/iptables/rules.v*
# Generated by xtables-save v1.8.2 on Tue Nov 26 09:30:15 2019
*filter
:OUTPUT ACCEPT [66:3802]
:FORWARD ACCEPT [0:0]
:INPUT ACCEPT [101:571562]
-A OUTPUT -m owner --uid-owner 0 -j ACCEPT
-A OUTPUT -m owner --uid-owner 100 -j ACCEPT
-A OUTPUT -j DROP
COMMIT
# Completed on Tue Nov 26 09:30:15 2019
# Generated by xtables-save v1.8.2 on Tue Nov 26 09:30:15 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
COMMIT
# Completed on Tue Nov 26 09:30:15 2019
root@osestaging1-discourse-ose:/#
  1. I've been debating baking this into the image a build time, but if it can't persist then it'll *have* to live in a template. I really don't like having the box come up with networking and an open firewall, even for a microsecond, but I guess this is probably the best option
  2. I added a 'templates/iptables.template.yml' that just runs the iptables & ip6tables commands and attempted a bootstrap. it failed saying that there was an issue with my iptables syntax?
[root@osestaging1 discourse]# cat templates/iptables.template.yml 
run:
  - exec:
	 cmd:
	   # run these every time since the container can't persist iptables rules
	   - sudo apt-get install -y iptables 
	   - iptables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
	   - iptables -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
	   - iptables -A OUTPUT -j DROP
	   - ip6tables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
	   - ip6tables -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
	   - ip6tables -A OUTPUT -j DROP
[root@osestaging1 discourse]# 
[root@osestaging1 discourse]# docker stop discourse_ose
discourse_ose
[root@osestaging1 discourse]# ./launcher bootstrap discourse_ose
...
I, [2019-11-26T09:51:45.398930 #1]  INFO -- : > iptables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
iptables v1.8.2 (nf_tables): Couldn't load match `owner':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
I, [2019-11-26T09:51:45.409106 #1]  INFO -- : 


FAILED
--------------------
Pups::ExecError: iptables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT failed with return #<Process::Status: pid 190 exit 2>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params {"cmd"=>["sudo apt-get install -y iptables", "iptables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT", "iptables -A OUTPUT -m owner --uid-owner 100 -j ACCEPT", "iptables -A OUTPUT -j DROP", "ip6tables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT", "ip6tables -A OUTPUT -m owner --uid-owner 100 -j ACCEPT", "ip6tables -A OUTPUT -j DROP"]}
57dee8aa8bba2e5e61f0978c14def84b5981de1be525dc36c3ad5fad191ddca3
 FAILED TO BOOTSTRAP  please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
[root@osestaging1 discourse]# 
  1. I tried changing it from 'iptables' to 'iptables-legacy'. Now I'm getting Permision issues again
[root@osestaging1 discourse]# cat templates/iptables.template.yml 
run:
  - exec:
	 cmd:
	   # run these every time since the container can't persist iptables rules
	   - sudo apt-get install -y iptables 
	   - iptables-legacy -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
	   - iptables-legacy -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
	   - iptables-legacy -A OUTPUT -j DROP
	   - ip6tables-legacy -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
	   - ip6tables-legacy -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
	   - ip6tables-legacy -A OUTPUT -j DROP
[root@osestaging1 discourse]# 
[root@osestaging1 discourse]# ./launcher bootstrap discourse_ose
...
I, [2019-11-26T09:54:59.014399 #1]  INFO -- : > iptables-legacy -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
getsockopt failed strangely: Operation not permitted
I, [2019-11-26T09:54:59.023038 #1]  INFO -- : 


FAILED
--------------------
Pups::ExecError: iptables-legacy -A OUTPUT -m owner --uid-owner 0 -j ACCEPT failed with return #<Process::Status: pid 190 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params {"cmd"=>["sudo apt-get install -y iptables", "iptables-legacy -A OUTPUT -m owner --uid-owner 0 -j ACCEPT", "iptables-legacy -A OUTPUT -m owner --uid-owner 100 -j ACCEPT", "iptables-legacy -A OUTPUT -j DROP", "ip6tables-legacy -A OUTPUT -m owner --uid-owner 0 -j ACCEPT", "ip6tables-legacy -A OUTPUT -m owner --uid-owner 100 -j ACCEPT", "ip6tables-legacy -A OUTPUT -j DROP"]}
beed5935a7fdbbc447610d57e295f93a133c0bf9dff6f82bc7afaea531d91526
 FAILED TO BOOTSTRAP  please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
[root@osestaging1 discourse]#
# I confirmed that the NET_ADMIN capacity was still present
<pre>
[root@osestaging1 discourse]# docker inspect discourse_ose | grep -iC3 CapAdd
			"AutoRemove": false,
			"VolumeDriver": "",
			"VolumesFrom": null,
			"CapAdd": [
				"NET_ADMIN"
			],
			"CapDrop": null,
[root@osestaging1 discourse]# 
  1. adding sudo didn't help either. someone else had this issue, but there's no solution https://stackoverflow.com/questions/50419819/adding-a-new-user-to-docker-and-limiting-its-permissions
  2. I moved this to a runinit instead
[root@osestaging1 discourse]# cat templates/iptables.template.yml 
run:
  - exec:
	 cmd:
	   - sudo apt-get install -y iptables

  - file:
	 path: /etc/runit/1.d/000-iptables
	 contents: |
		#!/bin/bash
		sudo apt-get install -y iptables 
		sudo iptables-legacy -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
		sudo iptables-legacy -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
		sudo iptables-legacy -A OUTPUT -j DROP
		sudo ip6tables-legacy -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
		sudo ip6tables-legacy -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
		sudo ip6tables-legacy -A OUTPUT -j DROP

[root@osestaging1 discourse]# 
  1. I also had issues with launcher killing my changes to the container's hostconfig json file. the problem was solved by doing a docker restart, which is necessary to make the config stick

id=`docker inspect --format=".Id" discourse_ose` grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json systemctl restart docker

  1. now the box that comes up from `launcher start discourse_ose` has iptables permission! But there's on config :\
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/var/www/discourse# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
# Warning: iptables-legacy tables present, use iptables-legacy to see them
root@osestaging1-discourse-ose:/var/www/discourse#
  1. I'm having issues with the `launcher bootstrap` process killing my CapAdd settings; this is the process that works
/var/discourse/launcher stop discourse_ose
/var/discourse/launcher bootstrap discourse_ose
id=`docker inspect --format=".Id" discourse_ose`
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
systemctl restart docker
/var/discourse/launcher start discourse_ose
  1. I had an issue where the docker container got stuck in a loop boot. `docker ps` just kept saying its state was "restarting". To figure out what the problem was, I had to tail the docker logs https://stackoverflow.com/questions/37471929/docker-container-keeps-on-restarting-again-on-again
2019-11-26T13:20:16.076166067Z run-parts: executing /etc/runit/1.d/00-ensure-links
2019-11-26T13:20:16.115041657Z run-parts: executing /etc/runit/1.d/00-fix-var-logs
2019-11-26T13:20:16.190732272Z run-parts: executing /etc/runit/1.d/000-iptables
2019-11-26T13:20:16.190786173Z run-parts: failed to exec /etc/runit/1.d/000-iptables: Exec format error
2019-11-26T13:20:16.190800674Z run-parts: /etc/runit/1.d/000-iptables exited with return code 1
[root@osestaging1 discourse]# 
  1. this is really fucking frustrating, because the iteration time of trying to get this damn script correct is like 10-20 minutes on the discourse 'bootstrap' command!
  2. this is where I'm at now. Could it be that apt-get can't run yet in the early stage of runlevel 1?
[root@osestaging1 discourse]# cat templates/iptables.template.yml 
run:
  - exec:
	 cmd:
	   - sudo apt-get install -y iptables

  - file:
	 path: /etc/runit/1.d/000-iptables
	 chmod: "+x"
	 contents: |
		################################################################################
		# File:    /etc/runit/1.d/000-iptables
		# Version: 0.1
		# Purpose: installs & locks-down iptables
		# Author:  Michael Altfield <michael@opensourceecology.org>
		# Created: 2019-11-26
		# Updated: 2019-11-26
		################################################################################
		#!/bin/bash
		sudo apt-get install -y iptables 
		sudo iptableslegacy -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
		sudo iptables -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
		sudo iptables -A OUTPUT -j DROP
		sudo ip6tables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
		sudo ip6tables -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
		sudo ip6tables -A OUTPUT -j DROP

[root@osestaging1 discourse]# 
  1. ugh, there's a typo on the second command. Let's bootstrap again and wait another 10 minutes to see if that worked..
  2. I tried moving this to runlevel 2 instead of 1, but then I discovered that it *still* got stuck on the old 1.d runlevel file. Looks like it doesn't get deleted between bootstrap runs unless I do a destroy; so now these are my iteration commands
/var/discourse/launcher destroy discourse_ose
time nice /var/discourse/launcher bootstrap discourse_ose
id=`docker inspect --format=".Id" discourse_ose`
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
systemctl restart docker
/var/discourse/launcher start discourse_ose
  1. actually, the bootstrap starts the docker container too early after it's gotten rid of our changes to the container's capacity (NET_ADMIN), so it comes up without the ability for root to do an `iptables` a stop immediatly after the bootstrap followed by the change followed by a start works
/var/discourse/launcher destroy discourse_ose
time nice /var/discourse/launcher bootstrap discourse_ose
/var/discourse/launcher stop discourse_ose
id=`docker inspect --format=".Id" discourse_ose`
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
systemctl restart docker
/var/discourse/launcher start discourse_ose
  1. actually, that *still* doesn't work. The hostconfig.json file does not exist after the `launcher destroy`, and it isn't created until after a `launcher start`. But then the first start is necessarily can't have the NET_ADMIN capacity
  2. I did verify that the script runs without isues if I manually trigger it from runlevel = 2
root@osestaging1-discourse-ose:/# /etc/runit/2.d/000-iptables 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
iptables is already the newest version (1.8.2-4).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@osestaging1-discourse-ose:/# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             owner UID match root
ACCEPT     all  --  anywhere             anywhere             owner UID match _apt
DROP       all  --  anywhere             anywhere            
# Warning: iptables-legacy tables present, use iptables-legacy to see them
root@osestaging1-discourse-ose:/# cat /etc/runit/2.d/000-iptables 
################################################################################
# File:    /etc/runit/1.d/000-iptables
# Version: 0.1
# Purpose: installs & locks-down iptables
# Author:  Michael Altfield <michael@opensourceecology.org>
# Created: 2019-11-26
# Updated: 2019-11-26
################################################################################
#!/bin/bash
sudo apt-get install -y iptables 
sudo iptables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
sudo iptables -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
sudo iptables -A OUTPUT -j DROP
sudo ip6tables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
sudo ip6tables -A OUTPUT -m owner --uid-owner 100 -j ACCEPT
sudo ip6tables -A OUTPUT -j DROProot@osestaging1-discourse-ose:/# 
  1. ah, I think I understand at least why my script isn't being called when I put it in runlevel 2
root@osestaging1-discourse-ose:/etc/runit# cat 1
#!/bin/bash

/bin/run-parts --verbose --exit-on-error /etc/runit/1.d || exit 100
root@osestaging1-discourse-ose:/etc/runit# cat 2
#!/bin/bash
exec /usr/bin/runsvdir -P /etc/service
root@osestaging1-discourse-ose:/etc/runit# cat 3
#!/bin/bash

/bin/run-parts --verbose /etc/runit/3.d
root@osestaging1-discourse-ose:/etc/runit# 
  1. the actual command in `launcher` that finally creates e hostconfig.json file is this `docker run` one
+ id=sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63
+ grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e UNICORN_WORKERS=2 -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscourse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=172.17.0.1 -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-discourse-ose -e DOCKER_HOST_IP=172.17.0.1 --name discourse_ose -t -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:fc:97:b8:b4:0d local_discourse/discourse_ose /sbin/boot
acb146563bc67ad946d0ae3ec40ebb01c08d51f2b457676aba9a52a67bcb4896
++ docker inspect '--format=.Id' discourse_ose
+ id=acb146563bc67ad946d0ae3ec40ebb01c08d51f2b457676aba9a52a67bcb4896
+ grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/acb146563bc67ad946d0ae3ec40ebb01c08d51f2b457676aba9a52a67bcb4896/hostconfig.json
{"Binds":["/var/discourse/shared/standalone:/shared","/var/discourse/shared/standalone/log/var-log:/var/log"],"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"always","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":null,"CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":536870912,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
  1. I was finally able to trigger the creation of the hostconfig.json file with this
/bin/docker run --rm -i -a stdin -a stdout --name discourse_ose local_discourse/discourse_ose /sbin/boot
  1. for example I run this in one terminal
[root@osestaging1 discourse]# /var/discourse/launcher destroy discourse_ose
+ /bin/docker stop -t 10 discourse_ose
Error response from daemon: No such container: discourse_ose
discourse_ose was not found
[root@osestaging1 discourse]# time nice /var/discourse/launcher bootstrap discourse_ose

INFO: checking hostconfig capacities before 5
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
INFO: checking hostconfig capacities before 6
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
INFO: checking hostconfig capacities before 7
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
cd /pups && git pull && /pups/bin/pups --stdin
Already up to date.
I, [2019-11-26T15:00:50.276279 #1]  INFO -- : Loading --stdin
I, [2019-11-26T15:00:50.278427 #1]  INFO -- : Skipped missing after_code hook
I, [2019-11-26T15:00:50.308117 #1]  INFO -- : File > /etc/runit/2.d/000-iptables  chmod: +x  chown:
I, [2019-11-26T15:00:50.308310 #1]  INFO -- : > echo "Beginning of custom commands"
I, [2019-11-26T15:00:50.313145 #1]  INFO -- : Beginning of custom commands

I, [2019-11-26T15:00:50.313367 #1]  INFO -- : > echo "End of custom commands"
I, [2019-11-26T15:00:50.318130 #1]  INFO -- : End of custom commands

sha256:ed80d37d26774cf0d3a51cde9e08808814281f4ebb07cc7a8e44a7c1b333e3da
5e1526cce436c9c343cfc6c58a4b0421548836ac49d5b7f9a525d28a82dc1ed3
Successfully bootstrapped, to startup use ./launcher start discourse_ose

real    0m44.255s
user    0m1.832s
sys     0m1.420s
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
[root@osestaging1 discourse]# /bin/docker run --rm -i -a stdin -a stdout --name discourse_ose local_discourse/discourse_ose /sbin/boot
Cleaning stale PID files
Started runsvdir, PID is 25
chgrp: invalid group: ‘syslog’
rsyslogd: imklog: cannot open kernel log (/proc/kmsg): Operation not permitted.
rsyslogd: activation of module imklog failed [v8.1901.0 try https://www.rsyslog.com/e/2145 ]
  1. which gets stuck, then in another terminal: boom!
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
{"Binds":null,"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"no","MaximumRetryCount":0},"AutoRemove":true,"VolumeDriver":"","VolumesFrom":null,"CapAdd":null,"CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":67108864,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 discourse]# 
  1. this simpler command works too!
docker run --name discourse_ose local_discourse/discourse_ose whoami
  1. for example
[root@osestaging1 discourse]# /var/discourse/launcher destroy discourse_ose
+ /bin/docker stop -t 10 discourse_ose
Error response from daemon: No such container: discourse_ose
discourse_ose was not found
[root@osestaging1 discourse]# time nice /var/discourse/launcher bootstrap discourse_ose
...
sha256:2a321b9e0983a5134e9ca9459d4fc31b83eedb209ecf2a95697de6ed78e87542
8f0c04f55129ddc5bf7e9a85a5d724ce5b696f8c31dc525ea5625f1554887315
Successfully bootstrapped, to startup use ./launcher start discourse_ose

real    0m26.934s
user    0m1.899s
sys     0m1.511s
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
[root@osestaging1 discourse]# docker run --name discourse_ose local_discourse/discourse_ose whoami
root
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
{"Binds":null,"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"no","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":null,"CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":67108864,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 discourse]#
  1. so here's probably a better bootstrap method that hopefully gets the first boot with NET_ADMIN capabilities..
/var/discourse/launcher destroy discourse_ose
time nice /var/discourse/launcher bootstrap discourse_ose

# create hostconfig.json and grant the container NET_ADMIN permissions (for iptables)
# this hack is necessary because `docker start` doesn't take --cap-add
docker run --name discourse_ose local_discourse/discourse_ose whoami
id=`docker inspect --format=".Id" discourse_ose`
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
systemctl restart docker

/var/discourse/launcher start discourse_ose
  1. well fuck, now the last line `launcher start discourse_ose` doesn't actually start the docker instance!
  2. even manually running it doesn't work. da fuck?
[root@osestaging1 discourse]# docker start discourse_ose
discourse_ose
[root@osestaging1 discourse]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@osestaging1 discourse]# 
  1. ok, so I think what's happeneing is that I created a fork of the container "discourse_ose" called "local_discourse/discourse_ose" but such that when it's started, it runs only `whoami` and exits. This is a damn catch-22, because I need to make it run something that's *not* a boot so I can give it the NET_ADMIN capacity before booting!!
  2. I tried to "overwrite" the "whoami" with "/sbin/boot" after setting the CapAdd, but it said I had to delete the old one first. Of course, when I delete the old one, then I loose the config
root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
{"Binds":null,"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"no","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":["NET_ADMIN"],"CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":67108864,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 discourse]# 
[root@osestaging1 discourse]# docker ps -a | grep -i local_discourse
419cfe8ed8bc        local_discourse/discourse_ose   "whoami"                 11 minutes ago      Exited (0) 6 minutes ago                           discourse_ose
[root@osestaging1 discourse]# docker rm 419cfe8ed8bc
419cfe8ed8bc
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
[root@osestaging1 discourse]# 
  1. ok, what if I create the first docker instance just for a second, then terminate it, then update the configs, then start it again? Something like this?
/var/discourse/launcher destroy discourse_ose
time nice /var/discourse/launcher bootstrap discourse_ose

# create hostconfig.json and grant the container NET_ADMIN permissions (for iptables)
# this hack is necessary because `docker start` doesn't take --cap-add
docker run -d --name discourse_ose local_discourse/discourse_ose /sbin/boot
id=`docker inspect --format=".Id" discourse_ose`
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
systemctl restart docker

/var/discourse/launcher start discourse_ose
  1. that didn't quite work. after the docker restart, the NET_ADMIN capacity disappeared; maybe we have to stop the container before updating its config and restarting docker?
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
[root@osestaging1 discourse]# docker run -d --name discourse_ose local_discourse/discourse_ose /sbin/boot
3337ce1d1833172a4543a105396179a102fec4969cf2b3d9208c441009dd0443
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
{"Binds":null,"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"no","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":null,"CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":67108864,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 discourse]# sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
{"Binds":null,"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"no","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":"NET_ADMIN","CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":67108864,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 discourse]# systemctl restart docker
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
{"Binds":null,"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"no","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":null,"CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":67108864,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 discourse]#
  1. let's try this
/var/discourse/launcher destroy discourse_ose
time nice /var/discourse/launcher bootstrap discourse_ose

# create hostconfig.json and grant the container NET_ADMIN permissions (for iptables)
# this hack is necessary because `docker start` doesn't take --cap-add
docker run -d --name discourse_ose local_discourse/discourse_ose /sbin/boot
/var/discourse/launcher stop discourse_ose
id=`docker inspect --format=".Id" discourse_ose`
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
systemctl restart docker

/var/discourse/launcher start discourse_ose
  1. it stuck after the restart this time!
root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
[root@osestaging1 discourse]#
[root@osestaging1 discourse]#
[root@osestaging1 discourse]#
[root@osestaging1 discourse]# docker run -d --name discourse_ose local_discourse/discourse_ose /sbin/boot
91549063df643166ed7ac82c8fb79a4db42562dc2a568cee2462e488a2ef5e4f
[root@osestaging1 discourse]# /var/discourse/launcher stop discourse_ose
...
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
{"Binds":null,"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"no","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":null,"CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":67108864,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 discourse]# sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
{"Binds":null,"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"no","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":"NET_ADMIN","CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":67108864,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 discourse]# systemctl restart docker
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
{"Binds":null,"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"no","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":["NET_ADMIN"],"CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":67108864,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 discourse]# 
  1. and now the first boot lets root use iptables, yay!
root@91549063df64:/# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
# Warning: iptables-legacy tables present, use iptables-legacy to see them
root@91549063df64:/# 
  1. problem is, I think we lost a lot of important stuff (like persistent volumes) from the original run
+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e UNICORN_WORKERS=2 -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscourse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=172.17.0.1 -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-discourse-ose -e DOCKER_HOST_IP=172.17.0.1 --name discourse_ose -t -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:fc:97:b8:b4:0d local_discourse/discourse_ose /sbin/boot
  1. let's just use launcher commands then; this works
/var/discourse/launcher destroy discourse_ose
time nice /var/discourse/launcher bootstrap discourse_ose

# create hostconfig.json and grant the container NET_ADMIN permissions (for iptables)
# this hack is necessary because `docker start` doesn't take --cap-add
/var/discourse/launcher start discourse_ose
sleep 1
/var/discourse/launcher stop discourse_ose
id=`docker inspect --format=".Id" discourse_ose`
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json
grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/$id/hostconfig.json
systemctl restart docker

/var/discourse/launcher start discourse_ose
  1. but when I put my iptables scrpt in runlevel 1, I still get stuck in a restarting bootloop. Because of this, I can't actually use the launcher to enter the container and test it out
[root@osestaging1 discourse]# ./launcher enter discourse_ose
Error response from daemon: Container bac54f66d4e39052bcede900d734380ef82725fe835f5e3b18da6c0a704e7e5b is restarting, wait until the container is running
[root@osestaging1 discourse]# 
  1. but I was able to enter the 'discourse_local/discourse_ose' container (whatever that is) and confirm that it doesn't have NET_ADMIN capabilities--whatever *that* is
[root@osestaging1 discourse]# docker ps
CONTAINER ID        IMAGE                           COMMAND             CREATED             STATUS                                    PORTS               NAMES
bac54f66d4e3        local_discourse/discourse_ose   "/sbin/boot"        7 minutes ago       Restarting (100) Less than a second ago                       discourse_ose
[root@osestaging1 discourse]# docker run -it bac54f66d4e3 /bin/bash
Unable to find image 'bac54f66d4e3:latest' locally
docker: Error response from daemon: pull access denied for bac54f66d4e3, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.
[root@osestaging1 discourse]# docker run -it local_discourse/discourse_ose /bin/bash
root@24a1f9f4c038:/# iptables -L
bash: iptables: command not found
root@24a1f9f4c038:/# apt-get install iptables
...
root@24a1f9f4c038:/# iptables -L
# Warning: iptables-legacy tables present, use iptables-legacy to see them
iptables: Permission denied (you must be root).
root@24a1f9f4c038:/# 
  1. ok, so it looks like `docker run` does take a '--cap-add' command, and since `docker start` appears to just reference the container that's "created" (?) when that `launcher` script runs that long `docker run` command, perhaps we add it there. this critical line occurs at the very end of the runs_start() function in `launcher` https://docs.docker.com/engine/reference/commandline/run/
+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e UNICORN_WORKERS=2 -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscourse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=172.17.0.1 -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-discourse-ose -e DOCKER_HOST_IP=172.17.0.1 --name discourse_ose -t -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:fc:97:b8:b4:0d local_discourse/discourse_ose /sbin/boot
  1. that appears to have worked, but my iptables runit script is still causing the container to restart loop
  2. let me remove that iptables template and connect to the machine on first boot to see if it really does have the NET_ADMIN capacity; it does!
[root@osestaging1 discourse]# time nice /var/discourse/launcher bootstrap discourse_ose
...
Successfully bootstrapped, to startup use ./launcher start discourse_ose

real    0m24.815s
user    0m1.690s
sys     0m1.386s
[root@osestaging1 discourse]# /var/discourse/launcher start discourse_ose
...
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
+ echo 'INFO: checking hostconfig capacities before 16'
INFO: checking hostconfig capacities before 16
++ docker inspect '--format=.Id' discourse_ose
+ id=sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63
+ grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json
grep: /var/lib/docker/containers/sha256:940c0024cbd795d6f66c64ebbb1ab96b52a3051393d7d8dd85451f94b5f89c63/hostconfig.json: No such file or directory
+ echo 'run_image: local_discourse/discourse_ose'
run_image: local_discourse/discourse_ose
+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e UNICORN_WORKERS=2 -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscourse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=172.17.0.1 -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-discourse-ose --cap-add NET_ADMIN -e DOCKER_HOST_IP=172.17.0.1 --name discourse_ose -t -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:fc:97:b8:b4:0d local_discourse/discourse_ose /sbin/boot
ed44b77cea78b62b9a2d7114d30eb3d4c9b4b2123296a0b019ecff132f743bd9
++ docker inspect '--format=.Id' discourse_ose
+ id=ed44b77cea78b62b9a2d7114d30eb3d4c9b4b2123296a0b019ecff132f743bd9
+ grep -E 'CapAdd|CapDrop|Capabilities' /var/lib/docker/containers/ed44b77cea78b62b9a2d7114d30eb3d4c9b4b2123296a0b019ecff132f743bd9/hostconfig.json
{"Binds":["/var/discourse/shared/standalone:/shared","/var/discourse/shared/standalone/log/var-log:/var/log"],"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"always","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":["NET_ADMIN"],"CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":536870912,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 discourse]# 
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/# sudo apt-get install iptables
root@osestaging1-discourse-ose:/# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
# Warning: iptables-legacy tables present, use iptables-legacy to see them
root@osestaging1-discourse-ose:/# 
  1. I manually added the runit script, and it executed fine!
...
root@osestaging1-discourse-ose:/# chmod +x /etc/runit/1.d/000-iptables 
root@osestaging1-discourse-ose:/# /etc/runit/1.d/000-iptables 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
iptables is already the newest version (1.8.2-4).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@osestaging1-discourse-ose:/# echo $?
0
root@osestaging1-discourse-ose:/# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             owner UID match root
ACCEPT     all  --  anywhere             anywhere             owner UID match _apt
DROP       all  --  anywhere             anywhere            
# Warning: iptables-legacy tables present, use iptables-legacy to see them
root@osestaging1-discourse-ose:/# 
  1. well, now that I've fixed all possible issues with not having NET_ADMIN permissions on the first boot by adding the '--cap-add=NET_ADMIN' argument to the final `docker run` line of the `run_start()` function of the `launcher` script (which applies the NET_ADMIN capacity to all subsequent `docker start`calls wrapped by `launcher start discourse_ose`), I should be able to isolate it to a single line on the runit script.
  2. tomorrow I'll try to just make my script a single line = `exit 0`. That should work. Then I can add the `apt-get` line. Then a single `iptables` line

Mon Nov 25, 2019

  1. Our post on the offline wiki archiving and video on how to view the OSE wiki offline on the phone using kiwix is live https://www.opensourceecology.org/wp-admin/post.php?post=11468&action=edit
  2. ...
  3. Because it was unclear to me when the Dockerfile was actually run (I thought it was run when doing a `docker pull`) and I didn't find an easy answer to this question on the Internet, I posted a question & answer to serverfault to help future docker n00bs on this https://serverfault.com/questions/993177/what-is-responsible-for-calling-and-running-a-dockerfile/993178#993178
  4. ...
  5. I was about to post to the discourse forms describing my solution to use the docker host via SMTP and no auth, but I found this https://meta.discourse.org/t/how-to-set-smtp-config-to-use-localhost/131464
  6. I'm still not sure how durable it is to explicitly set the docker host IP as 172.17.0.1, but this other user had the same IP, so I guess it's pretty common at least
  7. I ended up documenting this on the main topic for troubleshooting email on a new Discourse install here https://meta.discourse.org/t/troubleshooting-email-on-a-new-discourse-install/16326/375
  8. ...
  9. there's a new update available for both discourse and the 'docker_manager' plugin. I've wanted to test the update procedure when the docker container has no internet access (only the docker host does), so let's figure that out
    1. Jesus, I wanted to simply note "we're on version X" and "the latest version is Y", but appraently Discourse is too fucking complex than that.
    2. In fact, we *are* running the latest version = v2.4.0.beta7, which is visible from the '/admin' page on the Discourse page https://github.com/discourse/discourse/releases
    3. But apparently the updates that we're behind-on are 88 new commits to the 'discourse' repo https://github.com/discourse/discourse/compare/84107c61a7...22eb1828f6
    4. does that mean that if I update Discourse it would get all of these commits, rather than them being put into a properly tested release? Jesus, this is sketchy. Note that the Discourse team doesn't actually roll fixes that break the previous stable release, so it's recommended that you stick with the beta version. God this sucks https://meta.discourse.org/t/please-dont-pressure-self-installers-to-be-on-beta-branch/32237/4
    5. And 1 commit to the 'docker_manager' plugin. https://github.com/discourse/docker_manager/compare/e4c82d3...bc4318f
      1. this one makes sense, but wouldn't the 'discourse' repo just be part of the main release version?
  10. first up: let me reverse my changes that gave the docker containers internet access (which I did for my poc setting up ModSecurity support in nginx from within the container so I could, for example, download ModSecurity from within the container).
[root@osestaging1 discourse]# grep -ir ExecStart /usr/lib/systemd/system/docker.service
#ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --iptables=false
[root@osestaging1 discourse]# systemctl restart docker.service
Warning: docker.service changed on disk. Run 'systemctl daemon-reload' to reload units.
^C
[root@osestaging1 discourse]# ^C
[root@osestaging1 discourse]# systemctl daemon-reload
[root@osestaging1 discourse]# systemctl restart docker.service
[root@osestaging1 discourse]# iptables-save
# Generated by iptables-save v1.4.21 on Mon Nov 25 11:38:32 2019
*mangle
:PREROUTING ACCEPT [1469:151806]
:INPUT ACCEPT [1469:151806]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1126:194886]
:POSTROUTING ACCEPT [1120:194526]
COMMIT
# Completed on Mon Nov 25 11:38:32 2019
# Generated by iptables-save v1.4.21 on Mon Nov 25 11:38:32 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [4:304]
:POSTROUTING ACCEPT [4:304]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
COMMIT
# Completed on Mon Nov 25 11:38:32 2019
# Generated by iptables-save v1.4.21 on Mon Nov 25 11:38:32 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [4:304]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
-A INPUT -s 5.9.144.234/32 -j DROP
-A INPUT -s 173.234.159.250/32 -j DROP
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 443 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 4443 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 4444 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables IN denied: " --log-level 7
-A INPUT -j DROP
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -s 5.9.144.234/32 -j DROP
-A FORWARD -s 173.234.159.250/32 -j DROP
-A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A OUTPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j ACCEPT
-A OUTPUT -d 213.133.98.98/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -d 213.133.99.99/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -d 213.133.100.100/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -m limit --limit 5/min -j LOG --log-prefix "iptables OUT denied: " --log-level 7
-A OUTPUT -p tcp -m owner --uid-owner 48 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 27 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 995 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 994 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 993 -j DROP
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
COMMIT
# Completed on Mon Nov 25 11:38:32 2019
[root@osestaging1 discourse]#
  1. hmm, that didn't work?
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/var/www/discourse# ping 1.1.1.1
bash: ping: command not found
root@osestaging1-discourse-ose:/var/www/discourse# curl 1.1.1.1
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>cloudflare-lb</center>
</body>
</html>
root@osestaging1-discourse-ose:/var/www/discourse# 
  1. stopping & starting the app didn't work either :\
[root@osestaging1 discourse]# ./launcher stop discourse_ose
+ /bin/docker stop -t 10 discourse_ose
discourse_ose
[root@osestaging1 discourse]# ./launcher start discourse_ose

starting up existing container
+ /bin/docker start discourse_ose
discourse_ose
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/var/www/discourse# curl 1.1.1.1
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>cloudflare-lb</center>
</body>
</html>
root@osestaging1-discourse-ose:/var/www/discourse# 
  1. destroy didn't do it either
[root@osestaging1 discourse]# ./launcher destroy discourse_ose
+ /bin/docker stop -t 10 discourse_ose
discourse_ose
+ /bin/docker rm discourse_ose
discourse_ose
[root@osestaging1 discourse]# ./launcher start discourse_ose

+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=2 -e UNICORN_SIDEKIQS=1 -e RUBY_GLOBAL_METHOD_CACHE_SIZE=131072 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscourse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=172.17.0.1 -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-discourse-ose -e DOCKER_HOST_IP=172.17.0.1 --name discourse_ose -t -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:fc:97:b8:b4:0d local_discourse/discourse_ose /sbin/boot
d201a1a4c67111e84d095e2f712bc56fdaaab35375cfa09cf0865156c3b9b8f0
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/var/www/discourse# curl 1.1.1.1
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>cloudflare-lb</center>
</body>
</html>
root@osestaging1-discourse-ose:/var/www/discourse# 
  1. I got it working; I think the key is to stop the docker containers first, then fix iptables, then start docker
[root@osestaging1 discourse]# systemctl start docker
[root@osestaging1 discourse]# iptables-save
# Generated by iptables-save v1.4.21 on Mon Nov 25 11:47:14 2019
*mangle
:PREROUTING ACCEPT [835:70141]
:INPUT ACCEPT [831:69901]
:FORWARD ACCEPT [4:240]
:OUTPUT ACCEPT [487:67065]
:POSTROUTING ACCEPT [490:67245]
COMMIT
# Completed on Mon Nov 25 11:47:14 2019
# Generated by iptables-save v1.4.21 on Mon Nov 25 11:47:14 2019
*nat
:PREROUTING ACCEPT [1:60]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [10:728]
:POSTROUTING ACCEPT [10:728]
COMMIT
# Completed on Mon Nov 25 11:47:14 2019
# Generated by iptables-save v1.4.21 on Mon Nov 25 11:47:14 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [2:152]
:DOCKER-USER - [0:0]
-A INPUT -s 5.9.144.234/32 -j DROP
-A INPUT -s 173.234.159.250/32 -j DROP
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 443 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 4443 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 4444 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables IN denied: " --log-level 7
-A INPUT -j DROP
-A FORWARD -j DOCKER-USER
-A FORWARD -s 5.9.144.234/32 -j DROP
-A FORWARD -s 173.234.159.250/32 -j DROP
-A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A OUTPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j ACCEPT
-A OUTPUT -d 213.133.98.98/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -d 213.133.99.99/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -d 213.133.100.100/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -m limit --limit 5/min -j LOG --log-prefix "iptables OUT denied: " --log-level 7
-A OUTPUT -p tcp -m owner --uid-owner 48 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 27 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 995 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 994 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 993 -j DROP
-A DOCKER-USER -j RETURN
COMMIT
# Completed on Mon Nov 25 11:47:14 2019
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/var/www/discourse# curl 1.1.1.1
  1. damn, now browsing to the same '/upgrade' page tells me that we *are* up-to-date. Why does it lie rather than just saying it can't access the updates page? https://discourse.opensourceecology.org/admin/upgrade
    1. for example, the plugin 'docker_manager' says we're on commit ' e4c82d3', but clearly there's a new commit 'bc4318f' after that https://github.com/discourse/docker_manager/commits/master
  2. so the update process seems pretty straight-forward. First do a `git pull` in /var/discourse, then I added my step to rebuild the docker image locally, then rebuild the app with the launcher script. None of that should require the docker container itself to have internet access, but apparently there is something that did: pups?
[root@osestaging1 discourse]# ${vhostDir}/launcher rebuild discourse_ose
Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
Stopping old container
+ /bin/docker stop -t 10 discourse_ose
discourse_ose
cd /pups && git pull && /pups/bin/pups --stdin
fatal: unable to access 'https://github.com/discourse/pups.git/': Could not resolve host: github.com
12fca265a98f2b6eae828341d2d25f104a82282e1e8150c367751ff2240270a2
 FAILED TO BOOTSTRAP  please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
[root@osestaging1 discourse]# 
  1. oddly, the only place where I saw this being done is in the Dockerfile, but I know from the past couple weeks fighting with Discourse that this does *not* get run unles a `docker build` is called, which I've done on the *host* above
[root@osestaging1 ~]# cd /var/discourse/
[root@osestaging1 discourse]# grep -ir 'pups.git' *
image/base/Dockerfile:    cd / && git clone https://github.com/discourse/pups.git
[root@osestaging1 discourse]# grep -irC4 'pups.git' *
image/base/Dockerfile-RUN gem update --system
image/base/Dockerfile-
image/base/Dockerfile-RUN gem install bundler --force &&\
image/base/Dockerfile-    rm -rf /usr/local/share/ri/2.6.5/system &&\
image/base/Dockerfile:    cd / && git clone https://github.com/discourse/pups.git
image/base/Dockerfile-
image/base/Dockerfile-ADD install-redis /tmp/install-redis
image/base/Dockerfile-RUN /tmp/install-redis
image/base/Dockerfile-
[root@osestaging1 discourse]# 
  1. honestly, this shouldn't even be necessary since I just rebuit the image above. But, a catch (!), docker will diff the previous image's Dockerfile and only run some things if its lines changed. But that's an issue if one of the steps is to do a git pull, since the `git pull` command won't change, but the actual source code at the endpoint on github.com did change!
  2. it looks like there's an option `--no-cache` for this https://stackoverflow.com/questions/35594987/how-to-force-docker-for-a-clean-build-of-an-image
  3. Unfortunately
[root@osestaging1 base]# docker build --tag 'discourse_ose' /var/discourse/image/base/
...
Step 6/61 : RUN apt update && apt install -y gnupg sudo curl
 ---> Running in 56360e585353

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Err:1 http://security.debian.org/debian-security buster/updates InRelease
  Temporary failure resolving 'security.debian.org'
Err:2 http://deb.debian.org/debian buster InRelease
  Temporary failure resolving 'deb.debian.org'
Err:3 http://deb.debian.org/debian buster-updates InRelease
  Temporary failure resolving 'deb.debian.org'
Reading package lists...
Building dependency tree...
Reading state information...
All packages are up to date.
W: Failed to fetch http://deb.debian.org/debian/dists/buster/InRelease  Temporary failure resolving 'deb.debian.org'
W: Failed to fetch http://security.debian.org/debian-security/dists/buster/updates/InRelease  Temporary failure resolving 'security.debian.org'
W: Failed to fetch http://deb.debian.org/debian/dists/buster-updates/InRelease  Temporary failure resolving 'deb.debian.org'
W: Some index files failed to download. They have been ignored, or old ones used instead.

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Reading package lists...
Building dependency tree...
Reading state information...
Package gnupg is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'gnupg' has no installation candidate
E: Unable to locate package sudo
E: Unable to locate package curl
The command '/bin/sh -c apt update && apt install -y gnupg sudo curl' returned a non-zero code: 100
[root@osestaging1 base]# 
  1. ugh, so we want the container to have internet access at build time, but not at run time.
  2. probably what we'll want to do long-term is do this process on staging only, then the procedure for updating prod will be to copy the docker image from staging to production and then just run the final `launcher rebuild discourse_ose` command on production; this will need to be tested!
  3. I checked the config, and found that we have 3 networks that docker created https://docs.docker.com/network/
[root@osestaging1 discourse]# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
70656b6b4a55        bridge              bridge              local
330da33d9bfc        host                host                local
ae36b68b1cf6        none                null                local
[root@osestaging1 discourse]# 
  1. I got it to work by setting the docker network for the container at build time to "host"
[root@osestaging1 discourse]# docker build --no-cache --network=host --tag 'discourse_ose' /var/discourse/image/base/
...
Post-install message from discourse_image_optim:
Rails image assets optimization is extracted into image_optim_rails gem
You can safely remove `config.assets.image_optim = false` if you are not going to use that gem
Downloading MaxMindDb's GeoLite2-City...
Downloading MaxMindDb's GeoLite2-ASN...
Removing intermediate container 4ac7f6e70f9b
 ---> 940c0024cbd7
Successfully built 940c0024cbd7
Successfully tagged discourse_ose:latest
[root@osestaging1 base]#
  1. unfortunately the next rebuild step still failed!
[root@osestaging1 base]# popd
/var/discourse
[root@osestaging1 discourse]# ${vhostDir}/launcher rebuild discourse_ose
Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
Stopping old container
+ /bin/docker stop -t 10 discourse_ose
discourse_ose
cd /pups && git pull && /pups/bin/pups --stdin
fatal: unable to access 'https://github.com/discourse/pups.git/': Could not resolve host: github.com
44d1a10e6e41e9ea314e911638ce8282c1ef43cf5d18079a4012cbce9e6445c9
 FAILED TO BOOTSTRAP  please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
[root@osestaging1 discourse]# 
  1. tbqh, this pups update step is entirely unnecessary since we just baked in those changes into the damn image. how do we strip it out? I couldn't find anything on the container itself (like a cron job or init script) that is triggering this attempt to update pups
[root@osestaging1 discourse]# docker start discourse_ose
discourse_ose
[root@osestaging1 discourse]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
d201a1a4c671        bf23c0a7cb70        "/sbin/boot"        2 hours ago         Up About a minute                       discourse_ose
[root@osestaging1 discourse]# docker exec -it d201a1a4c671 /bin/bash
root@osestaging1-discourse-ose:/# 
  1. ah, it looks like it's being done by the launcher script itself
[root@osestaging1 discourse]# grep pups launcher
  update_pups=`cat $config_file | $docker_path run $user_args --rm -i -a stdin -a stdout $image ruby -e \
	"require 'yaml'; puts YAML.load(STDIN.readlines.join)['update_pups']"`
  run_command="cd /pups &&"
  if  ! "false" =  $update_pups ; then
  run_command="$run_command /pups/bin/pups --stdin"
[root@osestaging1 discourse]# 
  1. commenting-out the git pull line worked
[root@osestaging1 discourse]# grep -C1 'git pull' launcher
  if  ! "false" =  $update_pups ; then
	#run_command="$run_command git pull &&"
	run_command="echo 'skipping pups git pull'"
  fi
--
		  echo "Updating Launcher"
		  git pull || (echo 'failed to update' && exit 1)

[root@osestaging1 discourse]# 
  1. that worked!
[root@osestaging1 discourse]# time ./launcher rebuild discourse_ose
Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
Stopping old container
+ /bin/docker stop -t 10 discourse_ose
discourse_ose
echo 'skipping pups git pull' /pups/bin/pups --stdin
skipping pups git pull /pups/bin/pups --stdin
sha256:3f97048def9ae8c80c689f8278db1f302cc6dedf611dbaa6a42d2ef600cf0407
23dd166f10cf6ed8f727ca6e8737bc7136bf33d0369c7acbd96d2ffff8fc555e
Removing old container
+ /bin/docker rm discourse_ose
discourse_ose

+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=2 -e UNICORN_SIDEKIQS=1 -e RUBY_GLOBAL_METHOD_CACHE_SIZE=131072 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscourse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=172.17.0.1 -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-discourse-ose -e DOCKER_HOST_IP=172.17.0.1 --name discourse_ose -t -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:fc:97:b8:b4:0d local_discourse/discourse_ose /sbin/boot
b7ac0e6856f621b632059dd8caec15582bf1a4a4092924aee2baa5c569d069cc

real    1m22.172s
user    0m2.603s
sys     0m2.188s
[root@osestaging1 discourse]# 
  1. but, damn, the site didn't come up. Further digging shows that nginx isn't started. Not only that, but there's now no runit service for nginx!
root@osestaging1-discourse-ose:/# ps -ef | grep -i nginx
root        73    35  0 14:02 pts/1    00:00:00 grep -i nginx
root@osestaging1-discourse-ose:/# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
root@osestaging1-discourse-ose:/# sv start nginx
fail: nginx: unable to change to service directory: file does not exist
root@osestaging1-discourse-ose:/# ls -lah /etc/runit/*
-rwxr-xr-x. 1 root root   81 Oct 28 12:07 /etc/runit/1
-rwxr-xr-x. 1 root root   51 Oct 28 12:07 /etc/runit/2
-rwxr-xr-x. 1 root root   53 Oct 28 12:07 /etc/runit/3

/etc/runit/1.d:
total 24K
drwxr-xr-x. 1 root root 4.0K Nov 25 13:22 .
drwxr-xr-x. 1 root root 4.0K Nov 25 13:22 ..
-rwxr-xr-x. 1 root root  321 Oct 28 12:07 00-fix-var-logs
-rwxr-xr-x. 1 root root   33 Oct 28 12:07 anacron
-rwxr-xr-x. 1 root root   75 Oct 28 12:07 cleanup-pids

/etc/runit/3.d:
total 16K
drwxr-xr-x. 2 root root 4.0K Nov 25 13:22 .
drwxr-xr-x. 1 root root 4.0K Nov 25 13:22 ..

/etc/runit/runsvdir:
total 24K
drwxr-xr-x. 1 root root 4.0K Nov 25 12:49 .
drwxr-xr-x. 1 root root 4.0K Nov 25 13:22 ..
drwxr-xr-x. 1 root root 4.0K Nov 25 13:22 default
root@osestaging1-discourse-ose:/# ls -lah /etc/runit/runsvdir/default/
total 32K
drwxr-xr-x. 1 root root 4.0K Nov 25 13:22 .
drwxr-xr-x. 1 root root 4.0K Nov 25 12:49 ..
drwxr-xr-x. 1 root root 4.0K Nov 25 13:56 cron
drwxr-xr-x. 1 root root 4.0K Nov 25 13:56 rsyslog
root@osestaging1-discourse-ose:/# 	
  1. so my guess is that, for whatever reason, the templates defined in the discourse_ose.yaml file didn't get called
  2. I guess my solution for changing the run_command to an echo broke something later on; I found that it actually continued with the bootstrap if I instead just set update_pups to false before the if statement
  run_command="cd /pups &&"
  update_pups="false"
  if  ! "false" =  $update_pups ; then
	run_command="$run_command git pull &&"
  fi
  run_command="$run_command /pups/bin/pups --stdin"
  1. but now there's a step in the templates/web.templates.yml file that's failing; it's trying to do a `git pull`. again, we just did all that when we built the image, and we don't want the container to have internet access.
  2. I'm beginning to rethink our solution to preventing the container from having internet access. While it's not a VM, it does have its own set of packages. Therefore, if the debian base on which this Discourse docker container has an outdated package, then it wouldn't be auto-updated (the Discourse docs make a point to enable debian's unattended-upgrades package so that the container [or maybe they meant the host?] gets security updates). Disabling the entire container's internet access was just a hack to prevent docker from fucking up our iptables rules. Usually what I do is block the web server's user at the firewall level. I should probably just do this in the container as well *sigh*
  1. ...
  2. ok, now let's see if I can get docker behind varnish
  3. I skipped this while waiting for feedback from the Discourse community on the topic I posted about putting Docker behind a cache like varnish, but I since discovered that the Discourese developer community is pretty toxic and just wants to tell me essentiall "don't do that" and "pay me." So now we proceed on our own https://meta.discourse.org/t/discourse-purge-cache-method-on-content-changes/132917
  4. The only person who appears to have done this is Lee at Ars Technica. They finally responded saying that they only cached static assets https://meta.discourse.org/t/discourse-purge-cache-method-on-content-changes/132917/20
  5. I think I'll try to push it further: let's cache everything (for non-logged-in users, of course) for 1 minute (which is the default for the built-in Discourse ANON_CACHE_DURATION) and see what happens..
  6. well damn, first off, it appears that varnish doesn't support connecting to unix domain sockets until vcl 4.1, which was released in Varnish 6.0. We're only on Varnish 4.0.5-1, which is the latest in Cent repos https://varnish-cache.org/docs/6.0/whats-new/changes-6.0.html
[root@osestaging1 sites-enabled]# rpm -qa | grep -i varnish
varnish-libs-4.0.5-1.el7.x86_64
varnish-libs-devel-4.0.5-1.el7.x86_64
varnish-4.0.5-1.el7.x86_64
[root@osestaging1 sites-enabled]# yum install varnish
Loaded plugins: fastestmirror, replace
Loading mirror speeds from cached hostfile
 * base: mirror.checkdomain.de
 * epel: mirror.23media.com
 * extras: mirror.checkdomain.de
 * updates: mirror.fra10.de.leaseweb.net
 * webtatic: uk.repo.webtatic.com
Package varnish-4.0.5-1.el7.x86_64 already installed and latest version
Nothing to do
[root@osestaging1 sites-enabled]# 
  1. so it looks like we'll have to have nginx listen on 127.0.0.1:8020. note that currently our prod server has apache listening on port 8000 for name-based-vhosts (defaulting to www.opensourceecology.org), and 127.0.0.1:8010 for certbot to validate on our private vhosts (munin, awstats, etc) which is exposed over port 443 while admins actually access it on port 4443. We'll just follow that standard and have discourse listen on 127.0.0.1:8020 and expose that to the docker host, hopefully only accessible to 127.0.0.1 on the docker host
  2. first I re-enabled docker to set iptables rules, reloaded the systemd config, and restarted the docker service
[root@osestaging1 templates]# systemctl stop docker
[root@osestaging1 templates]# grep ExecStart /usr/lib/systemd/system/docker.service
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
#ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --iptables=false
[root@osestaging1 templates]# 
[root@osestaging1 templates]# systemctl daemon-reload
[root@osestaging1 templates]# systemctl start docker
[root@osestaging1 templates]# 
  1. I did this manually first on the container; I'll roll this into a template file soon. note that the base debian image for the docker container is so basic that it didn't come with ufw; no need to remove it before installing iptables
root@osestaging1-discourse-ose:/# apt-get install iptables
  1. oddly, I couldn't list the iptables rules as root. It said I wasn't root. hmm
root@osestaging1-discourse-ose:/# iptables -L
# Warning: iptables-legacy tables present, use iptables-legacy to see them
iptables: Permission denied (you must be root).
root@osestaging1-discourse-ose:/# sudo iptables -L
# Warning: iptables-legacy tables present, use iptables-legacy to see them
iptables: Permission denied (you must be root).
root@osestaging1-discourse-ose:/# 
  1. ok, so it looks like I need to add the NET_ADMIN capacity to the docker container https://stackoverflow.com/questions/41706983/installing-iptables-in-docker-container-based-on-alpinelinux
  2. that link doesn't make clear exactly how the hell to set the container to use that capacity (what command do you add --cap-add=NET_ADMIN *to*?)
  3. this solution says you can edit the json file directly; fuck that https://stackoverflow.com/questions/38758627/how-can-we-add-capabilities-to-a-running-docker-container
  4. I found that I can edit the image directly with `docker run` docker run --cap-add=NET_ADMIN discourse_ose
[root@osestaging1 discourse]# docker run --cap-add=NET_ADMIN discourse_ose
  1. honestly, I don't think that edits containers launched from the image; I think it's only relevant to one-time commands
[root@osestaging1 discourse]# ./launcher stop discourse_ose
+ /bin/docker stop -t 10 discourse_ose
discourse_ose
[root@osestaging1 discourse]# docker run --cap-add=NET_ADMIN discourse_ose
[root@osestaging1 discourse]# ./launcher start discourse_ose

starting up existing container
+ /bin/docker start discourse_ose
discourse_ose
[root@osestaging1 discourse]# ./l
launcher                  launcher.20191118_122249  launcher.20191118.orig    launcher.new              libbrotli/                
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/# iptables -L
# Warning: iptables-legacy tables present, use iptables-legacy to see them
iptables: Permission denied (you must be root).
root@osestaging1-discourse-ose:/# 
  1. indeed, the CapAdd in the json is still null
[root@osestaging1 discourse]# docker inspect discourse_ose | grep CapAdd
			"CapAdd": null,
[root@osestaging1 discourse]# 
  1. yeah, this did work
[root@osestaging1 discourse]# docker run --cap-add=NET_ADMIN discourse_ose /usr/bin/apt-get install -y iptables && iptables -L
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  libip6tc0 libiptc0 libmnl0 libnetfilter-conntrack3 libnfnetlink0
  libnftables0 libnftnl11 libxtables12 nftables
Suggested packages:
  kmod
The following NEW packages will be installed:
  iptables libip6tc0 libiptc0 libmnl0 libnetfilter-conntrack3 libnfnetlink0
  libnftables0 libnftnl11 libxtables12 nftables
0 upgraded, 10 newly installed, 0 to remove and 0 not upgraded.
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
DROP       all  --  static.234.144.9.5.clients.your-server.de  anywhere
DROP       all  --  173.234.159.250      anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             state RELATED,ESTABLISHED
ACCEPT     icmp --  anywhere             anywhere
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:http
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:https
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:pharos
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:krb524
ACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:32415
LOG        all  --  anywhere             anywhere             limit: avg 5/min burst 5 LOG level debug prefix "iptables IN denied: "
DROP       all  --  anywhere             anywhere

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
DOCKER-USER  all  --  anywhere             anywhere
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
DROP       all  --  static.234.144.9.5.clients.your-server.de  anywhere
DROP       all  --  173.234.159.250      anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere             state RELATED,ESTABLISHED
ACCEPT     all  --  localhost.localdomain  localhost.localdomain
ACCEPT     udp  --  anywhere             ns1-coloc.hetzner.de  udp dpt:domain
ACCEPT     udp  --  anywhere             ns2-coloc.hetzner.net  udp dpt:domain
ACCEPT     udp  --  anywhere             ns3-coloc.hetzner.com  udp dpt:domain
LOG        all  --  anywhere             anywhere             limit: avg 5/min burst 5 LOG level debug prefix "iptables OUT denied: "
DROP       tcp  --  anywhere             anywhere             owner UID match apache
DROP       tcp  --  anywhere             anywhere             owner UID match mysql
DROP       tcp  --  anywhere             anywhere             owner UID match varnish
DROP       tcp  --  anywhere             anywhere             owner UID match hitch
DROP       tcp  --  anywhere             anywhere             owner UID match nginx

Chain DOCKER (1 references)
target     prot opt source               destination

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Chain DOCKER-USER (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere
[root@osestaging1 discourse]#

  1. actually, no, that's running the iptables command on my docker host; it's not taking the "&&" inside the `docker run` command
  2. I had issues stringing commands together (putting quotes around didn't work) with the docker run command, and found it much easier to give myself a shell. This worked!
[root@osestaging1 discourse]# docker run --cap-add=NET_ADMIN -it discourse_ose /bin/bash
root@ef4f90be07e6:/# apt-get install -y iptables
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libip6tc0 libiptc0 libmnl0 libnetfilter-conntrack3 libnfnetlink0 libnftables0 libnftnl11 libxtables12 nftables
Suggested packages:
  kmod
The following NEW packages will be installed:
  iptables libip6tc0 libiptc0 libmnl0 libnetfilter-conntrack3 libnfnetlink0 libnftables0 libnftnl11 libxtables12 nftables
0 upgraded, 10 newly installed, 0 to remove and 0 not upgraded.
Need to get 982 kB of archives.
...
root@ef4f90be07e6:/# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
# Warning: iptables-legacy tables present, use iptables-legacy to see them
root@ef4f90be07e6:/# 
  1. I tried again without the capacity added, and it failed as expected (note that iptables isn't installed on the next command; must be a new container)
[root@osestaging1 discourse]# docker run -it discourse_ose /bin/bash
root@4a3b6e123460:/# iptables -L
bash: iptables: command not found
root@4a3b6e123460:/# apt-get install -y iptables
...
root@4a3b6e123460:/# iptables -L
# Warning: iptables-legacy tables present, use iptables-legacy to see them
iptables: Permission denied (you must be root).
root@4a3b6e123460:/# 
  1. ok, but I can connect to the *existing* running container with `docker exec` (as opposed to `docker run`, which just starts a new container)
[root@osestaging1 discourse]# docker exec -it discourse_ose /bin/bash
root@osestaging1-discourse-ose:/# iptables -L
# Warning: iptables-legacy tables present, use iptables-legacy to see them
iptables: Permission denied (you must be root).
root@osestaging1-discourse-ose:/# 
  1. unfortunately there's no '--cap-add' command for a `docker exec` run
[root@osestaging1 discourse]# docker exec -it --cap-add=NET_ADMIN discourse_ose /bin/bash
unknown flag: --cap-add
See 'docker exec --help'.
[root@osestaging1 discourse]# 
  1. and it's the same for the `docker start` command that ran it initially
[root@osestaging1 discourse]# docker start --cap-add=NET_ADMIN discourse_ose
unknown flag: --cap-add
See 'docker start --help'.
[root@osestaging1 discourse]# 
  1. fucking hell, I really think I just have to edit this damn json file directly per https://stackoverflow.com/questions/38758627/how-can-we-add-capabilities-to-a-running-docker-container
  2. first we get the id of the container
[root@osestaging1 discourse]# id=`docker inspect --format=".Id" discourse_ose`
[root@osestaging1 discourse]# echo $id
d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e
[root@osestaging1 discourse]# 
  1. and now we sed to change it
[root@osestaging1 discourse]# cd /var/lib/docker/containers/d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# cp hostconfig.json hostconfig.20191125.bak.orig
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# #sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json 
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# cat hostconfig.json 
{"Binds":["/var/discourse/shared/standalone:/shared","/var/discourse/shared/standalone/log/var-log:/var/log"],"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"always","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":null,"CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":536870912,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# sed -i 's/"CapAdd":null/"CapAdd":"NET_ADMIN"/' /var/lib/docker/containers/$id/hostconfig.json 
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# cat hostconfig.json
{"Binds":["/var/discourse/shared/standalone:/shared","/var/discourse/shared/standalone/log/var-log:/var/log"],"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"always","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":"NET_ADMIN","CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":536870912,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# 
  1. now we connect to it, but there's still no NET_ADMIN capacity :(

[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker start discourse_ose discourse_ose [root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker exec -it discourse_ose /bin/bash root@osestaging1-discourse-ose:/# iptables -L

  1. Warning: iptables-legacy tables present, use iptables-legacy to see them

iptables: Permission denied (you must be root). root@osestaging1-discourse-ose:/#

  1. and, ffs, it got rid of my changes to hostconfig.json
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker exec -it discourse_ose /bin/bash
root@osestaging1-discourse-ose:/# iptables -L
# Warning: iptables-legacy tables present, use iptables-legacy to see them
iptables: Permission denied (you must be root).
root@osestaging1-discourse-ose:/# 
  1. I did not put the capacity in an array; I tried that, but it still failed
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker stop discourse_ose
discourse_ose
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# sed -i 's/"CapAdd":null/"CapAdd":["NET_ADMIN"]/' /var/lib/docker/containers/$id/hostconfig.json 
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# cat hostconfig.json
{"Binds":["/var/discourse/shared/standalone:/shared","/var/discourse/shared/standalone/log/var-log:/var/log"],"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{}},"NetworkMode":"default","PortBindings":{},"RestartPolicy":{"Name":"always","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":["NET_ADMIN"],"CapDrop":null,"Capabilities":null,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":null,"UTSMode":"","UsernsMode":"","ShmSize":536870912,"Runtime":"runc","ConsoleSize":[0,0],"Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":[],"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":["/proc/asound","/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"ReadonlyPaths":["/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker start discourse_ose
discourse_ose
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker exec -it discourse_ose /bin/bash
root@osestaging1-discourse-ose:/# iptables -L
# Warning: iptables-legacy tables present, use iptables-legacy to see them
iptables: Permission denied (you must be root).
root@osestaging1-discourse-ose:/# 
  1. ah, got it! I had to restart the docker service
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker stop discourse_ose
discourse_ose
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# sed -i 's/"CapAdd":null/"CapAdd":["NET_ADMIN"]/' /var/lib/docker/containers/$id/hostconfig.json 
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# service docker restart
Redirecting to /bin/systemctl restart docker.service
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker start discourse_ose
discourse_ose
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker exec -it discourse_ose /bin/bash
root@osestaging1-discourse-ose:/# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
# Warning: iptables-legacy tables present, use iptables-legacy to see them
root@osestaging1-discourse-ose:/# 
  1. note that a hard restart *is* necessary; I confirmed that a reload won't work
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker stop discourse_ose
discourse_ose
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# sed -i 's/"CapAdd":null/"CapAdd":["NET_ADMIN"]/' /var/lib/docker/containers/$id/hostconfig.json 
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# service docker reload
Redirecting to /bin/systemctl reload docker.service
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker start discourse_ose
discourse_ose
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# docker exec -it discourse_ose /bin/bash
root@osestaging1-discourse-ose:/# iptables -L
# Warning: iptables-legacy tables present, use iptables-legacy to see them
iptables: Permission denied (you must be root).
root@osestaging1-discourse-ose:/# exit
[root@osestaging1 d5b6a3af666e868e3d574532fc81792c41ee7f13fbd547a82f367845009c490e]# 
  1. cool, now I can actually iterate! Let's try a rule that permits root and the apth user and blocks all other users from being able to access the internet
root@osestaging1-discourse-ose:/# curl 1.1.1.1
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>cloudflare-lb</center>
</body>
</html>
root@osestaging1-discourse-ose:/# iptables -A OUTPUT -p tcp -m owner --uid-owner 0 -j ACCEPT
root@osestaging1-discourse-ose:/# iptables -A OUTPUT -j DROP
root@osestaging1-discourse-ose:/# curl 1.1.1.1
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>cloudflare-lb</center>
</body>
</html>
root@osestaging1-discourse-ose:/# 
  1. it works!
root@osestaging1-discourse-ose:/# su - discourse
discourse@osestaging1-discourse-ose:~$ curl 1.1.1.1
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>cloudflare-lb</center>
</body>
</html>
discourse@osestaging1-discourse-ose:~$ logout
root@osestaging1-discourse-ose:/# grep -ir apt /etc/passwd
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
root@osestaging1-discourse-ose:/# iptables -A OUTPUT -p tcp -m owner --uid-owner 0 -j ACCEPT
root@osestaging1-discourse-ose:/# iptables -A OUTPUT -p tcp -m owner --uid-owner 100 -j ACCEPT
root@osestaging1-discourse-ose:/# iptables -A OUTPUT -j DROP
root@osestaging1-discourse-ose:/# su - discourse
discourse@osestaging1-discourse-ose:~$ curl 1.1.1.1
  1. I then tried to persist this by installing iptables-persistent, but it failed. I whitelisted apt, why did it fail?
root@osestaging1-discourse-ose:/# apt-get install -y iptables-persistent
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  netfilter-persistent
The following NEW packages will be installed:
  iptables-persistent netfilter-persistent
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 21.8 kB of archives.
After this operation, 80.9 kB of additional disk space will be used.
Err:1 http://deb.debian.org/debian buster/main amd64 netfilter-persistent all 1.0.11
  Temporary failure resolving 'deb.debian.org'
Err:2 http://deb.debian.org/debian buster/main amd64 iptables-persistent all 1.0.11
  Temporary failure resolving 'deb.debian.org'
E: Failed to fetch http://deb.debian.org/debian/pool/main/i/iptables-persistent/netfilter-persistent_1.0.11_all.deb  Temporary failure resolving 'deb.debian.org'
E: Failed to fetch http://deb.debian.org/debian/pool/main/i/iptables-persistent/iptables-persistent_1.0.11_all.deb  Temporary failure resolving 'deb.debian.org'
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
root@osestaging1-discourse-ose:/# 

Mon Nov 18, 2019

  1. I'm still trying to trace the ./launcher process to the point to where it downloads the discourse_docker repo, so that I can modify the image/base/install_nginx script during a `./launcher rebuild discourse_ose` https://github.com/discourse/discourse_docker
  2. ./launcher has a function run_bootstrap() that runs `$docker_path pull $image` https://github.com/discourse/discourse_docker/blob/87fd7172af8f2848d5118fdebada646c5996821b/launcher#L660
+ /bin/docker pull discourse/base:2.0.20191013-2320
2.0.20191013-2320: Pulling from discourse/base
Digest: sha256:77e010342aa5111c8c3b81d80de7d4bdb229793d595bbe373992cdb8f86ef41f
Status: Image is up to date for discourse/base:2.0.20191013-2320
docker.io/discourse/base:2.0.20191013-2320
    1. so this appears to be the part responsible for pulling the discourse/base image from docker hub https://hub.docker.com/r/discourse/base/
  1. another important command appears to be auto_build.rb, which actually executes the `docker build` command to create docker images. Unfortunately, I don't think this script is actually called by any other scripts, and it's actually just a helper script for human developers when doing some testing https://github.com/discourse/discourse_docker/blob/master/image/auto_build.rb
  2. I'm beginning to think that `docker pull` doesn't actually use the Dockerfile to build the image at download. Or, perhaps more importantly, the Discourse scripts download a fresh copy of the discourse_docker repo from github when doing a rebuild sch that I can't make modifications to the relvant Dockerfile (ie: updating install-nginx) to affect the image anyway.
    1. however, there *is* an argument passed to `launcher` named `--run-image`. So, perhaps, I can write a wrapper for the Discourse `launcher` script that first builds my own docker image, and then just tell `./launcher` to use my local image rather than the image from docker hub https://github.com/discourse/discourse_docker/blob/87fd7172af8f2848d5118fdebada646c5996821b/launcher#L22
  3. I did a test trying to build my own local docker image. To make *sure* that it used my local files, I updated the Dockerfile's FROM to use alpine linux. A good sign: it showed alpine and it failed when trying to apt
[root@osestaging1 base]# docker build --tag 'alpine_discourse' .
Sending build context to Docker daemon   38.4kB
Step 1/61 : FROM alpine
latest: Pulling from library/alpine
89d9c30c1d48: Pull complete
Digest: sha256:c19173c5ada610a5989151111163d28a67368362762534d8a8121ce95cf2bd5a
Status: Downloaded newer image for alpine:latest
 ---> 965ea09ff2eb
Step 2/61 : ENV PG_MAJOR 10
 ---> Running in e53144a7c7ee
Removing intermediate container e53144a7c7ee
 ---> 895d2ca9cfe9
Step 3/61 : ENV RUBY_ALLOCATOR /usr/lib/libjemalloc.so.1
 ---> Running in a3bcc3695df3
Removing intermediate container a3bcc3695df3
 ---> 06e4d7476da0
Step 4/61 : ENV RAILS_ENV production
 ---> Running in 0ac24d9fb112
Removing intermediate container 0ac24d9fb112
 ---> 4159869cf970
Step 5/61 : RUN echo 2.0.`date +%Y%m%d` > /VERSION
 ---> Running in 2e2c3136622a
Removing intermediate container 2e2c3136622a
 ---> 9b33df0cef8e
Step 6/61 : RUN apt update && apt install -y gnupg sudo curl
 ---> Running in 53bbee438a5e
/bin/sh: apt: not found
The command '/bin/sh -c apt update && apt install -y gnupg sudo curl' returned a non-zero code: 127
[root@osestaging1 base]# 
  1. now let's check the list of docker images. it looks like it pulled-in the alpine image, but my new build above isn't present--probably because it failed to build
[root@osestaging1 base]# docker image ls
REPOSITORY                      TAG                 IMAGE ID            CREATED              SIZE
<none>                          <none>              9b33df0cef8e        About a minute ago   5.55MB
local_discourse/discourse_ose   latest              de5dc3e2af42        15 minutes ago       2.76GB
alpine                          latest              965ea09ff2eb        3 weeks ago          5.55MB
discourse/base                  2.0.20191013-2320   09725007dc9e        5 weeks ago          2.3GB
hello-world                     latest              fce289e99eb9        10 months ago        1.84kB
[root@osestaging1 base]# 

  1. I changed the FROM line back to debian, and I tried the build again. it worked!
[root@osestaging1 base]# docker build --tag 'discourse_maltfield' .
...
Please check your Rails app for 'config.i18n.fallbacks = true'.
If you're using I18n (>= 1.1.0) and Rails (< 5.2.2), this should be
'config.i18n.fallbacks = [I18n.default_locale]'.
If not, fallbacks will be broken in your app by I18n 1.1.x.

For more info see:
https://github.com/svenfuchs/i18n/releases/tag/v1.1.0

Post-install message from discourse_image_optim:
Rails image assets optimization is extracted into image_optim_rails gem
You can safely remove `config.assets.image_optim = false` if you are not going to use that gem
Downloading MaxMindDb's GeoLite2-City...
Downloading MaxMindDb's GeoLite2-ASN...
Removing intermediate container 4ea4f5d6aa85
 ---> 1a24fc6acacd
Successfully built 1a24fc6acacd
Successfully tagged discourse_maltfield:latest
[root@osestaging1 base]# 
  1. on a related note wrt understanding how docker works (and trying to differentiate the components of Discourse that are specific to Discourse vs docker itself), most of the documentation for docker is helpful to understand how to use the commands, but it's not especially userful for understanding the bigger picture. For example, when does the Dockerfile get executed? Is it during the pull? Or the push? Or the build? https://docs.docker.com/get-started/
    1. this documentation was very helpful in understanding the bigger picture of how docker images are built, pushed, pulled, and stored on the machine http://blog.thoward37.me/articles/where-are-docker-images-stored/
    2. this also provides a ton of info on the Dockerfile https://linuxhint.com/understand_dockerfile[[1]]
  2. after the build above, there's now a new image
[root@osestaging1 base]# docker image ls
REPOSITORY                      TAG                 IMAGE ID            CREATED             SIZE
discourse_maltfield             latest              1a24fc6acacd        About an hour ago   2.35GB
<none>                          <none>              9b33df0cef8e        2 hours ago         5.55MB
local_discourse/discourse_ose   latest              de5dc3e2af42        2 hours ago         2.76GB
alpine                          latest              965ea09ff2eb        3 weeks ago         5.55MB
debian                          buster-slim         105ec214185d        4 weeks ago         69.2MB
discourse/base                  2.0.20191013-2320   09725007dc9e        5 weeks ago         2.3GB
hello-world                     latest              fce289e99eb9        10 months ago       1.84kB
[root@osestaging1 base]# 
  1. I ran the launcher script, passing it the new image that I built as the '--run-image'. It finished without error, and the new discourse_ose container is shown using the new 'discourse_maltfield' image
[root@osestaging1 discourse]# time ./launcher rebuild discourse_ose --run-image discourse_maltfield &> output.log
real    10m49.785s
user    0m2.751s
sys     0m2.394s
[root@osestaging1 discourse]# docker container ls
CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS              PORTS               NAMES
0438d3a342f4        discourse_maltfield   "/sbin/boot"        20 seconds ago      Up 19 seconds                           discourse_ose
[root@osestaging1 discourse]# 
  1. And I confirmed that the new container has nginx with the ModSecurity module working
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/# ls -lah /etc/modsecurity/
total 84K
drwxr-xr-x. 3 root root 4.0K Nov 18 08:28 .
drwxr-xr-x. 1 root root 4.0K Nov 18 10:44 ..
drwxr-xr-x. 2 root root 4.0K Nov 18 08:28 crs
-rw-r--r--. 1 root root 8.3K Dec 10  2018 modsecurity.conf-recommended
-rw-r--r--. 1 root root  52K Dec 10  2018 unicode.mapping
root@osestaging1-discourse-ose:/# nginx -V
nginx version: nginx/1.17.4
built by gcc 8.3.0 (Debian 8.3.0-6) 
built with OpenSSL 1.1.1d  10 Sep 2019
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli --add-module=/tmp/ModSecurity-nginx
root@osestaging1-discourse-ose:/# ls -lah /tmp
total 16K
drwxrwxrwt. 1 root root 4.0K Nov 18 10:45 .
drwxr-xr-x. 1 root root 4.0K Nov 18 10:44 ..
root@osestaging1-discourse-ose:/# 
  1. unfortunately the new container doesn't appear to have the nginx configs in-place, which should have been added by the templates
root@osestaging1-discourse-ose:/var/www/discourse# cd /etc/nginx
root@osestaging1-discourse-ose:/etc/nginx# ls -lah
total 108K
drwxr-xr-x. 1 root root 4.0K Nov 18 10:50 .
drwxr-xr-x. 1 root root 4.0K Nov 18 10:44 ..
drwxr-xr-x. 2 root root 4.0K Aug 13 18:10 conf.d
-rw-r--r--. 1 root root 1.1K Aug 13 18:10 fastcgi.conf
-rw-r--r--. 1 root root 1.1K Nov 18 08:30 fastcgi.conf.default
-rw-r--r--. 1 root root 1007 Aug 13 18:10 fastcgi_params
-rw-r--r--. 1 root root 1007 Nov 18 08:30 fastcgi_params.default
-rw-r--r--. 1 root root 2.8K Nov 18 08:30 koi-utf
-rw-r--r--. 1 root root 2.2K Nov 18 08:30 koi-win
-rw-r--r--. 1 root root 3.9K Aug 13 18:10 mime.types
-rw-r--r--. 1 root root 5.2K Nov 18 08:30 mime.types.default
drwxr-xr-x. 2 root root 4.0K Aug 13 18:10 modules-available
drwxr-xr-x. 2 root root 4.0K Nov 18 08:30 modules-enabled
-rw-r--r--. 1 root root 1.5K Aug 13 18:10 nginx.conf
-rw-r--r--. 1 root root 2.6K Nov 18 08:30 nginx.conf.default
-rw-r--r--. 1 root root  180 Aug 13 18:10 proxy_params
-rw-r--r--. 1 root root  636 Aug 13 18:10 scgi_params
-rw-r--r--. 1 root root  636 Nov 18 08:30 scgi_params.default
drwxr-xr-x. 2 root root 4.0K Nov 18 08:28 sites-available
drwxr-xr-x. 2 root root 4.0K Nov 18 08:28 sites-enabled
drwxr-xr-x. 2 root root 4.0K Nov 18 08:28 snippets
-rw-r--r--. 1 root root  664 Aug 13 18:10 uwsgi_params
-rw-r--r--. 1 root root  664 Nov 18 08:30 uwsgi_params.default
-rw-r--r--. 1 root root 3.6K Nov 18 08:30 win-utf
root@osestaging1-discourse-ose:/etc/nginx# ls -lah conf.d
total 12K
drwxr-xr-x. 2 root root 4.0K Aug 13 18:10 .
drwxr-xr-x. 1 root root 4.0K Nov 18 10:50 ..
root@osestaging1-discourse-ose:/etc/nginx# ls -lah modules-available/
total 12K
drwxr-xr-x. 2 root root 4.0K Aug 13 18:10 .
drwxr-xr-x. 1 root root 4.0K Nov 18 10:50 ..
root@osestaging1-discourse-ose:/etc/nginx# ls -lah modules-enabled/
total 12K
drwxr-xr-x. 2 root root 4.0K Nov 18 08:30 .
drwxr-xr-x. 1 root root 4.0K Nov 18 10:50 .
root@osestaging1-discourse-ose:/etc/nginx# find . | grep -i discourse
root@osestaging1-discourse-ose:/etc/nginx# grep -irl 'discourse' *
root@osestaging1-discourse-ose:/etc/nginx# 
  1. the templates are still in-place in my container yaml
[root@osestaging1 discourse]# head containers/discourse_ose.yml -n 20
## this is the all-in-one, standalone Discourse Docker container template
##
## After making changes to this file, you MUST rebuild
## /var/discourse/launcher rebuild app
##
## BE *VERY* CAREFUL WHEN EDITING!
## YAML FILES ARE SUPER SUPER SENSITIVE TO MISTAKES IN WHITESPACE OR ALIGNMENT!
## visit http://www.yamllint.com/ to validate this file as needed

templates:
  - "templates/postgres.template.yml"
  - "templates/redis.template.yml"
  - "templates/web.template.yml"
  - "templates/web.ratelimited.template.yml"
  - "templates/web.socketed.template.yml"
#  - "templates/web.modsecurity.template.yml"
## Uncomment these two lines if you wish to add Lets Encrypt (https)
  #- "templates/web.ssl.template.yml"
  #- "templates/web.letsencrypt.ssl.template.yml"

[root@osestaging1 discourse]# 
  1. for example, web.template should create the /etc/nginx/conf.d/discourse.conf file
[root@osestaging1 discourse]# grep -ir '/etc/nginx/conf.d/discourse.conf' templates/web.template.yml 
		- "cp $home/config/nginx.sample.conf /etc/nginx/conf.d/discourse.conf"
	  filename: "/etc/nginx/conf.d/discourse.conf"
	  filename: "/etc/nginx/conf.d/discourse.conf"
	  filename: "/etc/nginx/conf.d/discourse.conf"
[root@osestaging1 discourse]#
  1. I checked the verbose output of the `./launcher rebuild...` execution above, and I *still* saw some `docker run` commands being executed against the 'discourse/base...' image not the image I explicitly gave it = 'discourse_maltfield'
[root@osestaging1 discourse]# grep -i 'docker run' output.log 
++ /bin/docker run -i --rm -a stdout -a stderr discourse/base:2.0.20191013-2320 echo working
++ /bin/docker run --rm -i -a stdout -a stdin discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\
+++ /bin/docker run --rm -i -a stdin -a stdout discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\; puts YAML.load(STDIN.readlines.join)['\templates'\]'
++++ /bin/docker run --rm -i -a stdin -a stdout discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\; puts YAML.load(STDIN.readlines.join)['\templates'\]'
++ /bin/docker run --rm -i -a stdin -a stdout discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\
++ /bin/docker run --rm -i -a stdin -a stdout discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\
++ /bin/docker run --rm -i -a stdin -a stdout discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\
++ /bin/docker run --rm -i -a stdout -a stdin discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\; puts YAML.load(STDIN.readlines.join)['\docker_args'\]'
+++ /bin/docker run --rm -i -a stdin -a stdout discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\; puts YAML.load(STDIN.readlines.join)['\templates'\]'
++++ /bin/docker run --rm -i -a stdin -a stdout discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\; puts YAML.load(STDIN.readlines.join)['\templates'\]'
++ /bin/docker run --rm -i -a stdin -a stdout discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\
++ /bin/docker run --rm -i -a stdin -a stdout discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\
++ /bin/docker run --rm -i -a stdin -a stdout discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\
++ /bin/docker run --rm -i -a stdout -a stdin discourse/base:2.0.20191013-2320 ruby -e 'require '\yaml'\; puts YAML.load(STDIN.readlines.join)['\docker_args'\]'
+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=2 -e UNICORN_SIDEKIQS=1 -e RUBY_GLOBAL_METHOD_CACHE_SIZE=131072 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscourse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=172.17.0.1 -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-discourse-ose -e DOCKER_HOST_IP=172.17.0.1 --name discourse_ose -t -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:fc:97:b8:b4:0d discourse_maltfield /sbin/boot
[root@osestaging1 discourse]# 
  1. so it looks like all the yaml `docker run` executions use the discoures base image, and only the final run uses my discourse_maltfield image
  2. it looks like the `launcher` script only takes '--run-image' and stores it to run_image and user_run_image
user_run_image=""
...
  --run-image)
	user_run_image="$2"
	shift
	;;
  esac
...
   set_run_image
...
	 $docker_path run --shm-size=512m $links $attach_on_run $restart_policy "${env[@]}" "${labels[@]}" -h "$hostname" \
		-e DOCKER_HOST_IP="$docker_ip" --name $config -t "${ports[@]}" $volumes $mac_address $user_args \
		$run_image $boot_command
...
  set_run_image

  unset ERR
  (exec $docker_path run --rm --shm-size=512m $user_args $links "${env[@]}" -e DOCKER_HOST_IP="$docker_ip" -i -a stdin -a stdout -a stderr $volumes $run_image \
	/bin/bash -c "$run_command") || ERR=$?
  1. also note that it looks like we can define the run_image in the container's yaml config file, rather than having to set it as a command line argument passed to `launcher`
set_run_image() {
  run_image=`cat $config_file | $docker_path run $user_args --rm -i -a stdin -a stdout $image ruby -e \
	"require 'yaml'; puts YAML.load(STDIN.readlines.join)['run_image']"`

  if [ -n "$user_run_image" ]; then
	run_image=$user_run_image
  elif [ -z "$run_image" ]; then
	run_image="$local_discourse/$config"
  fi
}
  1. but the `launcher` script uses a distinct variable `image`, which appears to be hard-coded
image="discourse/base:2.0.20191013-2320"
...
set_volumes() {
  volumes=`cat $config_file | $docker_path run $user_args --rm -i -a stdout -a stdin $image ruby -e \
		"require 'yaml'; puts YAML.load(STDIN.readlines.join)['volumes'].map{|v| '-v ' << v['volume']['host'] << ':' << v['volume']['guest'] << ' '}.join"`
}

set_links() {
	links=`cat $config_file | $docker_path run $user_args --rm -i -a stdout -a stdin $image ruby -e \
		"require 'yaml'; puts YAML.load(STDIN.readlines.join)['links'].map{|l| '--link ' << l['link']['name'] << ':' << l['link']['alias'] << ' '}.join"`
}
  1. I replaced the hard-coded image with my own and re-ran the `launcher rebuild...` command
[root@osestaging1 discourse]# grep "image=" launcher 
user_run_image=""
	user_run_image="$2"
#image="discourse/base:2.0.20191013-2320"
image="discourse_maltfield"
  run_image=`cat $config_file | $docker_path run $user_args --rm -i -a stdin -a stdout $image ruby -e \
	run_image=$user_run_image
	run_image="$local_discourse/$config"
  base_image=`cat $config_file | $docker_path run $user_args --rm -i -a stdin -a stdout $image ruby -e \
	image=$base_image
[root@osestaging1 discourse]# time ./launcher rebuild discourse_ose --run-image discourse_maltfield &> output.log

real    8m26.645s
user    0m2.800s
sys     0m2.559s
[root@osestaging1 discourse]# 
  1. shit, that didn't work; there's still no nginx configs on the build. I'm thinking that specifying the build image is breaking it and overriding the changes made by the template. what if I just override 'image' and don't specify the 'run-image'?
[root@osestaging1 discourse]# time ./launcher rebuild discourse_ose &> output.log

real    8m10.725s
user    0m2.691s
sys     0m2.406s
[root@osestaging1 discourse]# 
  1. that worked!
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/var/www/discourse# ls -lah /etc/nginx/conf.d
total 24K
drwxr-xr-x. 1 root root 4.0K Nov 18 11:41 .
drwxr-xr-x. 1 root root 4.0K Nov 18 08:30 ..
-rw-r--r--. 1 root root 8.4K Nov 18 11:41 discourse.conf
-rw-r--r--. 1 root root  661 Nov 18 11:41 modsecurity.include
root@osestaging1-discourse-ose:/var/www/discourse# 
  1. I updated our documentation to use this solution https://wiki.opensourceecology.org/wiki/Discourse#Nginx_mod_security
  2. And I updated my topic on meta.discourse.org about using a WAF with Discourse for the beneift of the community https://meta.discourse.org/t/discourse-web-application-firewall-waf-mod-security/133612/7

Sun Nov 17, 2019

  1. my docker for-linux github issue requesting they update their gpg key to a keyserver to assist their clients in validating the key's authenticity out-of-band from docker.com was closed as a duplicate of #602 https://github.com/docker/for-linux/issues/849
  2. but #602 is *not* a duplicate; it's a distinct request to add the fignerprint of the key to the get.docker.com install script--which I support, but it doesn't solve the out-of-band authenticity validation issue as both the installs script *and* the key are downloaded from the docker.com domain. I requested that the above ticket be re-opened (it should only take a few minutes for them to action, anyway) https://github.com/docker/for-linux/issues/849
  3. ...
  4. I created a new topic specifically asking what WAF is recommended for Discourse https://meta.discourse.org/t/discourse-web-application-firewall-waf-mod-security/133612
  5. I pointed out that, even if the code is perfect (which is impossible anyway), a web application can be vulnerable to an exploit if one of its dependencies are vulnerable. I pointed to the example of CVE-2019-11043, which made servers with vulnerable php-fpm & nginx subject critical remote code executions. But this is entirely mitigated by mod_security's CRS, which blocks all malicious requests when it blocks queries with \n or \r in them. https://www.nginx.com/blog/php-fpm-cve-2019-11043-vulnerability-nginx/
  6. I did a quick test of the site with mod_security enabled. first I do a regular request with nothing malicious
user@ose:/tmp$ curl -sik "https://discourse.opensourceecology.org/" | head -n1HTTP/1.1 200 OK
  1. then I do an example of malicious query including a newline in the request
user@ose:/tmp$ curl -sik -X POST --data-binary $'line break test\n' "https://discourse.opensourceecology.org/" | head -n1
HTTP/1.1 403 Forbidden
  1. then I disabled modsecurity in nginx on the discourse docker container, and re-execute the same query. note that this query was not stopped by modsecurity, and made it to the backend app. in this example, if we were running php-fpm and nginx we could be vulnerable. But with modsecurity enabled, we are already protected prior to and after the 0day
user@ose:/tmp$ curl -sik -X POST --data-binary $'line break test\n' "https://discourse.opensourceecology.org/" | head -n1
HTTP/1.1 404 Not Found

Tue Nov 12, 2019

  1. I created an issue in the docker-ce github asking them to please upload their gpg key to a keyserver so that it can actually be validated in a more sane way

https://github.com/docker/for-linux/issues/849

    1. I also asked them to integrate it into the web of trust. currently it has no non-self signatures *facepalm*
    2. I also recommended that they create a keybase account and use their official twitter account to identify themselves on keybase, which indicates their gpg key fingerprint
  1. ...
  2. I asked for more info about the Discourse built-in "anonymous" cache. For example, how do I see how many queries are cache hits vs misses? And is there a system in-place for cache invalidation? What if I set the cache to 24 hours (looks like it defaults to 60 seconds) would that mean it will serve stale content for a max of 24 hours? None of this is documented. https://meta.discourse.org/t/discourse-purge-cache-method-on-content-changes/132917/18
  3. ...
  4. regarding mod_security, there *are* risks in breaking future installs by me modifying the nginx install & config used by Discourse inside the Docker container. But the alternative of adding apache as a proxy before Discourse (nginx -> varnish -> apache -> nginx) not only is a ridiculious architecture that will make troubleshooting difficult for the future sysadmin, but it also looks like an apache proxy performs awful for long polling, which is used by the Disourse message bus https://meta.discourse.org/t/how-to-run-discourse-in-apache-vhost-not-nginx/133112/13
    1. https://meta.discourse.org/t/howto-setup-discourse-with-lets-encrypt-and-apache-ssl/46139
    2. https://stackoverflow.com/questions/14157515/will-apache-2s-mod-proxy-wait-and-occupy-a-worker-when-long-polling
    3. https://github.com/SamSaffron/message_bus
  5. so I think the best option is still to update the nginx config inside the docker container to use mod_security
  6. this is probably the best guide for compiling nginx with mod_security; it's from nginx.com https://www.nginx.com/blog/compiling-and-installing-modsecurity-for-open-source-nginx/
  7. first, let's get the existing nginx config so we can vaildate that after our changes that it includes mod_security. note that the configure line here shows 'ngx_brotli' which was added by the Discourse install-nginx script https://github.com/discourse/discourse_docker/blob/416467f6ead98f82342e8a926dc6e06f36dfbd56/image/base/install-nginx
root@osestaging1-discourse-ose:/var/www/discourse# nginx -V
nginx version: nginx/1.17.4
built by gcc 8.3.0 (Debian 8.3.0-6) 
built with OpenSSL 1.1.1d  10 Sep 2019
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli
root@osestaging1-discourse-ose:/var/www/discourse# 
  1. so the install guide above first downloads, compiles, & installs the SpiderLabs Modecurity package *then* it downloads the 'ModSecurity-nginx.git' nginx module, which is added to the './configure' line as a dynamic module
  2. but the first one we may be able to get from the yum repos
root@osestaging1-discourse-ose:/var/www/discourse# apt-cache search security
...
libmodsecurity-dev - ModSecurity v3 library component (development files)
libmodsecurity3 - ModSecurity v3 library component
libapache2-mod-security2 - Tighten web applications security for Apache
modsecurity-crs - OWASP ModSecurity Core Rule Set
  1. before i update 'install-nginx', let's see if I can get it working manually. First I installed the 'modsecurity-crs' package above
root@osestaging1-discourse-ose:/var/www/discourse# apt-get install modsecurity-crs
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  apache2-bin libapache2-mod-security2 libapr1 libaprutil1 libaprutil1-dbd-sqlite3 libaprutil1-ldap libbrotli1 libjansson4 liblua5.1-0
  liblua5.2-0 libyajl2
Suggested packages:
  apache2-doc apache2-suexec-pristine | apache2-suexec-custom www-browser lua geoip-database-contrib ruby
The following NEW packages will be installed:
  apache2-bin libapache2-mod-security2 libapr1 libaprutil1 libaprutil1-dbd-sqlite3 libaprutil1-ldap libbrotli1 libjansson4 liblua5.1-0
  liblua5.2-0 libyajl2 modsecurity-crs
0 upgraded, 12 newly installed, 0 to remove and 3 not upgraded.
Need to get 2,544 kB of archives.
After this operation, 11.1 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Err:1 http://deb.debian.org/debian buster/main amd64 libapr1 amd64 1.6.5-1+b1
  Temporary failure resolving 'deb.debian.org'
  1. ugh, that failed since I blocked the docker container from having internet access. for the purposes of testing, I'll undo that for now
[root@osestaging1 base]# vim /usr/lib/systemd/system/docker.service
...
[root@osestaging1 base]# systemctl daemon-reload
[root@osestaging1 base]# systemctl restart docker
...
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/var/www/discourse# apt-get install modsecurity-crs
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  apache2-bin libapache2-mod-security2 libapr1 libaprutil1 libaprutil1-dbd-sqlite3 libaprutil1-ldap libbrotli1 libjansson4 liblua5.1-0
  liblua5.2-0 libyajl2
Suggested packages:
  apache2-doc apache2-suexec-pristine | apache2-suexec-custom www-browser lua geoip-database-contrib ruby
The following NEW packages will be installed:
  apache2-bin libapache2-mod-security2 libapr1 libaprutil1 libaprutil1-dbd-sqlite3 libaprutil1-ldap libbrotli1 libjansson4 liblua5.1-0
  liblua5.2-0 libyajl2 modsecurity-crs
0 upgraded, 12 newly installed, 0 to remove and 3 not upgraded.
Need to get 2,544 kB of archives.
After this operation, 11.1 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://security.debian.org/debian-security buster/updates/main amd64 apache2-bin amd64 2.4.38-3+deb10u3 [1,307 kB]
...
Unpacking modsecurity-crs (3.1.0-1) ...
Setting up libbrotli1:amd64 (1.0.7-2) ...
Setting up libyajl2:amd64 (2.1.0-3) ...
Setting up libapr1:amd64 (1.6.5-1+b1) ...
Setting up modsecurity-crs (3.1.0-1) ...
Setting up libjansson4:amd64 (2.12-1) ...
Setting up liblua5.2-0:amd64 (5.2.4-1.1+b2) ...
Setting up liblua5.1-0:amd64 (5.1.5-8.1+b2) ...
Setting up libaprutil1:amd64 (1.6.1-4) ...
Setting up libaprutil1-ldap:amd64 (1.6.1-4) ...
Setting up libaprutil1-dbd-sqlite3:amd64 (1.6.1-4) ...
Setting up apache2-bin (2.4.38-3+deb10u3) ...
Setting up libapache2-mod-security2 (2.9.3-1) ...
Processing triggers for libc-bin (2.28-10) ...
root@osestaging1-discourse-ose:/var/www/discourse#
  1. unfortunately that installed apache, but fortunately it looks like apache doesn't get started so it's no biggie
  2. but it looks like that CRS package + the nginx mod_security wasn't sufficient to compile nginx with the mod_security module. It still wants the ModSecurity library
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# ./configure --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli --add-dynamic-module=/tmp/ModSecurity-nginx
  1. it looks like adding the 'libmodsecurity3' package was also not sufficient
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# apt-get install libmodsecurity3
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libfuzzy2 liblua5.3-0 libmaxminddb0 
Suggested packages:
  mmdb-bin
The following NEW packages will be installed:
  libfuzzy2 liblua5.3-0 libmaxminddb0 libmodsecurity3
0 upgraded, 4 newly installed, 0 to remove and 3 not upgraded.
Need to get 682 kB of archives.
After this operation, 3,010 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://deb.debian.org/debian buster/main amd64 libfuzzy2 amd64 2.14.1+git20180629.57fcfff-1 [19.3 kB]
Get:2 http://deb.debian.org/debian buster/main amd64 liblua5.3-0 amd64 5.3.3-1.1 [120 kB]
Get:3 http://deb.debian.org/debian buster/main amd64 libmaxminddb0 amd64 1.3.2-1 [30.7 kB]
Get:4 http://deb.debian.org/debian buster/main amd64 libmodsecurity3 amd64 3.0.3-1 [513 kB]
Fetched 682 kB in 0s (1,596 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libfuzzy2:amd64.
(Reading database ... 45112 files and directories currently installed.)
Preparing to unpack .../libfuzzy2_2.14.1+git20180629.57fcfff-1_amd64.deb ...
Unpacking libfuzzy2:amd64 (2.14.1+git20180629.57fcfff-1) ...
Selecting previously unselected package liblua5.3-0:amd64.
Preparing to unpack .../liblua5.3-0_5.3.3-1.1_amd64.deb ...
Unpacking liblua5.3-0:amd64 (5.3.3-1.1) ...
Selecting previously unselected package libmaxminddb0:amd64.
Preparing to unpack .../libmaxminddb0_1.3.2-1_amd64.deb ...
Unpacking libmaxminddb0:amd64 (1.3.2-1) ...
Selecting previously unselected package libmodsecurity3:amd64.
Preparing to unpack .../libmodsecurity3_3.0.3-1_amd64.deb ...
Unpacking libmodsecurity3:amd64 (3.0.3-1) ...
Setting up libfuzzy2:amd64 (2.14.1+git20180629.57fcfff-1) ...
Setting up libmaxminddb0:amd64 (1.3.2-1) ...
Setting up liblua5.3-0:amd64 (5.3.3-1.1) ...
Setting up libmodsecurity3:amd64 (3.0.3-1) ...
Processing triggers for libc-bin (2.28-10) ...
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# ./configure --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli --add-dynamic-module=/tmp/ModSecurity-nginx
...
configuring additional dynamic modules
adding module in /tmp/ModSecurity-nginx
checking for ModSecurity library ... not found
checking for ModSecurity library in /usr/local/modsecurity ... not found
 ./configure: error: ngx_http_modsecurity_module requires the ModSecurity library.
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# 
  1. but the 'libmodsecurity-dev' package worked!
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# apt-get install libmodsecurity-dev                                                                  Reading package lists... Done                                                                                                                         Building dependency tree                                                                                                                              Reading state information... Done                                                                                                                     The following NEW packages will be installed:                                                                                                           libmodsecurity-dev                                                                                                                                  0 upgraded, 1 newly installed, 0 to remove and 3 not upgraded.                                                                                        Need to get 614 kB of archives.
After this operation, 5,840 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian buster/main amd64 libmodsecurity-dev amd64 3.0.3-1 [614 kB]
Fetched 614 kB in 0s (1,242 kB/s)
ydebconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libmodsecurity-dev:amd64.
(Reading database ... 45143 files and directories currently installed.)
Preparing to unpack .../libmodsecurity-dev_3.0.3-1_amd64.deb ...
Unpacking libmodsecurity-dev:amd64 (3.0.3-1) ...
Setting up libmodsecurity-dev:amd64 (3.0.3-1) ...
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# ./configure --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli --add-dynamic-module=/tmp/ModSecurity-nginx
...
 + ngx_brotli was configured
configuring additional dynamic modules
adding module in /tmp/ModSecurity-nginx
checking for ModSecurity library ... found
 + ngx_http_modsecurity_module was configured
checking for PCRE library ... found
checking for PCRE JIT support ... found
checking for OpenSSL library ... found
checking for zlib library ... found
creating objs/Makefile

Configuration summary
  + using threads
  + using system PCRE library
  + using system OpenSSL library
  + using system zlib library

  nginx path prefix: "/usr/share/nginx"
  nginx binary file: "/usr/share/nginx/sbin/nginx"
  nginx modules path: "/usr/share/nginx/modules"
  nginx configuration prefix: "/etc/nginx"
  nginx configuration file: "/etc/nginx/nginx.conf"
  nginx pid file: "/run/nginx.pid"
  nginx error log file: "/var/log/nginx/error.log"
  nginx http access log file: "/var/log/nginx/access.log"
  nginx http client request body temporary files: "/var/lib/nginx/body"
  nginx http proxy temporary files: "/var/lib/nginx/proxy"
  nginx http fastcgi temporary files: "/var/lib/nginx/fastcgi"
  nginx http uwsgi temporary files: "/var/lib/nginx/uwsgi"
  nginx http scgi temporary files: "/var/lib/nginx/scgi"

./configure: warning: the "--with-ipv6" option is deprecated
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4#
  1. but then the make install failed :(
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# make install
...
objs/addon/src/ngx_http_modsecurity_rewrite.o \
objs/ngx_http_modsecurity_module_modules.o \
-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now -lmodsecurity \
-shared
/usr/bin/ld: objs/addon/src/ngx_http_modsecurity_module.o: relocation R_X86_64_PC32 against symbol `stderr@@GLIBC_2.2.5' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: objs/addon/src/ngx_http_modsecurity_pre_access.o: relocation R_X86_64_PC32 against symbol `ngx_http_modsecurity_module' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: objs/addon/src/ngx_http_modsecurity_header_filter.o: relocation R_X86_64_PC32 against undefined symbol `ngx_http_core_module' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: objs/addon/src/ngx_http_modsecurity_body_filter.o: relocation R_X86_64_PC32 against symbol `ngx_http_modsecurity_module' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: objs/addon/src/ngx_http_modsecurity_log.o: relocation R_X86_64_PC32 against symbol `ngx_http_modsecurity_module' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: objs/addon/src/ngx_http_modsecurity_rewrite.o: relocation R_X86_64_PC32 against symbol `ngx_http_modsecurity_module' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: nonrepresentable section on output
collect2: error: ld returned 1 exit status
make[1]: *** [objs/Makefile:1952: objs/ngx_http_modsecurity_module.so] Error 1
make[1]: Leaving directory '/tmp/nginx-1.17.4'
make: *** [Makefile:11: install] Error 2
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# 
  1. using '--add-module' instead of '--add-dynamic-module' fixes this issue *shrug* And now `nginx -V` shows the 'ModSecurity-nginx' module. sweet!
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# make
...
objs/ngx_modules.o \
-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now -ldl -lpthread -lpthread -lcrypt -lm -lmodsecurity -lpcre -lssl -lcrypto -ldl -lpthread -lz \
-Wl,-E
sed -e "s|%%PREFIX%%|/usr/share/nginx|" \
		-e "s|%%PID_PATH%%|/run/nginx.pid|" \
		-e "s|%%CONF_PATH%%|/etc/nginx/nginx.conf|" \
		-e "s|%%ERROR_LOG_PATH%%|/var/log/nginx/error.log|" \
		< man/nginx.8 > objs/nginx.8
make[1]: Leaving directory '/tmp/nginx-1.17.4'
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# make install
cp conf/nginx.conf '/etc/nginx/nginx.conf.default'
test -d '/run' \
		|| mkdir -p '/run'
test -d '/var/log/nginx' \
		|| mkdir -p '/var/log/nginx'
test -d '/usr/share/nginx/html' \
		|| cp -R html '/usr/share/nginx'
test -d '/var/log/nginx' \
		|| mkdir -p '/var/log/nginx'
make[1]: Leaving directory '/tmp/nginx-1.17.4'
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# nginx -V
nginx version: nginx/1.17.4
built by gcc 8.3.0 (Debian 8.3.0-6)
built with OpenSSL 1.1.1d  10 Sep 2019
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# mv /usr/share/nginx/sbin/nginx /usr/sbin
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# nginx -V
nginx version: nginx/1.17.4
built by gcc 8.3.0 (Debian 8.3.0-6)
built with OpenSSL 1.1.1d  10 Sep 2019
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli --add-module=/tmp/ModSecurity-nginx
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4#
  1. meanwhile, it looks like the crs package I installed before dropped all my rules into /usr/share/modsecurity-crs
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# ls -lah /etc/modsecurity/
total 84K
drwxr-xr-x. 3 root root 4.0K Nov 12 08:22 .
drwxr-xr-x. 1 root root 4.0K Nov 12 08:38 ..
drwxr-xr-x. 2 root root 4.0K Nov 12 08:22 crs
-rw-r--r--. 1 root root 8.3K Dec 10  2018 modsecurity.conf-recommended
-rw-r--r--. 1 root root  52K Dec 10  2018 unicode.mapping
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# ls -lah /usr/share/modsecurity-crs/
total 28K
drwxr-xr-x.  4 root root 4.0K Nov 12 08:22 .
drwxr-xr-x.  1 root root 4.0K Nov 12 08:22 ..
-rw-r--r--.  1 root root  373 Nov 27  2018 owasp-crs.load
drwxr-xr-x.  2 root root 4.0K Nov 12 08:22 rules
drwxr-xr-x. 13 root root 4.0K Nov 12 08:22 util
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# ls -lah /usr/share/modsecurity-crs/rules/
total 648K
drwxr-xr-x. 2 root root 4.0K Nov 12 08:22 .
drwxr-xr-x. 4 root root 4.0K Nov 12 08:22 ..
-rw-r--r--. 1 root root  659 Nov 27  2018 crawlers-user-agents.data
-rw-r--r--. 1 root root  551 Nov 27  2018 iis-errors.data
-rw-r--r--. 1 root root  833 Nov 27  2018 java-classes.data
-rw-r--r--. 1 root root  264 Nov 27  2018 java-code-leakages.data
-rw-r--r--. 1 root root  240 Nov 27  2018 java-errors.data
-rw-r--r--. 1 root root  31K Nov 27  2018 lfi-os-files.data
-rw-r--r--. 1 root root 5.3K Nov 27  2018 php-config-directives.data
-rw-r--r--. 1 root root 9.1K Nov 27  2018 php-errors.data
-rw-r--r--. 1 root root  589 Nov 27  2018 php-function-names-933150.data
-rw-r--r--. 1 root root  21K Nov 27  2018 php-function-names-933151.data
-rw-r--r--. 1 root root  224 Nov 27  2018 php-variables.data
-rw-r--r--. 1 root root  13K Nov 27  2018 REQUEST-901-INITIALIZATION.conf
-rw-r--r--. 1 root root  12K Nov 27  2018 REQUEST-903.9001-DRUPAL-EXCLUSION-RULES.conf
-rw-r--r--. 1 root root  22K Nov 27  2018 REQUEST-903.9002-WORDPRESS-EXCLUSION-RULES.conf
-rw-r--r--. 1 root root 8.9K Nov 27  2018 REQUEST-903.9003-NEXTCLOUD-EXCLUSION-RULES.conf
-rw-r--r--. 1 root root 7.3K Nov 27  2018 REQUEST-903.9004-DOKUWIKI-EXCLUSION-RULES.conf
-rw-r--r--. 1 root root 1.8K Nov 27  2018 REQUEST-903.9005-CPANEL-EXCLUSION-RULES.conf
-rw-r--r--. 1 root root 1.5K Nov 27  2018 REQUEST-905-COMMON-EXCEPTIONS.conf
-rw-r--r--. 1 root root  11K Nov 27  2018 REQUEST-910-IP-REPUTATION.conf
-rw-r--r--. 1 root root 2.8K Nov 27  2018 REQUEST-911-METHOD-ENFORCEMENT.conf
-rw-r--r--. 1 root root 9.7K Nov 27  2018 REQUEST-912-DOS-PROTECTION.conf
-rw-r--r--. 1 root root 7.7K Nov 27  2018 REQUEST-913-SCANNER-DETECTION.conf
-rw-r--r--. 1 root root  52K Nov 27  2018 REQUEST-920-PROTOCOL-ENFORCEMENT.conf
-rw-r--r--. 1 root root  11K Nov 27  2018 REQUEST-921-PROTOCOL-ATTACK.conf
-rw-r--r--. 1 root root 6.3K Nov 27  2018 REQUEST-930-APPLICATION-ATTACK-LFI.conf
-rw-r--r--. 1 root root 5.9K Nov 27  2018 REQUEST-931-APPLICATION-ATTACK-RFI.conf
-rw-r--r--. 1 root root  54K Nov 27  2018 REQUEST-932-APPLICATION-ATTACK-RCE.conf
-rw-r--r--. 1 root root  31K Nov 27  2018 REQUEST-933-APPLICATION-ATTACK-PHP.conf
-rw-r--r--. 1 root root  41K Nov 27  2018 REQUEST-941-APPLICATION-ATTACK-XSS.conf
-rw-r--r--. 1 root root  70K Nov 27  2018 REQUEST-942-APPLICATION-ATTACK-SQLI.conf
-rw-r--r--. 1 root root 5.6K Nov 27  2018 REQUEST-943-APPLICATION-ATTACK-SESSION-FIXATION.conf
-rw-r--r--. 1 root root  15K Nov 27  2018 REQUEST-944-APPLICATION-ATTACK-JAVA.conf
-rw-r--r--. 1 root root 4.1K Nov 27  2018 REQUEST-949-BLOCKING-EVALUATION.conf
-rw-r--r--. 1 root root 3.9K Nov 27  2018 RESPONSE-950-DATA-LEAKAGES.conf
-rw-r--r--. 1 root root  19K Nov 27  2018 RESPONSE-951-DATA-LEAKAGES-SQL.conf
-rw-r--r--. 1 root root 3.7K Nov 27  2018 RESPONSE-952-DATA-LEAKAGES-JAVA.conf
-rw-r--r--. 1 root root 5.2K Nov 27  2018 RESPONSE-953-DATA-LEAKAGES-PHP.conf
-rw-r--r--. 1 root root 6.0K Nov 27  2018 RESPONSE-954-DATA-LEAKAGES-IIS.conf
-rw-r--r--. 1 root root 3.8K Nov 27  2018 RESPONSE-959-BLOCKING-EVALUATION.conf
-rw-r--r--. 1 root root 6.6K Nov 27  2018 RESPONSE-980-CORRELATION.conf
-rw-r--r--. 1 root root 1.7K Nov 27  2018 restricted-files.data
-rw-r--r--. 1 root root  390 Nov 27  2018 restricted-upload.data
-rw-r--r--. 1 root root  216 Nov 27  2018 scanners-headers.data
-rw-r--r--. 1 root root  418 Nov 27  2018 scanners-urls.data
-rw-r--r--. 1 root root 4.2K Nov 27  2018 scanners-user-agents.data
-rw-r--r--. 1 root root  717 Nov 27  2018 scripting-user-agents.data
-rw-r--r--. 1 root root 1.9K Nov 27  2018 sql-errors.data
-rw-r--r--. 1 root root 2.0K Nov 27  2018 sql-function-names.data
-rw-r--r--. 1 root root  973 Nov 27  2018 unix-shell.data
-rw-r--r--. 1 root root 3.9K Nov 27  2018 windows-powershell-commands.data
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4#
  1. I updated the 'nginx-install' script with the changes tested above; here's a diff (TODO: test this and write a sed line to add to make these changes to add to our install guide on the wiki)
[root@osestaging1 base]# diff install-nginx.20191112.orig install-nginx
7a8,12
> # mod_security --maltfield
> apt-get install -y libmodsecurity-dev modsecurity-crs
> cd /tmp
> git clone --depth 1 https://github.com/SpiderLabs/ModSecurity-nginx.git
> 
34c39
< ./configure --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli
---
> ./configure --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli --add-module=/tmp/ModSecurity-nginx
[root@osestaging1 base]# 
  1. the next step of the nginx mod_security guide linked above (skipping over the bits for loading the dynamic module which I no longer compiled-in as dynamic) is to put modsecurity.conf-recommended in /etc/nginx/modsec/modsecurity.conf and change "SecRuleEngine DetectionOnly" to "SecRuleEngine On". Well, it looks like we don't have to get that from github; our crs package already created modsecurity.conf-recommended in /etc/modsecurity/
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# cp /etc/modsecurity/modsecurity.conf-recommended /etc/modsecurity/modsecurity.conf
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# sed -i 's/SecRuleEngine DetectionOnly/SecRuleEngine On/' /etc/modsecurity/modsecurity.conf
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# diff /etc/modsecurity/modsecurity.conf-recommended /etc/modsecurity/modsecurity.conf
7c7
< SecRuleEngine DetectionOnly
---
> SecRuleEngine On
root@osestaging1-discourse-ose:/tmp/nginx-1.17.4# 
  1. now we create our nginx modsecurity config file, which can be then included into our nginx discourse vhost config file
cat << EOF > /etc/nginx/conf.d/modsecurity.include
################################################################################
# File:    modsecurity.include
# Version: 0.1
# Purpose: Defines mod_security rules for the discourse vhost
#          This should be included in the server{} blocks nginx vhosts.
# Author:  Michael Altfield <michael@opensourceecology.org>
# Created: 2019-11-12
# Updated: 2019-11-12
################################################################################
Include "/etc/modsecurity/modsecurity.conf"

# Basic test rule
SecRule ARGS:testparam "@contains test" "id:1234,deny,status:403"
EOF
  1. testing the nginx config failed
root@osestaging1-discourse-ose:/etc/nginx/conf.d# nginx -t
nginx: [emerg] "modsecurity_rules_file" directive Rules error. File: /etc/modsecurity/modsecurity.conf. Line: 39. Column: 33. As of ModSecurity version 3.0, SecRequestBodyInMemoryLimit is no longer supported. Instead, you can use your web server configurations to control those values. ModSecurity will follow the web server decision.  in /etc/nginx/conf.d/discourse.conf:39
nginx: configuration file /etc/nginx/nginx.conf test failed
root@osestaging1-discourse-ose:/etc/nginx/conf.d#
  1. commenting-out the offending line for SecRequestBodyInMemoryLimit fixed it, but now there's a distinct issue with the log path
root@osestaging1-discourse-ose:/etc/nginx/conf.d# sed --in-place=.`date "+%Y%m%d_%H%M%S"` 's^\(\s*\)[^#]*SecRequestBodyInMemoryLimit\(.*\)^\1#SecRequestBodyInMemoryLimit\2^' /etc/modsecurity/modsecurity.conf
root@osestaging1-discourse-ose:/etc/nginx/conf.d# nginx -t
nginx: [emerg] "modsecurity_rules_file" directive Failed to open file: /var/log/apache2/modsec_audit.log in /etc/nginx/conf.d/discourse.conf:39
nginx: configuration file /etc/nginx/nginx.conf test failed
root@osestaging1-discourse-ose:/etc/nginx/conf.d# 
  1. I changed the log path from apache to nginx; that fixed the nginx config issues
root@osestaging1-discourse-ose:/etc/modsecurity# sed --in-place=.`date "+%Y%m%d_%H%M%S"` '/nginx/! s%^\(\s*\)[^#]*SecAuditLog \(.*\)%#\1SecAuditLog \2\n\1SecAuditLog /var/log/nginx/modsec_audit.log%' /etc/modsecurity/modsecurity.conf
root@osestaging1-discourse-ose:/etc/modsecurity# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
root@osestaging1-discourse-ose:/etc/modsecurity# 
  1. as a recap, here's the changes I made to modsecurity.conf-recommended to the new modsecurity.conf file
root@osestaging1-discourse-ose:/etc/modsecurity# diff /etc/modsecurity/modsecurity.conf-recommended /etc/modsecurity/modsecurity.conf
7c7
< SecRuleEngine DetectionOnly
---
> SecRuleEngine On
45c45
< SecRequestBodyInMemoryLimit 131072
---
> #SecRequestBodyInMemoryLimit 131072
193c193,194
< SecAuditLog /var/log/apache2/modsec_audit.log
---
> #SecAuditLog /var/log/apache2/modsec_audit.log
> SecAuditLog /var/log/nginx/modsec_audit.log
root@osestaging1-discourse-ose:/etc/modsecurity# 
  1. I tried to restart nginx, but it failed with a message that it couldn't bind to an address already in use. I think it's because nginx is executed by 'runsv' instead of as a typical service inside the docker container
root@osestaging1-discourse-ose:/etc/modsecurity# ps -ef | grep -i nginx
root        41    34  0 08:21 ?        00:00:00 runsv nginx
root     23605    41  4 10:16 ?        00:00:00 /usr/sbin/nginx
root     23608  4977  0 10:16 pts/1    00:00:00 grep -i nginx
root@osestaging1-discourse-ose:/etc/modsecurity# 
  1. I destoreyed & restarted the 'disocourse_ose' docker container because I don't know the proper way to restart nginx inside the docker container, but when it came back up all the configs I createad above were lost. Even the version of nginx is stale
root@osestaging1-discourse-ose:/etc/nginx/conf.d# nginx -V
nginx version: nginx/1.17.4
built by gcc 8.3.0 (Debian 8.3.0-6) 
built with OpenSSL 1.1.1d  10 Sep 2019
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli
root@osestaging1-discourse-ose:/etc/nginx/conf.d# ls -lah /etc/modsecurity
ls: cannot access '/etc/modsecurity': No such file or directory
root@osestaging1-discourse-ose:/etc/nginx/conf.d# ls -lah /etc/nginx/conf.d
total 28K
drwxr-xr-x. 1 root root 4.0K Nov 12 10:23 .
drwxr-xr-x. 1 root root 4.0K Oct 13 22:41 ..
-rw-r--r--. 1 root root 8.3K Nov 11 10:49 discourse.conf
root@osestaging1-discourse-ose:/etc/nginx/conf.d# dpkg -l | grep -i modsecurity
root@osestaging1-discourse-ose:/etc/nginx/conf.d# 
  1. I should probably figure out how to restart nginx inside this container. Here's what I get from systemctl
root@osestaging1-discourse-ose:/etc/nginx/conf.d# systemctl restart nginx
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
root@osestaging1-discourse-ose:/etc/nginx/conf.d# 
  1. ok, so there's an init.d script. but, yeah, when I attempt to start nginx it hit the same issue with binding to an existing nginx

</pre> root@osestaging1-discourse-ose:/etc/nginx/conf.d# /etc/init.d/nginx restart [FAIL] Restarting nginx: nginx failed! root@osestaging1-discourse-ose:/etc/nginx/conf.d# tail /var/log/nginx/error.log 2019/11/12 10:32:45 [emerg] 1123#1123: bind() to unix:/shared/nginx.http.sock failed (98: Address already in use) 2019/11/12 10:32:45 [emerg] 1123#1123: bind() to unix:/shared/nginx.http.sock failed (98: Address already in use) 2019/11/12 10:32:45 [emerg] 1123#1123: bind() to unix:/shared/nginx.http.sock failed (98: Address already in use) 2019/11/12 10:32:45 [emerg] 1123#1123: bind() to unix:/shared/nginx.http.sock failed (98: Address already in use) 2019/11/12 10:32:45 [emerg] 1123#1123: bind() to unix:/shared/nginx.http.sock failed (98: Address already in use) 2019/11/12 10:32:45 [emerg] 1123#1123: still could not bind() 2019/11/12 10:32:47 [emerg] 1127#1127: bind() to unix:/shared/nginx.http.sock failed (98: Address already in use) 2019/11/12 10:32:47 [emerg] 1127#1127: bind() to unix:/shared/nginx.http.sock failed (98: Address already in use) 2019/11/12 10:32:47 [emerg] 1127#1127: bind() to unix:/shared/nginx.http.sock failed (98: Address already in use) 2019/11/12 10:32:47 [emerg] 1127#1127: bind() to unix:/shared/nginx.http.sock failed (98: Address already in use) root@osestaging1-discourse-ose:/etc/nginx/conf.d# ps -ef | grep -i nginx root 46 40 0 10:19 ? 00:00:00 runsv nginx root 1174 46 1 10:33 ? 00:00:00 /usr/sbin/nginx root 1177 108 0 10:33 pts/1 00:00:00 grep -i nginx root@osestaging1-discourse-ose:/etc/nginx/conf.d# </pre>

  1. so it looks like 'runsv' is part of runit https://en.wikipedia.org/wiki/Runit
    1. indeed, this is something that's been implemented into Ruby on Rails
  2. I find that if I kill the 'runsv nginx' process, then it automatically re-runs itself after a few seconds. Perhaps this is how I restart it?
root@osestaging1-discourse-ose:/# ps -ef | grep -i nginx
root      2798    40  0 10:51 ?        00:00:00 runsv nginx
root      2852  2798  2 10:52 ?        00:00:00 /usr/sbin/nginx
root      2855   108  0 10:52 pts/1    00:00:00 grep -i nginx
root@osestaging1-discourse-ose:/# kill 2798
root@osestaging1-discourse-ose:/# ps -ef | grep -i nginx
root      2871   108  0 10:52 pts/1    00:00:00 grep -i nginx
root@osestaging1-discourse-ose:/# ps -ef | grep -i nginx
root      2873    40  0 10:52 ?        00:00:00 runsv nginx
root      2874  2873  2 10:52 ?        00:00:00 /usr/sbin/nginx
root      2877   108  0 10:52 pts/1    00:00:00 grep -i nginx
root@osestaging1-discourse-ose:/# 
  1. but, actually, my discourse site is inaccessible now; I get a 502 Bad Gateway now after that kill
  2. ah, I found the answer in the templates/web.template.yml file https://github.com/discourse/discourse_docker/blob/master/templates/web.template.yml
  3. it's the `sv` command
root@osestaging1-discourse-ose:/var/www/discourse# sv stop nginx
ok: down: nginx: 1s, normally up
root@osestaging1-discourse-ose:/var/www/discourse# sv start nginx
ok: run: nginx: (pid 269) 0s
root@osestaging1-discourse-ose:/var/www/discourse# sv --help
usage: sv [-v] [-w sec] command service ...

root@osestaging1-discourse-ose:/var/www/discourse# 
  1. there's no man page on my docker container; here's the sv man page http://smarden.org/runit/sv.8.html
  2. since it's not documented anywhere, I made a quick topic on the discourse meta forums with this question & solution https://meta.discourse.org/t/how-to-restart-discorse-nginx-in-docker-container/133187
    1. awkwardly, it appears that I can't mark my own reply to a topic as a solution? That's not very stack-exchange-like..
  3. also, I added a line to the 'install-nginx' script to delete the ModSecurity-nginx repo from /tmp/ along with all the other cleanup lines
[root@osestaging1 base]# diff install-nginx.20191112.orig install-nginx
7a8,12
> # mod_security --maltfield
> apt-get install -y libmodsecurity-dev modsecurity-crs
> cd /tmp
> git clone --depth 1 https://github.com/SpiderLabs/ModSecurity-nginx.git
> 
34c39
< ./configure --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli
---
> ./configure --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli --add-module=/tmp/ModSecurity-nginx
44a50
> rm -fr /tmp/ModSecurity-nginx
[root@osestaging1 base]# 
  1. I also discovered that the existing script has a minor logic error: it doesn't actually cleanup the nginx source code. I commented about this on the discourse meta forums https://meta.discourse.org/t/how-to-run-discourse-in-apache-vhost-not-nginx/133112/14
  2. I re-did all the install-nginx commands in the docker container again
  3. finally, I added the following lines inside the server{} block of /etc/nginx/conf.d/discourse.conf
		modsecurity on;
		modsecurity_rules_file /etc/nginx/conf.d/modsecurity.include;
  1. and I stopped & started nginx
root@osestaging1-discourse-ose:/var/www/discourse# sv stop nginx
ok: down: nginx: 0s, normally up
root@osestaging1-discourse-ose:/var/www/discourse# sv start nginx
ok: run: nginx: (pid 11237) 0s
root@osestaging1-discourse-ose:/var/www/discourse# 
  1. I confirmed that I can query the page and get a 200 with curl
user@ose:~$ curl -Iks https://discourse.opensourceecology.org/
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Nov 2019 11:53:34 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Vary: Accept-Encoding
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Permitted-Cross-Domain-Policies: none
Referrer-Policy: strict-origin-when-cross-origin
X-Discourse-Route: list/latest
Cache-Control: no-cache, no-store
Content-Security-Policy: base-uri 'none'; object-src 'none'; script-src 'unsafe-eval' 'report-sample' http://discourse.opensourceecology.org/logs/ http://discourse.opensourceecology.org/sidekiq/ http://discourse.opensourceecology.org/mini-profiler-resources/ http://discourse.opensourceecology.org/assets/ http://discourse.opensourceecology.org/brotli_asset/ http://discourse.opensourceecology.org/extra-locales/ http://discourse.opensourceecology.org/highlight-js/ http://discourse.opensourceecology.org/javascripts/ http://discourse.opensourceecology.org/plugins/ http://discourse.opensourceecology.org/theme-javascripts/ http://discourse.opensourceecology.org/svg-sprite/; worker-src 'self' blob:
X-Request-Id: 26257fd3-a4f7-4926-95e7-01cc7373a5fa
X-Runtime: 0.146294
Strict-Transport-Security: max-age=15552001
Public-Key-Pins: pin-sha256="UbSbHFsFhuCrSv9GNsqnGv4CbaVh5UV5/zzgjLgHh9c="; pin-sha256="YLh1dUR9y6Kja30RrAn7JKnbQG/uEtLMkBgFF2Fuihg="; pin-sha256="C5+lpZ7tcVwmwQIMcRtPbsQtWLABXhQzejna0wHFr8M="; pin-sha256="Vjs8r4z+80wjNcr1YKepWQboSIRi63WsWXhIMN+eWys="; pin-sha256="lCppFqbkrlJ3EcVFAkeip0+44VaoJUymbnOaEUk7tEU="; pin-sha256="K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q="; pin-sha256="Y9mvm0exBk1JoQ57f9Vm28jKo5lFm/woKcVxrYxu80o="; pin-sha256="EGn6R6CqT4z3ERscrqNl7q7RCzJmDe9uBhS/rnCHU="; pin-sha256="NIdnza073SiyuN1TUa7DDGjOxc1p0nbfOCfbxPWAZGQ="; pin-sha256="fNZ8JI9p2D/C+bsB3LH3rWejY9BGBDeW0JhMOiMfa7A="; pin-sha256="oyD01TTXvpfBro3QSZc1vIlcMjrdLTiL/M9mLCPX+Zo="; pin-sha256="0cRTd+vc1hjNFlHcLgLCHXUeWqn80bNDH/bs9qMTSPo="; pin-sha256="MDhNnV1cmaPdDDONbiVionUHH2QIf2aHJwq/lshMWfA="; pin-sha256="OIZP7FgTBf7hUpWHIA7OaPVO2WrsGzTl9vdOHLPZmJU="; max-age=3600; includeSubDomains; report-uri="http:opensourceecology.org/hpkp-report"

user@ose:~$
  1. And, finally, I successfully confirmed that modsecurity returns a 403 when the testparm from 'modsecurity.include' is set
user@ose:~$ curl -Iks https://discourse.opensourceecology.org/?testparam=test
HTTP/1.1 403 Forbidden
Server: nginx
Date: Tue, 12 Nov 2019 11:53:48 GMT
Content-Type: text/html
Content-Length: 146
Connection: keep-alive

user@ose:~$
  1. And the modsecurity log reflects this too!
root@osestaging1-discourse-ose:/var/www/discourse# tail -f /var/log/nginx/modsec_audit.log 
...



---ZPTABsP3---A--
[12/Nov/2019:11:56:29 +0000] 157355978960.000481 10.241.189.10 0 10.241.189.10 12147
---ZPTABsP3---B--
HEAD /?testparam=test HTTP/1.1
Host: discourse.opensourceecology.org
X-Forwarded-For: 10.241.189.10
X-Forwarded-Proto: https
Connection: close
X-Real-IP: 10.241.189.10
User-Agent: curl/7.52.1
Accept: */*

---ZPTABsP3---D--

---ZPTABsP3---F--
HTTP/1.1 403
Server: nginx
Date: Tue, 12 Nov 2019 11:56:29 GMT
Content-Length: 146
Content-Type: text/html
Connection: close

---ZPTABsP3---H--

---ZPTABsP3---I--

---ZPTABsP3---J--

---ZPTABsP3---Z--

^C
root@osestaging1-discourse-ose:/var/www/discourse# 

  1. To actually enable the CRS, I updated the nginx modsecurity.include file
root@osestaging1-discourse-ose:/var/www/discourse# cat /etc/nginx/conf.d/modsecurity.include 
################################################################################
# File:    modsecurity.include
# Version: 0.1
# Purpose: Defines mod_security rules for the discourse vhost
#          This should be included in the server{} blocks nginx vhosts.
# Author:  Michael Altfield <michael@opensourceecology.org>
# Created: 2019-11-12
# Updated: 2019-11-12
################################################################################
Include "/etc/modsecurity/modsecurity.conf"

# OWASP Core Rule Set, installed from the 'modsecurity-crs' package in debian
Include /etc/modsecurity/crs/crs-setup.conf
Include /usr/share/modsecurity-crs/rules/*.conf
root@osestaging1-discourse-ose:/var/www/discourse# 
  1. and now it's blocking this sqli attack
user@ose:~$ curl -Iks 'https://discourse.opensourceecology.org/?sqli_attack="; DROP TABLE please;'
HTTP/1.1 403 Forbidden
Server: nginx
Date: Tue, 12 Nov 2019 12:10:32 GMT
Content-Type: text/html
Content-Length: 146
Connection: keep-alive

user@ose:~$ 
  1. compare this to meta.discourse.org, which--intesting--gives me a 400 error?
user@ose:~$ curl -Iks 'https://meta.discourse.org/?sqli_attack="; DROP TABLE please;'
HTTP/2 400 
date: Tue, 12 Nov 2019 12:11:13 GMT
server: nginx

user@ose:~$ 
  1. so it looks like meta.discourse.org does have some sort of WAF in-place. It works until I use the word "DROP"
user@ose:~$ curl -Iks 'https://meta.discourse.org/?sqli_attack=";'
HTTP/2 200 
date: Tue, 12 Nov 2019 12:13:10 GMT
content-type: text/html; charset=utf-8
server: nginx
vary: Accept-Encoding
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
x-download-options: noopen
x-permitted-cross-domain-policies: none
referrer-policy: strict-origin-when-cross-origin
x-discourse-route: list/latest
cache-control: no-cache, no-store
content-security-policy: base-uri 'none'; object-src 'none'; script-src 'unsafe-eval' 'report-sample' https://meta.discourse.org/logs/ https://meta.discourse.org/sidekiq/ https://meta.discourse.org/mini-profiler-resources/ https://d11a6trkgmumsb.cloudfront.net/assets/ https://d11a6trkgmumsb.cloudfront.net/brotli_asset/ https://meta.discourse.org/extra-locales/ https://d3bpeqsaub0i6y.cloudfront.net/highlight-js/ https://d3bpeqsaub0i6y.cloudfront.net/javascripts/ https://d3bpeqsaub0i6y.cloudfront.net/plugins/ https://d3bpeqsaub0i6y.cloudfront.net/theme-javascripts/ https://d3bpeqsaub0i6y.cloudfront.net/svg-sprite/ https://www.google-analytics.com/analytics.js; worker-src 'self' blob:
x-request-id: ad7e08c2-85b9-4be5-ae91-68187d8814ad
x-runtime: 0.063429
strict-transport-security: max-age=31536000

user@ose:~$ curl -Iks 'https://meta.discourse.org/?sqli_attack="; DROP'
HTTP/2 400 
date: Tue, 12 Nov 2019 12:13:14 GMT
server: nginx

user@ose:~$ 
  1. Interestingly, because of the way Discourse is designed, I can still browse my discourse site when querying something like "https://discourse.opensourceecology.org/t/test-topic-that-is-15-characters-or-more/14?sqli_attempt=%22DROP" in my web browser. It's just that a bunch of the JS queries return 403. Maybe it's because of the browser's cache?
    1. even when I clear the cache, I can still view the topic; hmm..
    2. Discourse of course makes a ton of reqeusts for a single page load. All of them succeed except one POST it makes to the message-bus, not because of the URI (it strips away my sql_attempt GET var), but namely because there's still a "Referrer" header in the request that includes my malicious variable. So the server returns 403 on all those requests...but the page itself appears to be unaffected in in the ux
  2. I updated our install guide with some commands to update the `install-nginx` script https://wiki.opensourceecology.org/wiki/Discourse#Nginx_mod_security
  3. I tried to add a new template for adding & updating the relevant nginx & modsecurity configuration files on the container during the bootstrap, but I kept getting an error on the (super long execution) `./launcher rebuild discourse_ose` setep
[root@osestaging1 discourse]# ./launcher rebuild discourse_ose
...
2019-11-12 13:51:09.437 UTC [49] LOG:  database system is shut down


FAILED
--------------------
Pups::ExecError: cp /etc/modsecurity/modsecurity.conf-recommended /etc/modsecurity/modsecurity.conf failed with return #<Process::Status: pid 11074 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params {"cmd"=>["cp /etc/modsecurity/modsecurity.conf-recommended /etc/modsecurity/modsecurity.conf"]}
e744f7701026c015b115924bf758cec781552e2c65693ae198db8742416eb069
 FAILED TO BOOTSTRAP  please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
[root@osestaging1 discourse]# 

  1. so my best guess as to the only reason that would fail is if the '/etc/modsecurity/modsecurity.conf-recommended' file doesn't exist yet. that file is put in-place by installing the 'modsecurity-crs' package, which is supposed to happen in the 'install-nginx' script. So I guess that script doesn't get executed before the template is called? That's a problem.
  2. fuck it; I just added a line to install that package to the template's exec lines as well. Note, interestingly, that it picked-up that I was running an apt command, and it delayed my template to later *shrug*
[root@osestaging1 discourse]# time ./launcher rebuild discourse_ose
...
I, [2019-11-12T14:29:26.856387 #1]  INFO -- : > sudo apt-get install -y modsecurity-crs
debconf: delaying package configuration, since apt-utils is not installed
I, [2019-11-12T14:29:45.766725 #1]  INFO -- : Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  apache2-bin libapache2-mod-security2 libapr1 libaprutil1
  libaprutil1-dbd-sqlite3 libaprutil1-ldap libbrotli1 libjansson4 liblua5.1-0
  liblua5.2-0 libyajl2
Suggested packages:
  apache2-doc apache2-suexec-pristine | apache2-suexec-custom www-browser lua
  geoip-database-contrib ruby
The following NEW packages will be installed:
  apache2-bin libapache2-mod-security2 libapr1 libaprutil1
  libaprutil1-dbd-sqlite3 libaprutil1-ldap libbrotli1 libjansson4 liblua5.1-0
  liblua5.2-0 libyajl2 modsecurity-crs
0 upgraded, 12 newly installed, 0 to remove and 0 not upgraded.
Need to get 2,544 kB of archives.
After this operation, 11.1 MB of additional disk space will be used.
Get:1 http://deb.debian.org/debian buster/main amd64 libapr1 amd64 1.6.5-1+b1 [102 kB]
Get:2 http://deb.debian.org/debian buster/main amd64 libaprutil1 amd64 1.6.1-4 [91.8 kB]
Get:3 http://deb.debian.org/debian buster/main amd64 libaprutil1-dbd-sqlite3 amd64 1.6.1-4 [18.7 kB]
Get:4 http://deb.debian.org/debian buster/main amd64 libaprutil1-ldap amd64 1.6.1-4 [16.8 kB]
Get:5 http://deb.debian.org/debian buster/main amd64 libbrotli1 amd64 1.0.7-2 [270 kB]
Get:6 http://deb.debian.org/debian buster/main amd64 libjansson4 amd64 2.12-1 [38.0 kB]
Get:7 http://deb.debian.org/debian buster/main amd64 liblua5.2-0 amd64 5.2.4-1.1+b2 [110 kB]
Get:8 http://deb.debian.org/debian buster/main amd64 apache2-bin amd64 2.4.38-3+deb10u1 [1,307 kB]
Get:9 http://deb.debian.org/debian buster/main amd64 liblua5.1-0 amd64 5.1.5-8.1+b2 [111 kB]
Get:10 http://deb.debian.org/debian buster/main amd64 libyajl2 amd64 2.1.0-3 [23.8 kB]
Get:11 http://deb.debian.org/debian buster/main amd64 libapache2-mod-security2 amd64 2.9.3-1 [257 kB]
Get:12 http://deb.debian.org/debian buster/main amd64 modsecurity-crs all 3.1.0-1 [198 kB]
Fetched 2,544 kB in 0s (20.6 MB/s)
Selecting previously unselected package libapr1:amd64.
(Reading database ... 44564 files and directories currently installed.)
Preparing to unpack .../00-libapr1_1.6.5-1+b1_amd64.deb ...
Unpacking libapr1:amd64 (1.6.5-1+b1) ...
Selecting previously unselected package libaprutil1:amd64.
Preparing to unpack .../01-libaprutil1_1.6.1-4_amd64.deb ...
Unpacking libaprutil1:amd64 (1.6.1-4) ...
Selecting previously unselected package libaprutil1-dbd-sqlite3:amd64.
Preparing to unpack .../02-libaprutil1-dbd-sqlite3_1.6.1-4_amd64.deb ...
Unpacking libaprutil1-dbd-sqlite3:amd64 (1.6.1-4) ...
Selecting previously unselected package libaprutil1-ldap:amd64.
Preparing to unpack .../03-libaprutil1-ldap_1.6.1-4_amd64.deb ...  
Unpacking libaprutil1-ldap:amd64 (1.6.1-4) ...
Selecting previously unselected package libbrotli1:amd64.
Preparing to unpack .../04-libbrotli1_1.0.7-2_amd64.deb ...
Unpacking libbrotli1:amd64 (1.0.7-2) ...
Selecting previously unselected package libjansson4:amd64.
Preparing to unpack .../05-libjansson4_2.12-1_amd64.deb ...
Unpacking libjansson4:amd64 (2.12-1) ...
Selecting previously unselected package liblua5.2-0:amd64.
Preparing to unpack .../06-liblua5.2-0_5.2.4-1.1+b2_amd64.deb ...  
Unpacking liblua5.2-0:amd64 (5.2.4-1.1+b2) ...
Selecting previously unselected package apache2-bin.
Preparing to unpack .../07-apache2-bin_2.4.38-3+deb10u1_amd64.deb ...
Unpacking apache2-bin (2.4.38-3+deb10u1) ...
Selecting previously unselected package liblua5.1-0:amd64.
Preparing to unpack .../08-liblua5.1-0_5.1.5-8.1+b2_amd64.deb ...  
Unpacking liblua5.1-0:amd64 (5.1.5-8.1+b2) ...
Selecting previously unselected package libyajl2:amd64.
Preparing to unpack .../09-libyajl2_2.1.0-3_amd64.deb ...
Unpacking libyajl2:amd64 (2.1.0-3) ...
Selecting previously unselected package libapache2-mod-security2.  
Preparing to unpack .../10-libapache2-mod-security2_2.9.3-1_amd64.deb ...
Unpacking libapache2-mod-security2 (2.9.3-1) ...
Selecting previously unselected package modsecurity-crs.
Preparing to unpack .../11-modsecurity-crs_3.1.0-1_all.deb ...
Unpacking modsecurity-crs (3.1.0-1) ...
Setting up libbrotli1:amd64 (1.0.7-2) ...
Setting up libyajl2:amd64 (2.1.0-3) ...
Setting up libapr1:amd64 (1.6.5-1+b1) ...
Setting up modsecurity-crs (3.1.0-1) ...
Setting up libjansson4:amd64 (2.12-1) ...
Setting up liblua5.2-0:amd64 (5.2.4-1.1+b2) ...
Setting up liblua5.1-0:amd64 (5.1.5-8.1+b2) ...
Setting up libaprutil1:amd64 (1.6.1-4) ...
Setting up libaprutil1-ldap:amd64 (1.6.1-4) ...
Setting up libaprutil1-dbd-sqlite3:amd64 (1.6.1-4) ...
Setting up apache2-bin (2.4.38-3+deb10u1) ...
Setting up libapache2-mod-security2 (2.9.3-1) ...
Processing triggers for libc-bin (2.28-10) ...

I, [2019-11-12T14:29:45.767132 #1]  INFO -- : > cp /etc/modsecurity/modsecurity.conf-recommended /etc/modsecurity/modsecurity.conf
I, [2019-11-12T14:29:45.779094 #1]  INFO -- :
I, [2019-11-12T14:29:45.787697 #1]  INFO -- : File > /etc/nginx/conf.d/modsecurity.include  chmod:   chown:
I, [2019-11-12T14:29:45.788674 #1]  INFO -- : Replacing (?-mix:server.+{) with server {
  modsecurity on;
  modsecurity_rules_file /etc/nginx/conf.d/modsecurity.include; in /etc/nginx/conf.d/discourse.conf
I, [2019-11-12T14:29:45.791643 #1]  INFO -- : > echo "Beginning of custom commands"
I, [2019-11-12T14:29:45.799587 #1]  INFO -- : Beginning of custom commands

I, [2019-11-12T14:29:45.800032 #1]  INFO -- : > echo "End of custom commands"
I, [2019-11-12T14:29:45.806371 #1]  INFO -- : End of custom commands

I, [2019-11-12T14:29:45.806650 #1]  INFO -- : Terminating async processes
I, [2019-11-12T14:29:45.806739 #1]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main pid: 49
I, [2019-11-12T14:29:45.806965 #1]  INFO -- : Sending TERM to exec chpst -u redis -U redis /usr/bin/redis-server /etc/redis/redis.conf pid: 166
2019-11-12 14:29:45.811 UTC [49] LOG:  received fast shutdown request
166:signal-handler (1573568985) Received SIGTERM scheduling shutdown...
166:M 12 Nov 2019 14:29:45.829 # User requested shutdown...
166:M 12 Nov 2019 14:29:45.829 * Saving the final RDB snapshot before exiting.
2019-11-12 14:29:46.082 UTC [49] LOG:  aborting any active transactions
166:M 12 Nov 2019 14:29:46.100 * DB saved on disk
166:M 12 Nov 2019 14:29:46.100 # Redis is now ready to exit, bye bye...
2019-11-12 14:29:46.107 UTC [49] LOG:  worker process: logical replication launcher (PID 58) exited with exit code 1
2019-11-12 14:29:46.109 UTC [53] LOG:  shutting down
2019-11-12 14:29:46.234 UTC [49] LOG:  database system is shut down

sha256:76988657b060bb18646decb9b91710d4e265141cfaeebea727752ededa162796
c00e0459426ce7ba3b2539c1d411909702488c2ec99cd56aac4c30db4b436d91

+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=2 -e UNICORN_SIDEKIQS=1 -e RUBY_GLOBAL_METHOD_CACHE_SIZE=131072 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscourse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=172.17.0.1 -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-discourse-ose -e DOCKER_HOST_IP=172.17.0.1 --name discourse_ose -t -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:fc:97:b8:b4:0d local_discourse/discourse_ose /sbin/boot
14df01c72c4682248df91a31a5dc4d462167293e9809362076267975cc759465

real    10m12.144s
user    0m2.632s
sys     0m2.333s
[root@osestaging1 discourse]#

[root@osestaging1 discourse]# time ./launcher rebuild discourse_ose

  1. I added all the commands (before it was just a shorter list of commands for testing to find the error). The bootstrap finishes, but nginx now fails to come up.
I, [2019-11-12T14:43:44.756483 #1]  INFO -- : > sed -i '/nginx/! s%^\(\s*\)[^#]*SecAuditLog \(.*\)%#\1SecAuditLog \2\n\1SecAuditLog /var/log/nginx/modsec_audit.log%' /etc/modsecurity/modsecurity.conf
I, [2019-11-12T14:43:44.765192 #1]  INFO -- :
I, [2019-11-12T14:43:44.773608 #1]  INFO -- : File > /etc/nginx/conf.d/modsecurity.include  chmod:   chown:
I, [2019-11-12T14:43:44.774845 #1]  INFO -- : Replacing (?-mix:server.+{) with server {
  modsecurity on;
  modsecurity_rules_file /etc/nginx/conf.d/modsecurity.include; in /etc/nginx/conf.d/discourse.conf
I, [2019-11-12T14:43:44.776540 #1]  INFO -- : > echo "Beginning of custom commands"
I, [2019-11-12T14:43:44.780945 #1]  INFO -- : Beginning of custom commands

I, [2019-11-12T14:43:44.781118 #1]  INFO -- : > echo "End of custom commands"
I, [2019-11-12T14:43:44.784763 #1]  INFO -- : End of custom commands

I, [2019-11-12T14:43:44.785048 #1]  INFO -- : Terminating async processes
I, [2019-11-12T14:43:44.785132 #1]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main pid: 49
I, [2019-11-12T14:43:44.785268 #1]  INFO -- : Sending TERM to exec chpst -u redis -U redis /usr/bin/redis-server /etc/redis/redis.conf pid: 166
2019-11-12 14:43:44.790 UTC [49] LOG:  received fast shutdown request
166:signal-handler (1573569824) Received SIGTERM scheduling shutdown...
2019-11-12 14:43:44.803 UTC [49] LOG:  aborting any active transactions
2019-11-12 14:43:44.826 UTC [49] LOG:  worker process: logical replication launcher (PID 58) exited with exit code 1
2019-11-12 14:43:44.826 UTC [53] LOG:  shutting down
166:M 12 Nov 2019 14:43:44.845 # User requested shutdown...
166:M 12 Nov 2019 14:43:44.845 * Saving the final RDB snapshot before exiting.
166:M 12 Nov 2019 14:43:44.932 * DB saved on disk
166:M 12 Nov 2019 14:43:44.932 # Redis is now ready to exit, bye bye...
2019-11-12 14:43:44.982 UTC [49] LOG:  database system is shut down
sha256:f43a4c400a22a43bc380dd6da2b1dfd6c4b19abc441357830aac0b972a12420f
b28b80fd026569e7ec05fad9fb6094b84bde6fc5e24ac7aa387f40b26acc14ee
Removing old container
+ /bin/docker rm discourse_ose
discourse_ose

+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=2 -e UNICORN_SIDEKIQS=1 -e RUBY_GLOBAL_METHOD_CACHE_SIZE=131072 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscourse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=172.17.0.1 -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-discourse-ose -e DOCKER_HOST_IP=172.17.0.1 --name discourse_ose -t -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:fc:97:b8:b4:0d local_discourse/discourse_ose /sbin/boot
09ffc71da6507d7100a5a7329d27893ca6128b31b2f8c9c650f49b9a201a975d

real    10m40.690s
user    0m2.805s
sys     0m2.466s
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/var/www/discourse# sv status nginx
down: nginx: 1s, normally up, want up
root@osestaging1-discourse-ose:/var/www/discourse# nginx -t
nginx: [emerg] unknown directive "modsecurity" in /etc/nginx/conf.d/discourse.conf:37
nginx: configuration file /etc/nginx/nginx.conf test failed
root@osestaging1-discourse-ose:/var/www/discourse# 
  1. damn, nginx doesn't have ModSecurity. Did it even run 'nginx-install' at all?
root@osestaging1-discourse-ose:/var/www/discourse# nginx -V
nginx version: nginx/1.17.4
built by gcc 8.3.0 (Debian 8.3.0-6) 
built with OpenSSL 1.1.1d  10 Sep 2019
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wno-deprecated-declarations' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_v2_module --with-http_sub_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads --add-module=/tmp/ngx_brotli
root@osestaging1-discourse-ose:/var/www/discourse#
  1. it looks like 'install-nginx' is run by the Dockerfile, but it's still not clear why it didn't take my changes
[root@osestaging1 discourse]# grep -irl 'install-nginx' *
image/base/Dockerfile
[root@osestaging1 discourse]# grep 'install-nginx' image/base/Dockerfile 
ADD install-nginx /tmp/install-nginx
RUN /tmp/install-nginx
[root@osestaging1 discourse]# 
  1. I asked the community why the `launcher rebuild app` command might not be using my updated `install-nginx` script https://meta.discourse.org/t/how-to-run-discourse-in-apache-vhost-not-nginx/133112/15

Mon Nov 11, 2019

  1. the discourse devs responded again https://meta.discourse.org/t/discourse-purge-cache-method-on-content-changes/132917/9
  2. one linked me to the source code for the "anonymous" caching in Discourse (I suppose they think that users that are not logged-in are anonymous). So I guess there is no documentation. The source code itself has almost 0 comments.
    1. the dev also admitted that I could cut down the load time from a few ms to fractions of a ms. It's not substantiated, but that's a worthwhile gain for a thundering herd making queries to a single server. In any case, I don't think it's necessary to backup the logic that a RAM cache from something as tried & true before an app like rails is obviously going to have immense gains.
  3. I was instructed to checkout the howto articles on the discourse meta forums for writing plugins, but I wasn't given a link
  4. I was recommended to checkout their github's plugins section, specifically the github plugin, which has logic in-place to respond to a new post to a topic.
  5. The devs re-iterated that they don't care about this since they already host huge customers. Of course they do. Of course you can host at almost an infinite scale when running elastically in the cloud. But you could run more efficiently on less nodes with a proper pre-backend cache..
  6. ...
  7. anyway, returning to my list of TODO items for this Discourse POC: backups are more important than varnish
  8. I was successfully able to trigger backups from within the staging node (host machine). It takes 20-30 seconds to run the backup now.
[root@osestaging1 discourse]# time docker exec app discourse backup
...
[SUCCESS]
Backup done.
Output file is in: /var/www/discourse/public/backups/default/discourse-2019-11-11-101834-v20191108000414.tar.gz


real    0m20.699s
user    0m0.074s
sys     0m0.084s
[root@osestaging1 discourse]# 
  1. and that creates a compressed tarball in /var/discourse/standalone/backups/default/
[root@osestaging1 discourse]# ls -lah shared/standalone/backups/default/
total 46M
drwxr-xr-x. 2 tgriffing        33 4.0K Nov 11 10:18 .
drwxr-xr-x. 3 tgriffing        33 4.0K Nov  8 00:00 ..
-rw-r--r--. 1 tgriffing        33 6.8M Nov  8 03:31 discourse-2019-11-08-033129-v20191101113230.tar.gz
-rw-r--r--. 1 tgriffing tgriffing 9.6M Nov  8 12:22 discourse-2019-11-08-122241-v20191108000414.tar.gz
-rw-r--r--. 1 tgriffing tgriffing 9.8M Nov 11 10:15 discourse-2019-11-11-101518-v20191108000414.tar.gz
-rw-r--r--. 1 tgriffing tgriffing 9.8M Nov 11 10:16 discourse-2019-11-11-101616-v20191108000414.tar.gz
-rw-r--r--. 1 tgriffing tgriffing 9.8M Nov 11 10:18 discourse-2019-11-11-101834-v20191108000414.tar.gz
[root@osestaging1 discourse]# 
  1. I validated the contents of the discourse backup; it's just an sql file and an image uploads dir
[root@osestaging1 backup-test.20191111]# ls
discourse-2019-11-11-101834-v20191108000414.tar.gz
[root@osestaging1 backup-test.20191111]# tar -xzvf discourse-2019-11-11-101834-v20191108000414.tar.gz
dump.sql.gz
uploads/default/
uploads/default/original/
uploads/default/original/1X/
uploads/default/original/1X/e952cfd4c1bc58e77024e4c2b518531356319780.png
[root@osestaging1 backup-test.20191111]# cd ^C
[root@osestaging1 backup-test.20191111]# ls -lah
total 20M
drwxr-xr-x.  3 root      root      4.0K Nov 11 10:23 .
drwxrwxrwt. 42 root      root      4.0K Nov 11 10:23 ..
-rw-r--r--.  1 root      root      9.8M Nov 11 10:23 discourse-2019-11-11-101834-v20191108000414.tar.gz
-rw-r--r--.  1 tgriffing tgriffing 9.8M Nov 11 10:18 dump.sql.gz
drwxr-xr-x.  3 root      root      4.0K Nov 11 10:23 uploads
[root@osestaging1 backup-test.20191111]# gunzip dump.sql.gz 
[root@osestaging1 backup-test.20191111]# ls -lah
total 55M
drwxr-xr-x.  3 root      root      4.0K Nov 11 10:24 .
drwxrwxrwt. 42 root      root      4.0K Nov 11 10:23 ..
-rw-r--r--.  1 root      root      9.8M Nov 11 10:23 discourse-2019-11-11-101834-v20191108000414.tar.gz
-rw-r--r--.  1 tgriffing tgriffing  45M Nov 11 10:18 dump.sql
drwxr-xr-x.  3 root      root      4.0K Nov 11 10:23 uploads
[root@osestaging1 backup-test.20191111]# ls -lah uploads/
total 12K
drwxr-xr-x. 3 root      root 4.0K Nov 11 10:23 .
drwxr-xr-x. 3 root      root 4.0K Nov 11 10:24 ..
drwxr-xr-x. 3 tgriffing   33 4.0K Nov  7 11:56 default
[root@osestaging1 backup-test.20191111]# ls -lah uploads/default/
total 12K
drwxr-xr-x. 3 tgriffing   33 4.0K Nov  7 11:56 .
drwxr-xr-x. 3 root      root 4.0K Nov 11 10:23 ..
drwxr-xr-x. 3 tgriffing   33 4.0K Nov  7 11:56 original
[root@osestaging1 backup-test.20191111]# ls -lah uploads/default/original/
total 12K
drwxr-xr-x. 3 tgriffing 33 4.0K Nov  7 11:56 .
drwxr-xr-x. 3 tgriffing 33 4.0K Nov  7 11:56 ..
drwxr-xr-x. 2 tgriffing 33 4.0K Nov 10 14:17 1X
[root@osestaging1 backup-test.20191111]# ls -lah uploads/default/original/1X/
total 20K
drwxr-xr-x. 2 tgriffing 33 4.0K Nov 10 14:17 .
drwxr-xr-x. 3 tgriffing 33 4.0K Nov  7 11:56 ..
-rw-r--r--. 1 tgriffing 33  11K Nov  7 11:56 e952cfd4c1bc58e77024e4c2b518531356319780.png
[root@osestaging1 backup-test.20191111]# 
  1. so this is script-able, but currently it's set to 'app'. Let's rename 'app' to 'discourse-ose' first. This meta discourse topic says to do a destroy & mv the yml file & rebuild https://meta.discourse.org/t/how-to-rename-app-yml-to-mycompanyname-yml-after-deploy/42625/2
[root@osestaging1 discourse]# pwd
/var/discourse
[root@osestaging1 discourse]# ls containers/
app.yml                        app.yml.2019-10-28-122901.bak  app.yml.2019-10-28-124114.bak  app.yml.2019-11-07-112654.bak
app.yml.2019-10-28-121933.bak  app.yml.2019-10-28-123029.bak  app.yml.2019-10-28-130503.bak  app.yml.2019-11-07-114148.bak
app.yml.2019-10-28-122342.bak  app.yml.2019-10-28-123104.bak  app.yml.2019-10-28-131830.bak  app.yml.2019-11-07-124719.bak
app.yml.2019-10-28-122534.bak  app.yml.2019-10-28-123312.bak  app.yml.2019-11-07-094441.bak  app.yml.2019-11-07-124745.bak
app.yml.2019-10-28-122816.bak  app.yml.2019-10-28-123827.bak  app.yml.2019-11-07-104530.bak  app.yml.2019-11-07-130735.bak
[root@osestaging1 discourse]# docker ps
CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS              PORTS               NAMES
8d66b8c8ccce        local_discourse/app   "/sbin/boot"        11 seconds ago      Up 9 seconds                            app
[root@osestaging1 discourse]# ./launcher destroy app
+ /bin/docker stop -t 10 app
app
+ /bin/docker rm app
app
[root@osestaging1 discourse]# mv containers/app.yml containers/discourse_ose.yml
[root@osestaging1 discourse]# time ./launcher rebuild discourse_ose
...
+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=2 -e UNICORN_SIDEKIQS=1 -e RUBY_GLOBAL_METHOD_CACHE_SIZE=131072 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscourse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=172.17.0.1 -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-discourse-ose -e DOCKER_HOST_IP=172.17.0.1 --name discourse_ose -t -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:fc:97:b8:b4:0d local_discourse/discourse_ose /sbin/boot
f01f52d2dcbaae157b75f2a43732b7b1e2d4125cd71103b841d1e36d768658b8

real    10m14.414s
user    0m2.779s
sys     0m2.386s
[root@osestaging1 discourse]# docker ps
CONTAINER ID        IMAGE                           COMMAND             CREATED             STATUS              PORTS               NAMES
f01f52d2dcba        local_discourse/discourse_ose   "/sbin/boot"        51 seconds ago      Up 50 seconds                           discourse_ose
[root@osestaging1 discourse]# 
  1. backups are still present; good
[root@osestaging1 discourse]# ls -lah shared/standalone/backups/default/
total 46M
drwxr-xr-x. 2 tgriffing 33 4.0K Nov 11 10:18 .
drwxr-xr-x. 3 tgriffing 33 4.0K Nov  8 00:00 ..
-rw-r--r--. 1 tgriffing 33 6.8M Nov  8 03:31 discourse-2019-11-08-033129-v20191101113230.tar.gz
-rw-r--r--. 1 tgriffing 33 9.6M Nov  8 12:22 discourse-2019-11-08-122241-v20191108000414.tar.gz
-rw-r--r--. 1 tgriffing 33 9.8M Nov 11 10:15 discourse-2019-11-11-101518-v20191108000414.tar.gz
-rw-r--r--. 1 tgriffing 33 9.8M Nov 11 10:16 discourse-2019-11-11-101616-v20191108000414.tar.gz
-rw-r--r--. 1 tgriffing 33 9.8M Nov 11 10:18 discourse-2019-11-11-101834-v20191108000414.tar.gz
[root@osestaging1 discourse]# 
  1. and I can still kick-off backups using the new name 'discourse_ose' instead of 'app'
[root@osestaging1 discourse]# time docker exec discourse_ose discourse backup
...
[SUCCESS]
Backup done.
Output file is in: /var/www/discourse/public/backups/default/discourse-2019-11-11-110452-v20191108000414.tar.gz


real    0m37.206s
user    0m0.065s
sys     0m0.051s
[root@osestaging1 discourse]# ls -lah shared/standalone/backups/default/
total 50M
drwxr-xr-x. 2 tgriffing        33 4.0K Nov 11 11:04 .
drwxr-xr-x. 3 tgriffing        33 4.0K Nov  8 00:00 ..
-rw-r--r--. 1 tgriffing        33 9.6M Nov  8 12:22 discourse-2019-11-08-122241-v20191108000414.tar.gz
-rw-r--r--. 1 tgriffing        33 9.8M Nov 11 10:15 discourse-2019-11-11-101518-v20191108000414.tar.gz
-rw-r--r--. 1 tgriffing        33 9.8M Nov 11 10:16 discourse-2019-11-11-101616-v20191108000414.tar.gz
-rw-r--r--. 1 tgriffing        33 9.8M Nov 11 10:18 discourse-2019-11-11-101834-v20191108000414.tar.gz
-rw-r--r--. 1 tgriffing tgriffing  12M Nov 11 11:04 discourse-2019-11-11-110452-v20191108000414.tar.gz
[root@osestaging1 discourse]# 
  1. unfortunately the staging node doesn't actually have /root/backups copied from prod (this was intentional; all of /root on prod wasn't copied). Anyway, here's basically what I *should* add to /root/backups/backup.sh (untested)
#############
# DISCOURSE #
#############

# cleanup old backups
$NICE $RM -rf /var/discourse/shared/standalone/backups/default/*.tar.gz
time $NICE $DOCKER exec discourse_ose discourse backup
$MKDIR "${backupDirPath}/${archiveDirName}/discourse_ose"
$NICE $MV /var/discourse/shared/standalone/backups/default/*.tar.gz ${backupDirPath}/${archiveDirName}/discourse_ose/"

  1. ...
  2. ok, so now I want to harden Discourse. The first logical step is to prevent Discourse from being able to initiate connections to the Internet. As a web server, it should only be able to *respond* to queries--not initiate them like some piece of malware. We do this with our other apps using iptables to permit RELATED,ESTABLISHED connections through the OUTPUT table, but then blocking all other traffic from specific uids--such as the nginx & apache users.
  3. iptables on the host machine seems like the best solution for docker as well, but I'm not sure if docker containers are run as a specific uid. It looks like the daemon is run as root. Some services are running as 'docker' or 'dockerd' user, but they don't actually exist in /etc/passwd. They're probably some systemd temp user shit.
[root@osestaging1 backup-test.20191111]# ps -ef | grep -i docker
root      2739     1  0 12:10 ?        00:00:00 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --iptables=false
root      2758  1689  0 12:10 ?        00:00:00 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/f01f52d2dcbaae157b75f2a43732b7b1e2d4125cd71103b841d1e36d768658b8 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
root      3414 24689  0 12:13 pts/5    00:00:00 /bin/docker exec -it discourse_ose /bin/bash --login
root      4678 23960  0 12:16 pts/8    00:00:00 grep --color=auto -i docker
[root@osestaging1 backup-test.20191111]# ss -plan | grep -i docker
u_str  LISTEN     0      128    /var/run/docker/libnetwork/8d9f7d64fe798ffae545b7cf474d68140ccb5d56dda86273c3dd44cb10d6b2ff.sock 469029893             * 0                   users:(("dockerd",pid=2739,fd=14))
u_str  LISTEN     0      128    /var/run/docker.sock 469029709             * 0                   users:(("dockerd",pid=2739,fd=6),("systemd",pid=1,fd=24))
u_str  LISTEN     0      128    /var/run/docker/metrics.sock 469029865             * 0                   users:(("dockerd",pid=2739,fd=3))
u_str  ESTAB      0      0         * 469029851             * 469029852           users:(("dockerd",pid=2739,fd=2),("dockerd",pid=2739,fd=1))
u_str  ESTAB      0      0         * 469115654             * 469115655           users:(("docker",pid=3414,fd=3))
u_str  ESTAB      0      0         * 469115657             * 469115658           users:(("docker",pid=3414,fd=5))
u_dgr  UNCONN     0      0         * 469029861             * 370237218           users:(("dockerd",pid=2739,fd=4))
u_str  ESTAB      0      0      /var/run/docker.sock 469115655             * 469115654           users:(("dockerd",pid=2739,fd=25))
u_str  ESTAB      0      0      /var/run/docker.sock 469115658             * 469115657           users:(("dockerd",pid=2739,fd=26))
u_str  ESTAB      0      0         * 469029868             * 469029875           users:(("dockerd",pid=2739,fd=7))
u_str  ESTAB      0      0         * 469029877             * 469029878           users:(("dockerd",pid=2739,fd=8))
[root@osestaging1 backup-test.20191111]# grep -ir 'docker' /etc/passwd
dockerroot:x:989:985:Docker User:/var/lib/docker:/sbin/nologin
[root@osestaging1 backup-test.20191111]# 
  1. it's also worth pointing out that docker already fucked-up all our iptables rules by injecting some tables and rules of its own: not ideal. There also appears to be no way to tell docker to clean that shit up. I would *expect* stopping docker to do that, but it doesn't. Google says that people just had to manually delete the rules to get them to go away. So I did that. https://stackoverflow.com/questions/50084582/cant-delete-docker-containers-default-iptables-rule
[root@osestaging1 lib]# iptables-save
# Generated by iptables-save v1.4.21 on Mon Nov 11 12:09:41 2019
*mangle
:PREROUTING ACCEPT [10080:1184525]
:INPUT ACCEPT [10080:1184525]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [7512:1518444]
:POSTROUTING ACCEPT [7506:1518084]
COMMIT
# Completed on Mon Nov 11 12:09:41 2019
# Generated by iptables-save v1.4.21 on Mon Nov 11 12:09:41 2019
*nat
:PREROUTING ACCEPT [1:52]
:INPUT ACCEPT [1:52]
:OUTPUT ACCEPT [140:9693]
:POSTROUTING ACCEPT [134:9333]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
COMMIT
# Completed on Mon Nov 11 12:09:41 2019
# Generated by iptables-save v1.4.21 on Mon Nov 11 12:09:41 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
-A INPUT -p tcp -m state --state NEW -m tcp --dport 25 -j ACCEPT
-A INPUT -s 5.9.144.234/32 -j DROP
-A INPUT -s 173.234.159.250/32 -j DROP
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 443 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 4443 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 4444 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 8020 -j ACCEPT
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables IN denied: " --log-level 7
-A INPUT -j DROP
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A OUTPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j ACCEPT
-A OUTPUT -d 213.133.98.98/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -d 213.133.99.99/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -d 213.133.100.100/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -m limit --limit 5/min -j LOG --log-prefix "iptables OUT denied: " --log-level 7
-A OUTPUT -p tcp -m owner --uid-owner 48 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 27 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 995 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 994 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 993 -j DROP
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
COMMIT
# Completed on Mon Nov 11 12:09:41 2019
[root@osestaging1 lib]# cd /root/backups/iptables/20191111
[root@osestaging1 20191111]# iptables-save > iptablesa
[root@osestaging1 20191111]# cp iptablesa iptablesb
[root@osestaging1 20191111]# vim iptablesb
[root@osestaging1 20191111]# iptables-restore < iptablesb
[root@osestaging1 20191111]# iptables-save
# Generated by iptables-save v1.4.21 on Mon Nov 11 12:10:17 2019
*mangle
:PREROUTING ACCEPT [129:14009]
:INPUT ACCEPT [129:14009]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [107:12837]
:POSTROUTING ACCEPT [107:12837]
COMMIT
# Completed on Mon Nov 11 12:10:17 2019
# Generated by iptables-save v1.4.21 on Mon Nov 11 12:10:17 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
# Completed on Mon Nov 11 12:10:17 2019
# Generated by iptables-save v1.4.21 on Mon Nov 11 12:10:17 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -p tcp -m state --state NEW -m tcp --dport 25 -j ACCEPT
-A INPUT -s 5.9.144.234/32 -j DROP
-A INPUT -s 173.234.159.250/32 -j DROP
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 443 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 4443 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 4444 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 8020 -j ACCEPT
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables IN denied: " --log-level 7
-A INPUT -j DROP
-A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A OUTPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j ACCEPT
-A OUTPUT -d 213.133.98.98/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -d 213.133.99.99/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -d 213.133.100.100/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -m limit --limit 5/min -j LOG --log-prefix "iptables OUT denied: " --log-level 7
-A OUTPUT -p tcp -m owner --uid-owner 48 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 27 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 995 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 994 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 993 -j DROP
COMMIT
# Completed on Mon Nov 11 12:10:17 2019
[root@osestaging1 20191111]# 
  1. I read that I can prevent docker from injecting rules into iptables by using the 'iptables=false' argument, but it's unclear where to put this value https://docs.docker.com/network/iptables/
    1. the above doc says to put it in /etc/docker/daemon.json, but that file does not exist
    2. other places say to define DOCKER_OPTS. I could find no place to put this in /var/discourse
    3. other places said /etc/default/docker; that doesn't exist either
    4. other places said /etc/sysconfig/docker; that doesn't exist either
  2. finally, I found /usr/lib/systemd/system/docker.service that defines 'ExecStart'. I added the '--iptables=false' arg to that line
[root@osestaging1 system]# cat docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
#ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --iptables=false
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target
[root@osestaging1 system]# 
  1. note that I had to reload the systemd daemon to apply this change. when I restarted docker, it mangled our iptables *less*, but it still added 3x lines
[root@osestaging1 system]# systemctl daemon-reload
[root@osestaging1 system]# systemctl start docker
[root@osestaging1 system]# docker ps
CONTAINER ID        IMAGE                           COMMAND             CREATED             STATUS              PORTS               NAMES
f01f52d2dcba        local_discourse/discourse_ose   "/sbin/boot"        2 hours ago         Up 4 seconds                            discourse_ose
[root@osestaging1 system]# iptables-save
# Generated by iptables-save v1.4.21 on Mon Nov 11 12:28:47 2019
*mangle
:PREROUTING ACCEPT [7821:1020843]
:INPUT ACCEPT [7810:1020199]
:FORWARD ACCEPT [11:644]   
:OUTPUT ACCEPT [6531:1681597]
:POSTROUTING ACCEPT [6530:1681521]
COMMIT
# Completed on Mon Nov 11 12:28:47 2019
# Generated by iptables-save v1.4.21 on Mon Nov 11 12:28:47 2019
*nat
:PREROUTING ACCEPT [11:584]
:INPUT ACCEPT [9:468]
:OUTPUT ACCEPT [153:10197] 
:POSTROUTING ACCEPT [143:9593]
COMMIT
# Completed on Mon Nov 11 12:28:47 2019
# Generated by iptables-save v1.4.21 on Mon Nov 11 12:28:47 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1:76]
:DOCKER-USER - [0:0]
-A INPUT -p tcp -m state --state NEW -m tcp --dport 25 -j ACCEPT
-A INPUT -s 5.9.144.234/32 -j DROP
-A INPUT -s 173.234.159.250/32 -j DROP
-A INPUT -i lo -j ACCEPT   
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT 
-A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 443 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 4443 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 4444 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 8020 -j ACCEPT
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables IN denied: " --log-level 7
-A INPUT -j DROP
-A FORWARD -j DOCKER-USER  
-A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A OUTPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j ACCEPT
-A OUTPUT -d 213.133.98.98/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -d 213.133.99.99/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -d 213.133.100.100/32 -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -m limit --limit 5/min -j LOG --log-prefix "iptables OUT denied: " --log-level 7
-A OUTPUT -p tcp -m owner --uid-owner 48 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 27 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 995 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 994 -j DROP
-A OUTPUT -p tcp -m owner --uid-owner 993 -j DROP
-A DOCKER-USER -j RETURN
COMMIT
# Completed on Mon Nov 11 12:28:47 2019
[root@osestaging1 system]#
  1. now when I enter the docker container, it cannot curl google.com (or the ip address for google.com). success!
[root@osestaging1 discourse]# ./launcher enter discourse_ose
root@osestaging1-discourse-ose:/var/www/discourse# curl google.com
curl: (6) Could not resolve host: google.com
root@osestaging1-discourse-ose:/var/www/discourse# curl 216.58.207.78
curl: (7) Failed to connect to 216.58.207.78 port 80: Connection timed out
root@osestaging1-discourse-ose:/var/www/discourse# 
  1. meanwhile, I confirmed that I can still browse around on the discourse wui without issues; this is probaly because the host server has an nginx config mapped to a socket file--preventing the need for me to poke a hole in the firewall for RELATED, ESTABLISHED connections. svveet.
  2. the next step for hardening is getting a WAF. We use mod_security and the OWASP CRS for all our other sites, but all our other sites' backends are running apache. Unfortunately, getting mod_security setup in Nginx (which is what Discourse runs as in docker) requires compiling Nginx from source D:
  3. A search for 'apache' in the meta.discourse.org forums shows a lot of info on how to run discourse on a sever with apache already running. I already followed these guides to get Discourse to listen on a socket instead of a port to avoid port binding conflicts. Other topics on the forums are guides to run apache on the host that proxy back to Nginx. I couldn't find anyone who actually got Discourse to run as Apache.
  4. I created a new topic asking if anyone has actually gotten the docker backend running Discourse to run in an Apache vhost, hopefully someone has already translated the Nginx config into Apache so I can just copy their config https://meta.discourse.org/t/how-to-run-discourse-in-apache-vhost-not-nginx/133112
  5. good lord, it looks like Discourse may already be compiling nginx from source?? https://github.com/discourse/discourse_docker/blob/416467f6ead98f82342e8a926dc6e06f36dfbd56/image/base/install-nginx
    1. if this is confirmed, then I should ask the community which is least likely to break my Discourse config in the future: updating the above install-nginx script to include mod_security or updating the container's config to run Discourse behind Apache instead of Nginx
  6. ok, so it looks like $home is /var/www/discourse, which is itself a git clone of the main 'discourse/discourse' repo https://github.com/discourse/discourse/
root@osestaging1-discourse-ose:/var/www/discourse# cat .git/config
[core]
		repositoryformatversion = 0
		filemode = true
		bare = false
		logallrefupdates = true
[remote "origin"]
		url = https://github.com/discourse/discourse.git
		fetch = +refs/heads/*:refs/remotes/origin/*
		fetch = +refs/heads/tests-passed:refs/remotes/origin/tests-passed
		fetch = +refs/heads/master:refs/remotes/origin/master
[branch "master"]
		remote = origin
		merge = refs/heads/master
[branch "tests-passed"]
		remote = origin
		merge = refs/heads/tests-passed
root@osestaging1-discourse-ose:/var/www/discourse# 
  1. that repo appears to be created (git cloned) at the end of this Dockerfile execution from the 'discouse_docker' repo that lives on the host at /var/discourse https://github.com/discourse/discourse_docker/blob/ceffc4433e1bd6fcbd101f2427e17232fc99ab14/image/base/Dockerfile#L132
[root@osestaging1 discourse]# cd /var/discourse/
[root@osestaging1 discourse]# cat .git/config
[core]
		repositoryformatversion = 0
		filemode = true
		bare = false
		logallrefupdates = true
[remote "origin"]
		url = https://github.com/discourse/discourse_docker.git
		fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
		remote = origin
		merge = refs/heads/master
[root@osestaging1 discourse]# ls -lah image/base/Dockerfile 
-rw-r--r--. 1 root root 5.6K Oct 28 12:07 image/base/Dockerfile
[root@osestaging1 discourse]# 
  1. so then $home (in the template yamls) appears to be equal to the '/var/www/discourse/' dir in the container, and then "$home/config/nginx.sample.conf" is '/var/www/discourse/config/nginx.sample.conf', which is then the file in the 'discourse' repo here https://github.com/discourse/discourse/blob/1d1dd2a4d436944a7b088f2d4a471c62b8fa4de2/config/nginx.sample.conf
  2. well, I wanted to avoid compiling nginx from source, but if Discourse is *already* doing that, then it's probably easier to just update that relatively simple 'install-nginx' script to add-in the mod_security module to the container's build of nginx rather than translate all the nginx vhost configs to apache
  3. so then the next steps for getting a WAF in-front of Discourse would be to
    1. update the install-nginx script so that it compiles nginx with mod_security (and probably downloads the OWASP CRS as well) https://github.com/discourse/discourse_docker/blob/416467f6ead98f82342e8a926dc6e06f36dfbd56/image/base/install-nginx
    2. add a new templates/web.modsecurity.yml file that updates the /etc/nginx/conf.d/discourse.conf file to enable mod_security (and add some blacklisted rules as-needed), similar to the existing web.socketed.template.yml file https://github.com/discourse/discourse_docker/blob/416467f6ead98f82342e8a926dc6e06f36dfbd56/templates/web.socketed.template.yml

Sun Nov 10, 2019

  1. I'm still getting a lot of friction from the Discourse team; they don't seem to understand front-end caching at all. But at least they are actively responding https://meta.discourse.org/t/discourse-purge-cache-method-on-content-changes/132917/9
  2. I was informed that there is a caching layer built-into Discourse for users that are not logged-in. I asked for a link to the documentation, but I'm afraid that it may not exist.
  3. I was chastised by on of their staff testers for not understanding how Discourse works...but I still can't find any documentation that describes how it works *shrug*
    1. in any case, they said I was wrong when I stated that Discourse produces HTML, CSS, & JS and sends that to a browser. What? That's literally what Discourse does, and all of it can be cached by varnish.
      1. this person seems to think I want to cache logged-in users. I'm not sure why they would assume that...I clarified that people usually don't cache logged-in users. I don't think the Discourse team has ever worked with caching web apps before...
  4. so I'm hoping I get 2x responses:
    1. links to documentation on what caching Discourse does built-in
    2. links to documentation on how I can write a Discourse plugin to call a function that I write when a new post is added to a topic

Sat Nov 09, 2019

  1. oh boy, a founder of the Discourse project responded to my question about how to have the DIscourse app send PURGE requests to our varnish caching layer on content changes. He errornously suggested that caching doesn't make sense for today's JS apps--as if Discourse's function isn't to produce HTML, CSS, and JS (all of which can be cached!). This is *not* good. https://meta.discourse.org/t/discourse-purge-cache-method-on-content-changes/132917
  2. I did some research into scaling Discourse. I got a ton of info in this thread. Important to note is that they recommend running redis & postgresql *outside* of docker; then just elastically scaling the ruby docker containers as needed based on some calculations https://meta.discourse.org/t/performance-scaling-and-ha-requirements/60098/8
    1. it makes a point to note that these calculations are based on read operations, which again suggests that we could significantly reduce our hardware requirements by putting a fucking varnish cache before the app. That's not surprising...
  3. Unfortunatley, it looks like I'd have to write a damn plugin for Discourse to get it to invalidate the cache. And, worse, I'd have no support from their development team that doesn't see the point in adding a cache before their app. Is it worth it?

Fri Nov 08, 2019

  1. I got a response from the discourse community on my query regarding connecting discourse to the same server running discourse over smtp unauth https://meta.discourse.org/t/troubleshooting-email-on-a-new-discourse-install/16326
  2. I was errornously told that my config is rare (it's literally the default postfix config on rhel/centos) and that it isn't supported using the 'discourse-setup' install script *facepalm*
  3. I really, really want to abort this Discourse POC. Their install tools are shit. Their community and documentation is shit & non-existant. Their developers are necessarily ruby shit. It's unnecessarily complex to the point that the most basic smtp config breaks its default install (localhost:25 without auth. could it get any simpler?!?). This isn't a problem for most open source web projects; they just connect to localhost:25 and send email without any configuration. They just work! But not Discourse...Alas, everyone is using Discourse now. And it still, from a user perspective, seems to be the best tool. It'll just be a huge pain to install & manage *sign*
  4. anyway, continuing from yesterday, the only port that's visible on the docker host from within the discourse docker container is 1000
root@osestaging1-app:/# nmap 172.17.0.1
Starting Nmap 7.70 ( https://nmap.org ) at 2019-11-07 15:00 UTC
Nmap scan report for 172.17.0.1
Host is up (0.000019s latency).
Not shown: 999 closed ports
PORT      STATE SERVICE
10000/tcp open  snet-sensor-mgmt
MAC Address: 02:42:80:35:65:A1 (Unknown)

Nmap done: 1 IP address (1 host up) scanned in 1.85 seconds
root@osestaging1-app:/#
  1. I checked the host, and there is something listening on pot 10000
[root@osestaging1 20191107]# ss -plan | grep -i 1000
udp    UNCONN     0      0         *:10000                 *:*                   users:(("miniserv.pl",pid=620,fd=6))
tcp    LISTEN     0      128       *:10000                 *:*                   users:(("miniserv.pl",pid=620,fd=5))
[root@osestaging1 20191107]# 
  1. smtp, however, is only listening on 127.0.0.1. This is ideal; all servers on prod were setup to bind to a specific IP; not all interfaces. I guess we'll have to add the docker interface for postfix, though
  2. I update the staging server's postfix config to use inet_interfaces "127.0.0.1, 172.17.0.1" instead of just "localhost"
#inet_interfaces = localhost
inet_interfaces = 127.0.0.1, 172.17.0.1
  1. I restarted the postfix service & confirmed the changes
[root@osestaging1 postfix]# service postfix restart
Redirecting to /bin/systemctl restart postfix.service
[root@osestaging1 postfix]#
[root@osestaging1 postfix]# ss -plan | grep -i ':25' | grep -i LISTEN
tcp    LISTEN     0      100    172.17.0.1:25                    *:*                   users:(("master",pid=27738,fd=14))
tcp    LISTEN     0      100    127.0.0.1:25                    *:*                   users:(("master",pid=27738,fd=13))
[root@osestaging1 postfix]#
  1. cool, that worked. now my Discourse docker instance can see an open port 25 on the staging server
root@osestaging1-app:/# nmap 172.17.0.1
Starting Nmap 7.70 ( https://nmap.org ) at 2019-11-08 07:57 UTC
Nmap scan report for 172.17.0.1
Host is up (0.000035s latency).
Not shown: 998 closed ports
PORT      STATE SERVICE
25/tcp    open  smtp
10000/tcp open  snet-sensor-mgmt
MAC Address: 02:42:80:35:65:A1 (Unknown)

Nmap done: 1 IP address (1 host up) scanned in 1.56 seconds
root@osestaging1-app:/# 
  1. I tested this access via telnet, but it was rejected by the mail server
root@osestaging1-app:/# telnet 172.17.0.1 25
Trying 172.17.0.1...
Connected to 172.17.0.1.
Escape character is '^]'.
220 mailer.opensourceecology.org ESMTP Postfix
HELO from osestaging1-app.opensourceecology.org
250 mailer.opensourceecology.org
mail from: discourse@opensourceecology.org
250 2.1.0 Ok
rcpt to: michael@opensourceecology.org
454 4.7.1 <michael@opensourceecology.org>: Relay access denied
  1. the postfix log at /var/log/maillog on the staging server shows that it's rejected. That's probably because the docker IP is not in the 'mynetworks'
[root@osestaging1 nginx]# tail -f /var/log/maillog
...
Nov  8 08:02:15 osestaging1 postfix/smtpd[28964]: connect from unknown[172.17.0.2]
Nov  8 08:02:52 osestaging1 postfix/smtpd[28964]: NOQUEUE: reject: RCPT from unknown[172.17.0.2]: 454 4.7.1 <michael@opensourceecology.org>: Relay access denied; from=<discourse@opensourceecology.org> to=<michael@opensourceecology.org> proto=SMTP helo=<from?osestaging1-app.opensourceecology.org>
  1. I updated the mynetworks_style && mynetworks variables in postfix's /etc/postfix/main.cf config to include the docker0 subnet
[root@osestaging1 postfix]# ip address show docker0
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
	link/ether 02:42:80:35:65:a1 brd ff:ff:ff:ff:ff:ff
	inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
	   valid_lft forever preferred_lft forever
	inet6 fe80::42:80ff:fe35:65a1/64 scope link 
	   valid_lft forever preferred_lft forever
[root@osestaging1 postfix]# cat /etc/postfix/main.cf
...
#mynetworks_style = host
...
mynetworks = 127.0.0.0/8, 172.17.0.0/16
...
  1. this appears to work now
root@osestaging1-app:/# telnet 172.17.0.1 25
Trying 172.17.0.1...
Connected to 172.17.0.1.
Escape character is '^]'.
220 mailer.opensourceecology.org ESMTP Postfix
HELO from osestaging1-app.opensourceecology.org
250 mailer.opensourceecology.org
mail from: discourse@opensourceecology.org
250 2.1.0 Ok
rcpt to: vt6t5up@mail.ru
250 2.1.5 Ok
DATA
354 End data with <CR><LF>.<CR><LF>
subject: test
Hi, this is a test
can you see it?
.
250 2.0.0 Ok: queued as 434F25E2279
QUIT
221 2.0.0 Bye
Connection closed by foreign host.
root@osestaging1-app:/# 
  1. postfix accepted it, but the logs show mail.ru rejected my mail as spam
Nov  8 08:14:55 osestaging1 postfix/cleanup[31776]: 434F25E2279: message-id=<>
Nov  8 08:14:55 osestaging1 postfix/qmgr[31738]: 434F25E2279: from=<discourse@opensourceecology.org>, size=268, nrcpt=1 (queue active)
Nov  8 08:14:56 osestaging1 postfix/smtp[31842]: 434F25E2279: to=<vt6t5up@mail.ru>, relay=mxs.mail.ru[94.100.180.31]:25, delay=67, delays=66/0.03/0.16/1, dsn=5.0.0, status=bounced (host mxs.mail.ru[94.100.180.31] said: 550 spam message rejected. Please visit http://help.mail.ru/notspam-support/id?c=clN2o_Yz744aT6kkUmr_G_iDdemyJDJvS9gQkItq7hI-s9yFa787o8FmNc74dWmdKwAAAJhXAADjzkkU or  report details to abuse@corp.mail.ru. Error code: A37653728EEF33F624A94F1A1BFF6A52E97583F86F3224B29010D84B12EE6A8B85DCB33EA33BBF6BCE3566C19D6975F8. ID: 0000002B000057981449CEE3. (in reply to end of DATA command))
Nov  8 08:14:56 osestaging1 postfix/cleanup[31776]: D2DAC5E227D: message-id=<20191108081456.D2DAC5E227D@mailer.opensourceecology.org>
Nov  8 08:14:56 osestaging1 postfix/qmgr[31738]: D2DAC5E227D: from=<>, size=2976, nrcpt=1 (queue active)
Nov  8 08:14:56 osestaging1 postfix/bounce[31844]: 434F25E2279: sender non-delivery notification: D2DAC5E227D
Nov  8 08:14:56 osestaging1 postfix/qmgr[31738]: 434F25E2279: removed
Nov  8 08:14:57 osestaging1 postfix/smtp[31842]: D2DAC5E227D: to=<discourse@opensourceecology.org>, relay=aspmx.l.google.com[74.125.140.27]:25, delay=0.63, delays=0.01/0/0.13/0.5, dsn=2.0.0, status=sent (250 2.0.0 OK  1573200896 l8si4236714wmg.78 - gsmtp)
Nov  8 08:14:57 osestaging1 postfix/qmgr[31738]: D2DAC5E227D: removed  
  1. I tried again, sending to my personal domain. And it worked!
Nov  8 08:21:22 osestaging1 postfix/smtpd[2027]: connect from unknown[172.17.0.2]
Nov  8 08:22:10 osestaging1 postfix/smtpd[2027]: 8168E5E2279: client=unknown[172.17.0.2]
Nov  8 08:22:25 osestaging1 postfix/cleanup[31776]: 8168E5E2279: message-id=<>
Nov  8 08:22:25 osestaging1 postfix/qmgr[31738]: 8168E5E2279: from=<discourse@opensourceecology.org>, size=304, nrcpt=1 (queue active)
Nov  8 08:22:26 osestaging1 postfix/smtp[2095]: 8168E5E2279: to=<osediscourse_2019@michaelaltfield.net>, relay=mail.michaelaltfield.net[176.56.237.113]:25, delay=23, delays=22/0.02/0.48/0.43, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 605471016)
Nov  8 08:22:26 osestaging1 postfix/qmgr[31738]: 8168E5E2279: removed
  1. now, I changed the SMTP address from 'localhost' to '172.17.0.1' in the discourse app.yml config file & rebuild the app
root@osestaging1 discourse]# grep SMTP_ADDRESS containers/app.yml
  DISCOURSE_SMTP_ADDRESS: 172.17.0.1
[root@osestaging1 discourse]# ./launcher destroy app && ./launcher rebuild app
  1. finally disourse came back up. Now hitting the "Resend Activation Email" button produces postfix logs in the staging server
Nov  8 08:43:35 osestaging1 postfix/smtpd[21052]: connect from unknown[172.17.0.2]
Nov  8 08:43:35 osestaging1 postfix/smtpd[21052]: D330D5E227D: client=unknown[172.17.0.2]
Nov  8 08:43:35 osestaging1 postfix/cleanup[21056]: D330D5E227D: message-id=<968ccef2-0b84-4807-940d-bf075e21f260@discourse.opensourceecology.org>
Nov  8 08:43:35 osestaging1 postfix/qmgr[31738]: D330D5E227D: from=<noreply@discourse.opensourceecology.org>, size=2790, nrcpt=1 (queue active)
Nov  8 08:43:35 osestaging1 postfix/smtpd[21052]: disconnect from unknown[172.17.0.2]
Nov  8 08:43:36 osestaging1 postfix/smtp[21057]: D330D5E227D: host mail.michaelaltfield.net[176.56.237.113] said: 450 4.1.8 <noreply@discourse.opensourceecology.org>: Sender address rejected: Domain not found (in reply to RCPT TO command)
Nov  8 08:43:38 osestaging1 postfix/smtp[21057]: connect to mail.michaelaltfield.net[2a00:d880:5:82b::329a]:25: Network is unreachable
Nov  8 08:43:38 osestaging1 postfix/smtp[21057]: D330D5E227D: to=<osediscorse_2019@michaelaltfield.net>, relay=none, delay=2.4, delays=0.06/0/2.3/0, dsn=4.4.1, status=deferred (connect to mail.michaelaltfield.net[2a00:d880:5:82b::329a]:25: Network is unreachable)
  1. I checked the logs on my personal mail server. yeah, I rejected them since 'discourse.opensourceecology.org' is not defined
Nov  8 08:45:03 mail postfix/smtpd[14653]: connect from static.113.233.201.195.clients.your-server.de[195.201.233.113]
Nov  8 08:45:03 mail postfix/smtpd[14653]: NOQUEUE: reject: RCPT from static.113.233.201.195.clients.your-server.de[195.201.233.113]: 450 4.1.8 <noreply@discourse.opensourceecology.org>: Sender address rejected: Domain not found; from=<noreply@discourse.opensourceecology.org> to=<osediscorse_2019@michaelaltfield.net> proto=ESMTP helo=<mailer.opensourceecology.org>
Nov  8 08:45:07 mail postfix/smtpd[14653]: disconnect from static.113.233.201.195.clients.your-server.de[195.201.233.113]
  1. in my testing, I apparently triggered an active response from ossec, temp banning the ossec staging server from accessing my personal michaelaltfield.net server; here's the command to fix it
[root@mail etc]# /var/ossec/active-response/bin/firewall-drop.sh delete - 195.201.233.113/32
  1. ugh, I had a typo of <osediscorse_2019@michaelaltfield.net> != <osediscourse_2019@michaelaltfield.net>
  2. I fixed it & rebuilt the app. god this takes so long. I timed it: a simple change to a single variable followed by a restart of Discourse takes 10 minutes and 40 seconds *facepalm*
[root@osestaging1 discourse]# time ( ./launcher destroy app && ./launcher rebuild app )
2019-11-08 09:57:15.837 UTC [49] LOG:  database system is shut down
sha256:e54ffb1ec9b28fda8a13807a4147fcc2bc06f1558e5212c324a8952071602967
2f988a76dde4cc8f151979a0cb321116abe94fcf5159d9630450178b3fef72b3

+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=2 -e UNICORN_SIDEKIQS=1 -e RUBY_GLOBAL_METHOD_CACHE_SIZE=131072 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscourse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=172.17.0.1 -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-app -e DOCKER_HOST_IP=172.17.0.1 --name app -t -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:53:2a:01:9b:c2 local_discourse/app /sbin/boot
dc9a16388e615187594c0fa5919a8d691f0e0a4bb7a55d63d51c5404bca15de7

real    10m40.754s
user    0m3.380s
sys     0m3.095s
[root@osestaging1 discourse]# 
  1. that fixed it! I got an email with an activation link, and I was able to begin the install in the discourse wui!
  2. I set the Community Name = Open Source Ecology
  3. I set the "Describe your community in one short sentence" = "We’re developing open source industrial machines that can be made for a fraction of commercial costs, and sharing our designs online for free. The goal of Open Source Ecology is to create an open source economy an efficient economy which increases innovation by open collaboration."
  4. I set the "Describe your community in few words" = "Open Source Blueprints for Civilization. Build Yourself."
  5. I set the "Welcome Topic" paragraph to
We’re developing open source industrial machines that can be made for a fraction of commercial costs, and sharing our designs online for free. The goal of Open Source Ecology is to create an open source economy  an efficient economy which increases innovation by open collaboration.

For more information, see our Founder's TED talk https://www.ted.com/talks/marcin_jakubowski

Our website is at https://www.opensourceecology.org
And our wiki is at https://wiki.opensourceecology.org
  1. I set the site to "Public"
  2. I set the new user signup to "Users can sign up on their own, but must be approved by staff."
  3. I set the (contact) "Web Page" to https://www.opensourceecology.org/contact/
  4. I set the city & state regarding laws to "Maysville, MO"
  5. I selected the top-left-most (first?) theme
  6. the next step asked for two logos
    1. one that's 120 pixels high and 3x (or more) wider than that
    2. one that's square, larger than 512x512
  7. I used this one for both *shrug* Yellowlogo.png
  8. It then asked for another two icons: a favicon and, I guess, the apple touch icon? Again, I used the Yellowlogo.png from above for both
  9. I left the default to show Latest Topics on the main page
  10. Emoji preferences? I left it at the default of Twitter. Which one is the most open-source? *shrug*
  11. I added Marcin, but he won't be able to access the site until he's setup with the VPN anyway..
  12. ok, now I can click around the Discourse wui
  13. first I noticed that our Discourse version is 2.4.0.beta7. Da fuk? How did we end up on some beta version?
    1. this articles discusses it. they say that the stable branch is actually neglected and not maintained, so the Discourse model is to use beta for default on production install *facepalm* https://meta.discourse.org/t/please-dont-pressure-self-installers-to-be-on-beta-branch/32237/4
  14. well, fuck, everything I setup in the wizard (as logged above) is totally not set! why?
  15. I set all the Admin -> Required fields manually; they saved.
  16. But then the "Branding" section refused to take my icons. It uploaded them, but then when I clicked the green check-box, it just dissolved my logo and replaced it with Discourse's with no error in the wui or in the logs (they *do* show 200 success logs). the fuck?
  17. I had to go to the "Login" tab to re-enable the "Staff must approve all new user accounts before they are allowed to access the site." checkbox
  18. the fuck? there's a default list of domains banned by Discourse, including mailinator.com. Why are we blocking mailinator? I removed it.
  19. there's also a ban of all cryptographic signatures? This is just perplexing; I created a topic about it, tagging the dev that made this the default back in 2016-08-03 https://meta.discourse.org/t/why-does-discourse-block-cryptographic-signatures-by-default/132912
  20. I removed the S/MIME & PGP signature ban
  21. I changed the max image size from 4096 to 1024 KB (this matches our MediaWiki settings)
  22. I also created a topic on meta.discourse.org asking how to setup Discourse to send PURGE requests to our defined cache server when its content changes, so that we can put Discourse behind varnish https://meta.discourse.org/t/discourse-purge-cache-method-on-content-changes/132917
  23. there's a secruity tab in the admin section. The default cookie policy is "lax" I changed it to "strict"
  24. I found a docs subdomain on discourse.org, but it appears to be just for documenting the API? Also, it freezes my damn browser it's so slow.. https://docs.discourse.org/
  25. I'm not the only one that is puzzled by the lack of documentation https://meta.discourse.org/t/basic-product-documentation/35719/6
    1. someone responded suggesting threads in meta.discourse.org tagged with #howto https://meta.discourse.org/c/10-howto
    2. and the faq https://meta.discourse.org/c/howto/faq/4
  26. this new user guide seems like an obvious place to start https://meta.discourse.org/t/discourse-new-user-guide/96331
  27. ok, since there's no fucking documentation I'm left googling the Discourse forms trying to figure out how the fuck I should cobble together a backup solution. First, it looks like discourse-triggered backups are stored to /var/discourse/shared/standalone/backups/default https://meta.discourse.org/t/where-do-the-local-backups-go-when-s3-backups-arent-enabled/26591
[root@osestaging1 discourse]# ls -lah /var/discourse/shared/standalone/backups/default/
total 6.9M
drwxr-xr-x. 2 tgriffing 33 4.0K Nov  8 03:31 .
drwxr-xr-x. 3 tgriffing 33 4.0K Nov  8 00:00 ..
-rw-r--r--. 1 tgriffing 33 6.8M Nov  8 03:31 discourse-2019-11-08-033129-v20191101113230.tar.gz
[root@osestaging1 discourse]# 
  1. so that's 6.8M. And it's owned by tgriffing? The permissions are wrong, ugh. The permissions start to get fucked under the 'standalone' dir it seems. Maybe that's just some internal docker UIDs or sth..
[root@osestaging1 discourse]# ls -lah /var/discourse/shared/standalone/
total 44K
drwxr-xr-x. 11 root      root 4.0K Nov  8 09:59 .
drwxr-xr-x.  3 root      root 4.0K Nov  7 11:27 ..
drwxr-xr-x.  3 tgriffing   33 4.0K Nov  8 00:00 backups
drwxr-xr-x.  4 root      root 4.0K Nov  7 11:28 log
srw-rw-rw-.  1 root      root    0 Nov  8 09:59 nginx.http.sock
drwxr-xr-x.  2       106  109 4.0K Nov  7 11:28 postgres_backup
drwx------. 19       106  109 4.0K Nov  8 09:59 postgres_data
drwxrwxr-x.  3       106  109 4.0K Nov  8 09:59 postgres_run
drwxr-xr-x.  2       108  111 4.0K Nov  8 12:14 redis_data
drwxr-xr-x.  4 root      root 4.0K Nov  7 11:54 state
drwxr-xr-x.  4 tgriffing   33 4.0K Nov  8 09:59 tmp
drwxr-xr-x.  3 tgriffing   33 4.0K Nov  7 11:30 uploads
[root@osestaging1 discourse]# 
  1. fwiw that standalone dir (which is where all our state is stored) is 102M. Most of it is postgres data
[root@osestaging1 discourse]# du -sh /var/discourse/shared/standalone/
102M    /var/discourse/shared/standalone/
[root@osestaging1 discourse]# du -sh /var/discourse/shared/standalone/*
6.9M    /var/discourse/shared/standalone/backups
1.1M    /var/discourse/shared/standalone/log
0       /var/discourse/shared/standalone/nginx.http.sock
4.0K    /var/discourse/shared/standalone/postgres_backup
93M     /var/discourse/shared/standalone/postgres_data
216K    /var/discourse/shared/standalone/postgres_run
404K    /var/discourse/shared/standalone/redis_data
28K     /var/discourse/shared/standalone/state
12K     /var/discourse/shared/standalone/tmp
216K    /var/discourse/shared/standalone/uploads
[root@osestaging1 discourse]# 
  1. this post suggests a method to kick-off a backup, but it requires shutting down the discourse server to do so; not an option. https://meta.discourse.org/t/backup-discourse-from-the-command-line/64364/7
  2. another user suggested to run a two-liner to trigger a backup via the cli which (I think) doesn't require taking down the app
[root@osestaging1 discourse]# ./launcher enter app
root@osestaging1-app:/var/www/discourse# discourse backup
[SUCCESS]
Backup done.
Output file is in: /var/www/discourse/public/backups/default/discourse-2019-11-08-122241-v20191108000414.tar.gz

root@osestaging1-app:/var/www/discourse# 
  1. but can it be done from a script?
  2. related: Discourse is a Javascript-heavy beast that's basically crashing my browser. Just typing into a textarea when updating the wiki in one tab experiences significant delays when I have only 8 tabs open on meta.discourse.org. I need a way to better throttle backgrounded tabs, but I could never make sense of how to do this in firefox. So I posted a question about this to SuperUser. I want backgrounded tabs to have exactly 0.00% CPU usage for at least 30 minutes once they're no longer in the foreground. If I'm not looking at it, it shouldn't be slowing down my computer. https://superuser.com/questions/1500289/how-to-aggressively-throttle-background-tabs-in-firefox-using-dom-min-background

Thr Nov 07, 2019

  1. Chris made a video tutorial for how to download our Aug 2019 wiki .zim wiki archive from archive.org, put it on an sd card, and view it from Kiwix on android.
  2. I wrote a draft of an article and asked Chris to publish his video on youtube so we can embed it and publish the article on www.opensourceecology.org
  3. ...
  4. continuing from oct 28, let's do our best attempt to validate the damn unsigned docker install script
  5. I downloaded it again from a vpn connection; the non-cryptographic integrity hash matches from my last download
root@disp3084:~# curl  --tlsv1.2 --proto =https --location https://get.docker.com/ > get-docker.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100 13216  100 13216    0     0   6487      0  0:00:02  0:00:02 --:--:--  6487
root@disp3084:~# sha384sum get-docker.sh 
68041f4b75f5485834c53c549d1682f1d36af864ac2fde5eba1d7bf401fd44db3a6c79ba32d7f85c6778aea5897182c4  get-docker.sh
root@disp3084:~# 
  1. I downloaded it again from the staging server through the clearnet; it matches again
[maltfield@osestaging1 tmp]$ curl  --tlsv1.2 --proto =https --location https://get.docker.com/ > get-docker.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100 13216  100 13216    0     0  50878      0 --:--:-- --:--:-- --:--:-- 50830
[maltfield@osestaging1 tmp]$ sha384sum get-docker.sh 
68041f4b75f5485834c53c549d1682f1d36af864ac2fde5eba1d7bf401fd44db3a6c79ba32d7f85c6778aea5897182c4  get-docker.sh
[maltfield@osestaging1 tmp]$ 
  1. and, finally, I did another download from the tor network; it maches too
user@host:~$ curl  --tlsv1.2 --proto =https --location https://get.docker.com/ > get-docker.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100 13216  100 13216    0     0   4673      0  0:00:02  0:00:02 --:--:--  4673
user@host:~$ sha384sum get-docker.sh 
68041f4b75f5485834c53c549d1682f1d36af864ac2fde5eba1d7bf401fd44db3a6c79ba32d7f85c6778aea5897182c4  get-docker.sh
user@host:~$ 
  1. ok, I'm satisfied that I got the correct file that's being served by get.docker.com, though I cannot have any greater than 0% confidence that it's actually produced by the docker team, since it has no cryptographic signature. Next step is to read the file's contents and see what it's doing.
  2. ugh, the install script escalates its privilege as root. not great, but reasonable for an install script.
  3. ok, so the installer sideloads a gpg key into the apt/yum keyring then attempts to install its packages. For centos, the gpg key comes from here
$DOWNLOAD_URL/linux/$lsb_dist/$REPO_FILE
  1. which should translate at runtime to
https://download.docker.com/linux/centos/docker-ce.repo
  1. I did the same thing with this file as above; here's from the clearnet on the staging box directly
[root@osestaging1 discourse]# curl --tlsv1.2 --proto =https --location --remote-name https://download.docker.com/linux/centos/docker-ce.repo
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100  2424  100  2424    0     0  11862      0 --:--:-- --:--:-- --:--:-- 11940
[root@osestaging1 discourse]# sha384sum docker-ce.repo 
483187126d28ca55ff4c6554ab8847c8dcdf3f06b211ea6f800cc7b216088c785373830897ce0d7b202ad1f33edc1dc1  docker-ce.repo
[root@osestaging1 discourse]# 
  1. here's from the VPN; it matches
user@disp8990:~$ curl --tlsv1.2 --proto =https --location --remote-name https://download.docker.com/linux/centos/docker-ce.repo
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100  2424  100  2424    0     0    602      0  0:00:04  0:00:04 --:--:--   602
user@disp8990:~$ sha384sum docker-ce.repo 
483187126d28ca55ff4c6554ab8847c8dcdf3f06b211ea6f800cc7b216088c785373830897ce0d7b202ad1f33edc1dc1  docker-ce.repo
user@disp8990:~$ 
  1. and from TOR; it matches too
user@host:~$ curl --tlsv1.2 --proto =https --location --remote-name https://download.docker.com/linux/centos/docker-ce.repo
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100  2424  100  2424    0     0    536      0  0:00:04  0:00:04 --:--:--   536
user@host:~$ sha384sum docker-ce.repo 
483187126d28ca55ff4c6554ab8847c8dcdf3f06b211ea6f800cc7b216088c785373830897ce0d7b202ad1f33edc1dc1  docker-ce.repo
user@host:~$ 
  1. the file itself defines a ton of repos, but only the first one is enabled
[root@osestaging1 discourse]# grep enabled docker-ce.repo                                                                    enabled=1
enabled=0
enabled=0
enabled=0
enabled=0
enabled=0
enabled=0
enabled=0
enabled=0
enabled=0
enabled=0
enabled=0
[root@osestaging1 discourse]# head docker-ce.repo 
[docker-ce-stable]
name=Docker CE Stable - $basearch
baseurl=https://download.docker.com/linux/centos/7/$basearch/stable
enabled=1
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

[docker-ce-stable-debuginfo]
name=Docker CE Stable - Debuginfo $basearch
baseurl=https://download.docker.com/linux/centos/7/debug-$basearch/stable
[root@osestaging1 discourse]# 
  1. and again a test for the gpg key
[root@osestaging1 discourse]# curl  --tlsv1.2 --proto =https --location https://download.docker.com/linux/centos/gpg > docker.gpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100  1627  100  1627    0     0   8413      0 --:--:-- --:--:-- --:--:--  8430
[root@osestaging1 discourse]# sha384sum docker.gpg
e3837773edabb1aef62d8b89bbfe3a3c80e008fa312c1d7791606cd303e35d9c17208598fad4eb47fa0374ce027e4c17  docker.gpg
[root@osestaging1 discourse]# gpg --keyid-format 0xlong docker.gpg 
pub  4096R/0xC52FEB6B621E9F35 2017-02-22 Docker Release (CE rpm) <docker@docker.com>
[root@osestaging1 discourse]# cat docker.gpg | gpg --keyid-format 0xlong --list-packets
:public key packet:
		version 4, algo 1, created 1487791233, expires 0
		pkey[0]: [4096 bits]
		pkey[1]: [17 bits]
		keyid: C52FEB6B621E9F35
:user ID packet: "Docker Release (CE rpm) <docker@docker.com>"
:signature packet: algo 1, keyid C52FEB6B621E9F35
		version 4, created 1487792760, md5len 0, sigclass 0x13
		digest algo 10, begin of digest e8 2d
		hashed subpkt 2 len 4 (sig created 2017-02-22)
		hashed subpkt 27 len 1 (key flags: 2F)
		hashed subpkt 11 len 4 (pref-sym-algos: 9 8 7 3)
		hashed subpkt 21 len 4 (pref-hash-algos: 10 9 8 11)
		hashed subpkt 22 len 4 (pref-zip-algos: 2 3 1 0)
		hashed subpkt 30 len 1 (features: 01)
		hashed subpkt 23 len 1 (key server preferences: 80)
		subpkt 16 len 8 (issuer key ID C52FEB6B621E9F35)
		data: [4094 bits]
[root@osestaging1 discourse]# 
  1. and from the vpn; it matches
user@disp8990:~$ curl  --tlsv1.2 --proto =https --location https://download.docker.com/linux/centos/gpg > docker.gpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100  1627  100  1627    0     0    545      0  0:00:02  0:00:02 --:--:--   545
user@disp8990:~$ sha384sum docker.gpg
e3837773edabb1aef62d8b89bbfe3a3c80e008fa312c1d7791606cd303e35d9c17208598fad4eb47fa0374ce027e4c17  docker.gpg
user@disp8990:~$ gpg --keyid-format 0xlong docker.gpg 
gpg: keybox '/home/user/.gnupg/pubring.kbx' created
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
pub   rsa4096/0xC52FEB6B621E9F35 2017-02-22 [SCEA]
	  060A61C51B558A7F742B77AAC52FEB6B621E9F35
uid                             Docker Release (CE rpm) <docker@docker.com>
user@disp8990:~$ cat docker.gpg | gpg --keyid-format 0xlong --list-packets
# off=0 ctb=99 tag=6 hlen=3 plen=525
:public key packet:
	version 4, algo 1, created 1487791233, expires 0
	pkey[0]: [4096 bits]
	pkey[1]: [17 bits]
	keyid: C52FEB6B621E9F35
# off=528 ctb=b4 tag=13 hlen=2 plen=43
:user ID packet: "Docker Release (CE rpm) <docker@docker.com>"
# off=573 ctb=89 tag=2 hlen=3 plen=567
:signature packet: algo 1, keyid C52FEB6B621E9F35
	version 4, created 1487792760, md5len 0, sigclass 0x13
	digest algo 10, begin of digest e8 2d
	hashed subpkt 2 len 4 (sig created 2017-02-22)
	hashed subpkt 27 len 1 (key flags: 2F)
	hashed subpkt 11 len 4 (pref-sym-algos: 9 8 7 3)
	hashed subpkt 21 len 4 (pref-hash-algos: 10 9 8 11)
	hashed subpkt 22 len 4 (pref-zip-algos: 2 3 1 0)
	hashed subpkt 30 len 1 (features: 01)
	hashed subpkt 23 len 1 (keyserver preferences: 80)
	subpkt 16 len 8 (issuer key ID C52FEB6B621E9F35)
	data: [4094 bits]
user@disp8990:~$ 
  1. and from tor; it matches too
user@host:~$ curl  --tlsv1.2 --proto =https --location https://download.docker.com/linux/centos/gpg > docker.gpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100  1627  100  1627    0     0    819      0  0:00:01  0:00:01 --:--:--   819
user@host:~$ sha384sum docker.gpg
e3837773edabb1aef62d8b89bbfe3a3c80e008fa312c1d7791606cd303e35d9c17208598fad4eb47fa0374ce027e4c17  docker.gpg
user@host:~$ gpg --keyid-format 0xlong docker.gpg 
gpg: keybox '/home/user/.gnupg/pubring.kbx' created
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
pub   rsa4096/0xC52FEB6B621E9F35 2017-02-22 [SCEA]
	  060A61C51B558A7F742B77AAC52FEB6B621E9F35
uid                             Docker Release (CE rpm) <docker@docker.com>
user@host:~$ cat docker.gpg | gpg --keyid-format 0xlong --list-packets
# off=0 ctb=99 tag=6 hlen=3 plen=525
:public key packet:
	version 4, algo 1, created 1487791233, expires 0
	pkey[0]: [4096 bits]
	pkey[1]: [17 bits]
	keyid: C52FEB6B621E9F35
# off=528 ctb=b4 tag=13 hlen=2 plen=43
:user ID packet: "Docker Release (CE rpm) <docker@docker.com>"
# off=573 ctb=89 tag=2 hlen=3 plen=567
:signature packet: algo 1, keyid C52FEB6B621E9F35
	version 4, created 1487792760, md5len 0, sigclass 0x13
	digest algo 10, begin of digest e8 2d
	hashed subpkt 2 len 4 (sig created 2017-02-22)
	hashed subpkt 27 len 1 (key flags: 2F)
	hashed subpkt 11 len 4 (pref-sym-algos: 9 8 7 3)
	hashed subpkt 21 len 4 (pref-hash-algos: 10 9 8 11)
	hashed subpkt 22 len 4 (pref-zip-algos: 2 3 1 0)
	hashed subpkt 30 len 1 (features: 01)
	hashed subpkt 23 len 1 (keyserver preferences: 80)
	subpkt 16 len 8 (issuer key ID C52FEB6B621E9F35)
	data: [4094 bits]
user@host:~$ 
  1. I imported this key into my personal keyring for future safe-keeping. Here's the full fingerprint
user@personal:~$ gpg --list-keys docker
pub   rsa4096/0xC52FEB6B621E9F35 2017-02-22 [SCEA]
	  Key fingerprint = 060A 61C5 1B55 8A7F 742B  77AA C52F EB6B 621E 9F35
uid                   [ unknown] Docker Release (CE rpm) <docker@docker.com>

user@personal:~$ 
  1. there were no entries for the uid 'docker@docker.com' on the new keyserver https://keys.openpgp.org/search?q=docker%40docker.com
  2. while the original sks key server's entry for the uid is fucking huge https://sks-keyservers.net/pks/lookup?op=get&search=docker@docker.com
  3. I'm not a huge fan of specifying the location of a gpg key as a URL; our other repo files specify gpg key files that are located at /etc/pki/rpm-gpg/ on disk
[root@osestaging1 yum.repos.d]# grep 'gpgkey=' * | sort -u
CentOS-Base.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
CentOS-CR.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
CentOS-Debuginfo.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Debug-7
CentOS-fasttrack.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
CentOS-Media.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
CentOS-Sources.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
CentOS-Vault.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
epel.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
epel-testing.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
webmin.repo:gpgkey=http://www.webmin.com/jcameron-key.asc
webtatic-archive.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-webtatic-el7
webtatic.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-webtatic-el7
webtatic-testing.repo:gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-webtatic-el7
[root@osestaging1 yum.repos.d]# ls -lah /etc/pki/rpm-gpg
total 28K
drwxr-xr-x.  2 root root 4.0K Oct 27  2017 .
drwxr-xr-x. 11 root root 4.0K Sep 22  2017 ..
-rw-r--r--.  1 root root 1.7K Aug 30  2017 RPM-GPG-KEY-CentOS-7
-rw-r--r--.  1 root root 1004 Aug 30  2017 RPM-GPG-KEY-CentOS-Debug-7
-rw-r--r--.  1 root root 1.7K Aug 30  2017 RPM-GPG-KEY-CentOS-Testing-7
-rw-r--r--.  1 root root 1.7K Oct  2  2017 RPM-GPG-KEY-EPEL-7
-rw-r--r--.  1 root root 1.6K Oct  8  2014 RPM-GPG-KEY-webtatic-el7
[root@osestaging1 yum.repos.d]# 
  1. I found an issue about this recommending to add the full fingerprint to the install script; the issue was closed, but my install script has no fingerprint var in it.. https://github.com/moby/moby/issues/17436
  2. the sks query finally finished downloading; it's 192M!
user@disp8990:~$ curl  --tlsv1.2 --proto =https --location "https://sks-keyservers.net/pks/lookup?op=get&search=docker@docker.com" > docker-sks.gpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100  191M  100  191M    0     0   308k      0  0:10:34  0:10:34 --:--:--  319k
user@disp8990:~$ 
user@disp8990:~$ du -sh docker-sks.gpg 
192M	docker-sks.gpg
user@disp8990:~$ 
  1. it doesn't look like a spammed packet, and it doesn't even include the key from above; I got 4x keys with a ton of signatures it seems
ser@disp8990:~$ gpg --import docker-sks.gpg 
gpg: /home/user/.gnupg/trustdb.gpg: trustdb created
gpg: key D7B2C1254AE90ACE: public key "vicky <vicky.kwan@docker.com>" imported
gpg: key 48B1173B7CDE3ACB: public key "Elias Uriegas <eliasuriegas@gmail.com>" imported
gpg: key 8E66A2E3A1C1CAD6: public key "ekino-gradle-docker-plugin <opensource@ekino.com>" imported
gpg: key C755EC7A05D64F1E: public key "Oliver Faßbender <olli@intrepid.de>" imported
gpg: packet(13) too large
gpg: read_block: read error: Invalid packet
gpg: no valid OpenPGP data found.
gpg: import from 'docker-sks.gpg' failed: Invalid keyring
gpg: Total number processed: 4
gpg:               imported: 4
user@disp8990:~$ gpg --list-keys
/home/user/.gnupg/pubring.kbx
-----------------------------
pub   rsa4096 2019-08-12 [SC] [expires: 2035-08-08]
	  47545A72C300CAB7A05A4E92D7B2C1254AE90ACE
uid           [ unknown] vicky <vicky.kwan@docker.com>
sub   rsa4096 2019-08-12 [E] [expires: 2035-08-08]

pub   rsa4096 2019-07-11 [SC]
	  DEC839EFD2644EC6CE93CB8948B1173B7CDE3ACB
uid           [ unknown] Elias Uriegas <eliasuriegas@gmail.com>
uid           [ unknown] Elias Uriegas <eli.uriegas@docker.com>
sub   rsa2048 2019-07-11 [SA] [expires: 2027-07-09]
sub   rsa2048 2019-07-11 [E] [expires: 2027-07-09]

pub   rsa2048 2019-06-13 [SC] [expires: 2021-06-12]
	  FF3F45A5AF2FC3FE927BF3338E66A2E3A1C1CAD6
uid           [ unknown] ekino-gradle-docker-plugin <opensource@ekino.com>
sub   rsa2048 2019-06-13 [E] [expires: 2021-06-12]

pub   rsa4096 2019-04-12 [SC] [expires: 2024-04-12]
	  594DD4D9F6E9CA5738617BF6C755EC7A05D64F1E
uid           [ unknown] Oliver Faßbender <olli@intrepid.de>
uid           [ unknown] Oliver Faßbender <docker@intrepid.de>
uid           [ unknown] Oliver Faßbender <github@intrepid.de>
uid           [ unknown] Oliver Faßbender <foxromeo75@gmail.com>
sub   rsa4096 2019-04-12 [E]

user@disp8990:~$ 
  1. ugh, TOFU sucks. There is no path to validation here; I'll just copy the one docker.org gave me and put it in /etc/pki/rpm-gpg :(
[root@osestaging1 discourse]# sha384sum docker.gpg 
e3837773edabb1aef62d8b89bbfe3a3c80e008fa312c1d7791606cd303e35d9c17208598fad4eb47fa0374ce027e4c17  docker.gpg
[root@osestaging1 discourse]# cat docker.gpg
-----BEGIN PGP PUBLIC KEY BLOCK-----

mQINBFit5IEBEADDt86QpYKz5flnCsOyZ/fk3WwBKxfDjwHf/GIflo+4GWAXS7wJ
1PSzPsvSDATV10J44i5WQzh99q+lZvFCVRFiNhRmlmcXG+rk1QmDh3fsCCj9Q/yP
w8jn3Hx0zDtz8PIB/18ReftYJzUo34COLiHn8WiY20uGCF2pjdPgfxE+K454c4G7
gKFqVUFYgPug2CS0quaBB5b0rpFUdzTeI5RCStd27nHCpuSDCvRYAfdv+4Y1yiVh
KKdoe3Smj+RnXeVMgDxtH9FJibZ3DK7WnMN2yeob6VqXox+FvKYJCCLkbQgQmE50
uVK0uN71A1mQDcTRKQ2q3fFGlMTqJbbzr3LwnCBE6hV0a36t+DABtZTmz5O69xdJ
WGdBeePCnWVqtDb/BdEYz7hPKskcZBarygCCe2Xi7sZieoFZuq6ltPoCsdfEdfbO
+VBVKJnExqNZCcFUTEnbH4CldWROOzMS8BGUlkGpa59Sl1t0QcmWlw1EbkeMQNrN
spdR8lobcdNS9bpAJQqSHRZh3cAM9mA3Yq/bssUS/P2quRXLjJ9mIv3dky9C3udM
+q2unvnbNpPtIUly76FJ3s8g8sHeOnmYcKqNGqHq2Q3kMdA2eIbI0MqfOIo2+Xk0
rNt3ctq3g+cQiorcN3rdHPsTRSAcp+NCz1QF9TwXYtH1XV24A6QMO0+CZwARAQAB
tCtEb2NrZXIgUmVsZWFzZSAoQ0UgcnBtKSA8ZG9ja2VyQGRvY2tlci5jb20+iQI3
BBMBCgAhBQJYrep4AhsvBQsJCAcDBRUKCQgLBRYCAwEAAh4BAheAAAoJEMUv62ti
Hp816C0P/iP+1uhSa6Qq3TIc5sIFE5JHxOO6y0R97cUdAmCbEqBiJHUPNQDQaaRG
VYBm0K013Q1gcJeUJvS32gthmIvhkstw7KTodwOM8Kl11CCqZ07NPFef1b2SaJ7l
TYpyUsT9+e343ph+O4C1oUQw6flaAJe+8ATCmI/4KxfhIjD2a/Q1voR5tUIxfexC
/LZTx05gyf2mAgEWlRm/cGTStNfqDN1uoKMlV+WFuB1j2oTUuO1/dr8mL+FgZAM3
ntWFo9gQCllNV9ahYOON2gkoZoNuPUnHsf4Bj6BQJnIXbAhMk9H2sZzwUi9bgObZ
XO8+OrP4D4B9kCAKqqaQqA+O46LzO2vhN74lm/Fy6PumHuviqDBdN+HgtRPMUuao
xnuVJSvBu9sPdgT/pR1N9u/KnfAnnLtR6g+fx4mWz+ts/riB/KRHzXd+44jGKZra
IhTMfniguMJNsyEOO0AN8Tqcl0eRBxcOArcri7xu8HFvvl+e+ILymu4buusbYEVL
GBkYP5YMmScfKn+jnDVN4mWoN1Bq2yMhMGx6PA3hOvzPNsUoYy2BwDxNZyflzuAi
g59mgJm2NXtzNbSRJbMamKpQ69mzLWGdFNsRd4aH7PT7uPAURaf7B5BVp3UyjERW
5alSGnBqsZmvlRnVH5BDUhYsWZMPRQS9rRr4iGW0l+TH+O2VJ8aQ
=0Zqq
-----END PGP PUBLIC KEY BLOCK-----
[root@osestaging1 discourse]# cp docker.gpg /etc/pki/rpm-gpg/
[root@osestaging1 discourse]# chown root:root /etc/pki/rpm-gpg/docker.gpg
[root@osestaging1 discourse]# chmod 0644 /etc/pki/rpm-gpg/docker.gpg
[root@osestaging1 discourse]# 
  1. and I replaced the repo file to use this gpg key
[root@osestaging1 discourse]# sed 's^gpgkey=\(.*\)^gpgkey=file:///etc/pki/rpm-gpg/docker.gpg^' docker-ce.repo > /etc/yum.repos.d/docker-ce.repo
[root@osestaging1 discourse]# head /etc/yum.repos.d/docker-ce.repo 
[docker-ce-stable]
name=Docker CE Stable - $basearch
baseurl=https://download.docker.com/linux/centos/7/$basearch/stable
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/docker.gpg

[docker-ce-stable-debuginfo]
name=Docker CE Stable - Debuginfo $basearch
baseurl=https://download.docker.com/linux/centos/7/debug-$basearch/stable
[root@osestaging1 discourse]# 
  1. following the script, I installed the 'yum-utils' package; it also looks like I setup the docker-ce repo files correctly
[root@osestaging1 yum.repos.d]# yum install yum-utils
Loaded plugins: fastestmirror, replace
docker-ce-stable                                                                                      | 3.5 kB  00:00:00     
(1/2): docker-ce-stable/x86_64/primary_db                                                             |  37 kB  00:00:00     
(2/2): docker-ce-stable/x86_64/updateinfo                                                             |   55 B  00:00:00     
Loading mirror speeds from cached hostfile
 * base: linux.darkpenguin.net
 * epel: mirror.23media.com
 * extras: mirror.softaculous.com
 * updates: mirror.alpix.eu
 * webtatic: uk.repo.webtatic.com
Resolving Dependencies
--> Running transaction check
---> Package yum-utils.noarch 0:1.1.31-52.el7 will be installed
--> Processing Dependency: python-kitchen for package: yum-utils-1.1.31-52.el7.noarch
--> Processing Dependency: libxml2-python for package: yum-utils-1.1.31-52.el7.noarch
--> Running transaction check
---> Package libxml2-python.x86_64 0:2.9.1-6.el7_2.3 will be installed
---> Package python-kitchen.noarch 0:1.1.1-5.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================================================================
 Package                          Arch                     Version                              Repository              Size
=============================================================================================================================
Installing:
 yum-utils                        noarch                   1.1.31-52.el7                        base                   121 k
Installing for dependencies:
 libxml2-python                   x86_64                   2.9.1-6.el7_2.3                      base                   247 k
 python-kitchen                   noarch                   1.1.1-5.el7                          base                   267 k

Transaction Summary
=============================================================================================================================
Install  1 Package (+2 Dependent packages)

Total download size: 635 k
Installed size: 3.2 M
Is this ok [y/d/N]: y
...
Installed:
  yum-utils.noarch 0:1.1.31-52.el7                                                                                           

Dependency Installed:
  libxml2-python.x86_64 0:2.9.1-6.el7_2.3                         python-kitchen.noarch 0:1.1.1-5.el7                        

Complete!
[root@osestaging1 yum.repos.d]# 
  1. and I installed the docker-ce package
[root@osestaging1 yum.repos.d]# yum install docker-ce
...
Install  1 Package (+2 Dependent packages)

Total download size: 87 M
Installed size: 362 M
Is this ok [y/d/N]: y
Downloading packages:
warning: /var/cache/yum/x86_64/7/docker-ce-stable/packages/docker-ce-19.03.4-3.el7.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID 621e9f35: NOKEY
Public key for docker-ce-19.03.4-3.el7.x86_64.rpm is not installed
(1/3): docker-ce-19.03.4-3.el7.x86_64.rpm                                                             |  24 MB  00:00:01     
(2/3): containerd.io-1.2.10-3.2.el7.x86_64.rpm                                                        |  23 MB  00:00:01     
(3/3): docker-ce-cli-19.03.4-3.el7.x86_64.rpm                                                         |  39 MB  00:00:00     
-----------------------------------------------------------------------------------------------------------------------------
Total                                                                                         40 MB/s |  87 MB  00:00:02     
Retrieving key from file:///etc/pki/rpm-gpg/docker.gpg
Importing GPG key 0x621E9F35:
 Userid     : "Docker Release (CE rpm) <docker@docker.com>"
 Fingerprint: 060a 61c5 1b55 8a7f 742b 77aa c52f eb6b 621e 9f35
 From       : /etc/pki/rpm-gpg/docker.gpg
Is this ok [y/N]: y
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : containerd.io-1.2.10-3.2.el7.x86_64                                                                       1/3 
  Installing : 1:docker-ce-cli-19.03.4-3.el7.x86_64                                                                      2/3 
  Installing : 3:docker-ce-19.03.4-3.el7.x86_64                                                                          3/3 
  Verifying  : 1:docker-ce-cli-19.03.4-3.el7.x86_64                                                                      1/3 
  Verifying  : 3:docker-ce-19.03.4-3.el7.x86_64                                                                          2/3 
  Verifying  : containerd.io-1.2.10-3.2.el7.x86_64                                                                       3/3 

Installed:
  docker-ce.x86_64 3:19.03.4-3.el7                                                                                           

Dependency Installed:
  containerd.io.x86_64 0:1.2.10-3.2.el7                         docker-ce-cli.x86_64 1:19.03.4-3.el7                        

Complete!
[root@osestaging1 yum.repos.d]# 
  1. this time I have docker v19.03.4,
[root@osestaging1 yum.repos.d]# docker -v
Docker version 19.03.4, build 9013bf583a
[root@osestaging1 yum.repos.d]# 
  1. and I started the docker daemon & attempted to run the discourse setup again; it failed
[root@osestaging1 discourse]# service docker start
Redirecting to /bin/systemctl start docker.service
[root@osestaging1 discourse]# 
[root@osestaging1 discourse]# ./discourse-setup 
...
Hostname for your Discourse? [discourse.opensourceecology.org]: 
Email address for admin account(s)? [michael@opensourceecology.org]: 
SMTP server address? [localhost]: 
SMTP port? [25]: 
SMTP user name? [discourse@opensouceecology.org]: 
SMTP password? [none]: 
Optional email address for setting up Let's Encrypt? (ENTER to skip) [me@example.com]: 

Does this look right?

Hostname      : discourse.opensourceecology.org
Email         : michael@opensourceecology.org
SMTP address  : localhost
SMTP port     : 25
SMTP username : discourse@opensouceecology.org
SMTP password : none

ENTER to continue, 'n' to try again, Ctrl+C to exit: 

Configuration file at  updated successfully!

Updates successful. Rebuilding in 5 seconds.
Building app
/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "apply caps: operation not permitted": unknown.
Your Docker installation is not working correctly

See: https://meta.discourse.org/t/docker-error-on-bootstrap/13657/18?u=sam
[root@osestaging1 discourse]# 
  1. google suggests that this is because we're running a container inside a container. Namely docker in an lxc container. god damn it.
  2. the link from the output suggets a test of hello world in docker; that fails too
[root@osestaging1 discourse]# docker run -it --rm hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
Digest: sha256:c3b4ada4687bbaa170745b3e4dd8ac3f194ca95b2d0518b417fb47e5879d9b5f
Status: Downloaded newer image for hello-world:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "apply caps: operation not permitted": unknown.
[root@osestaging1 discourse]# 
  1. I added a line to the osestaging1 lxc config file (/var/lib/lxc/osestaging1/config) to keep the mknod capability https://serverfault.com/questions/946854/docker-inside-lxc-starting-container-process-caused-apply-caps-operation-not-p
lxc.cap.keep = mknod
  1. well, that failed; it wants either keep or drop. I tried again with an empty 'drop' setting to clear all drops https://linuxcontainers.org/fr/lxc/manpages/man5/lxc.container.conf.5.html
  2. I started the osestaging1 lxc container again, and that appears to have worked
[root@osestaging1 ~]# service docker start
Redirecting to /bin/systemctl start docker.service
[root@osestaging1 ~]# docker run -it --rm hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
	(amd64)
 3. The Docker daemon created a new container from that image which runs the
	executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
	to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

[root@osestaging1 ~]# 
  1. I ran the discourse setup script again; it ran for a *while* but ultimately died because it tried to bind on port 443, where nginx is listening
[root@osestaging1 discourse]# ./discourse-setup 
...
sha256:45984f2db03ab095892062799571bef5ec7b89a66e05fe9389677e135884cd32
Error response from daemon: container a23752a126c179518a4ad5bdeeb431082167f2d4102875d07651e12fabf046da: driver "overlay2" failed to remove root filesystem: unlinkat /var/lib/docker/overlay2/af51149685f75bb3e62c402c45a6683a49d5e254d620a919fa3497843d9b6aec/merged: device or resource busy

+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=2 -e UNICORN_SIDEKIQS=1 -e RUBY_GLOBAL_METHOD_CACHE_SIZE=131072 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=michael@opensourceecology.org -e DISCOURSE_SMTP_ADDRESS=localhost -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_USER_NAME=discourse@opensouceecology.org -e DISCOURSE_SMTP_PASSWORD=none -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-app -e DOCKER_HOST_IP=172.17.0.1 --name app -t -p 80:80 -p 443:443 -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:53:2a:01:9b:c2 local_discourse/app /sbin/boot
1ba1c1a440db8884093bd001d60955df85cd8b3e655e00b4a0c8c8659f56b9e0
/bin/docker: Error response from daemon: driver failed programming external connectivity on endpoint app (28487c717e14e86e50b3f3caa3e2d015d5a56248249587f6e96d00302f5becb9): Error starting userland proxy: listen tcp 0.0.0.0:443: bind: address already in use.
[root@osestaging1 discourse]# 
  1. I manually edited the containers/app.yml file and replaced the "expose" ports of 80 & 443 to be 8020, similar to all our other apache backends (which are 8000 & 8010 so far)
  2. I re-ran the discourse-setup script. this time no error
[root@osestaging1 discourse]# ./discourse-setup 
...
166:M 07 Nov 2019 11:51:25.124 # Redis is now ready to exit, bye bye...
2019-11-07 11:51:25.255 UTC [49] LOG:  database system is shut down
sha256:201efc3c86c4597373ac995d85d9470e0765ef3a1efcf65720724adeda96e6ce
d9e23c6be259dcda6aa66756c207c12cb45c4cdcb886619ce7d6a8ccd114ebb5
Removing old container
+ /bin/docker rm app
app

+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=2 -e UNICORN_SIDEKIQS=1 -e RUBY_GLOBAL_METHOD_CACHE_SIZE=131072 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=michael@opensourceecology.org -e DISCOURSE_SMTP_ADDRESS=localhost -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_USER_NAME=discourse@opensouceecology.org -e DISCOURSE_SMTP_PASSWORD=none -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-app -e DOCKER_HOST_IP=172.17.0.1 --name app -t -p 8020:8020 -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:53:2a:01:9b:c2 local_discourse/app /sbin/boot
9363d1d1e1944b8100eebb033446c261fde7c08850877c30d755e7d9faf7c633
[root@osestaging1 discourse]# 
  1. it's running, but--ugh--it's bound to all interfaces
[root@osestaging1 discourse]# ss -plan | grep -i 8020
tcp    LISTEN     0      128      :::8020                 :::*                   users:(("docker-proxy",pid=18381,fd=4))
[root@osestaging1 discourse]# 
  1. can I visit the site? I created a new dns entry in the osedev1:/etc/hosts file
[root@osedev1 etc]# tail /etc/hosts
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4

# The following lines are desirable for IPv6 capable hosts
::1 osedev1 osedev1
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

# staging
10.241.189.11 www.opensourceecology.org opensourceecology.org awstats.opensourceecology.org fef.opensourceecology.org forum.opensourceecology.org microfactory.opensourceecology.org munin.opensourceecology.org opensourceecology.org oswh.opensourceecology.org phplist.opensourceecology.org store.opensourceecology.org wiki.opensourceecology.org awstats.openbuildinginstitute.org openbuildinginstitute.org seedhome.openbuildinginstitute.org www.openbuildinginstitute.org discourse.opensourceecology.org
[root@osedev1 etc]# service dnsmasq restart
Redirecting to /bin/systemctl restart dnsmasq.service
[root@osedev1 etc]# dig @127.0.0.1 discourse.opensourceecology.org

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> @127.0.0.1 discourse.opensourceecology.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29089
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;discourse.opensourceecology.org. IN    A

;; ANSWER SECTION:
discourse.opensourceecology.org. 0 IN   A       10.241.189.11

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Nov 07 12:57:18 CET 2019
;; MSG SIZE  rcvd: 76

[root@osedev1 etc]# 
  1. according to the install guide, I should now just be able to access the site. well, it doesn't work https://github.com/discourse/discourse/blob/master/docs/INSTALL-cloud.md
  2. I added the hostname to /etc/hosts on osestaging1 too, and even *there* it fails
[root@osestaging1 20191107]# curl -I discourse.opensourceecology.org:8020/
curl: (7) Failed connect to discourse.opensourceecology.org:8020; Connection refused
[root@osestaging1 20191107]# 
  1. so I guess the 'expose' section only pokes holes in the discourse docker firewall; it doesn't actually change the config? Anyway, there's a guide on how to set this up https://meta.discourse.org/t/running-other-websites-on-the-same-machine-as-discourse/17247
  2. I added the 'templates/web.socketed.template.yml' template to the app.yml template list per the link above
- "templates/web.socketed.template.yml"
  1. then I can setup my nginx proxy to forward to a socket file (unix:/var/discourse/shared/standalone/nginx.http.sock) as opposed to a port; that works. I created a new nginx config file for the vhost at /etc/nginx/discourse.opensourceecology.org from the fef config file and made some changes, such as commenting-out the varnish bits (I'll try to get that working later after this basic POC is operational) and replacing it with the recommended proxy lines from the above link
[root@osestaging1 conf.d]# cat discourse.opensourceecology.org
################################################################################
# File:    discourse.opensourceecology.org.conf
# Version: 0.1
# Purpose: Internet-listening web server for truncating https, basic DOS
#          protection, and passing to varnish cache (varnish then passes to
#          apache)
# Author:  Michael Altfield <michael@opensourceecology.org>
# Created: 2019-11-07
# Updated: 2019-11-07
################################################################################

# this whole site is a subdomain, so the below block for redirecting a naked
# domain does not apply here
#server {
#       # redirect the naked domain to 'www'
#       #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
#   #                   '$status $body_bytes_sent "$http_referer" '
#   #                   '"$http_user_agent" "$http_x_forwarded_for"';
#       #access_log /var/log/nginx/www.opensourceecology.org/access.log main;
#       #error_log /var/log/nginx/www.opensourceecology.org/error.log main;
#   include conf.d/secure.include;
#   include conf.d/ssl.opensourceecology.org.include;
#   listen 10.241.189.11:443;
#       server_name opensourceecology.org;
#       return 301 https://www.opensourceecology.org$uri;
#
#}

server {

		access_log /var/log/nginx/discourse.opensourceecology.org/access.log main;
		error_log /var/log/nginx/discourse.opensourceecology.org/error.log;

   include conf.d/secure.include;
   include conf.d/ssl.opensourceecology.org.include;
   #include conf.d/ssl.openbuildinginstitute.org.include;

   listen 10.241.189.11:443;
   #listen [2a01:4f8:172:209e::2]:443;

   server_name discourse.opensourceecology.org;

		#############
		# SITE_DOWN #
		#############
		# uncomment this block && restart nginx prior to apache work to display the
		# "SITE DOWN" webpage for our clients

#       root /var/www/html/SITE_DOWN/htdocs/;
#   index index.html index.htm; 
#
#       # force all requests to load exactly this page
#       location / {
#               try_files $uri /index.html;
#       }

		###################
		# SEND TO VARNISH #
		###################

#   location / {
#      proxy_pass http://127.0.0.1:6081;
#      proxy_set_header X-Real-IP $remote_addr;
#      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
#      proxy_set_header X-Forwarded-Proto https;
#      proxy_set_header X-Forwarded-Port 443;
#      proxy_set_header Host $host;
#   }

		##################
		# SEND TO DOCKER #
		##################

	location / {
		proxy_pass http://unix:/var/discourse/shared/standalone/nginx.http.sock:;
		proxy_set_header Host $http_host;
		proxy_http_version 1.1; 
		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
		proxy_set_header X-Forwarded-Proto https;
		proxy_set_header X-Real-IP $remote_addr;
	}

}

[root@osestaging1 conf.d]#
  1. and per the docs I reloaded the nginx config & rebuilt the docker discourse app
[root@osestaging1 discourse]# service nginx reload
Redirecting to /bin/systemctl reload nginx.service
[root@osestaging1 discourse]# ./launcher rebuild app
Ensuring launcher is up to date
...
  1. meanwhile, I created the necessary nginx log dirs
[root@osestaging1 conf.d]# mkdir /var/log/nginx/discourse.opensourceecology.org
[root@osestaging1 conf.d]# chown nginx:nginx /var/log/nginx/discourse.opensourceecology.org/
[root@osestaging1 conf.d]# chmod 0755 /var/log/nginx/discourse.opensourceecology.org/
[root@osestaging1 conf.d]# 
  1. note: discourse iteration with docker takes forever! I thought docker was supposed to make iterating times faster?!? Just this change from a port to a socket && this necessary "rebuild app" thing causes it to do a *ton* of opaque shit..
  2. ah, I had an issue with my nginx config file. It must end in '.conf' I moved it to 'discourse.opensourceecology.org.conf' && restarted nginx
  3. now I can access the discourse site in my browser! hooray!! Note that I had to start a private firefox window to create an exception to the hsts rules because 'discourse.opensourceecology.org' is not yet a valid subject alt name in our let's encrypt cert
  4. well, fuck, I can't login to the site because it's trying to send an email to michael@opensourceecology.org that never arrives. this is likely because google is blocking the email on their end, correctly noticing that it's coming from the wrong server. I don't want to fuck with our SPF records, etc and break production, so I think I'll just rebuild the discourse app so my email address is not hosted on google..
  5. this time it didn't come up because "container is marked for removal and cannot be started" ??

< pre> 2019-11-07 12:54:31.695 UTC [49] LOG: database system is shut down sha256:aa9684393a90a88c1bad7a780d3350a7abd9345d36066649f03110858f9abdef f7e67420146dd837bdcc5110c2849029fac5c7f7b795356143fdab0e7b0bddd4 Removing old container + /bin/docker rm app Error response from daemon: container 1ccf7cb96a6b4f099dbe5292041007f9639b128f5130270986ff44977e3d95fb: driver "overlay2" failed to remove root filesystem: unlinkat /var/lib/docker/overlay2/ce4f659013a7d25723e7e38f905e458b1b103a3009cd0fc4cd8d21e053c5e437/merged: device or resource busy

starting up existing container + /bin/docker start app Error response from daemon: container is marked for removal and cannot be started Error: failed to start containers: app [root@osestaging1 discourse]# </pre>

  1. `docker info` shows 3 docker containers are in the 'stopped' state
[root@osestaging1 discourse]# docker info
Client:
 Debug Mode: false

Server:
 Containers: 3
  Running: 0
  Paused: 0
  Stopped: 3
 Images: 6
 Server Version: 19.03.4
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-957.21.3.el7.x86_64
 Operating System: CentOS Linux 7 (Core) (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 1.748GiB
 Name: osestaging1
 ID: 7RXD:GHAW:C4IE:IOYN:BOPN:4OTK:UO2R:VNNC:KGST:B72A:J5ML:EFFF
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
[root@osestaging1 discourse]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@osestaging1 discourse]#
  1. it looks like if I pass -a to `docker ps` then it will give me the stopped dockers too
[root@osestaging1 discourse]# docker ps -a
CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS                PORTS               NAMES
1ccf7cb96a6b        f241e5ea3321                       "/sbin/boot"             27 minutes ago      Removal In Progress                       app
a23752a126c1        discourse/base:2.0.20191013-2320   "/bin/bash -c 'cd /p…"   2 hours ago         Removal In Progress                       jovial_mirzakhani
3c77792ab6b5        hello-world                        "/hello"                 10 days ago         Created                                   hardcore_goodall
[root@osestaging1 discourse]# 
  1. is this docker shit production ready? this docker issue says to just delete the dir. god I wouldn't want this to happen on prod.. https://github.com/moby/moby/issues/22312
  2. first I safely got rid of the one that wasn't stuck in "Removal In Progress"; it worked
[root@osestaging1 discourse]# docker ps -a
CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS                PORTS               NAMES
1ccf7cb96a6b        f241e5ea3321                       "/sbin/boot"             32 minutes ago      Removal In Progress                       app
a23752a126c1        discourse/base:2.0.20191013-2320   "/bin/bash -c 'cd /p…"   2 hours ago         Removal In Progress                       jovial_mirzakhani
3c77792ab6b5        hello-world                        "/hello"                 10 days ago         Created                                   hardcore_goodall
[root@osestaging1 discourse]# docker rm 3c77792ab6b5
3c77792ab6b5
[root@osestaging1 discourse]# docker ps -a
CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS                PORTS               NAMES
1ccf7cb96a6b        f241e5ea3321                       "/sbin/boot"             32 minutes ago      Removal In Progress                       app
a23752a126c1        discourse/base:2.0.20191013-2320   "/bin/bash -c 'cd /p…"   2 hours ago         Removal In Progress                       jovial_mirzakhani
[root@osestaging1 discourse]# 
  1. ok, stopping docker & force `rm`ing the container dirs worked
Redirecting to /bin/systemctl stop docker.service
[root@osestaging1 discourse]# rm f /var/lib/docker/containers/
1ccf7cb96a6b4f099dbe5292041007f9639b128f5130270986ff44977e3d95fb/ a23752a126c179518a4ad5bdeeb431082167f2d4102875d07651e12fabf046da/
[root@osestaging1 discourse]# rm -rf /var/lib/docker/containers/*
[root@osestaging1 discourse]# service docker start
Redirecting to /bin/systemctl start docker.service
[root@osestaging1 discourse]# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@osestaging1 discourse]# 
  1. ok, now it's working
[root@osestaging1 discourse]# ./launcher start app

+ /bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=2 -e UNICORN_SIDEKIQS=1 -e RUBY_GLOBAL_METHOD_CACHE_SIZE=131072 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e DISCOURSE_HOSTNAME=discourse.opensourceecology.org -e DISCOURSE_DEVELOPER_EMAILS=osediscorse_2019@michaelaltfield.net -e DISCOURSE_SMTP_ADDRESS=localhost -e DISCOURSE_SMTP_PORT=25 -e DISCOURSE_SMTP_USER_NAME=discourse@opensouceecology.org -e DISCOURSE_SMTP_PASSWORD=none -e DISCOURSE_SMTP_AUTHENTICATION=none -e DISCOURSE_SMTP_OPENSSL_VERIFY_MODE=none -e DISCOURSE_SMTP_ENABLE_START_TLS=false -h osestaging1-app -e DOCKER_HOST_IP=172.17.0.1 --name app -t -p 8020:8020 -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:53:2a:01:9b:c2 local_discourse/app /sbin/boot
d90b039776439ea5caf969b5bbc202cb1d90fc657b8e6e1949b0365b5ff6f8cb
[root@osestaging1 discourse]# 
[root@osestaging1 discourse]# docker ps
CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS              PORTS                    NAMES
d90b03977643        local_discourse/app   "/sbin/boot"        13 seconds ago      Up 12 seconds       0.0.0.0:8020->8020/tcp   app
[root@osestaging1 discourse]# 
  1. I tried again with my new email address, but it still didn't come through. Note also that it said my username 'maltfield' wasn't unique. How do I do a fresh install? Whatever. It appears that the email issue may be between discourse and postfix. postfix logs don't even show anything when I click the "Resend Activation Email" button. Here's the relevant docs https://meta.discourse.org/t/troubleshooting-email-on-a-new-discourse-install/16326/2
  2. as noted in the above thread, the discourse log files are here: /var/discourse/shared/standalone/log/rails/production.log
  3. when I click on the "Resend Activation Email" button, this pops up in the above log file
Started PUT "/finish-installation/resend-email" for 127.0.0.1 at 2019-11-07 13:15:31 +0000
Processing by FinishInstallationController#resend_email as HTML
  Parameters: {"authenticity_token"=>"SzQCvRWiqdXsBKzOjIB0X7KkvXro7Od6SdP8Qa8vvrskPeNYZNos5ORHJfyDUrHiKShZR/txM6NHuqHHCQCR1w=="}
  Rendering finish_installation/resend_email.html.erb within layouts/finish_installation
  Rendered finish_installation/resend_email.html.erb within layouts/finish_installation (Duration: 0.7ms | Allocations: 103)
  Rendered layouts/_head.html.erb (Duration: 0.5ms | Allocations: 103)
Completed 200 OK in 98ms (Views: 3.0ms | ActiveRecord: 0.0ms | Allocations: 4763)
  Rendering layouts/email_template.html.erb
  Rendered layouts/email_template.html.erb (Duration: 0.5ms | Allocations: 141)
Delivered mail c4ca58ca-345e-46c4-81bc-6d0eac7afa04@discourse.opensourceecology.org (11.3ms)
Job exception: wrong authentication type none
  1. aw ffs, back to this smtp auth shit again. We *don't* have auth on our smtp server; it's not exposed to the Internet, and it runs on localhost only; auth is not necessary. I set it to "none" to *not* use smtp auth. Apparently it doesn't like that *facepalm*
  2. I removed the username & password fields and rebuilt the app (the best way I've found is `./launcher destroy app && ./launcher rebuild app` which still takes for fucking ever to run); now it gets a bit further, but complains that localhost is refusing the connection.
  Rendering layouts/email_template.html.erb
  Rendered layouts/email_template.html.erb (Duration: 0.6ms | Allocations: 139)
Delivered mail ca01baae-880e-4448-81fd-bacfc71cfab3@discourse.opensourceecology.org (3.5ms)
Job exception: Connection refused - connect(2) for "localhost" port 25
  1. so I think there's a few possible issues here:
    1. iptables is blocking traffic from the container to the host
      1. I attempted to fix this by adding an iptables rule permitting traffic from the 'docker0' interface into INPUT. Note that these rules were modified by docker, it seems, already.
[root@osestaging1 discourse]# iptables-save | head -n 40
# Generated by iptables-save v1.4.21 on Thu Nov  7 14:22:31 2019
*mangle
:PREROUTING ACCEPT [1584:199946]
:INPUT ACCEPT [1576:198394]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1410:267329]
:POSTROUTING ACCEPT [1404:266969]
COMMIT
# Completed on Thu Nov  7 14:22:31 2019
# Generated by iptables-save v1.4.21 on Thu Nov  7 14:22:31 2019
*nat
:PREROUTING ACCEPT [10:1656]
:INPUT ACCEPT [2:104]
:OUTPUT ACCEPT [44:3026]
:POSTROUTING ACCEPT [38:2666]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 8020 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 8020 -j DNAT --to-destination 172.17.0.2:8020
COMMIT
# Completed on Thu Nov  7 14:22:31 2019
# Generated by iptables-save v1.4.21 on Thu Nov  7 14:22:31 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [28:2240]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
-A INPUT -p tcp -m state --state NEW -m tcp --dport 25 -j ACCEPT
-A INPUT -i docker0 -j ACCEPT
-A INPUT -s 5.9.144.234/32 -j DROP
-A INPUT -s 173.234.159.250/32 -j DROP
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
[root@osestaging1 discourse]# 
    1. 'localhost' inside the docker container means the docker container itself; not the docker host running my smtp server
      1. according to this article, I can use the special DNS name 'host.docker.internal' to resolve to the host's ip address as addressible from the docker container https://stackoverflow.com/questions/31324981/how-to-access-host-port-from-docker-container
  1. it would also be great if I could debug from within the container itself; that's a bit tricky, but I found I can get a shell in a container (with a stupid simple subset of commands) like so
[root@osestaging1 discourse]# docker ps
dCONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS              PORTS               NAMES
7bf55da425bc        local_discourse/app   "/sbin/boot"        35 minutes ago      Up 35 minutes                           app
[root@osestaging1 discourse]# docker exec -it app /bin/bash
root@osestaging1-app:/# ping host.docker.internal
bash: ping: command not found
root@osestaging1-app:/# dig
bash: dig: command not found
root@osestaging1-app:/# nslookup host.docker.internal
bash: nslookup: command not found
  1. I installed 'adnshost', but--fuck--the dns entry does't work; looks like linux support for it is still pending *facepalm* https://github.com/docker/for-linux/issues/264
root@osestaging1-app:/# apt-get install adns-tools
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libadns1
The following NEW packages will be installed:
  adns-tools libadns1
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 107 kB of archives.
After this operation, 276 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://deb.debian.org/debian buster/main amd64 libadns1 amd64 1.5.0~rc1-1.1 [66.2 kB]
Get:2 http://deb.debian.org/debian buster/main amd64 adns-tools amd64 1.5.0~rc1-1.1 [40.3 kB]
Fetched 107 kB in 0s (450 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libadns1.
(Reading database ... 44574 files and directories currently installed.)
Preparing to unpack .../libadns1_1.5.0~rc1-1.1_amd64.deb ...
Unpacking libadns1 (1.5.0~rc1-1.1) ...
Selecting previously unselected package adns-tools.
Preparing to unpack .../adns-tools_1.5.0~rc1-1.1_amd64.deb ...
Unpacking adns-tools (1.5.0~rc1-1.1) ...
Setting up libadns1 (1.5.0~rc1-1.1) ...
Setting up adns-tools (1.5.0~rc1-1.1) ...
Processing triggers for libc-bin (2.28-10) ...
root@osestaging1-app:/# ad
add-apt-repository  addgroup            addr2line           adduser             adnshost            adnsresfilter       advmng              advzip
addgnupghome        addpart             add-shell           adnsheloex          adnslogres          advdef              advpng
root@osestaging1-app:/# ad
add-apt-repository  addgroup            addr2line           adduser             adnshost            adnsresfilter       advmng              advzip
addgnupghome        addpart             add-shell           adnsheloex          adnslogres          advdef              advpng
root@osestaging1-app:/# ad
add-apt-repository  addgroup            addr2line           adduser             adnshost            adnsresfilter       advmng              advzip
addgnupghome        addpart             add-shell           adnsheloex          adnslogres          advdef              advpng
root@osestaging1-app:/# adns
adnsheloex     adnshost       adnslogres     adnsresfilter
root@osestaging1-app:/# adnshost
adnshost usage error: no domains given, and -f/--pipe not used; try --help
root@osestaging1-app:/# adnshost google.com
google.com A INET 172.217.22.78
google.com A INET6 2a00:1450:4001:800::200e
root@osestaging1-app:/# adnshost host.docker.internal
host.docker.internal does not exist
root@osestaging1-app:/# 
  1. dns is the robust option, but can we at least prove connectivity from within the container to the host at least over IP for testing? I installed telnet and couldn't get it to work..
root@osestaging1-app:/# cat /etc/hosts
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2      osestaging1-app
root@osestaging1-app:/# telnet 172.17.0.2 25
Trying 172.17.0.2...
telnet: Unable to connect to remote host: Connection refused
root@osestaging1-app:/# telnet 172.17.0.1 25
Trying 172.17.0.1...
telnet: Unable to connect to remote host: Connection refused
root@osestaging1-app:/# 
  1. here's the damn command to get the `ip` command; it's 'iproute2' https://stackoverflow.com/questions/51834978/ip-command-is-missing-from-ubuntu-docker-image
root@osestaging1-app:/# apt-get install iproute2
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libatm1 libmnl0 libxtables12
Suggested packages:
  iproute2-doc
The following NEW packages will be installed:
  iproute2 libatm1 libmnl0 libxtables12
0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
Need to get 990 kB of archives.
After this operation, 2,954 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://deb.debian.org/debian buster/main amd64 libmnl0 amd64 1.0.4-2 [12.2 kB]
Get:2 http://deb.debian.org/debian buster/main amd64 libxtables12 amd64 1.8.2-4 [80.0 kB]
Get:3 http://deb.debian.org/debian buster/main amd64 iproute2 amd64 4.20.0-2 [827 kB]
Get:4 http://deb.debian.org/debian buster/main amd64 libatm1 amd64 1:2.5.1-2 [71.0 kB]
Fetched 990 kB in 0s (13.2 MB/s)    
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libmnl0:amd64.
(Reading database ... 45481 files and directories currently installed.)
Preparing to unpack .../libmnl0_1.0.4-2_amd64.deb ...
Unpacking libmnl0:amd64 (1.0.4-2) ...
Selecting previously unselected package libxtables12:amd64.
Preparing to unpack .../libxtables12_1.8.2-4_amd64.deb ...
Unpacking libxtables12:amd64 (1.8.2-4) ...
Selecting previously unselected package iproute2.
Preparing to unpack .../iproute2_4.20.0-2_amd64.deb ...
Unpacking iproute2 (4.20.0-2) ...
Selecting previously unselected package libatm1:amd64.
Preparing to unpack .../libatm1_1%3a2.5.1-2_amd64.deb ...
Unpacking libatm1:amd64 (1:2.5.1-2) ...
Setting up libatm1:amd64 (1:2.5.1-2) ...
Setting up libmnl0:amd64 (1.0.4-2) ...
Setting up libxtables12:amd64 (1.8.2-4) ...
Setting up iproute2 (4.20.0-2) ...
Processing triggers for libc-bin (2.28-10) ...
root@osestaging1-app:/# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host 
	   valid_lft forever preferred_lft forever
299: eth0@if300: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
	link/ether 02:53:2a:01:9b:c2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
	inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
	   valid_lft forever preferred_lft forever
	inet6 fe80::53:2aff:fe01:9bc2/64 scope link 
	   valid_lft forever preferred_lft forever
root@osestaging1-app:/# 
  1. so it looks like the ip to the docker host from the docker container is '172.17.0.1'. the only open port there is 10000. 25 is not visible :\
root@osestaging1-app:/# nmap 172.17.0.1
Starting Nmap 7.70 ( https://nmap.org ) at 2019-11-07 15:00 UTC
Nmap scan report for 172.17.0.1
Host is up (0.000019s latency).
Not shown: 999 closed ports
PORT      STATE SERVICE
10000/tcp open  snet-sensor-mgmt
MAC Address: 02:42:80:35:65:A1 (Unknown)

Nmap done: 1 IP address (1 host up) scanned in 1.85 seconds
root@osestaging1-app:/# ip r
default via 172.17.0.1 dev eth0 
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.2 
root@osestaging1-app:/# 
  1. I added a question to the thread on how this fucking simple config is supposed to be achieved https://meta.discourse.org/t/troubleshooting-email-on-a-new-discourse-install/16326/372

Mon Oct 28, 2019

  1. I stumbled on buttplug.io and sent an email to Marcin about the market size of 3d-printable sex toys, a potential addition to our HeroX challenge. He responded that it's a > 23-billion-dollar industry https://www.businesswire.com/news/home/20181002005775/en/Global-Adult-Toys-Market-Worth-23.7Bn-2017
    1. https://buttplug.io/
    2. Buttplug
    3. Market Size
  2. I expect that it would be a farily trivial design, and the marketing potential is compelling
  3. and potential modular interoperability with the hammer drill?!? no, nevermind. back to sysadmin work..
  4. ...
  5. time to install discourse on staging and see if it breaks our existing sites
  6. the easy install guide provided by docker is in their git repo's docs dir named 'INSTALL-cloud' https://github.com/discourse/discourse/blob/master/docs/INSTALL-cloud.md
  7. this guide says to first checkout the discourse docker repot to /var/discourse https://github.com/discourse/discourse_docker
  8. the next step is to execute a `curl https://get.docker.com/ | sh`. God help us.
  9. fortunately, the insanely insecure step above is wrapped in an if-condition that only executes if docker is not first found, and if the user presses enter to proceed (providing the option for the user to safely ctrl+c out to prevent the above command from running
  10. I found doker exists in the yum repos, so I installed it from there first
[root@osestaging1 discourse]# yum install docker
...
Installed:
  docker.x86_64 2:1.13.1-103.git7f2769b.el7.centos                                                                                                

Dependency Installed:
  atomic-registries.x86_64 1:1.22.1-29.gitb507039.el7                            container-selinux.noarch 2:2.107-3.el7                           
  container-storage-setup.noarch 0:0.11.0-2.git5eaf76c.el7                       containers-common.x86_64 1:0.1.37-3.el7.centos                   
  docker-client.x86_64 2:1.13.1-103.git7f2769b.el7.centos                        docker-common.x86_64 2:1.13.1-103.git7f2769b.el7.centos          
  oci-register-machine.x86_64 1:0-6.git2b44233.el7                               oci-systemd-hook.x86_64 1:0.2.0-1.git05e6923.el7_6               
  oci-umount.x86_64 2:2.5-3.el7                                                  python-pytoml.noarch 0:0.1.14-1.git7dea353.el7                   
  subscription-manager-rhsm-certificates.x86_64 0:1.24.13-3.el7.centos           yajl.x86_64 0:2.0.4-4.el7                                        

Dependency Updated:
  libselinux.x86_64 0:2.5-14.1.el7                     libselinux-python.x86_64 0:2.5-14.1.el7       libselinux-utils.x86_64 0:2.5-14.1.el7     
  libsemanage.x86_64 0:2.5-14.el7                      libsemanage-python.x86_64 0:2.5-14.el7        libsepol.x86_64 0:2.5-10.el7               
  policycoreutils.x86_64 0:2.5-33.el7                  policycoreutils-python.x86_64 0:2.5-33.el7    selinux-policy.noarch 0:3.13.1-252.el7.1   
  selinux-policy-targeted.noarch 0:3.13.1-252.el7.1    setools-libs.x86_64 0:3.3.8-4.el7            

Complete!
[root@osestaging1 discourse]# 
  1. that appeared to be the only curl-piped-to-shell line in the discourse-setup script, so I proceeded with the install. the first thing I noticed was that it yelled at me for having <2G of swap space. If this becomes an issue, I'll just create a 2-4G swap file somewhere on '/' (not in the ebs volume)
[root@osestaging1 discourse]# ./discourse-setup
which: no docker.io in (/sbin:/bin:/usr/sbin:/usr/bin)
which: no docker.io in (/sbin:/bin:/usr/sbin:/usr/bin)
WARNING: Discourse requires at least 2GB of swap when running with 2GB of RAM
or less. This system does not appear to have sufficient swap space.

Without sufficient swap space, your site may not work properly, and future
upgrades of Discourse may not complete successfully.

Ctrl+C to exit or wait 5 seconds to have a 2GB swapfile created.
Setting up swapspace version 1, size = 2097148 KiB
no label, UUID=7e132ae9-7b1b-429c-8c11-d55310818030
/swapfile       swap    swap    auto      0       0
sysctl: setting key "vm.swappiness": Read-only file system
./discourse-setup: line 277: netstat: command not found
./discourse-setup: line 277: netstat: command not found
Ports 80 and 443 are free for use
‘samples/standalone.yml’ -> ‘containers/app.yml’
Found 1GB of memory and 1 physical CPU cores
setting db_shared_buffers = 128MB
setting UNICORN_WORKERS = 2
containers/app.yml memory parameters updated.

Hostname for your Discourse? [discourse.example.com]: discourse.opensourceecology.org
Email address for admin account(s)? [me@example.com,you@example.com]: michael@opensourceecology.org
SMTP server address? [smtp.example.com]: localhost
SMTP port? [587]:
SMTP user name? [user@example.com]:
SMTP password? [pa$$word]:
Optional email address for setting up Let's Encrypt? (ENTER to skip) [me@example.com]:

Does this look right?

Hostname      : discourse.opensourceecology.org
Email         : michael@opensourceecology.org
SMTP address  : localhost
SMTP port     : 587
SMTP username : user@example.com
SMTP password : pa$$word

ENTER to continue, 'n' to try again, Ctrl+C to exit: 
  1. continuing failed! I entered no username & password for the smtp server as it should be unnecessary coming from localhost. apparently that doesn't override the default, though?
Configuration file at  updated successfully!

DISCOURSE_SMTP_USER_NAME left at incorrect default of user@example.com
DISCOURSE_SMTP_PASSWORD left at incorrect default of pa$$word

Sorry, these containers/app.yml settings aren't valid -- can't continue!
If you have unusual requirements, edit containers/app.yml and then: 
./launcher bootstrap app
[root@osestaging1 discourse]# 
  1. on second thought, maybe that's because I set the port to the default of 587; it should be 25
[root@osestaging1 discourse]# nmap localhost 

Starting Nmap 6.40 ( http://nmap.org ) at 2019-10-28 12:18 UTC
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000010s latency).
rDNS record for 127.0.0.1: localhost.localdomain
Not shown: 996 closed ports
PORT      STATE SERVICE
25/tcp    open  smtp
8000/tcp  open  http-alt
8010/tcp  open  xmpp
10000/tcp open  snet-sensor-mgmt

Nmap done: 1 IP address (1 host up) scanned in 0.11 seconds
[root@osestaging1 discourse]# ss -plan | grep -i ':25'
tcp    LISTEN     0      100    127.0.0.1:25                    *:*                   users:(("master",pid=782,fd=13))
[root@osestaging1 discourse]# 
  1. I tried again (the installer was a bit automagic at remembering previous args, which is nice), but changing to 25 still asked for creds
[root@osestaging1 discourse]# ./discourse-setup 
which: no docker.io in (/sbin:/bin:/usr/sbin:/usr/bin)
which: no docker.io in (/sbin:/bin:/usr/sbin:/usr/bin)
The configuration file containers/app.yml already exists!

. . . reconfiguring . . .


Saving old file as app.yml.2019-10-28-121933.bak
Stopping existing container in 5 seconds or Control-C to cancel.
Device "docker0" does not exist.
Cannot connect to the docker daemon - verify it is running and you have access

Found 1GB of memory and 1 physical CPU cores
setting db_shared_buffers = 128MB
setting UNICORN_WORKERS = 2
containers/app.yml memory parameters updated.

Hostname for your Discourse? [discourse.opensourceecology.org]: 
Email address for admin account(s)? [michael@opensourceecology.org]: 
SMTP server address? [localhost]: 
SMTP port? [587]: 25
SMTP user name? [user@example.com]: 
SMTP password? [pa$$word]: 
Optional email address for setting up Let's Encrypt? (ENTER to skip) [me@example.com]: 

Does this look right?

Hostname      : discourse.opensourceecology.org
Email         : michael@opensourceecology.org
SMTP address  : localhost
SMTP port     : 25
SMTP username : user@example.com
SMTP password : pa$$word

ENTER to continue, 'n' to try again, Ctrl+C to exit: 

Configuration file at  updated successfully!

DISCOURSE_SMTP_USER_NAME left at incorrect default of user@example.com
DISCOURSE_SMTP_PASSWORD left at incorrect default of pa$$word

Sorry, these containers/app.yml settings aren't valid -- can't continue!
If you have unusual requirements, edit containers/app.yml and then: 
./launcher bootstrap app
[root@osestaging1 discourse]# 
  1. I manually edited the config file, blanking-out these default vaules
[root@osestaging1 discourse]# vim containers/app.yml
...
[root@osestaging1 discourse]# grep SMTP_PORT containers/app.yml | head -n1
  DISCOURSE_SMTP_PORT: 25
[root@osestaging1 discourse]# grep SMTP_USER containers/app.yml | head -n1
  DISCOURSE_SMTP_USER_NAME: ""
[root@osestaging1 discourse]# grep SMTP_PASSWORD containers/app.yml | head -n1
  DISCOURSE_SMTP_PASSWORD: ""
[root@osestaging1 discourse]# 
  1. It was still pretty unhappy with me
[root@osestaging1 discourse]# ./discourse-setup 
which: no docker.io in (/sbin:/bin:/usr/sbin:/usr/bin)
which: no docker.io in (/sbin:/bin:/usr/sbin:/usr/bin)
The configuration file containers/app.yml already exists!

. . . reconfiguring . . .


Saving old file as app.yml.2019-10-28-122342.bak
Stopping existing container in 5 seconds or Control-C to cancel.
Device "docker0" does not exist.
Cannot connect to the docker daemon - verify it is running and you have access

Found 1GB of memory and 1 physical CPU cores
setting db_shared_buffers = 128MB
setting UNICORN_WORKERS = 2
containers/app.yml memory parameters updated.

Hostname for your Discourse? [discourse.opensourceecology.org]: 
Email address for admin account(s)? [michael@opensourceecology.org]: 
SMTP server address? [localhost]: 
SMTP port? [25]: 
SMTP password? []: 
Optional email address for setting up Let's Encrypt? (ENTER to skip) [me@example.com]: 

Does this look right?

Hostname      : discourse.opensourceecology.org
Email         : michael@opensourceecology.org
SMTP address  : localhost
SMTP port     : 25
SMTP username : 
SMTP password : 

ENTER to continue, 'n' to try again, Ctrl+C to exit: 

Configuration file at  updated successfully!

DISCOURSE_SMTP_USER_NAME not present
DISCOURSE_SMTP_PASSWORD not present

Sorry, these containers/app.yml settings aren't valid -- can't continue!
If you have unusual requirements, edit containers/app.yml and then: 
./launcher bootstrap app
[root@osestaging1 discourse]# 
  1. so much for docker not requiring sysadmins, this is not exactly a trivial install. There's nothing in the comments of that config file that state how to set it up if you have auth-less smtp, but I I followed all the recommendations in a few relevant threads to no avail
    1. https://meta.discourse.org/t/cant-use-none-discourse-smtp-authentication/55833/13
    2. https://meta.discourse.org/t/how-to-set-no-user-pass-for-smtp-in-discourse-setup/58900/10
[root@osestaging1 discourse]# ./discourse-setup 
which: no docker.io in (/sbin:/bin:/usr/sbin:/usr/bin)
which: no docker.io in (/sbin:/bin:/usr/sbin:/usr/bin)
The configuration file containers/app.yml already exists!

. . . reconfiguring . . .


Saving old file as app.yml.2019-10-28-123312.bak
Stopping existing container in 5 seconds or Control-C to cancel.
Device "docker0" does not exist.
Cannot connect to the docker daemon - verify it is running and you have access

Found 1GB of memory and 1 physical CPU cores
setting db_shared_buffers = 128MB
setting UNICORN_WORKERS = 2
containers/app.yml memory parameters updated.

Hostname for your Discourse? [discourse.opensourceecology.org]: 
Email address for admin account(s)? [michael@opensourceecology.org]: 
SMTP server address? [localhost]: 
SMTP port? [25]: 
SMTP password? []: 
Optional email address for setting up Let's Encrypt? (ENTER to skip) [me@example.com]: 

Does this look right?

Hostname      : discourse.opensourceecology.org
Email         : michael@opensourceecology.org
SMTP address  : localhost
SMTP port     : 25
SMTP username : 
SMTP password : 

ENTER to continue, 'n' to try again, Ctrl+C to exit: 

Configuration file at  updated successfully!

DISCOURSE_SMTP_USER_NAME not present
DISCOURSE_SMTP_PASSWORD not present

Sorry, these containers/app.yml settings aren't valid -- can't continue!
If you have unusual requirements, edit containers/app.yml and then: 
./launcher bootstrap app
  1. per the last line, I tried running `launcher bootstrap app`, and that failed. Docker isn't running?
[root@osestaging1 discourse]# ./launcher bootstrap app
Device "docker0" does not exist.
Cannot connect to the docker daemon - verify it is running and you have access
[root@osestaging1 discourse]# ps -ef | grep -i docker
root      7839   692  0 12:35 pts/9    00:00:00 grep --color=auto -i docker
[root@osestaging1 discourse]# 
  1. I got a bit further by adding some options that *should* prevent auth to smtp, but I still had to set a value for the stmp password, else it yells at me and exits. this time it says it had connection issues to docker. do I have to manually start it?
[root@osestaging1 discourse]# ./discourse-setup 
which: no docker.io in (/sbin:/bin:/usr/sbin:/usr/bin)
which: no docker.io in (/sbin:/bin:/usr/sbin:/usr/bin)
The configuration file containers/app.yml already exists!

. . . reconfiguring . . .


Saving old file as app.yml.2019-10-28-130503.bak
Stopping existing container in 5 seconds or Control-C to cancel.
Device "docker0" does not exist.
Cannot connect to the docker daemon - verify it is running and you have access

Found 1GB of memory and 1 physical CPU cores
setting db_shared_buffers = 128MB
setting UNICORN_WORKERS = 2
containers/app.yml memory parameters updated.

Hostname for your Discourse? [discourse.opensourceecology.org]: 
Email address for admin account(s)? [michael@opensourceecology.org]: 
SMTP server address? [localhost]: 
SMTP port? [25]: 
SMTP user name? [discourse@opensouceecology.org]: 
SMTP password? [none]: 
Optional email address for setting up Let's Encrypt? (ENTER to skip) [me@example.com]: 

Does this look right?

Hostname      : discourse.opensourceecology.org
Email         : michael@opensourceecology.org
SMTP address  : localhost
SMTP port     : 25
SMTP username : discourse@opensouceecology.org
SMTP password : none

ENTER to continue, 'n' to try again, Ctrl+C to exit: 

Configuration file at  updated successfully!

Updates successful. Rebuilding in 5 seconds.
Building app
Device "docker0" does not exist.
Cannot connect to the docker daemon - verify it is running and you have access
[root@osestaging1 discourse]# 
[root@osestaging1 discourse]# grep SMTP containers/app.yml
  ## TODO: The SMTP mail server used to validate new accounts and send notifications
  # SMTP ADDRESS, username, and password are required
  # WARNING the char '#' in SMTP password can cause problems!
  DISCOURSE_SMTP_ADDRESS: localhost
  DISCOURSE_SMTP_PORT: 25
  DISCOURSE_SMTP_USER_NAME: discourse@opensouceecology.org
  DISCOURSE_SMTP_PASSWORD: "none"
  DISCOURSE_SMTP_AUTHENTICATION: none
  DISCOURSE_SMTP_OPENSSL_VERIFY_MODE: none
  DISCOURSE_SMTP_ENABLE_START_TLS: false
  #DISCOURSE_SMTP_ENABLE_START_TLS: true           # (optional, default true)
[root@osestaging1 discourse]# 
  1. it does appear that the docker services are disabled. There's 4 of them. Which do I use?
[root@osestaging1 discourse]# systemctl list-units | grep -i docker
[root@osestaging1 discourse]# systemctl list-unit-files | grep -i docker
docker-cleanup.service                        disabled
docker-storage-setup.service                  disabled
docker.service                                disabled
docker-cleanup.timer                          disabled
[root@osestaging1 discourse]# 
  1. a few tests fail https://meta.discourse.org/t/cant-run-the-launcher-to-install-discourse-on-centos-7/23095/21
[root@osestaging1 discourse]# docker run hello-world
/usr/bin/docker-current: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See '/usr/bin/docker-current run --help'.
[root@osestaging1 discourse]# ps -ef | grep -i docker
root     14744   692  0 13:15 pts/9    00:00:00 grep --color=auto -i docker
  1. I started just the simplest service = 'docker.service'
[root@osestaging1 discourse]# systemctl start docker.service
[root@osestaging1 discourse]# systemctl list-unit-files | grep -i docker
docker-cleanup.service                        disabled
docker-storage-setup.service                  disabled
docker.service                                disabled
docker-cleanup.timer                          disabled
[root@osestaging1 discourse]# systemctl status | grep -i docker
		   │ ├─docker.service
		   │ │ ├─15302 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --init-path=/usr/libexec/docker/docker-init-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver=journald --signature-verification=false --storage-driver overlay2
		   │ │ └─15307 /usr/bin/docker-containerd-current -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc --runtime-args --systemd-cgroup=true
			   │ ├─15386 grep --color=auto -i docker
[root@osestaging1 discourse]# systemctl | grep -i docker
  var-lib-docker-containers.mount                               loaded active mounted   /var/lib/docker/containers
  var-lib-docker-overlay2.mount                                 loaded active mounted   /var/lib/docker/overlay2
  docker.service                                                loaded active running   Docker Application Container Engine
  docker-cleanup.timer                                          loaded active waiting   Run docker-cleanup every hour
[root@osestaging1 discourse]# 
  1. now it's running, but the hello-world test failed
[root@osestaging1 discourse]# ps -ef | grep -i docker
root     15302     1  0 13:15 ?        00:00:00 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --init-path=/usr/libexec/docker/docker-init-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver=journald --signature-verification=false --storage-driver overlay2
root     15307 15302  0 13:15 ?        00:00:00 /usr/bin/docker-containerd-current -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc --runtime-args --systemd-cgroup=true
root     15509   692  0 13:17 pts/9    00:00:00 grep --color=auto -i docker
[root@osestaging1 discourse]# docker run hello-world
Unable to find image 'hello-world:latest' locally
Trying to pull repository docker.io/library/hello-world ... 
latest: Pulling from docker.io/library/hello-world
1b930d010525: Pull complete 
Digest: sha256:c3b4ada4687bbaa170745b3e4dd8ac3f194ca95b2d0518b417fb47e5879d9b5f
Status: Downloaded newer image for docker.io/hello-world:latest
container_linux.go:235: starting container process caused "process_linux.go:327: setting cgroup config for procHooks process caused \"failed to write c 5:1 rwm to devices.allow: write /sys/fs/cgroup/devices/lxc/osestaging1/system.slice/docker-3c77792ab6b5f23d727daf392b5b8d33a8713849da4e7f30e8cfcd2197c7ec0c.scope/devices.allow: operation not permitted\""
/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "process_linux.go:327: setting cgroup config for procHooks process caused \"failed to write c 5:1 rwm to devices.allow: write /sys/fs/cgroup/devices/lxc/osestaging1/system.slice/docker-3c77792ab6b5f23d727daf392b5b8d33a8713849da4e7f30e8cfcd2197c7ec0c.scope/devices.allow: operation not permitted\"".
[root@osestaging1 discourse]# 
  1. attempting the install failed again, stating that my version of docker as installed from the cent repos is too old :(
[root@osestaging1 discourse]# ./discourse-setup 
...
Configuration file at  updated successfully!

Updates successful. Rebuilding in 5 seconds.
Building app
ERROR: Docker version 1.13.1 not supported, please upgrade to at least 17.03.1, or recommended 17.06.2
[root@osestaging1 discourse]# 
  1. I removed the docker installed from yum
[root@osestaging1 discourse]# yum remove docker
Loaded plugins: fastestmirror, replace
Resolving Dependencies
--> Running transaction check
---> Package docker.x86_64 2:1.13.1-103.git7f2769b.el7.centos will be erased
--> Finished Dependency Resolution

Dependencies Resolved

==================================================================================================================================================
 Package                    Arch                       Version                                                  Repository                   Size
==================================================================================================================================================
Removing:
 docker                     x86_64                     2:1.13.1-103.git7f2769b.el7.centos                       @extras                      65 M

Transaction Summary
==================================================================================================================================================
Remove  1 Package

Installed size: 65 M
Is this ok [y/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Erasing    : 2:docker-1.13.1-103.git7f2769b.el7.centos.x86_64                                                                               1/1 
warning: /etc/sysconfig/docker-storage saved as /etc/sysconfig/docker-storage.rpmsave
  Verifying  : 2:docker-1.13.1-103.git7f2769b.el7.centos.x86_64                                                                               1/1 

Removed:
  docker.x86_64 2:1.13.1-103.git7f2769b.el7.centos                                                                                                

Complete!
[root@osestaging1 discourse]# 
  1. but, umm, docker is still installed?
[root@osestaging1 discourse]# docker -v
Docker version 1.13.1, build 7f2769b/1.13.1
[root@osestaging1 discourse]# yum remove docker
Loaded plugins: fastestmirror, replace
No Match for argument: docker
No Packages marked for removal
[root@osestaging1 discourse]# 
  1. I removed docker-client & docker-common too
[root@osestaging1 discourse]# rpm -qa | grep -i docker
docker-client-1.13.1-103.git7f2769b.el7.centos.x86_64
docker-common-1.13.1-103.git7f2769b.el7.centos.x86_64
[root@osestaging1 discourse]# yum remove docker-client docker-common
Loaded plugins: fastestmirror, replace
Existing lock /var/run/yum.pid: another copy is running as pid 16965.
Another app is currently holding the yum lock; waiting for it to exit...
  The other application is: yum
	Memory : 224 M RSS (986 MB VSZ)
	Started: Mon Oct 28 13:21:58 2019 - 00:03 ago
	State  : Running, pid: 16965
Resolving Dependencies
--> Running transaction check
---> Package docker-client.x86_64 2:1.13.1-103.git7f2769b.el7.centos will be erased
---> Package docker-common.x86_64 2:1.13.1-103.git7f2769b.el7.centos will be erased
--> Finished Dependency Resolution

Dependencies Resolved

==================================================================================================================================================
 Package                         Arch                     Version                                                 Repository                 Size
==================================================================================================================================================
Removing:
 docker-client                   x86_64                   2:1.13.1-103.git7f2769b.el7.centos                      @extras                    13 M
 docker-common                   x86_64                   2:1.13.1-103.git7f2769b.el7.centos                      @extras                   4.4 k

Transaction Summary
==================================================================================================================================================
Remove  2 Packages

Installed size: 13 M
Is this ok [y/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Erasing    : 2:docker-client-1.13.1-103.git7f2769b.el7.centos.x86_64                                                                        1/2 
  Erasing    : 2:docker-common-1.13.1-103.git7f2769b.el7.centos.x86_64                                                                        2/2 
  Verifying  : 2:docker-common-1.13.1-103.git7f2769b.el7.centos.x86_64                                                                        1/2 
  Verifying  : 2:docker-client-1.13.1-103.git7f2769b.el7.centos.x86_64                                                                        2/2 

Removed:
  docker-client.x86_64 2:1.13.1-103.git7f2769b.el7.centos                 docker-common.x86_64 2:1.13.1-103.git7f2769b.el7.centos                

Complete!
[root@osestaging1 discourse]# 
  1. Aaaand now it's gone
[root@osestaging1 discourse]# docker -v
bash: /bin/docker: No such file or directory
[root@osestaging1 discourse]# 
  1. alright, let's get that damn unsigned https install script and see what it does
[root@osestaging1 discourse]# wget https://get.docker.com/ -O installDocker.sh
--2019-10-28 13:26:37--  https://get.docker.com/
Resolving get.docker.com (get.docker.com)... 143.204.101.29, 143.204.101.37, 143.204.101.126, ...
Connecting to get.docker.com (get.docker.com)|143.204.101.29|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13216 (13K) [text/plain]
Saving to: ‘installDocker.sh’

100%[========================================================================================================>] 13,216      --.-K/s   in 0s      

2019-10-28 13:26:37 (32.0 MB/s) - ‘installDocker.sh’ saved [13216/13216]

[root@osestaging1 discourse]# 
  1. good christ, it's 476 lines long. The sources (and some related documentation) is found on their github here. Note that even they say not to use this script on production systems :( https://github.com/docker/docker-install
  2. I see no on there about verifying a cryptographic signature of the file. ffs it's not hard to implement. I remember hitting this wall when I was researching discourse & docker in the past, and I was stunned to discover that even the whonix project used discourse. I asked the founder (Patrick Schleizer) about his thoughts on the security of Discourse back in 2018-09, and the best suggestion he offered for authenticity of the install script was some args for curl https://forums.whonix.org/t/change-whonix-forum-software-to-discourse/1181/15
curl --remote-name --tlsv1.2 --proto =https --location --remote-name https://get.docker.com/
  1. his command was untested and had some issues; I fixed it, having to bruteforce the name of the script. It's actually 'install.sh' (not get-docker.sh as the comments suggest), and attempts to grab anything else return a 403 from cloudflare. I did this one in whonix (meta) through tor
user@host:~$ curl -i --tlsv1.2 --proto =https --location --remote-name https://get.docker.com/install.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100 76963  100 76963    0     0  13326      0  0:00:05  0:00:05 --:--:-- 17836
user@host:~$ sha384sum install.sh 
da1bb77df1cc6aea926b893cb67780c492ca8fcaf52edd5328819732ce914c894f2fed8c210aec92a9df1c03de51107b  install.sh
user@host:~$ 
  1. compare that to what I downloaded through the internet from our staging server, and we have a --- a mismatch?
[root@osestaging1 discourse]# sha384sum installDocker.sh 
68041f4b75f5485834c53c549d1682f1d36af864ac2fde5eba1d7bf401fd44db3a6c79ba32d7f85c6778aea5897182c4  installDocker.sh
[root@osestaging1 discourse]# 
  1. so somehow install.sh obtained through tor is actually a binary file? Did I really just get MITM'd?
user@host:~$ less install.sh 
"install.sh" may be a binary file.  See it anyway? 
user@host:~$    
  1. well, when I tried this command from our server, I got the same binary
[root@osestaging1 discourse]# curl  --tlsv1.2 --proto=https --location --remote-name https://get.docker.com/install.sh
curl: option --proto=https: is unknown
curl: try 'curl --help' or 'curl --manual' for more information
[root@osestaging1 discourse]# curl --tlsv1.2 --proto =https --location --remote-name https://get.docker.com/install.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100 76963  100 76963    0     0   230k      0 --:--:-- --:--:-- --:--:--  231k
[root@osestaging1 discourse]# sha384sum install.sh 
da1bb77df1cc6aea926b893cb67780c492ca8fcaf52edd5328819732ce914c894f2fed8c210aec92a9df1c03de51107b  install.sh
[root@osestaging1 discourse]# less install.sh
"install.sh" may be a binary file.  See it anyway? 
[root@osestaging1 discourse]# 
  1. this is hard because there there appears to be no endpoint file name in the URI, which curl wants (I guess for security reasons it would be good to ensure there's no redirects), but I can't find a file name that's correct. It's just spat out on a query for '/', and otherwise I get a 403 Access Denied. If I change to stdout instead of --remote-name, then it works
user@host:~$ curl  --tlsv1.2 --proto =https --location https://get.docker.com/ > get-docker.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100 13216  100 13216    0     0   3649      0  0:00:03  0:00:03 --:--:--  3648
user@host:~$ sha384sum get-docker.sh 
68041f4b75f5485834c53c549d1682f1d36af864ac2fde5eba1d7bf401fd44db3a6c79ba32d7f85c6778aea5897182c4  get-docker.sh
user@host:~$ 
  1. I downloaded it again through a distinct path on a vpn
user@disp5412:~$ curl  --tlsv1.2 --proto https --location https://get.docker.com/ > get-docker.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100 13216  100 13216    0     0   1606      0  0:00:08  0:00:08 --:--:--  3561
user@disp5412:~$ sha384sum get-docker.sh
68041f4b75f5485834c53c549d1682f1d36af864ac2fde5eba1d7bf401fd44db3a6c79ba32d7f85c6778aea5897182c4  get-docker.sh
user@disp5412:~$ 
user@disp5412:~$ curl ifconfig.co/json
{"ip":"5.254.96.242","ip_decimal":100557042,"country":"Romania","country_eu":true,"country_iso":"RO","city":"Bucharest","latitude":44.4354,"longitude":26.1033,"asn":"AS3223","asn_org":"Voxility LLP","user_agent":{"product":"curl","version":"7.52.1","raw_value":"curl/7.52.1"}}user@disp5412:~$ 

Wed Oct 25, 2019

  1. now that I've finished a script to automate the sync from prod to staging, I can finally proceed with a POC of Discourse or AskBot
  2. I emailed Marcin asking which was higher priority, which I'll begin next week
  3. Marcin said he's getting 414 request-uri too large issues from wordpress when attempting spam moderation. I checked our nginx config, which uses a 10M limit on 'client_max_body_size' which is 10x the default of 1M.
  4. I responded to our old email chain with Christian from almost 2 months ago asking if he heard back from kiwix regarding our offline zim wiki archive, and I asked if he could write an article about this archive as a howto for users to use it on andorid to publish on osemain
  5. Marcin confirmed: I should work on the Discourse POC next week
  6. I updated my TODO list https://wiki.opensourceecology.org/wiki/OSE_Server#TODO
    1. namely, in addition to this Discourse POC, I also need to add 2FA support to our VPN and put together guides for OSE devs to gain access to the VPN and also guides for the OSE sysadmin to grant them access

Tue Oct 24, 2019

  1. Marcin mentioned yesterday that the ajax signup form for osemail on our phplist post on osemain is broken https://www.opensourceecology.org/moving-to-open-source-email-list-software/
  2. looks like it's wordpress wrapping our javascript in paragraph

    tags again; I fixed this back January by using the wpautop-control plugin https://wiki.opensourceecology.org/wiki/Maltfield_Log/2019_Q1#Sat_Jan_26.2C_2019

  3. Marcin said he couldn't fix it today by doing a restore (fucking worpdress doesn't do an actual restore; it *still* tries to prase the old content & add paragraph tags??), so he just made an image of the form and linked to the signup page on phplist.opensourceecology.org. That sucks.
  4. I logged into osemain's wordpress wui. Oh, no, the 'wpautop-control' plugin isn't activated anymore. I'm assuming that marcin disabled it when doing some cleanup to debug slowdown after we added the social media and seo plugins https://wiki.opensourceecology.org/wiki/Maltfield_Log/2019_Q3#Mon_Sep_10.2C_2019
  5. I activated the plugin, and I restored to the most recent revision that was made by me. And it worked! https://wiki.opensourceecology.org/wiki/Maltfield_Log/2019_Q3#Mon_Sep_10.2C_2019
  6. ...
  7. continuing from yesterday, I need to create a new non-root user (which will have to exist on both staging & production) that I'll both [a] give NOPASSWD sudo access on the staging server only and [b] grant ssh key authorized access to only on the staging server
  8. I named this user stagingsync. On staging, I added an authorized_keys file with the root-owned public key for the 4096-bit passwordless rsa ssh key that I generated on prod yesterday
[root@osestaging1 ~]# adduser stagingsync
[root@osestaging1 ~]# ls -lah /home/stagingsync
total 20K
drwx------.  2 stagingsync stagingsync 4.0K Oct 24 12:12 .
drwxr-xr-x. 14 root        root        4.0K Oct 24 12:12 ..
-rw-r--r--.  1 stagingsync stagingsync   18 Sep  6  2017 .bash_logout
-rw-r--r--.  1 stagingsync stagingsync  193 Sep  6  2017 .bash_profile
-rw-r--r--.  1 stagingsync stagingsync  231 Sep  6  2017 .bashrc
[root@osestaging1 ~]# su - stagingsync
[stagingsync@osestaging1 ~]$ mkdir .ssh
[stagingsync@osestaging1 ~]$ chmod 0700 .ssh
[stagingsync@osestaging1 ~]$ echo "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC4FqRKRYw8qgLqbgfH1Yze+EWQ9wJudNU4+jrHHsatKag3yl90zE557NukZGfIcNP6sFp6+f8VeK0W9g6yhMiAq9wrsS6VrgZw1frjsFaflBaDPwPQb8s5uvj5O6P+9R0jg05t5kiHtkSrgXD7uFXkYbXeUm7xaeQRgOk+0Lt1tnVcT8g+EJDnQ7XlChLd+AXGUCiyRv+kLYCO9014Yd0Q4zlLfpRvHwXgE2gPjJDUqjiVM4SDtCqP1wSSp6JvW+bGAnFKEof/n1MyuYWajicJBijLkooCamI6VY20Qed1mv0V4E/9q2E3eQa/itd/Ai3SiEHxZURl3sVL3MPpKWqX9SG7ygZYIcnfnRah/JRjEkS84drIhdPgvF+W+X8r9i3/jRduP4H5nY9giqQBkchgZ+zixduVsjJk69oaxW3bMsJDH/UfX96gKl4HZaboJecBbKm3ZZi1YKsmAWBl6FdfsLT2FERHxWpb3PUsrfUGza187N9UHnPQESqyhpI0SRd+xMF/nZypDQEv1dSHnl4W/d6iaotZ4/RSMUF+nNHzbL/hjtusnd0f9llaEkc+v0IzRMtL6DB5XMmp9wWVkfE0Mg9qWIaqWgJKu1/wp4GABjpt2T5D2OkksgePWUQgHzXVC7By0I3XoEswFfFV/FTpp4r16lZc36s4dkDGsXT/6Q== root@opensourceecology.org" > .ssh/authorized_keys
[stagingsync@osestaging1 ~]$ chmod 0600 .ssh/authorized_keys 
[stagingsync@osestaging1 ~]$ 

</pre>

  1. then, per requirement, I added the stagingsync user to the sshaccess group

[root@osestaging1 ~]# gpasswd -a stagingsync sshaccess Adding user stagingsync to group sshaccess [root@osestaging1 ~]# </pre>

  1. I don't want stagingsync to have ssh access to prod (which, without a authorized_keys file on prod, it wouldn't be able to ssh in anyway--but it would be wise anyway to leave it out of the sshaccess group on prod), so I'll *not* do this on prod. because I do want to sync the /etc/groups file from prod to staging, I'll add a step in the sync script that appends ',stagingsync' to the 'sshaccess' line in /etc/groups
  2. cool, it works
[root@opensourceecology ~]# ssh -i /root/.ssh/id_rsa.201910 -p 32415 stagingsync@10.241.189.11 hostname
osestaging1
[root@opensourceecology ~]# 
  1. now I added the 'stagingsync' user to have NOPASSWD rights on staging only; note that this will not get overwritten as our rsync command explicitly excludes the sudo config
[root@osestaging1 ~]# tail /etc/sudoers
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom

## Allows members of the users group to shutdown this system
# %users  localhost=/sbin/shutdown -h now

## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)
#includedir /etc/sudoers.d

maltfield       ALL=(ALL)       NOPASSWD: ALL
stagingsync       ALL=(ALL)       NOPASSWD: ALL
[root@osestaging1 ~]# 
  1. I'm having issues with connections to staging suddenly failing from other vpn clients (my laptop and the prod server) after some time, even though my connection appears to remain successful. closing & reconnecting re-enables me to access staging.
  2. I inititaed a new rsync using my new script. here's what it looks like now
############
# SETTINGS #
############

STAGING_HOST=10.241.189.11
STAGING_SSH_PORT=32415
SYNC_USERNAME=stagingsync

#########
# RSYNC #
#########

# bwlimit prevents saturating the network on prod
# rsync-path makes a non-root ssh user become root on the staging side
# exclude /home/b2user just saves space & time
# exclude /home/stagingsync because 'stagingsync' should be able to access
#                           staging but not production
# exclude /etc/sudo* as we want 'stagingsync' NOPASSWD on staging, not root

time nice rsync \
		-e "ssh -p ${STAGING_SSH_PORT} -i /root/.ssh/id_rsa.201910" \
		--bwlimit=3000 \
		--numeric-ids \
		--rsync-path="sudo rsync" \
		--exclude=/root \
		--exclude=/run \
		--exclude=/home/b2user/sync* \
		--exclude=/home/stagingsync* \
		--exclude=/etc/sudo* \
		--exclude=/etc/openvpn \
		--exclude=/usr/share/easy-rsa \
		--exclude=/dev \
		--exclude=/sys \
		--exclude=/proc \
		--exclude=/boot/ \
		--exclude=/etc/sysconfig/network* \
		--exclude=/tmp \
		--exclude=/var/tmp \
		--exclude=/etc/fstab \
		--exclude=/etc/mtab \
		--exclude=/etc/mdadm.conf \
		--exclude=/etc/hostname \
		-av \
		--progress \
		/ ${SYNC_USERNAME}@${STAGING_HOST}:/
  1. it works!
[root@opensourceecology bin]# ./syncToStaging.sh 
...
var/www/html/www.opensourceecology.org/htdocs/wp-content/uploads/2019/10/workshop9sm.jpg
	   97794 100%  113.29kB/s    0:00:00 (xfer#4820, to-check=898/518063)

sent 810748552 bytes  received 2196940 bytes  1910565.20 bytes/sec
total size is 41443449279  speedup is 50.98
+ exit 0
[root@opensourceecology bin]# 
  1. A double-tap fails, probably because the sync updated /etc/group, removing 'stagingsync' from the 'sshaccess' group
[root@opensourceecology bin]# ./syncToStaging.sh 
+ STAGING_HOST=10.241.189.11
+ STAGING_SSH_PORT=32415
+ SYNC_USERNAME=stagingsync
+ nice rsync -e 'ssh -p 32415 -i /root/.ssh/id_rsa.201910' --bwlimit=3000 --numeric-ids '--rsync-path=sudo rsync' --exclude=/root --exclude=/run '--exclude=/home/b2user/sync*' '--exclude=/home/stagingsync*' '--exclude=/etc/sudo*' --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ '--exclude=/etc/sysconfig/network*' --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf --exclude=/etc/hostname -av --progress / stagingsync@10.241.189.11:/
Permission denied (publickey).
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9]

real    0m0.147s
user    0m0.031s
sys     0m0.007s
+ exit 0
[root@opensourceecology bin]# 
  1. I ran this sed command, which I'll add to the script
[root@osestaging1 ~]# grep sshaccess /etc/group
sshaccess:x:1006:cmota,marcin,tgriffing,maltfield,lberezhny,crupp
[root@osestaging1 ~]# sed -i 's/^sshaccess:\(.*\)$/sshaccess:\1,stagingsync/' /etc/group
[root@osestaging1 ~]# grep sshaccess /etc/group
sshaccess:x:1006:cmota,marcin,tgriffing,maltfield,lberezhny,crupp,stagingsync
[root@osestaging1 ~]# 
  1. now the second tap works; it wasn't quite as fast as I'd like; it spent a lot of time on mysql, logs, ossec, munin, etc files that changed from just a few minutes ago
[root@opensourceecology bin]# ./syncToStaging.sh 
...
var/www/html/munin/static/zoom.js
		4760 100%  422.59kB/s    0:00:00 (xfer#773, to-check=1002/322356)

sent 61086821 bytes  received 1431809 bytes  454680.95 bytes/sec
total size is 41445400614  speedup is 662.93

real    2m17.019s
user    0m22.964s
sys     0m8.157s
+ exit 0
[root@opensourceecology bin]# 
  1. I went to add the sed command to be executed after the rsync but--well--that's a new line with a new connection necessarily. And I can't connect after rsync copied-over the /etc/group file. I'm in a catch-22.
  2. my solution: exclude the rsync of /etc/group, and do it manually with sed piping to the file over ssh
sed 's/^sshaccess:\(.*\)$/sshaccess:\1,stagingsync/' /etc/group | ssh -p32415 -i /root/.ssh/id_rsa.201910 stagingsync@10.241.189.11 'sudo tee /etc/group'
  1. I was also able to fix all the nginx configs by adding this to the script
############
# SETTINGS #
############
...
PRODUCTION_IP1=138.201.84.243
PRODUCTION_IP2=138.201.84.223
PRODUCTION_IPv6='2a01:4f8:172:209e::2'
...
#############
# FUNCTIONS #
#############

runOnStaging () {

		ssh -p ${STAGING_SSH_PORT} -i '/root/.ssh/id_rsa.201910' ${SYNC_USERNAME}@${STAGING_HOST} $1

}
...
##################
# NGINX BINDINGS #
##################

# nginx configs must be updated to bind to our staging server's VPN address
# instead of the prod server's internet-facing IP addresses

# update the listen lines to use the VPN IP
runOnStaging "sudo sed -i 's/${PRODUCTION_IP1}/${STAGING_HOST}/g' /etc/nginx/conf.d/*"
runOnStaging "sudo sed -i 's/${PRODUCTION_IP2}/${STAGING_HOST}/g' /etc/nginx/conf.d/*"

# since the main config file has both listens (for redirecting port 80 to port
# 80 to port 443, we just do it once & comment-out the second one to avoid err
runOnStaging "sudo sed -i 's/${PRODUCTION_IP1}/${STAGING_HOST}/g' /etc/nginx/nginx.conf"
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen ${PRODUCTION_IP2}\(.*\)^\1#listen ${PRODUCTION_IP2}\2^' /etc/nginx/nginx.conf"

# just remove all of ipv6 listens
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen \[${PRODUCTION_IPv6}\(.*\)^\1#listen \[${PRODUCTION_IPv6}\2^' /etc/nginx/nginx.conf"
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen \[${PRODUCTION_IPv6}\(.*\)^\1#listen \[${PRODUCTION_IPv6}\2^' /etc/nginx/conf.d/*"

# since we went from 2 prod IPs to 1 staging IP, we must remove one of the
# default_server entries. We choose to make OSE default & remove it from OBI
runOnStaging "sudo sed -i 's^listen \(.*\) default_server^listen \1^' /etc/nginx/conf.d/www.openbuildinginstitute.org.conf"
  1. because the websites necessarily look exactly the same, I decided to add a quick one-liner to add a 'is_staging' file into the docroot of the vhosts with the contents 'true' on the staging box after the sync. On prod, a GET for '/is_staging' should return a 404.
for docroot in $(find /var/www/html/* -maxdepth 1 -name htdocs -type d); do echo 'true' > "$docroot/is_staging"; done
  1. ok, I finished the sync script! I haven't added it to a cron yet (which I would also have to comment-out on the staging box; super meta), but here's what I got so far
[root@opensourceecology bin]# cat syncToStaging.sh
#!/bin/bash
set -x
################################################################################
# Author:  Michael Altfield <michael at opensourceecology dot org>
# Created: 2019-10-23
# Updated: 2019-10-24
# Version: 0.1
# Purpose: Syncs 99% of the prod node state to staging & staging-ifys it
################################################################################

############
# SETTINGS #
############

STAGING_HOST=10.241.189.11
STAGING_SSH_PORT=32415
SYNC_USERNAME=stagingsync

PRODUCTION_IP1=138.201.84.243
PRODUCTION_IP2=138.201.84.223
PRODUCTION_IPv6='2a01:4f8:172:209e::2'

#############
# FUNCTIONS #
#############

runOnStaging () {

		ssh -p ${STAGING_SSH_PORT} -i '/root/.ssh/id_rsa.201910' ${SYNC_USERNAME}@${STAGING_HOST} $1

}

#########
# RSYNC #
#########

# bwlimit prevents saturating the network on prod
# rsync-path makes a non-root ssh user become root on the staging side
# exclude /home/b2user/sync* just saves space & time
# exclude /home/stagingsync* because 'stagingsync' should be able to access
#                            staging but not production
# exclude /etc/group so 'stagingsync' is in the 'sshaccess' group on staging
#                    but not on prod
# exclude /etc/sudo* as we want 'stagingsync' NOPASSWD on staging, not root

time nice rsync \
		-e "ssh -p ${STAGING_SSH_PORT} -i /root/.ssh/id_rsa.201910" \
		--bwlimit=3000 \
		--numeric-ids \
		--rsync-path="sudo rsync" \
		--exclude=/root \
		--exclude=/run \
		--exclude=/home/b2user/sync* \
		--exclude=/home/stagingsync* \
		--exclude=/etc/sudo* \
		--exclude=/etc/group \
		--exclude=/etc/openvpn \
		--exclude=/usr/share/easy-rsa \
		--exclude=/dev \
		--exclude=/sys \
		--exclude=/proc \
		--exclude=/boot/ \
		--exclude=/etc/sysconfig/network* \
		--exclude=/tmp \
		--exclude=/var/tmp \  
		--exclude=/etc/fstab \
		--exclude=/etc/mtab \ 
		--exclude=/etc/mdadm.conf \
		--exclude=/etc/hostname \
		-av \
		--progress \
		/ ${SYNC_USERNAME}@${STAGING_HOST}:/

##################
# NGINX BINDINGS #
##################

# nginx configs must be updated to bind to our staging server's VPN address
# instead of the prod server's internet-facing IP addresses

# update the listen lines to use the VPN IP
runOnStaging "sudo sed -i 's/${PRODUCTION_IP1}/${STAGING_HOST}/g' /etc/nginx/conf.d/*"
runOnStaging "sudo sed -i 's/${PRODUCTION_IP2}/${STAGING_HOST}/g' /etc/nginx/conf.d/*"

# since the main config file has both listens (for redirecting port 80 to port
# 80 to port 443, we just do it once & comment-out the second one to avoid err
runOnStaging "sudo sed -i 's/${PRODUCTION_IP1}/${STAGING_HOST}/g' /etc/nginx/nginx.conf"
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen ${PRODUCTION_IP2}\(.*\)^\1#listen ${PRODUCTION_IP2}\2^' /etc
/nginx/nginx.conf"

# just remove all of ipv6 listens
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen \[${PRODUCTION_IPv6}\(.*\)^\1#listen \[${PRODUCTION_IPv6}\2^
' /etc/nginx/nginx.conf"
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen \[${PRODUCTION_IPv6}\(.*\)^\1#listen \[${PRODUCTION_IPv6}\2^
' /etc/nginx/conf.d/*"

# since we went from 2 prod IPs to 1 staging IP, we must remove one of the
# default_server entries. We choose to make OSE default & remove it from OBI
runOnStaging "sudo sed -i 's^listen \(.*\) default_server^listen \1^' /etc/nginx/conf.d/www.openbuildinginstitute.org.conf"

# finally, restart nginx
runOnStaging "sudo systemctl restart nginx.service"

#########################
# MAKE THE STAGING MARK #
#########################

# we leave a mark so we can test to see if we're looking at staging by doing a
# GET request against '/is_staging'. It should 404 on prod but return 200 on
# staging

runOnStaging 'for docroot in $(sudo find /var/www/html/* -maxdepth 1 -name htdocs -type d); do echo 'true' | sudo tee "$docroot/is_staging"; done'

###################
# OSSEC SILENCING #
###################

# we don't need ossec email alerts from our staging server
runOnStaging "sudo sed -i 's^<email_notification>yes</email_notification>^<email_notification>no</email_notification>^' /var/ossec/etc/ossec.conf"

##################
# CRON DISABLING #
##################

# disable certbot cron
runOnStaging "sudo sed -i 's^\(\s*\)\([^#]\)\(.*\)^\1#\2\3^' /etc/cron.d/letsencrypt"

# disable backups cron
runOnStaging "sudo sed -i 's^\(\s*\)\([^#]\)\(.*\)^\1#\2\3^' /etc/cron.d/backup_to_backblaze"

###############
# USER/GROUPS #
###############

# append ',stagingsync' to the 'sshaccess' line in /etc/groups to permit this
# user to be able to ssh into staging (we don't do this on prod so they can't
# ssh into prod)
sed 's/^sshaccess:\(.*\)$/sshaccess:\1,stagingsync/' /etc/group | ssh -p${STAGING_SSH_PORT} -i /root/.ssh/id_rsa.201910 ${SYNC_USERNAME}@${STAGING_HOST} 'sudo tee /etc/group'

########
# EXIT #
########

# clean exit
exit 0
[root@opensourceecology bin]# 

Mon Oct 23, 2019

  1. I updated the wiki documentation on the development server, added an article on the staging server, and added some bits about the /var network block mount and the vpn config
  2. ...
  3. it does not appear that I can simply add items to the client's /etc/hosts file or otherwise on a per-ip or per-dns basis. It appears that I can only add a "dhcp-option DNS" item to the server (or client) configs to override the dns server used on the client https://openvpn.net/community-resources/pushing-dhcp-options-to-clients/
  4. so then I can run a dns server on osedev1 which has a few entries for each of our websites, point them to the VPN IP of osestaging1 (10.241.189.11), and defers the rest onto 1.1.1.1 or something.
  5. this question suggests using dnsmasq https://askubuntu.com/questions/885497/openvpn-and-dns
  6. cool, dnsmasq-2.76-9 is already installed on our cent7 osedev1 box. Let's take that low-hanging fruit
[root@osedev1 3]# rpm -qa | grep -i dns
dnsmasq-2.76-9.el7.x86_64
[root@osedev1 3]# cat /etc/redhat-release 
CentOS Linux release 7.6.1810 (Core) 
[root@osedev1 3]# 
  1. It also appears to already be running
[root@osedev1 3]# ps -ef | grep dnsmasq
nobody    1346     1  0 Oct22 ?        00:00:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
root      1347  1346  0 Oct22 ?        00:00:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
root     18856 14405  0 13:55 pts/10   00:00:00 grep --color=auto dnsmasq
[root@osedev1 3]# 
  1. oh, shit, it appears to be only listening on 192.168.122.1:53 which is our lxc network
[root@osedev1 etc]# ss -plan | grep -i dnsmasq
u_dgr  UNCONN     0      0         * 18757                 * 8150                users:(("dnsmasq",pid=1346,fd=10))
udp    UNCONN     0      0      192.168.122.1:53                    *:*                   users:(("dnsmasq",pid=1346,fd=5))
udp    UNCONN     0      0      *%virbr0:67                    *:*                   users:(("dnsmasq",pid=1346,fd=3))
tcp    LISTEN     0      5      192.168.122.1:53                    *:*                   users:(("dnsmasq",pid=1346,fd=6))
[root@osedev1 etc]# ss -planu | grep -i dnsmasq
UNCONN     0      0      192.168.122.1:53                       *:*                   users:(("dnsmasq",pid=1346,fd=5))
UNCONN     0      0      *%virbr0:67                       *:*                   users:(("dnsmasq",pid=1346,fd=3))
[root@osedev1 etc]# 
  1. I could find no entries in the dnsmasq.conf file for the bind address
[root@osedev1 etc]# ls -lah /etc/dnsmasq.*
-rw-r--r--. 1 root root  27K Aug  9 01:12 /etc/dnsmasq.conf

/etc/dnsmasq.d:
total 8.0K
drwxr-xr-x.  2 root root 4.0K Aug  9 01:12 .
drwxr-xr-x. 86 root root 4.0K Oct 23 14:20 ..
[root@osedev1 etc]# grep '192.168.122' /etc/dnsmasq.conf 
[root@osedev1 etc]# 
  1. I found two unrelated files that specify this network--unless dnsmasq is somehow configured by libvirt?
[root@osedev1 etc]# grep -irl '192.168.122' /etc
/etc/libvirt/qemu/networks/default.xml
/etc/openvpn/openvpn-status.log
[root@osedev1 etc]# cat /etc/libvirt/qemu/networks/default.xml 
<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
  virsh net-edit default
or other application using the libvirt API.
-->

<network>
  <name>default</name>
  <uuid>a11767e5-cc15-4acd-9443-bbffc220fa4d</uuid>
  <forward mode='nat'/>
  <bridge name='virbr0' stp='on' delay='0'/>
  <mac address='52:54:00:7d:01:71'/>
  <ip address='192.168.122.1' netmask='255.255.255.0'>
	<dhcp>
	  <range start='192.168.122.2' end='192.168.122.254'/>
	</dhcp>
  </ip>
</network>
[root@osedev1 etc]# cat /etc/openvpn/openvpn-status.log 
OpenVPN CLIENT LIST
Updated,Wed Oct 23 14:28:10 2019
Common Name,Real Address,Bytes Received,Bytes Sent,Connected Since
hetzner2,138.201.84.223:34914,8122227,496427,Tue Oct 22 17:48:40 2019
osestaging1,192.168.122.201:51674,2340646,949941,Tue Oct 22 18:07:40 2019
maltfield,27.7.149.58:51080,44891,39735,Wed Oct 23 13:28:30 2019
ROUTING TABLE
Virtual Address,Common Name,Real Address,Last Ref
10.241.189.10,maltfield,27.7.149.58:51080,Wed Oct 23 14:28:09 2019
10.241.189.11,osestaging1,192.168.122.201:51674,Wed Oct 23 13:48:08 2019
GLOBAL STATS
Max bcast/mcast queue length,1
END
[root@osedev1 etc]# 
  1. I think it is libvirt; this libvirt guide describes how to avoid conflicts when trying to use a distinct "global" dnsmasq config https://wiki.libvirt.org/page/Libvirtd_and_dnsmasq
  2. I made a backup of the existing /etc/dnsmasq.conf file and added the lines to bind dnsmasq only on tun0 to the config
[root@osedev1 etc]# cp dnsmasq.conf dnsmasq.20191023.orig.conf
[root@osedev1 etc]# vim dnsmasq.conf 
...
[root@osedev1 etc]# tail /etc/dnsmasq.conf 
#conf-dir=/etc/dnsmasq.d,.bak

# Include all files in a directory which end in .conf
#conf-dir=/etc/dnsmasq.d/,*.conf

# Include all files in /etc/dnsmasq.d except RPM backup files
conf-dir=/etc/dnsmasq.d,.rpmnew,.rpmsave,.rpmorig

interface=tun0
bind-interfaces
[root@osedev1 etc]# 
  1. And then I verified that the dnsmasq.service is disabled
[root@osedev1 etc]# systemctl list-units | grep -i dns
  unbound-anchor.timer                                                                        loaded active waiting   daily update of the root trust anchor for DNSSEC
[root@osedev1 etc]# systemctl list-unit-files | grep -i dns
chrony-dnssrv@.service                        static  
dnsmasq.service                               disabled
chrony-dnssrv@.timer                          disabled
[root@osedev1 etc]# 
  1. I started it
[root@osedev1 etc]# systemctl status dnsmasq.service
● dnsmasq.service - DNS caching server.
   Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
[root@osedev1 etc]# systemctl start dnsmasq.service
[root@osedev1 etc]# systemctl status dnsmasq.service
● dnsmasq.service - DNS caching server.
   Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2019-10-23 14:40:58 CEST; 2s ago
 Main PID: 29666 (dnsmasq)
	Tasks: 1
   CGroup: /system.slice/dnsmasq.service
		   └─29666 /usr/sbin/dnsmasq -k

Oct 23 14:40:58 osedev1 systemd[1]: Started DNS caching server..
Oct 23 14:40:58 osedev1 dnsmasq[29666]: started, version 2.76 cachesize 150
Oct 23 14:40:58 osedev1 dnsmasq[29666]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-...inotify
Oct 23 14:40:58 osedev1 dnsmasq[29666]: reading /etc/resolv.conf
Oct 23 14:40:58 osedev1 dnsmasq[29666]: using nameserver 213.133.100.100#53
Oct 23 14:40:58 osedev1 dnsmasq[29666]: using nameserver 213.133.99.99#53
Oct 23 14:40:58 osedev1 dnsmasq[29666]: using nameserver 213.133.98.98#53
Oct 23 14:40:58 osedev1 dnsmasq[29666]: read /etc/hosts - 6 addresses
Hint: Some lines were ellipsized, use -l to show in full.
[root@osedev1 etc]# 
  1. cool, now it looks like it's running on both the 192.168.122 virbr0 lxc network and the 10.241.189 tun0 vpn network
[root@osedev1 etc]# ss -plan | grep -i dnsmasq
u_dgr  UNCONN     0      0         * 20573881              * 8150                users:(("dnsmasq",pid=29666,fd=15))
u_dgr  UNCONN     0      0         * 18757                 * 8150                users:(("dnsmasq",pid=1346,fd=10))
udp    UNCONN     0      0      127.0.0.1:53                    *:*                   users:(("dnsmasq",pid=29666,fd=6))
udp    UNCONN     0      0      10.241.189.1:53                    *:*                   users:(("dnsmasq",pid=29666,fd=4))
udp    UNCONN     0      0      192.168.122.1:53                    *:*                   users:(("dnsmasq",pid=1346,fd=5))
udp    UNCONN     0      0      *%virbr0:67                    *:*                   users:(("dnsmasq",pid=1346,fd=3))
udp    UNCONN     0      0       ::1:53                   :::*                   users:(("dnsmasq",pid=29666,fd=10))
udp    UNCONN     0      0      fe80::fd4a:7df9:169:e7e2%tun0:53                   :::*                   users:(("dnsmasq",pid=29666,fd=8))
tcp    LISTEN     0      5      127.0.0.1:53                    *:*                   users:(("dnsmasq",pid=29666,fd=7))
tcp    LISTEN     0      5      10.241.189.1:53                    *:*                   users:(("dnsmasq",pid=29666,fd=5))
tcp    LISTEN     0      5      192.168.122.1:53                    *:*                   users:(("dnsmasq",pid=1346,fd=6))
tcp    LISTEN     0      5       ::1:53                   :::*                   users:(("dnsmasq",pid=29666,fd=11))
tcp    LISTEN     0      5      fe80::fd4a:7df9:169:e7e2%tun0:53                   :::*                   users:(("dnsmasq",pid=29666,fd=9))
[root@osedev1 etc]# ss -planu | grep -i dnsmasq
UNCONN     0      0      127.0.0.1:53                       *:*                   users:(("dnsmasq",pid=29666,fd=6))
UNCONN     0      0      10.241.189.1:53                       *:*                   users:(("dnsmasq",pid=29666,fd=4))
UNCONN     0      0      192.168.122.1:53                       *:*                   users:(("dnsmasq",pid=1346,fd=5))
UNCONN     0      0      *%virbr0:67                       *:*                   users:(("dnsmasq",pid=1346,fd=3))
UNCONN     0      0          ::1:53                      :::*                   users:(("dnsmasq",pid=29666,fd=10))
UNCONN     0      0      fe80::fd4a:7df9:169:e7e2%tun0:53                      :::*                   users:(("dnsmasq",pid=29666,fd=8))
[root@osedev1 etc]# 
  1. cool, from my laptop the 53 udp port on osedev1's vpn address appears to be open. or, uh, filtered?
user@ose:~/openvpn$ sudo nmap -Pn -sU -p53 10.137.0.1

Starting Nmap 7.40 ( https://nmap.org ) at 2019-10-23 18:31 +0545
Nmap scan report for 10.137.0.1 (10.137.0.1)
Host is up.
PORT   STATE         SERVICE
53/udp open|filtered domain

Nmap done: 1 IP address (1 host up) scanned in 2.12 seconds
user@ose:~/openvpn$ 
  1. nope, fail.
user@ose:~/openvpn$ dig @10.137.0.1 google.com

; <<>> DiG 9.10.3-P4-Debian <<>> @10.137.0.1 google.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
user@ose:~/openvpn$ 
  1. Indeed, all ports are reported as "filtered"
user@ose:~/openvpn$ nmap -Pn 10.241.189.1

Starting Nmap 7.40 ( https://nmap.org ) at 2019-10-23 18:29 +0545
Nmap scan report for 10.241.189.1 (10.241.189.1)
Host is up.
All 1000 scanned ports on 10.241.189.1 (10.241.189.1) are filtered

Nmap done: 1 IP address (1 host up) scanned in 201.47 seconds
user@ose:~/openvpn$ 
  1. I bet this is an iptables issues. And, christ, the iptables looks more complex than anything I built; I guess this is libvirt's doing?
[root@osedev1 etc]# iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 2929  205K ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            udp dpt:53
	0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:53
   54 17712 ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            udp dpt:67
	0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:67
  101 15196 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
 107K   15M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED
   17   706 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0           
	4   628 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:32415
	9   804 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW udp dpt:1194
11218  621K DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy ACCEPT 320 packets, 26880 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 7135   30M ACCEPT     all  --  *      virbr0  0.0.0.0/0            192.168.122.0/24     ctstate RELATED,ESTABLISHED
 7756  935K ACCEPT     all  --  virbr0 *       192.168.122.0/24     0.0.0.0/0           
	0     0 ACCEPT     all  --  virbr0 virbr0  0.0.0.0/0            0.0.0.0/0           
	0     0 REJECT     all  --  *      virbr0  0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable
	0     0 REJECT     all  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable

Chain OUTPUT (policy ACCEPT 38266 packets, 6557K bytes)
 pkts bytes target     prot opt in     out     source               destination         
   54 18295 ACCEPT     udp  --  *      virbr0  0.0.0.0/0            0.0.0.0/0            udp dpt:68
[root@osedev1 etc]# 
[root@osedev1 etc]# iptables-save
# Generated by iptables-save v1.4.21 on Wed Oct 23 14:50:44 2019
*mangle
:PREROUTING ACCEPT [136754:47136918]
:INPUT ACCEPT [121507:15710076]
:FORWARD ACCEPT [15247:31426842]
:OUTPUT ACCEPT [38360:6581630]
:POSTROUTING ACCEPT [53607:38008472]
-A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
COMMIT
# Completed on Wed Oct 23 14:50:44 2019
# Generated by iptables-save v1.4.21 on Wed Oct 23 14:50:44 2019
*nat
:PREROUTING ACCEPT [13853:810289]
:INPUT ACCEPT [1821:140336]
:OUTPUT ACCEPT [2275:162484]
:POSTROUTING ACCEPT [2276:162568]
-A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
COMMIT
# Completed on Wed Oct 23 14:50:44 2019
# Generated by iptables-save v1.4.21 on Wed Oct 23 14:50:44 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [320:26880]
:OUTPUT ACCEPT [38306:6563335]
-A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 1194 -j ACCEPT
-A INPUT -j DROP
-A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT
-A FORWARD -i virbr0 -o virbr0 -j ACCEPT
-A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable
-A OUTPUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT
COMMIT
# Completed on Wed Oct 23 14:50:44 2019
[root@osedev1 etc]# 
  1. I just added a single line before the drop to permit udp packets to 53 from tun0
[root@osedev1 20191023]# service iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables:[  OK  ]
[root@osedev1 20191023]# iptables-save
# Generated by iptables-save v1.4.21 on Wed Oct 23 15:23:13 2019
*mangle
:PREROUTING ACCEPT [279:24925]
:INPUT ACCEPT [279:24925]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [106:13445]
:POSTROUTING ACCEPT [106:13445]
-A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
COMMIT
# Completed on Wed Oct 23 15:23:13 2019
# Generated by iptables-save v1.4.21 on Wed Oct 23 15:23:13 2019
*nat
:PREROUTING ACCEPT [30:1478]
:INPUT ACCEPT [3:218]
:OUTPUT ACCEPT [4:304]
:POSTROUTING ACCEPT [4:304]
-A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
COMMIT
# Completed on Wed Oct 23 15:23:13 2019
# Generated by iptables-save v1.4.21 on Wed Oct 23 15:23:13 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [106:13445]
-A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 1194 -j ACCEPT
-A INPUT -i tun0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -j DROP
-A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT
-A FORWARD -i virbr0 -o virbr0 -j ACCEPT
-A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable
-A OUTPUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT
COMMIT
# Completed on Wed Oct 23 15:23:13 2019
[root@osedev1 20191023]# 
  1. And now it works!
user@ose:~/openvpn$ dig @10.241.189.1 michaelaltfield.net

; <<>> DiG 9.10.3-P4-Debian <<>> @10.241.189.1 michaelaltfield.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34648
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;michaelaltfield.net.		IN	A

;; ANSWER SECTION:
michaelaltfield.net.	3554	IN	A	176.56.237.113

;; Query time: 148 msec
;; SERVER: 10.241.189.1#53(10.241.189.1)
;; WHEN: Wed Oct 23 19:07:28 +0545 2019
;; MSG SIZE  rcvd: 64

user@ose:~/openvpn$ 
  1. now let's see i I can hardcode www.opensourceecology.org. By default, it returns the internet ip address of our prod server per our public dns records
user@ose:~/openvpn$ dig @10.241.189.1 www.opensourceecology.org

; <<>> DiG 9.10.3-P4-Debian <<>> @10.241.189.1 www.opensourceecology.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40391
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.opensourceecology.org.	IN	A

;; ANSWER SECTION:
www.opensourceecology.org. 120	IN	A	138.201.84.243

;; Query time: 214 msec
;; SERVER: 10.241.189.1#53(10.241.189.1)
;; WHEN: Wed Oct 23 19:09:07 +0545 2019
;; MSG SIZE  rcvd: 70

user@ose:~/openvpn$ 
  1. The dnsmasq.conf config says that it reads from /etc/hosts, so I just added a line to osedev1:/etc/hosts
[root@osedev1 20191023]# tail /etc/hosts
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4

# The following lines are desirable for IPv6 capable hosts
::1 osedev1 osedev1
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

# staging
10.241.189.11 www.opensourceecology.org
[root@osedev1 20191023]# 
  1. I tried the query again, but I still got the 138.201.84.243 address
user@ose:~/openvpn$ dig @10.241.189.1 www.opensourceecology.org

; <<>> DiG 9.10.3-P4-Debian <<>> @10.241.189.1 www.opensourceecology.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62221
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.opensourceecology.org.	IN	A

;; ANSWER SECTION:
www.opensourceecology.org. 87	IN	A	138.201.84.243

;; Query time: 158 msec
;; SERVER: 10.241.189.1#53(10.241.189.1)
;; WHEN: Wed Oct 23 19:11:46 +0545 2019
;; MSG SIZE  rcvd: 70

user@ose:~/openvpn$ 
  1. I gave dnsmasq a restart (maybe caching issue?)
[root@osedev1 20191023]# service dnsmasq restart
Redirecting to /bin/systemctl restart dnsmasq.servic
[root@osedev1 20191023]#
  1. And I tried again; It worked this time!
user@ose:~/openvpn$ dig @10.241.189.1 www.opensourceecology.org

; <<>> DiG 9.10.3-P4-Debian <<>> @10.241.189.1 www.opensourceecology.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34890
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.opensourceecology.org.	IN	A

;; ANSWER SECTION:
www.opensourceecology.org. 0	IN	A	10.241.189.11

;; Query time: 146 msec
;; SERVER: 10.241.189.1#53(10.241.189.1)
;; WHEN: Wed Oct 23 19:11:56 +0545 2019
;; MSG SIZE  rcvd: 70

user@ose:~/openvpn$ 
  1. cool, so now to push that option in the vpn; I added the "push dhcp-option" line to /etc/openvpn/server/server.conf
push "dhcp-option DNS 10.241.189.1"
  1. I reconnected to the vpn from my laptop, but there were no changes to my /etc/resolv.conf. I tried to restart the openvpn server on osedev1
[root@osedev1 server]# systemctl restart openvpn@server.service
[root@osedev1 server]# 
  1. I still have no changes on my resolv.conf, but I do see the option in the output of the client
Wed Oct 23 19:20:42 2019 PUSH: Received control message: 'PUSH_REPLY,dhcp-option DNS 10.241.189.1,route 10.241.189.0 255.255.255.0,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 2,cipher AES-256-GCM'
  1. ah, fuck, apparently the linux client of openvpn doesn't support the dhcp-option push https://unix.stackexchange.com/questions/201946/how-to-define-dns-server-in-openvpn
  2. this archlinux wiki has a solution for linux, but the location of the scripts pull-resolv-conf are in a distinct location on centos https://wiki.archlinux.org/index.php/OpenVPN#DNS
[root@osedev1 server]# find / | grep -i pull-resolv-conf
/usr/share/doc/openvpn-2.4.7/contrib/pull-resolv-conf
/usr/share/doc/openvpn-2.4.7/contrib/pull-resolv-conf/client.down
/usr/share/doc/openvpn-2.4.7/contrib/pull-resolv-conf/client.up
...
  1. these scripts actually have to live client-side, though. My client is debian-9. It doesn't have the 'pull-resolve-conf' scripts on it. But it does have 'update-resolv-conf' and 'systemd-resolved'. The latter isn't openvpn-specific, however. I think I should use '/etc/openvpn/update-resolv-conf'
root@ose:~# find / | grep -i pull-resolv-conf
root@ose:~# find / | grep -i resolv-conf
/etc/openvpn/update-resolv-conf
root@ose:~# find / | grep -i systemd-resolved
/usr/share/man/man8/systemd-resolved.service.8.gz
/usr/share/man/man8/systemd-resolved.8.gz
/lib/systemd/system/systemd-resolved.service.d
/lib/systemd/system/systemd-resolved.service.d/resolvconf.conf
/lib/systemd/system/systemd-resolved.service
/lib/systemd/systemd-resolved
root@ose:~# cat /etc/issue
Debian GNU/Linux 9 \n \l

root@ose:~# ls -lah /etc/openvpn/update-resolv-conf 
-rwxr-xr-x 1 root root 1.3K Oct 15  2018 /etc/openvpn/update-resolv-conf
root@ose:~# 
  1. I added the needful to my client.conf file, but it didn't do anything when I reconnected to the vpn
root@ose:/home/user/openvpn# tail client.conf
# Silence repeating messages
;mute 20

# hardening
tls-cipher TLS-DHE-RSA-WITH-AES-256-GCM-SHA384

# dns for staging
script-security 2
up /etc/openvpn/update-resolv-conf
down /etc/openvpn/update-resolv-conf
root@ose:/home/user/openvpn# 
  1. well, the first non-commented line in that script is to check for the existance of /sbin/resolvconf and `exit 0` if it doesn't exist. Yeah, it doesn't exist.
root@ose:/home/user/openvpn# grep resolvconf /etc/openvpn/update-resolv-conf 
# Used snippets of resolvconf script by Thomas Hood and Chris Hanson.
[ -x /sbin/resolvconf ] || exit 0
	echo -n "$R" | /sbin/resolvconf -a "${dev}.openvpn"
	/sbin/resolvconf -d "${dev}.openvpn"
root@ose:/home/user/openvpn# ls -lah /sbin/resolvconf
ls: cannot access '/sbin/resolvconf': No such file or directory
root@ose:/home/user/openvpn# 
  1. per the archlinux guide linked above, I installed the 'openresolv' package from apt-get. This time it worked!
user@ose:~/openvpn$ sudo openvpn client.conf
...
Wed Oct 23 19:43:00 2019 PUSH: Received control message: 'PUSH_REPLY,dhcp-option DNS 10.241.189.1,route 10.241.189.0 255.255.255.0,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 2,cipher AES-256-GCM'
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: timers and/or timeouts modified
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: --ifconfig/up options modified
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: route options modified
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: --ip-win32 and/or --dhcp-option options modified
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: peer-id set
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: data channel crypto options modified
Wed Oct 23 19:43:00 2019 Data Channel Encrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Wed Oct 23 19:43:00 2019 Data Channel Decrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Wed Oct 23 19:43:00 2019 ROUTE_GATEWAY 10.137.0.6
Wed Oct 23 19:43:00 2019 TUN/TAP device tun0 opened
Wed Oct 23 19:43:00 2019 TUN/TAP TX queue length set to 100
Wed Oct 23 19:43:00 2019 do_ifconfig, tt->did_ifconfig_ipv6_setup=0
Wed Oct 23 19:43:00 2019 /sbin/ip link set dev tun0 up mtu 1500
Wed Oct 23 19:43:00 2019 /sbin/ip addr add dev tun0 local 10.241.189.10 peer 10.241.189.9
Wed Oct 23 19:43:00 2019 /etc/openvpn/update-resolv-conf tun0 1500 1552 10.241.189.10 10.241.189.9 init
dhcp-option DNS 10.241.189.1
Too few arguments.
Wed Oct 23 19:43:00 2019 /sbin/ip route add 10.241.189.0/24 via 10.241.189.9
Wed Oct 23 19:43:00 2019 Initialization Sequence Completed
  1. And my laptop's new resolv.conf file
user@ose:~/openvpn$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 10.241.189.1
user@ose:~/openvpn$ 
  1. I refreshed the 'www.opensourceecology.org' page on my browser, and--boom--it's now showing staging! Success!!1one
  2. now, I finished adding the other hostnames to osedev1:/etc/hosts. Unfortunately, this will have to be updated as-needed in the future
[root@osedev1 pull-resolv-conf]# tail /etc/hosts
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4

# The following lines are desirable for IPv6 capable hosts
::1 osedev1 osedev1
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

# staging
10.241.189.11 www.opensourceecology.org opensourceecology.org awstats.opensourceecology.org fef.opensourceecology.org forum.opensourceecology.org microfactory.opensourceecology.org munin.opensourceecology.org opensourceecology.org oswh.opensourceecology.org phplist.opensourceecology.org store.opensourceecology.org wiki.opensourceecology.org awstats.openbuildinginstitute.org openbuildinginstitute.org seedhome.openbuildinginstitute.org www.openbuildinginstitute.org
[root@osedev1 pull-resolv-conf]# 
  1. I restarted dnsmasq and attempted to test www.openbuildinginstitute.org. Well, it kinda worked. It pointed to the staging server--which has an expired certificate. This means that I need to do another sync & automate this nginx config sed process. But it also means that I need to somehow kill the certbot cron on staging
  2. ...
  3. meanwhile, I logged-into backblaze b2 to check the status of our backups of the dev node
  4. first of all the prod 'ose-sever-backups bucket has 19 files totaling to 300G. One file appears to be uploding at the moment. There's two from 2018-11 & 2018-12 at <20M, but the others vary in size from 17.5G - 18.4G.
  5. as for the new dev-specific 'ose-dev-server-backups' bucket, there's 0 fucking files
  6. I kicked-off a backup; it completed relatively fast. There were no obvious errors during the upload, but the file is not visible on the wui
INFO: moving encrypted backup file to b2user's sync dir
INFO: Beginning upload to backblaze b2
URL by file name: https://f001.backblazeb2.com/file/ose-dev-server-backups/daily_osedev1_20191023_144309.tar.gpg
URL by fileId: https://f001.backblazeb2.com/b2api/v2/b2_download_file_by_id?fileId=4_z2675c17c55dd1d696edd0118_f10281b8779570cee_d20191023_m144325_c001_v0001130_t0041
{
  "action": "upload", 
  "fileId": "4_z2675c17c55dd1d696edd0118_f10281b8779570cee_d20191023_m144325_c001_v0001130_t0041", 
  "fileName": "daily_osedev1_20191023_144309.tar.gpg", 
  "size": 18465051, 
  "uploadTimestamp": 1571841805000
}

real    0m27.979s
user    0m1.037s
sys     0m0.321s
[root@osedev1 backups]# 
[root@osedev1 backups]# ./backup.sh
  1. the last upload appears to be from 20 days ago
[root@osedev1 backups]# ls -lah /home/b2user/sync
total 18M
drwxr-xr-x. 2 root   root   4.0K Oct 23 16:43 .
drwx------. 8 b2user b2user 4.0K Oct 23 16:43 ..
-rw-r--r--. 1 b2user root    18M Oct 23 16:43 daily_osedev1_20191023_144309.tar.gpg
[root@osedev1 backups]# ls -lah /home/b2user/sync.old
total 17M
drwxr-xr-x. 2 root   root   4.0K Oct  3 07:24 .
drwx------. 8 b2user b2user 4.0K Oct 23 16:43 ..
-rw-r--r--. 1 b2user root    17M Oct  3 07:24 daily_osedev1_20191003_052448.tar.gpg
[root@osedev1 backups]# 
  1. the cron job looks good
[root@osedev1 backups]# cat /etc/cron.d/backup_to_backblaze
20 07 * * * root time /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log
20 04 03 * * root time /bin/nice /root/backups/backupReport.sh
[root@osedev1 backups]# 
  1. but the logging dir doesn't exist; I created it
[root@osedev1 backups]# ls -lah /var/log/backups
ls: cannot access /var/log/backups: No such file or directory
[root@osedev1 backups]# mkdir /var/log/backups
[root@osedev1 backups]# 
  1. actually, after some time, the b2 wui now shows the files I just uploaded; totalling to 36.9M. Wasn't the dev server in a broken state recently? That's probably what happened..
  2. well, I'll follow-up in a few days. Hopefully it'll be stable for ~10 days through the monthly backup on 2019-11-01, which will have a 1-year retention time.
  3. ..
  4. ok, back to the sync. First, I fixed the hostname of the staging node so I don't do the sync the wrong way (!)
[root@opensourceecology ~]# vim /etc/hostname
[root@opensourceecology ~]# cat /etc/hostname
osestaging1
[root@opensourceecology ~]# 
[root@opensourceecology ~]# hostname osestaging1
[root@opensourceecology ~]# exit
logout
[maltfield@osestaging1 ~]$ 
  1. oh, shit, weird. I went to ssh into the prod server using `ssh opensourceecology.org`, but it ssh'd into staging because of the new dns changes. I fixed this by updating my .ssh/config file for the 'oseprod' Host line
user@ose:~$ head .ssh/config
# OSE
Host oseprod
	HostName 138.201.84.243
	Port 32415
	ForwardAgent yes
	IdentityFile /home/user/.ssh/id_rsa.ose
	User maltfield

Host osedev1
	HostName 195.201.233.113
user@ose:~$ 
user@ose:~$ ssh oseprod
Last login: Wed Oct 23 15:01:19 2019 from 116.75.124.97
[maltfield@opensourceecology ~]$ 
  1. so I think I should put this sync & sed process into a script that lives on prod. This was the last command I see executed in screen on prod
[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --numeric-ids --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync* --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
  1. In order to automate this, I'll also need to give root an ssh key that lives on prod and has the ability to ssh into the staging node as some sync user which has NOPASSWD sudo rights. Of course, I do *not* want *any* such config that permits someone to do such a thing to our prod node but to grant prod access to staging in this way seems fair enough. If someone gains this locked-down key file from the prod server, we have bigger problems..
  2. I created a new script for this & locked it down
[root@opensourceecology bin]# date
Wed Oct 23 15:10:06 UTC 2019
[root@opensourceecology bin]# pwd
/root/bin
[root@opensourceecology bin]# ls -lah syncToStaging.sh 
-rwx------ 1 root root 469 Oct 23 15:09 syncToStaging.sh
[root@opensourceecology bin]# cat syncToStaging.sh 
#!/bin/bash
set -x
################################################################################
# Author:  Michael Altfield <michael at opensourceecology dot org>
# Created: 2019-10-23
# Updated: 2019-10-23
# Version: 0.1
# Purpose: Syncs 99% of the prod node state to staging & staging-ifys it
################################################################################

############
# SETTINGS #
############

########
# EXIT #
########

# clean exit
exit 0
[root@opensourceecology bin]# 
  1. There is an existing rsa key for the root user on our prod server, but it's only 2048-bits. I think this was used to auth to our dreamhost server for scp-ing backups back in the day. In any case, it's too small; I generated a new one. Note that this key should only be used for ssh-ing into the staging server as a non-root (on the staging server). It should *not* be used to ssh into the prod server. And, of course, we should *never* allow root to ssh into any server anywhere. Oh, and, the staging server is also not exposed on the Internet; it's only accessible behind the VPN..
[root@opensourceecology bin]# ssh-keygen -lf /root/.ssh/id_rsa.pub 
2048 SHA256:/LpjdDSJFVAt0a4d2PM3fWu7ci3VVwqQT0UxobZel2s root@CentOS-72-64-minimal (RSA)
[root@opensourceecology bin]# ssh-keygen -t rsa -b 4096 -o -a 100
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): /root/.ssh/id_rsa.201910
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.201910.
Your public key has been saved in /root/.ssh/id_rsa.201910.pub.
...
[root@opensourceecology bin]# ls -lah /root/.ssh/id_rsa.201910*
-rw------- 1 root root 3.4K Oct 23 15:27 /root/.ssh/id_rsa.201910
-rw-r--r-- 1 root root  752 Oct 23 15:27 /root/.ssh/id_rsa.201910.pub
[root@opensourceecology bin]# 
  1. now I need a non-root user (which will have to exist on both staging & production) that I'll both [a] give NOPASSWD sudo access on the staging server only and [b] grant ssh key authorized access to only on the staging server

Mon Oct 21, 2019

  1. earlier this month a critical vulnerability was fixed in sudo 1.8.28 https://www.sudo.ws/alerts/minus_1_uid.html
  2. I configured this server to auto-update security-related updates, but I didn't see any changes to `sudo` since I've been away. I *did* see updates to nginx, but why didn't sudo update. Indeed, it's stuck at 1.8.19p2-11
[root@opensourceecology ~]# rpm -qa | grep -i sudo
sudo-1.8.19p2-11.el7_4.x86_64
[root@opensourceecology ~]# 
  1. fortunately the issue is an edge-case that doesn't affect us, specifically when the sudo config is setup to allow a defined user to run a defined command as any user except root https://access.redhat.com/security/cve/cve-2019-14287
  2. the fucking redhat solution is to fix your config; not to update sudo. A check-update run shows there *is* a newer version of sudo available
	   upgrade
[root@opensourceecology ~]# yum check-update sudo
Loaded plugins: fastestmirror, replace
Loading mirror speeds from cached hostfile
 * base: mirror.checkdomain.de
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.checkdomain.de
 * webtatic: uk.repo.webtatic.com

sudo.x86_64                          1.8.23-4.el7                           base
[root@opensourceecology ~]# 
  1. it looks like the '--changelog' arg to `rpm` only shows changes for what's installed, not prospective updates. So I updated
[root@opensourceecology ~]# yum install sudo
Loaded plugins: fastestmirror, replace
Loading mirror speeds from cached hostfile
 * base: mirror.checkdomain.de
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.checkdomain.de
 * webtatic: uk.repo.webtatic.com
Resolving Dependencies
--> Running transaction check
---> Package sudo.x86_64 0:1.8.19p2-11.el7_4 will be updated
---> Package sudo.x86_64 0:1.8.23-4.el7 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package        Arch             Version                   Repository      Size
================================================================================
Updating:
 sudo           x86_64           1.8.23-4.el7              base           841 k

Transaction Summary
================================================================================
Upgrade  1 Package

Total download size: 841 k
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
sudo-1.8.23-4.el7.x86_64.rpm                               | 841 kB   00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : sudo-1.8.23-4.el7.x86_64                                     1/2
warning: /etc/sudoers created as /etc/sudoers.rpmnew
  Cleanup    : sudo-1.8.19p2-11.el7_4.x86_64                                2/2
  Verifying  : sudo-1.8.23-4.el7.x86_64                                     1/2
  Verifying  : sudo-1.8.19p2-11.el7_4.x86_64                                2/2

Updated:
  sudo.x86_64 0:1.8.23-4.el7

Complete!
[root@opensourceecology ~]# 
  1. apparently the update doesn't patch his bug. ugh, I'm loosing faith in cent/rhel over debian..
[root@opensourceecology ~]# rpm -q --changelog sudo | head
* Wed Feb 20 2019 Radovan Sroka <rsroka@redhat.com> 1.8.23-4
- RHEL-7.7 erratum
  Resolves: rhbz#1672876 - Backporting sudo bug with expired passwords
  Resolves: rhbz#1665285 - Problem with sudo-1.8.23 and 'who am i'

* Mon Sep 24 2018 Daniel Kopecek <dkopecek@redhat.com> 1.8.23-3
- RHEL-7.6 erratum
  Resolves: rhbz#1547974 - Rebase sudo to latest stable upstream version

* Fri Sep 21 2018 Daniel Kopecek <dkopecek@redhat.com> 1.8.23-2
[root@opensourceecology ~]# 
  1. well, that's all I can do for now on sudo
  2. regarding the package that *did* update, I got an email from ossec on changed packages two days ago on Oct 20th, and the checksums changing to the binaries
OSSEC HIDS Notification.
2019 Oct 20 04:39:44

Received From: opensourceecology->/var/log/messages
Rule: 2932 fired (level 7) -> "New Yum package installed."
Portion of the log(s):

Oct 20 04:39:42 opensourceecology yum[29637]: Installed: nginx.x86_64 1:1.16.1-1.el7
  1. the changelog shows a sec update from 2 months ago. why so delayed?
[root@opensourceecology ~]# rpm -q --changelog nginx | head
* Sun Sep 15 2019 Warren Togami <warren@blockstream.com>
- add conditionals for EPEL7, see rhbz#1750857

* Tue Aug 13 2019 Jamie Nguyen <jamielinux@fedoraproject.org> - 1:1.16.1-1
- Update to upstream release 1.16.1
- Fixes CVE-2019-9511, CVE-2019-9513, CVE-2019-9516

* Thu Jul 25 2019 Fedora Release Engineering <releng@fedoraproject.org> - 1:1.16.0-5
- Rebuilt for https://fedoraproject.org/wiki/Fedora_31_Mass_Rebuild

[root@opensourceecology ~]# 
  1. the yum-cron package is responsible for updating security packages; it's kicked-off daily
[root@opensourceecology log]# ls -lah /etc/cron.daily/0yum-daily.cron 
-rwxr-xr-x 1 root root 332 Aug  5  2017 /etc/cron.daily/0yum-daily.cron
[root@opensourceecology log]# cat /etc/cron.daily/0yum-daily.cron 
#!/bin/bash

# Only run if this flag is set. The flag is created by the yum-cron init
# script when the service is started -- this allows one to use chkconfig and
# the standard "service stop|start" commands to enable or disable yum-cron.
if  ! -f /var/lock/subsys/yum-cron ; then
  exit 0
fi

# Action!
exec /usr/sbin/yum-cron
[root@opensourceecology log]# 
  1. the logs show that it was only updated in Oct 20
[root@opensourceecology log]# grep -ir nginx yum.log
May 26 06:30:47 Updated: nginx-filesystem.noarch 1:1.12.2-3.el7
May 26 06:30:47 Updated: nginx-mod-http-perl.x86_64 1:1.12.2-3.el7
May 26 06:30:47 Updated: nginx-mod-mail.x86_64 1:1.12.2-3.el7
May 26 06:30:47 Updated: nginx-mod-stream.x86_64 1:1.12.2-3.el7
May 26 06:30:47 Updated: nginx-mod-http-image-filter.x86_64 1:1.12.2-3.el7
May 26 06:30:48 Updated: nginx-mod-http-xslt-filter.x86_64 1:1.12.2-3.el7
May 26 06:30:48 Updated: nginx-mod-http-geoip.x86_64 1:1.12.2-3.el7
May 26 06:30:48 Updated: nginx-all-modules.noarch 1:1.12.2-3.el7
May 26 06:30:48 Updated: nginx.x86_64 1:1.12.2-3.el7
Oct 20 04:39:42 Updated: nginx-filesystem.noarch 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-mod-mail.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-mod-http-image-filter.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-mod-stream.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-mod-http-xslt-filter.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Installed: nginx.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-mod-http-perl.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-all-modules.noarch 1:1.16.1-1.el7
Oct 20 04:39:42 Erased: nginx-mod-http-geoip
[root@opensourceecology log]# 
  1. and the yum-cron config looks sane
[root@opensourceecology log]# head /etc/yum/yum-cron.conf 
[commands]
#  What kind of update to use:
# default                            = yum upgrade
# security                           = yum --security upgrade
# security-severity:Critical         = yum --sec-severity=Critical upgrade
# minimal                            = yum --bugfix update-minimal
# minimal-security                   = yum --security update-minimal
# minimal-security-severity:Critical =  --sec-severity=Critical update-minimal
update_cmd = minimal-security

[root@opensourceecology log]# 
  1. I still don't understand why it was delayed, but everything seems to be setup properly..
  2. ...
  3. anyway, returning to the dev/staging server setup; it looks like I can't VPN into our dev server anymore
user@ose:~/openvpn$ Tue Oct 22 21:24:32 2019 OpenVPN 2.4.0 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Oct 14 2018
Tue Oct 22 21:24:32 2019 library versions: OpenSSL 1.0.2t  10 Sep 2019, LZO 2.08
Enter Private Key Password: *
Tue Oct 22 21:24:35 2019 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Tue Oct 22 21:24:35 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Tue Oct 22 21:24:35 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Tue Oct 22 21:24:35 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Tue Oct 22 21:24:35 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Tue Oct 22 21:24:35 2019 UDP link local: (not bound)
Tue Oct 22 21:24:35 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Tue Oct 22 21:25:35 2019 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Tue Oct 22 21:25:35 2019 TLS Error: TLS handshake failed
Tue Oct 22 21:25:35 2019 SIGUSR1[soft,tls-error] received, process restarting
Tue Oct 22 21:25:35 2019 Restart pause, 5 second(s)
  1. And I can't ping the server either
user@ose:~$ ping 195.201.233.113
PING 195.201.233.113 (195.201.233.113) 56(84) bytes of data.
^C
--- 195.201.233.113 ping statistics ---
104 packets transmitted, 0 received, 100% packet loss, time 105449ms

user@ose:~$
  1. and ssh fails
user@ose:~$ ssh -vvvv osedev1
OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2t  10 Sep 2019
debug1: Reading configuration data /home/user/.ssh/config
debug1: /home/user/.ssh/config line 8: Applying options for osedev1
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: resolving "195.201.233.113" port 32415
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to 195.201.233.113 [195.201.233.113] port 32415.
debug1: connect to address 195.201.233.113 port 32415: Connection timed out
ssh: connect to host 195.201.233.113 port 32415: Connection timed out
user@ose:~$ 
  1. logging into the hetzner cloud console shows that the box is online and sitting on the login screen. I tried to login, but after typing the username it freezes. Now my dev node is acting like my damn staging node was.
  2. I gave the dev server a reboot
  3. after a few minutes, I could ssh-in.
  4. and I could VPN-in as well.
  5. now when I start the staging container, I still get timeout issues
opensourceecology login: maltfield
Password:
login: timed out after 60 seconds

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

opensourceecology login: 
  1. it's worth noting that systemd-journal is chewing up >90% of the CPU on the host osedev1 server
  2. I added these 2x lines to the lxc container's config file per https://serverfault.com/questions/658052/systemd-journal-in-debian-jessie-lxc-container-eats-100-cpu
lxc.autodev = 1
lxc.kmsg = 0
  1. I stopped the container & started it again; this time systemd on the host was <10% CPU usage, and I was able to login without any delay!
  2. I became root too, and that worked great!
  3. I had issues with ssh-ing in from my laptop, but after I disconnected from the VPN and reconnected, I was able to ssh into osestaging1 from my laptop!
  4. this time I was also able to become root and poke around at our new, shiny production clone server, cool!
user@ose:~/openvpn$ ssh -vvvvv osestaging1
...
Last login: Tue Oct 22 16:08:03 2019
[maltfield@opensourceecology ~]$ sudo su -
Last login: Tue Oct 22 16:09:19 UTC 2019 on lxc/console
[root@opensourceecology ~]# ls -lah /var/www/html/ | head
total 100K
drwxr-xr-x. 25 root       root   4.0K Apr  9  2019 .
drwxr-xr-x.  5 root       root   4.0K Aug 23  2017 ..
d---r-x---.  3 not-apache apache 4.0K Aug  8  2018 3dp.opensourceecology.org
drwxr-xr-x.  3 root       root   4.0K Dec 24  2017 awstats.openbuildinginstitute.org
drwxr-xr-x.  3 root       root   4.0K Feb  9  2018 awstats.opensourceecology.org
drwxr-xr-x.  2 root       root   4.0K Mar  2  2018 cacti.opensourceecology.org.old
drwxr-xr-x.  3 apache     apache 4.0K Feb  9  2018 certbot
d---r-x---.  3 not-apache apache 4.0K Aug  7  2018 d3d.opensourceecology.org
d---r-x---.  3 not-apache apache 4.0K Apr  9  2019 fef.opensourceecology.org
[root@opensourceecology ~]# 
  1. ss shows that varnish & apache are listening
[root@opensourceecology ~]# ss -plan | grep -i LISTEN
u_str  LISTEN     0      100    private/proxymap 183064                * 0                   users:(("master",pid=782,fd=49))
u_str  LISTEN     0      100    public/pickup 183032                * 0                   users:(("pickup",pid=791,fd=6),("master",pid=782,fd=17))
u_str  LISTEN     0      100    public/cleanup 183036                * 0                   users:(("master",pid=782,fd=21))
u_str  LISTEN     0      100    public/qmgr 183039                * 0                   users:(("qmgr",pid=792,fd=6),("master",pid=782,fd=24))
u_str  LISTEN     0      100    private/tlsmgr 183043                * 0                   users:(("master",pid=782,fd=28))
u_str  LISTEN     0      100    private/rewrite 183046                * 0                   users:(("master",pid=782,fd=31))
u_str  LISTEN     0      100    private/bounce 183049                * 0                   users:(("master",pid=782,fd=34))
u_str  LISTEN     0      100    private/defer 183052                * 0                   users:(("master",pid=782,fd=37))
u_str  LISTEN     0      100    private/trace 183055                * 0                   users:(("master",pid=782,fd=40))
u_str  LISTEN     0      128    /run/systemd/private 174128                * 0                   users:(("systemd",pid=1,fd=12))
u_str  LISTEN     0      128    /run/lvm/lvmpolld.socket 174135                * 0                   users:(("systemd",pid=1,fd=20))
u_str  LISTEN     0      128    /run/lvm/lvmetad.socket 174138                * 0                   users:(("lvmetad",pid=24,fd=3),("systemd",pid=1,fd=21))
u_str  LISTEN     0      128    /run/systemd/journal/stdout 174140                * 0                   users:(("systemd-journal",pid=18,fd=3),("systemd",pid=1,fd=22))
u_str  LISTEN     0      100    private/verify 183058                * 0                   users:(("master",pid=782,fd=43))
u_str  LISTEN     0      128    /tmp/ssh-bd3GlfYKNm/agent.1751 223092                * 0                   users:(("sshd",pid=1751,fd=9))
u_str  LISTEN     0      100    private/retry 183082                * 0                   users:(("master",pid=782,fd=67))
u_str  LISTEN     0      50     /var/lib/mysql/mysql.sock 187559                * 0                   users:(("mysqld",pid=1011,fd=14))
u_str  LISTEN     0      100    private/discard 183085                * 0                   users:(("master",pid=782,fd=70))
u_str  LISTEN     0      100    public/flush 183061                * 0                   users:(("master",pid=782,fd=46))
u_str  LISTEN     0      100    private/local 183088                * 0                   users:(("master",pid=782,fd=73))
u_str  LISTEN     0      100    private/virtual 183091                * 0                   users:(("master",pid=782,fd=76))
u_str  LISTEN     0      100    private/lmtp 183094                * 0                   users:(("master",pid=782,fd=79))
u_str  LISTEN     0      100    private/anvil 183097                * 0                   users:(("master",pid=782,fd=82))
u_str  LISTEN     0      100    private/scache 183100                * 0                   users:(("master",pid=782,fd=85))
u_str  LISTEN     0      100    private/proxywrite 183067                * 0                   users:(("master",pid=782,fd=52))
u_str  LISTEN     0      100    private/smtp 183070                * 0                   users:(("master",pid=782,fd=55))
u_str  LISTEN     0      100    private/relay 183073                * 0                   users:(("master",pid=782,fd=58))
u_str  LISTEN     0      100    public/showq 183076                * 0                   users:(("master",pid=782,fd=61))
u_str  LISTEN     0      100    private/error 183079                * 0                   users:(("master",pid=782,fd=64))
u_str  LISTEN     0      10     /var/run/acpid.socket 176097                * 0                   users:(("acpid",pid=48,fd=5))
u_str  LISTEN     0      128    /var/run/dbus/system_bus_socket 175844                * 0                   users:(("dbus-daemon",pid=51,fd=3),("systemd",pid=1,fd=31))
tcp    LISTEN     0      128    127.0.0.1:8000                  *:*                   users:(("httpd",pid=520,fd=3),("httpd",pid=519,fd=3),("httpd",pid=518,fd=3),("httpd",pid=517,fd=3),("httpd",pid=516,fd=3),("httpd",pid=314,fd=3))
tcp    LISTEN     0      128    127.0.0.1:6081                  *:*                   users:(("varnishd",pid=1165,fd=6))
tcp    LISTEN     0      10     127.0.0.1:6082                  *:*                   users:(("varnishd",pid=1109,fd=5))
tcp    LISTEN     0      128    127.0.0.1:8010                  *:*                   users:(("httpd",pid=520,fd=4),("httpd",pid=519,fd=4),("httpd",pid=518,fd=4),("httpd",pid=517,fd=4),("httpd",pid=516,fd=4),("httpd",pid=314,fd=4))
tcp    LISTEN     0      128       *:10000                 *:*                   users:(("miniserv.pl",pid=533,fd=5))
tcp    LISTEN     0      100    127.0.0.1:25                    *:*                   users:(("master",pid=782,fd=13))
tcp    LISTEN     0      128       *:32415                 *:*                   users:(("sshd",pid=326,fd=3))
tcp    LISTEN     0      128      :::4949                 :::*                   users:(("munin-node",pid=379,fd=5))
tcp    LISTEN     0      128      :::32415                :::*                   users:(("sshd",pid=326,fd=4))
[root@opensourceecology ~]# 
  1. as expected, nginx is failing because it can't bind to the hardcoded external ip addresses that don't exist on this distinct server; we'll have to sed this later
[root@opensourceecology ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] bind() to 138.201.84.223:4443 failed (99: Cannot assign requested address)
nginx: configuration file /etc/nginx/nginx.conf test failed
[root@opensourceecology ~]# 
  1. note that the hostname above is an exact match of the production server. This is confusing for my logs and makes it a risk of running commands on the wrong server. If possible, I should try to sed this back to 'osestaging1' or exclude the relevant configs' rsync as well
  2. so it looks like apache is listening to 127.0.0.1:8000 for name-based-vhosts, except certbot which listens on 127.0.0.1:8010
[root@opensourceecology conf.d]# grep VirtualHost *
000-www.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
000-www.opensourceecology.org.conf:</VirtualHost>
00-fef.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-fef.opensourceecology.org.conf:</VirtualHost>
00-forum.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-forum.opensourceecology.org.conf:</VirtualHost>
00-microfactory.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-microfactory.opensourceecology.org.conf:</VirtualHost>
00-oswh.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-oswh.opensourceecology.org.conf:</VirtualHost>
00-phplist.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-phplist.opensourceecology.org.conf:</VirtualHost>
00-seedhome.openbuildinginstitute.org.conf:<VirtualHost 127.0.0.1:8000>
00-seedhome.openbuildinginstitute.org.conf:</VirtualHost>
00-store.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-store.opensourceecology.org.conf:</VirtualHost>
00-wiki.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-wiki.opensourceecology.org.conf:</VirtualHost>
00-www.openbuildinginstitute.org.conf:<VirtualHost 127.0.0.1:8000>
00-www.openbuildinginstitute.org.conf:</VirtualHost>
awstats.openbuildinginstitute.org.conf:<VirtualHost 127.0.0.1:8000>
awstats.openbuildinginstitute.org.conf:</VirtualHost>
awstats.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
awstats.opensourceecology.org.conf:</VirtualHost>
certbot.conf:<VirtualHost 127.0.0.1:8010>
certbot.conf:</VirtualHost>
munin.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
munin.opensourceecology.org.conf:</VirtualHost>
ssl.conf.disabled:<VirtualHost _default_:443>
ssl.conf.disabled:# moved outside VirtualHost block (see below)
ssl.conf.disabled:# moved outside VirtualHost block (see below)
ssl.conf.disabled:</VirtualHost>
ssl.conf.orig:<VirtualHost _default_:443>
ssl.conf.orig:</VirtualHost>                                  
ssl.openbuildinginstitute.org:# Purpose: To be included inside the <VirtualHost> block for all
ssl.opensourceecology.org:# Purpose: To be included inside the <VirtualHost> block for all
staging.openbuildinginstitute.org.conf.bak:<VirtualHost staging.openbuildinginstitute.org:8000>
staging.openbuildinginstitute.org.conf.bak:</VirtualHost>
staging.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
staging.opensourceecology.org.conf:</VirtualHost>
varnishTest.conf.disabled:<VirtualHost 127.0.0.1:8000>
varnishTest.conf.disabled:</VirtualHost>
[root@opensourceecology conf.d]# 
  1. unfortunately I get 403 forbiddens for both with curl
[root@opensourceecology conf.d]# curl 127.0.0.1:8000/
<!DOCTYPE HTML PUBLIC "-IETFDTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.</p>
</body></html>
[root@opensourceecology conf.d]# curl 127.0.0.1:8010/
<!DOCTYPE HTML PUBLIC "-IETFDTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.</p>
</body></html>
[root@opensourceecology conf.d]# 
  1. tailing the logs shows modsec blocking us from the fef vhost because we specified the URI as an IP address. Well, ok.
==> fef.opensourceecology.org/error_log <==
[Tue Oct 22 16:20:34.573535 2019] [:error] [pid 518] [client 127.0.0.1] ModSecurity: Access denied with code 403 (phase 2). Pattern match "^[\\\\d.:]+$" at REQUEST_HEADERS:Host. [file "/etc/httpd/modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf"] [line "98"] [id "960017"] [rev "2"] [msg "Host header is a numeric IP address"] [data "127.0.0.1:8000"] [severity "WARNING"] [ver "OWASP_CRS/2.2.9"] [maturity "9"] [accuracy "9"] [tag "OWASP_CRS/PROTOCOL_VIOLATION/IP_HOST"] [tag "WASCTC/WASC-21"] [tag "OWASP_TOP_10/A7"] [tag "PCI/6.5.10"] [tag "http://technet.microsoft.com/en-us/magazine/2005.01.hackerbasher.aspx"] [hostname "127.0.0.1"] [uri "/"] [unique_id "Xa8sUlvJZ8GVfznr1gxo6AAAAAI"]

==> modsec_audit.log <==
--cbc91b75-A--
[22/Oct/2019:16:20:34 +0000] Xa8sUlvJZ8GVfznr1gxo6AAAAAI 127.0.0.1 33594 127.0.0.1 8000
--cbc91b75-B--
GET / HTTP/1.1
User-Agent: curl/7.29.0
Host: 127.0.0.1:8000
Accept: */*

--cbc91b75-F--
HTTP/1.1 403 Forbidden
Content-Length: 202
Content-Type: text/html; charset=iso-8859-1

--cbc91b75-E--

--cbc91b75-H--
Message: Access denied with code 403 (phase 2). Pattern match "^[\\d.:]+$" at REQUEST_HEADERS:Host. [file "/etc/httpd/modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf"] [line "98"] [id "960017"] [rev "2"] [msg "Host header is a numeric IP address"] [data "127.0.0.1:8000"] [severity "WARNING"] [ver "OWASP_CRS/2.2.9"] [maturity "9"] [accuracy "9"] [tag "OWASP_CRS/PROTOCOL_VIOLATION/IP_HOST"] [tag "WASCTC/WASC-21"] [tag "OWASP_TOP_10/A7"] [tag "PCI/6.5.10"] [tag "http://technet.microsoft.com/en-us/magazine/2005.01.hackerbasher.aspx"]
Action: Intercepted (phase 2)
Stopwatch: 1571761234559472 14421 (- - -)
Stopwatch2: 1571761234559472 14421; combined=4661, p1=4516, p2=113, p3=0, p4=0, p5=32, sr=3976, sw=0, l=0, gc=0
Response-Body-Transformed: Dechunked
Producer: ModSecurity for Apache/2.7.3 (http://www.modsecurity.org/); OWASP_CRS/2.2.9.
Server: Apache
Engine-Mode: "ENABLED"

--cbc91b75-Z--


==> fef.opensourceecology.org/access_log <==
127.0.0.1 - - [22/Oct/2019:16:20:34 +0000] "GET / HTTP/1.1" 403 202 "-" "curl/7.29.0"
  1. attempting 8000 does a redirect that tries to strip itself; attempting 8010 works! The latter is just an empty docroot that gets populated by `certbot` for renewing certs on complicated some non-public vhost sites
[root@opensourceecology ~]# curl -i http://localhost:8000/
HTTP/1.1 301 Moved Permanently
Date: Tue, 22 Oct 2019 16:22:24 GMT
Server: Apache
X-VC-Enabled: true
X-VC-TTL: 86400
Location: http://localhost/
X-XSS-Protection: 1; mode=block
Content-Length: 0
Content-Type: text/html; charset=UTF-8

[root@opensourceecology ~]# curl -i http://localhost:8010/
HTTP/1.1 200 OK
Date: Tue, 22 Oct 2019 16:23:43 GMT
Server: Apache
Last-Modified: Fri, 09 Feb 2018 20:56:47 GMT
Accept-Ranges: bytes
Content-Length: 18
X-XSS-Protection: 1; mode=block
Content-Type: text/html; charset=UTF-8

can you see this?
[root@opensourceecology ~]# 
  1. this is going to be a pain; let's see if I can get nginx working; we have to fix '138.201.84.223'
[root@opensourceecology nginx]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] bind() to 138.201.84.223:4443 failed (99: Cannot assign requested address)
nginx: configuration file /etc/nginx/nginx.conf test failed
[root@opensourceecology nginx]# 
[root@opensourceecology nginx]# grep -irl '138.201.84.223' *
conf.d/www.openbuildinginstitute.org.conf
conf.d/wiki.opensourceecology.org.conf
conf.d/seedhome.openbuildinginstitute.org.conf
conf.d/www.opensourceecology.org.conf
conf.d/awstats.openbuildinginstitute.org.conf
nginx.conf
[root@opensourceecology nginx]# 
  1. I replaced the first IP for OBI with our VPN IP
[root@opensourceecology nginx]# sed -i 's/138.201.84.223/10.241.189.11/g' nginx.conf
[root@opensourceecology nginx]# sed -i 's/138.201.84.223/10.241.189.11/g' conf.d/*
[root@opensourceecology nginx]# 
  1. And then I replaced the second IP for oSE with our VPN IP as well
[root@opensourceecology nginx]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] bind() to 138.201.84.243:4443 failed (99: Cannot assign requested address)
nginx: configuration file /etc/nginx/nginx.conf test failed
[root@opensourceecology nginx]# sed -i 's/138.201.84.243/10.241.189.11/g' nginx.conf
[root@opensourceecology nginx]# sed -i 's/138.201.84.243/10.241.189.11/g' conf.d/*
[root@opensourceecology nginx]#
  1. well now there is a duplicate line to listen on this same IP; I removed that from nginx.conf
  2. And now I'm having issues with a duplicate default_server line. Oh, right, now that OBI and OSE share the same IP I'll make OSE the default server and remove it from OBI
[root@opensourceecology conf.d]# nginx -t
nginx: [emerg] a duplicate default server for 10.241.189.11:443 in /etc/nginx/conf.d/www.opensourceecology.org.conf:58
nginx: configuration file /etc/nginx/nginx.conf test failed
[root@opensourceecology conf.d]# grep -irl 'default_server' *
www.openbuildinginstitute.org.conf
www.opensourceecology.org.conf
[root@opensourceecology conf.d]# vim www.openbuildinginstitute.org.conf 
  1. Aaand now it's failing on the same issue but for the IPv6 addresses. I'm just going to comment those out entirely for the staging server
[root@opensourceecology conf.d]# nginx -t
nginx: [warn] conflicting server name "_" on 10.241.189.11:443, ignored
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] bind() to [2a01:4f8:172:209e::2]:443 failed (99: Cannot assign requested address)
nginx: configuration file /etc/nginx/nginx.conf test failed
[root@opensourceecology conf.d]# grep -irl '2a01:4f8:172:209e::2' *
awstats.opensourceecology.org.conf
fef.opensourceecology.org.conf
forum.opensourceecology.org.conf
microfactory.opensourceecology.org
munin.opensourceecology.org.conf
oswh.opensourceecology.org.conf
store.opensourceecology.org.conf
wiki.opensourceecology.org.conf
www.opensourceecology.org.conf
  1. This last sed fixed it!
[root@opensourceecology conf.d]# sed -i 's^\(\s*\)[^#]*listen \[2a01:4f8:172:209e::2\(.*\)^\1#listen \[2a01:4f8:172:209e::2\2^' *
[root@opensourceecology conf.d]# nginx -t
nginx: [warn] conflicting server name "_" on 10.241.189.11:443, ignored
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
[root@opensourceecology conf.d]# 
  1. I added these lines to /etc/hosts to make a new domain 'staging.www.opensourceecology.org' point to this IP address; it works!
[root@opensourceecology conf.d]# tail /etc/hosts
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
2a01:4f8:172:209e::2 hetzner2.opensourceecology.org hetzner2

# staging
127.0.0.1 staging.www.opensourceecology.org
[root@opensourceecology conf.d]# 
[root@opensourceecology conf.d]# curl -si https://staging.opensourceecology.org | tail
var mo_theme = {"name_required":"Please provide your name","name_format":"Your name must consist of at least 5 characters","email_required":"Please provide a valid email address","url_required":"Please provide a valid URL","phone_required":"Minimum 5 characters required","human_check_failed":"The input the correct value for the equation above","message_required":"Please input the message","message_format":"Your message must be at least 15 characters long","success_message":"Your message has been sent. Thanks!","blog_url":"https:\/\/staging.opensourceecology.org","loading_portfolio":"Loading the next set of posts...","finished_loading":"No more items to load..."};
/* ]]> */
</script>
<script type='text/javascript' src='https://staging.opensourceecology.org/wp-content/themes/enigmatic/js/main.js?ver=1.6'></script>
<script type='text/javascript' src='https://staging.opensourceecology.org/wp-includes/js/wp-embed.min.js?ver=4.9.4'></script>

</body>
</html>


[root@opensourceecology conf.d]# 
  1. I then added a line for 'staging.www.opensourceecology.org' and 'www.opensourceecology.org' to point to my staging server's VPN IP address on my laptop and fired up firefox; I was successfully able to access the staging site's nginx -> varnish -> http site!
10.241.189.11 www.opensourceecology.org
10.241.189.11 staging.www.opensourceecology.org
  1. note that, of course, I get a cert error when attempting to access 'staging.www.opensourceecology.org', but it loads fine when hitting 'www.opensourceecology.org'. I'll have to think more about how I want to fix this. If one is on the VPN, should they be automatically forced to using the staging site? That seems like it could create confusion, but if the names are *not* the same, then I'm sure lots of errors will be encountered with links and such; so perhaps that *is* the most logical thing to do...
  2. oh fuck. now, somehow, I am getting emails from OSSEC on the staging server. I'll have to fix that too. For now I just stopped the ossec service on the staging server

Tue Oct 08, 2019

  1. continuing from yesterday, I checked-up on the rsync running from prod to staging, and it appears to have stalled
	75497472 100%    2.90MB/s    0:00:24 (xfer#4297, to-check=1538/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000d4a57-00058e887df34962.journal   
	75497472 100%    2.80MB/s    0:00:25 (xfer#4298, to-check=1537/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000e7f5a-00058ec8f2c8422b.journal   
	23429120  31%    2.91MB/s    0:00:17
  1. it's probably not a good idea to sync the /run dir..
  2. attempting to ssh into the server fails
user@ose:~/openvpn$ ssh osestaging1
The authenticity of host '[10.241.189.11]:32415 ([10.241.189.11]:32415)' can't be established.
ECDSA key fingerprint is SHA256:HclF8ZQOjGqx+9TmwL111kZ7QxgKkoEw8g3l2YxV0gk.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.11]:32415' (ECDSA) to the list of known hosts.
Permission denied (publickey).
user@ose:~/openvpn$ 
  1. I _can_ get into the staging server from the lxc-console on the dev server, but it doesn't look like anything is wrong with the setup of my user
[root@osestaging1 ~]# grep maltfield /etc/passwd
maltfield:x:1005:1005::/home/maltfield:/bin/bash
[root@osestaging1 ~]# grep maltfield /etc/shadow
maltfield:TRUNCATED
[root@osestaging1 ~]# grep maltfield /etc/group
wheel:x:10:maltfield,crupp,tgriffing,root
apache:x:48:cmota,crupp,maltfield,wp,apache,marcin
maltfield:x:1005:apache
sshaccess:x:1006:cmota,marcin,tgriffing,maltfield,lberezhny,crupp
keepass:x:993:maltfield,marcin,cmota,crupp
apache-admins:x:1012:cmota,maltfield,marcin,crupp,tgriffing,wp,apache
[root@osestaging1 ~]# ls -lah /home/maltfield/.ssh
total 16K
drwxr-xr-x.  2 tgriffing maltfield 4.0K Jan 19  2018 .
drwx------. 10 tgriffing maltfield 4.0K Oct  3 07:06 ..
-rw-r--r--.  1 root      root       750 Jun 20  2017 authorized_keys
-rw-r--r--.  1 tgriffing tgriffing 1.1K Oct  3 13:44 known_hosts
[root@osestaging1 ~]# cat /home/maltfield/.ssh/authorized_keys 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGNYjR7UKiJSAG/AbP+vlCBqNfQZ2yuSXfsEDuM7cEU8PQNJyuJnS7m0VcA48JRnpUpPYYCCB0fqtIEhpP+szpMg2LByfTtbU0vDBjzQD9mEfwZ0mzJsfzh1Nxe86l/d6h6FhxAqK+eG7ljYBElDhF4l2lgcMAl9TiSba0pcqqYBRsvJgQoAjlZOIeVEvM1lyfWfrmDaFK37jdUCBWq8QeJ98qpNDX4A76f9T5Y3q5EuSFkY0fcU+zwFxM71bGGlgmo5YsMMdSsW+89fSG0652/U4sjf4NTHCpuD0UaSPB876NJ7QzeDWtOgyBC4nhPpS8pgjsnl48QZuVm6FNDqbXr9bVk5BdntpBgps+gXdSL2j0/yRRayLXzps1LCdasMCBxCzK+lJYWGalw5dNaIDHBsEZiK55iwPp0W3lU9vXFO4oKNJGFgbhNmn+KAaW82NBwlTHo/tOlj2/VQD9uaK5YLhQqAJzIq0JuWZWFLUC2FJIIG0pJBIonNabANcN+vq+YJqjd+JXNZyTZ0mzuj3OAB/Z5zS6lT9azPfnEjpcOngFs46P7S/1hRIrSWCvZ8kfECpa8W+cTMus4rpCd40d1tVKzJA/n0MGJjEs2q4cK6lC08pXxq9zAyt7PMl94PHse2uzDFhrhh7d0ManxNZE+I5/IPWOnG1PJsDlOe4Yqw== guttersnipe@guttersnipe

[root@osestaging1 ~]# 
  1. ssh appears to be running too
[root@osestaging1 ~]# systemctl list-units | grep -i ssh
sshd.service                      loaded active running   OpenSSH server daemon
[root@osestaging1 ~]# ss -plan | grep -i ssh
u_str  ESTAB      0      0         * 32621                 * 32622               users:(("sshd",pid=350,fd=5))
u_dgr  UNCONN     0      0         * 32618                 * 29344               users:(("sshd",pid=350,fd=4),("sshd",pid=348,fd=4))
u_str  ESTAB      0      0         * 31143                 * 0                   users:(("sshd",pid=274,fd=2),("sshd",pid=274,fd=1))
u_str  ESTAB      0      0         * 32622                 * 32621               users:(("sshd",pid=348,fd=7))
tcp    LISTEN     0      128       *:32415                 *:*                   users:(("sshd",pid=274,fd=3))
tcp    ESTAB      0      0      10.241.189.11:32415              10.241.189.10:41270               users:(("sshd",pid=350,fd=3),("sshd",pid=348,fd=3))
tcp    LISTEN     0      128    [::]:32415              [::]:*                   users:(("sshd",pid=274,fd=4))
[root@osestaging1 ~]# 
  1. the ssh server logs say that the client just disconnects
Oct  8 05:57:01 localhost sshd[3586]: Connection closed by 10.241.189.10 port 41334 [preauth]
  1. the ssh client says that the server rejected our public key
user@ose:~/openvpn$ ssh -vvv osestaging1
...
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/user/.ssh/id_rsa.ose
debug3: send_pubkey_test
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey
debug2: we did not send a packet, disable method
debug1: No more authentication methods to try.
Permission denied (publickey).
user@ose:~/openvpn$ 
  1. I did notice that the ownership of the relevant /home/.ssh/authorized_keys file differs on the prod & staging servers
[maltfield@opensourceecology ~]$ ls -lahn /home/maltfield/.ssh
total 16K
drwxr-xr-x  2 1005 1005 4.0K Jan 19  2018 .
drwx------ 10 1005 1005 4.0K Oct  3 07:06 ..
-rw-r--r--  1    0    0  750 Jun 20  2017 authorized_keys
-rw-r--r--  1 1005 1005 1.1K Oct  3 13:44 known_hosts
[maltfield@opensourceecology ~]$ 
[root@osestaging1 ~]# ls -lahn /home/maltfield/.ssh
total 16K
drwxr-xr-x.  2 1000 1005 4.0K Jan 19  2018 .
drwx------. 10 1000 1005 4.0K Oct  3 07:06 ..
-rw-r--r--.  1    0    0  750 Jun 20  2017 authorized_keys
-rw-r--r--.  1 1000 1000 1.1K Oct  3 13:44 known_hosts
[root@osestaging1 ~]# 
  1. while the passwd, group, and shadow files all match
[root@opensourceecology ~]# md5sum /etc/passwd
cabf495ca12f7f32605eb764dd12c861  /etc/passwd
[root@opensourceecology ~]# md5sum /etc/group
04a70553d59a646406ecb89f2f7b17b5  /etc/group
[root@opensourceecology ~]# md5sum /etc/shadow
6f27deaf639ae2db1a1d94739a8bb834  /etc/shadow
[root@opensourceecology ~]# 
[root@osestaging1 ~]# md5sum /etc/passwd
cabf495ca12f7f32605eb764dd12c861  /etc/passwd
[root@osestaging1 ~]# md5sum /etc/group
04a70553d59a646406ecb89f2f7b17b5  /etc/group
[root@osestaging1 ~]# md5sum /etc/shadow
6f27deaf639ae2db1a1d94739a8bb834  /etc/shadow
[root@osestaging1 ~]# 
  1. for some reason my '/home/maltfield' dir was also owned by 'tgriffin'. I was able to ssh-in again after fixing this
[root@osestaging1 ~]# chown -R maltfield:maltfield /home/maltfield/
[root@osestaging1 ~]# ls -lah /home
total 52K
drwxr-xr-x. 13 root       root       4.0K Jul 28  2018 .
dr-xr-xr-x. 20 root       root       4.0K Oct  7 10:05 ..
drwx------.  7 b2user     b2user     4.0K Oct  7 07:46 b2user
drwx------.  5 cmota      cmota      4.0K Jul 14  2017 cmota
drwx------.  5 crupp      crupp      4.0K Aug 12  2017 crupp
drwx------.  2 Flipo      Flipo      4.0K Sep 20  2016 Flipo
drwx------.  2 hart       hart       4.0K Mar 30  2017 hart
drwx------.  3 lberezhny  lberezhny  4.0K Jul 20  2017 lberezhny
drwx------. 10 maltfield  maltfield  4.0K Oct  3 07:06 maltfield
drwx------.  4 marcin     marcin     4.0K Jul  6  2017 marcin
drwx------.  2 not-apache not-apache 4.0K Feb 12  2018 not-apache
drwx------.  5 tgriffing  tgriffing  4.0K Aug  1 09:19 tgriffing
drwx------.  5 wp         wp         4.0K Oct  7  2017 wp
[root@osestaging1 ~]# 
  1. I re-opened the screen for the rsync, and it now exited
	75497472 100%    2.90MB/s    0:00:24 (xfer#4297, to-check=1538/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000d4a57-00058e887df34962.journal   
	75497472 100%    2.80MB/s    0:00:25 (xfer#4298, to-check=1537/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000e7f5a-00058ec8f2c8422b.journal   
	23429120  31%    2.91MB/s    0:00:17





packet_write_wait: Connection to 10.241.189.11 port 32415: Broken pipe

rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: connection unexpectedly closed (119371 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9]

real    1059m42.282s
user    12m34.775s
sys     3m5.253s
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$ time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
  1. I updated the rsync command to exclude /run, and I kicked-off the rsync again
time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
  1. ah, ffs! my internet connection here failed me, and I was silently disconnected from my ssh session with the prod node and dumped into a local shell. So I ended-up kicking off this rsync not from the prod node on which I was ssh'd, but my personal laptop (when I was dropped out of the prod server's ssh shell into my laptop's shell). By the time I realized it, the fucking staging server was broken!
  2. fucking hell, I had successfully copied 35G overnight; now I have to restore from snapshot and start over.
  3. I prepended a fucking hostname check to make sure this stupid shit doesn't happen again
[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
  1. I had a bunch of issues restoring from snapshot; eventually I just did an rsync of the '/var/lib/lxcsnaps/osestaging1/snap1' dir to '/var/lib/lxc/osestaging1', and I was finally successfully able to `lxc-start -n osestaging1`
  2. I did the `visudo` and install of rsync and re-initiated the rsync from prod to staging using the above-command. I noticed that I forgot to exclude the backups; here's what I should use next time
[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
  1. while that ran, I checked our munin graphs. I nice'd & bwlimit'd the above rsync, but it's still good to check.
    1. there's a spike in varnish requests, which is a bit odd
    2. there was a shift in memory usage, but no issues there
    3. load spiked to ~2, but our box has 8; no problems
    4. there was a spike in 'nice' to ~100% cpu usage; cool
    5. firewall throughput, eth0 traffic spiked to about the same level as our backups. excellent
    6. there's a huge spike in disk usage read, disk IO that's much higher than backups; hmm
  2. I also noted that the apache graphs that I added some time ago are blank; I probably have to setup an apache stats vhost for munin to scrape
  3. munin processing graphs are also blank; hmm
  4. all mysql graphs are also blank
  5. even nginx graphs are all blank
  6. I also added plugins for monitoring the 'mysqld' process and the memory of a bunch of processes
[root@opensourceecology plugins]# ls
apache_access       if_err_eth0        mysql_slowqueries   uptime                       varnish_memory_usage.bak
apache_processes    if_eth0            mysql_threads       users                        varnish_objects
apache_volume       interrupts         nginx_request       varnish4_                    varnish_objects.bak
cpu                 irqstats           nginx_status        varnish_backend_traffic      varnish_request_rate
df                  load               open_files          varnish_backend_traffic.bak  varnish_request_rate.bak
df_inode            memory             open_inodes         varnish_bad                  varnish_threads
diskstats           munin_stats        postfix_mailqueue   varnish_bad.bak              varnish_threads.bak
entropy             mysql_             postfix_mailvolume  varnish_expunge              varnish_transfer_rates
forks               mysql_bytes        processes           varnish_expunge.bak          varnish_transfer_rates.bak
fw_conntrack        mysql_innodb       proc_pri            varnish_hit_rate             varnish_uptime
fw_forwarded_local  mysql_isam_space_  swap                varnish_hit_rate.bak         varnish_uptime.bak
fw_packets          mysql_queries      threads             varnish_memory_usage         vmstat
[root@opensourceecology plugins]# ls -lah | head -n 5
total 36K
drwxr-xr-x 2 root root 4.0K Sep  7 07:37 .
drwxr-xr-x 8 root root 4.0K Jun 24 16:05 ..
lrwxrwxrwx 1 root root   38 Sep  7 07:36 apache_access -> /usr/share/munin/plugins/apache_access
lrwxrwxrwx 1 root root   41 Sep  7 07:36 apache_processes -> /usr/share/munin/plugins/apache_processes
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/multip
multiping       multips         multips_memory  
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/multips_memory
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/ps_ ps_mysqld
[root@opensourceecology plugins]# 
  1. for the munin mysql graphs, it looks like I need to grant access for the 'munin' user
[root@opensourceecology plugin-conf.d]# munin-run --debug mysql_queries
# Processing plugin configuration from /etc/munin/plugin-conf.d/amavis
# Processing plugin configuration from /etc/munin/plugin-conf.d/df
# Processing plugin configuration from /etc/munin/plugin-conf.d/fw_
# Processing plugin configuration from /etc/munin/plugin-conf.d/hddtemp_smartctl
# Processing plugin configuration from /etc/munin/plugin-conf.d/munin-node
# Processing plugin configuration from /etc/munin/plugin-conf.d/postfix
# Processing plugin configuration from /etc/munin/plugin-conf.d/postgres
# Processing plugin configuration from /etc/munin/plugin-conf.d/sendmail
# Processing plugin configuration from /etc/munin/plugin-conf.d/zzz-ose
# Setting /rgid/ruid/ to /99/99/
# Setting /egid/euid/ to /99 99/99/
# Setting up environment
# Environment mysqlopts = -u munin
# About to run '/etc/munin/plugins/mysql_queries'
mysqladmin: connect to server at 'localhost' failed
error: 'Access denied for user 'munin'@'localhost' (using password: NO)'
[root@opensourceecology plugin-conf.d]# 
  1. woah, this guide suggests that there's a ton more graphs than just is what symlink-able https://blog.penumbra.be/2010/04/monitoring-mysql-munin-directadmin/
[root@opensourceecology plugins]# ls -lah mysql_*
lrwxrwxrwx 1 root root 31 Sep  7 07:36 mysql_ -> /usr/share/munin/plugins/mysql_
lrwxrwxrwx 1 root root 36 Sep  7 07:36 mysql_bytes -> /usr/share/munin/plugins/mysql_bytes
lrwxrwxrwx 1 root root 37 Sep  7 07:36 mysql_innodb -> /usr/share/munin/plugins/mysql_innodb
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_isam_space_ -> /usr/share/munin/plugins/mysql_isam_space_
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_queries -> /usr/share/munin/plugins/mysql_queries
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_slowqueries -> /usr/share/munin/plugins/mysql_slowqueries
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_threads -> /usr/share/munin/plugins/mysql_threads
[root@opensourceecology plugins]# ls -lah /usr/share/munin/plugins/mysql_*
-rwxr-xr-x 1 root root  33K Mar  3  2017 /usr/share/munin/plugins/mysql_
-rwxr-xr-x 1 root root 1.8K Mar  3  2017 /usr/share/munin/plugins/mysql_bytes
-rwxr-xr-x 1 root root 5.4K Mar  3  2017 /usr/share/munin/plugins/mysql_innodb
-rwxr-xr-x 1 root root 5.7K Mar  3  2017 /usr/share/munin/plugins/mysql_isam_space_
-rwxr-xr-x 1 root root 2.5K Mar  3  2017 /usr/share/munin/plugins/mysql_queries
-rwxr-xr-x 1 root root 1.5K Mar  3  2017 /usr/share/munin/plugins/mysql_slowqueries
-rwxr-xr-x 1 root root 1.7K Mar  3  2017 /usr/share/munin/plugins/mysql_threads
[root@opensourceecology plugins]# /usr/share/munin/plugins/mysql_ suggest
bin_relay_log
commands
connections
files_tables
innodb_bpool
innodb_bpool_act
innodb_insert_buf
innodb_io
innodb_io_pend
innodb_log
innodb_rows
innodb_semaphores
innodb_tnx
myisam_indexes
network_traffic
qcache
qcache_mem
replication
select_types
slow
sorts
table_locks
tmp_tables
[root@opensourceecology plugins]# 
  1. I added all the mysql things
root@opensourceecology plugins]# ls -lah mysql_*
lrwxrwxrwx 1 root root 31 Sep  7 07:36 mysql_ -> /usr/share/munin/plugins/mysql_
lrwxrwxrwx 1 root root 36 Sep  7 07:36 mysql_bytes -> /usr/share/munin/plugins/mysql_bytes
lrwxrwxrwx 1 root root 37 Sep  7 07:36 mysql_innodb -> /usr/share/munin/plugins/mysql_innodb
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_isam_space_ -> /usr/share/munin/plugins/mysql_isam_space_
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_queries -> /usr/share/munin/plugins/mysql_queries
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_slowqueries -> /usr/share/munin/plugins/mysql_slowqueries
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_threads -> /usr/share/munin/plugins/mysql_threads
[root@opensourceecology plugins]# rm -rf mysql_*
[root@opensourceecology plugins]# ln -sf /usr/share/munin/plugins/mysql_ mysql_
[root@opensourceecology plugins]# for i in `./mysql_ suggest`; \
> do ln -sf /usr/share/munin/plugins/mysql_ $i; done
[root@opensourceecology plugins]# ls -lah mysql_*
lrwxrwxrwx 1 root root 31 Oct  8 08:06 mysql_ -> /usr/share/munin/plugins/mysql_
[root@opensourceecology plugins]# ls -lah commands
lrwxrwxrwx 1 root root 31 Oct  8 08:06 commands -> /usr/share/munin/plugins/mysql_
[root@opensourceecology plugins]# 
  1. according to this guide, munin needs a user that doesn't need any GRANTs to any databases, and that's sufficient http://www.mbrando.com/2007/08/06/how-to-get-your-mysql-munin-graphs-working/
create user munin@localhost identified by 'CHANGEME';
flush privileges;
  1. and I added this stanza to /etc/munin/plugin-conf.d/zzz-ose
[mysql*]
user root
group wheel
env.mysqlopts -u munin_user -pOBFUSCATED
  1. test worked
[root@opensourceecology plugins]# munin-run --debug mysql_queries
# Processing plugin configuration from /etc/munin/plugin-conf.d/amavis
# Processing plugin configuration from /etc/munin/plugin-conf.d/df
# Processing plugin configuration from /etc/munin/plugin-conf.d/fw_
# Processing plugin configuration from /etc/munin/plugin-conf.d/hddtemp_smartctl
# Processing plugin configuration from /etc/munin/plugin-conf.d/munin-node
# Processing plugin configuration from /etc/munin/plugin-conf.d/postfix
# Processing plugin configuration from /etc/munin/plugin-conf.d/postgres
# Processing plugin configuration from /etc/munin/plugin-conf.d/sendmail
# Processing plugin configuration from /etc/munin/plugin-conf.d/zzz-ose
# Setting /rgid/ruid/ to /99/0/
# Setting /egid/euid/ to /99 99 10/0/
# Setting up environment
# Environment mysqlopts = -u munin_user -pqd2qQiFdeNGepvhv5dsQx4rVt7pRyFJ
# About to run '/etc/munin/plugins/mysql_queries'
delete.value 837242
insert.value 896145
replace.value 1197242
select.value 148647861
update.value 1721521
cache_hits.value 0
[root@opensourceecology plugins]# 
  1. now for nginx, I confirmed that we do have the ability to spit out the status page
[root@opensourceecology plugins]# nginx -V 2>&1 | grep -o with-http_stub_status_module
with-http_stub_status_module
[root@opensourceecology plugins]# 
  1. I tried adding a block for '/nginx_status' only accessible to '127.0.0.1', but I still got 403'd when attempting to access it via curl on the local machine
  2. the access logs showed it being accessed from an ipv6 address
2a01:4f8:172:209e::2 - - [08/Oct/2019:08:37:49 +0000] "GET /nginx_status HTTP/1.1" 403 162 "-" "curl/7.29.0" "-"
  1. I guess it has to go out over eth0 because the server is necessarily bound to that ip (it's not bound to 127.0.0.1)
  2. I used the following block
		# stats for munin
		location /nginx_status {
				stub_status on;
		access_log off;
				allow 127.0.0.1/32;
				allow 138.201.84.223/32;
				allow 138.201.84.243/32;
				allow ::1/128;
				allow 2a01:4f8:172:209e::2/128;
				allow fe80::921b:eff:fe94:7c4/128;
				deny all;
		}
  1. and it worked!
[root@opensourceecology conf.d]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
[root@opensourceecology conf.d]# service nginx reload
Redirecting to /bin/systemctl reload nginx.service
[root@opensourceecology conf.d]# curl https://www.opensourceecology.org/nginx_status
Active connections: 1 
server accepts handled requests
 16063989 16063989 27383851 
Reading: 0 Writing: 1 Waiting: 0 
[root@opensourceecology conf.d]# 
  1. I found that my nginx module wouldn't work unless I installed the 'perl-LWP-Protocol-https' package
[root@opensourceecology plugins]# yum install perl-LWP-Protocol-https
...
Installed:
  perl-LWP-Protocol-https.noarch 0:6.04-4.el7                                                                                    

Dependency Installed:
  perl-Mozilla-CA.noarch 0:20130114-5.el7                                                                                        

Complete!
[root@opensourceecology plugins]# 
  1. I added nginx configs for both the wiki & osemain. If all is well, I'll add the configs for out other vhosts
  2. I didn't bother with apache for now (also, the acl will be confusing since it sees all traffic coming from 127.0.0.1 via varnish)
  3. meanwhile, some of the mysql graphs are populating. good!
  4. and meanwhile, the rsync is still going; it's currently at "var/lib/mysql" copying or mysql databases' data. cool.
  5. ...
  6. after a few hours, I checked-up on rsync; it was stuck again
var/www/html/wiki.opensourceecology.org/htdocs/images/archive/5/5f/20170722193549!CEBPressJuneGroup.fcstd
	 4840012 100%    2.56MB/s    0:00:01 (xfer#344966, to-check=1043/396314)
var/www/html/wiki.opensourceecology.org/htdocs/images/archive/5/5f/20170722195024!CEBPressJuneGroup.fcstd
	  950272  19%  879.62kB/s    0:00:04
  1. the vpn client appears to have disconnected, and I can't ping the staging host at all from prod
[maltfield@opensourceecology ~]$ ping 10.241.189.11
PING 10.241.189.11 (10.241.189.11) 56(84) bytes of data.
^C
--- 10.241.189.11 ping statistics ---
59 packets transmitted, 0 received, 100% packet loss, time 57999ms

[maltfield@opensourceecology ~]$ 
  1. I manually exited-out of the openvpn connection & reinitiated it; pings now work. After about 60 seconds, the rsync started outputting again..
  2. when I went to check the size of the lxc container, I was told <1G, which can't be right
[root@osedev1 lxc]# du -sh /var/lib/lxc/osestaging1
604M    /var/lib/lxc/osestaging1
[root@osedev1 lxc]# 
  1. ncdu pointed me to the snap1 dir, which s currently 48G
[root@osedev1 lxc]# du -sh /var/lib/lxcsnaps/osestaging1/snap1
48G     /var/lib/lxcsnaps/osestaging1/snap1
[root@osedev1 lxc]# 
  1. apparently this is the consequence of restoring a snapshot just by doing a rsync; the snapshot's config file has a new line that identifies the rootfs path explicitly as the snapshot's rootfs
[root@osedev1 lxc]# tail /var/lib/lxc/osestaging1/config 
lxc.cap.drop = mac_admin
lxc.cap.drop = mac_override
lxc.cap.drop = setfcap
lxc.cap.drop = sys_module
lxc.cap.drop = sys_nice
lxc.cap.drop = sys_pacct
lxc.cap.drop = sys_rawio
lxc.cap.drop = sys_time
lxc.hook.clone = /usr/share/lxc/hooks/clonehostname
lxc.rootfs = /var/lib/lxcsnaps/osestaging1/snap1/rootfs
[root@osedev1 lxc]# 
  1. perhaps that means the actual dir is now my *real* snapshots data
  2. while rsync continued, I noted that my nginx graphs are appearing, but there's no label that differentiates the wiki from osemain's graphs
  3. I can see a list of variables defined by my plugin by default with the `munin-run <plugin> config` command https://munin.opensourceecology.org:4443/nginx-day.html
[root@opensourceecology plugins]# munin-run nginx_www.opensourceecology.org_status config
graph_title NGINX status
graph_args --base 1000
graph_category nginx
graph_vlabel Connections
total.label Active connections
total.info  Active connections
total.draw LINE2
reading.label Reading
reading.info  Reading
reading.draw LINE2
writing.label Writing
writing.info  Writing
writing.draw LINE2
waiting.label Waiting
waiting.info  Waiting
waiting.draw LINE2
[root@opensourceecology plugins]# 
  1. so it looks like I can set this as 'graph_title' or 'graph_info'
  2. I restarted munin-node and triggered the munin-cron to update the html pages
[root@opensourceecology plugins]# service munin-node restart
Redirecting to /bin/systemctl restart munin-node.service
[root@opensourceecology plugins]# 
[root@opensourceecology plugins]# sudo -u munin /usr/bin/munin-cron
  1. the new variables didn't affect anything, so I started grepping the logs
  2. unrelated, the logs complained about mysql auth failure for:
    1. network_traffic
    2. select_types
    3. innodb_tnx
    4. innodb_log
    5. sorts
    6. myisam_indexes
    7. qcache_mem
    8. innodb_io
    9. connections
    10. qcache
    11. innodb_insert_buf
    12. replication
    13. bin_relay_log
    14. mysql_queries
    15. innodb_rows
    16. innodb_bpool_act
    17. files_table
    18. commands
    19. innodb_bpool
    20. tmp_tables
    21. innodb_semaphores
    22. innodb_io_pend
    23. table_locks
    24. slow
  3. but there was nothing related to nginx
  4. I tried overriding the graph_title in the plugins, but it didn't work
  5. I found the datafile for munin in /var/lib/munin/datafile. This is clearly where the graph title is defined before being generated into html files
[root@opensourceecology plugins]# grep nginx /var/lib/munin/datafile | grep -i graph_title
localhost;localhost:nginx_wiki_opensourceecology_org_request.graph_title Nginx requests
localhost;localhost:nginx_wiki_opensourceecology_org_status.graph_title NGINX status
localhost;localhost:nginx_www_opensourceecology_org_status.graph_title NGINX status
localhost;localhost:nginx_www_opensourceecology_org_request.graph_title Nginx requests
[root@opensourceecology plugins]# 
  1. I found that I *could* override the title in /etc/muin/munin.conf https://www.aroundmyroom.com/2015/01/10/munin-help-needed/
[localhost]                                                                                                                      
	address 127.0.0.1                                                                                                            
	use_node_name yes                                                                                                            
	nginx_www_opensourceecology_org_status.graph_title Nginx Status (www.opensourceecology.org)                                  
	nginx_wiki_opensourceecology_org_status.graph_title Nginx Status (wiki.opensourceecology.org)
  1. ...
  2. meanwhile, the rsync finished!
[maltfield@opensourceecology ~]$ [ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
...
var/www/html/www.opensourceecology.org/htdocs/wp-includes/widgets/class-wp-widget-text.php
	   20735 100%   21.05kB/s    0:00:00 (xfer#450852, to-check=0/517755)
var/yp/

sent 59229738371 bytes  received 11198208 bytes  2959309.47 bytes/sec
total size is 77965794338  speedup is 1.32
rsync warning: some files vanished before they could be transferred (code 24) at main.c(1052) [sender=3.0.9]

real    333m37.655s
user    19m50.292s
sys     6m0.997s
[maltfield@opensourceecology ~]$
  1. but I still can't ssh into it; again, my home dir is owned by the wrong user
[root@osestaging1 ~]# ls -lah /home/maltfield/.ssh
total 16K
drwxr-xr-x.  2 tgriffing tgriffing 4.0K Jan 19  2018 .
drwx------. 10 tgriffing tgriffing 4.0K Oct  3 07:06 ..
-rw-r--r--.  1 root      root       750 Jun 20  2017 authorized_keys
-rw-r--r--.  1 tgriffing tgriffing 1.1K Oct  3 13:44 known_hosts
[root@osestaging1 ~]# 
  1. maybe I should add the '--numeric-ids' option if rsync is mapping the uids over?
[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --numeric-ids --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
  1. I found that the 'sync.old' dir was still trying to sync, so I updated the command to add a wildcard after the exclude; it worked
[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --numeric-ids --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync* --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
  1. this time the double-tap took only 3 minutes wall time
[maltfield@opensourceecology ~]$ [ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --numeric-ids --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync* --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
...
var/www/html/munin/static/zoom.js
		4760 100%    1.13MB/s    0:00:00 (xfer#2239, to-check=1002/321739)

sent 224884435 bytes  received 1668273 bytes  1352553.48 bytes/sec
total size is 41283867704  speedup is 182.23

real    2m46.967s
user    0m32.382s
sys     0m8.095s
[maltfield@opensourceecology ~]$ 
  1. this time the permissions of my home dir didn't break, and I was able to ssh-in.
  2. I'd like to take a snapshot of the staging server, but at this point we don't have space for it
[root@osedev1 lxc]# df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/sda1                      19G  3.4G   15G  19% /
devtmpfs                      873M     0  873M   0% /dev
tmpfs                         896M     0  896M   0% /dev/shm
tmpfs                         896M   17M  879M   2% /run
tmpfs                         896M     0  896M   0% /sys/fs/cgroup
/dev/mapper/ose_dev_volume_1  125G   94G   25G  80% /mnt/ose_dev_volume_1
tmpfs                         180M     0  180M   0% /run/user/1000
[root@osedev1 lxc]# 
  1. ok, now, drum roll: did we break the staging server? let's try to shut it down & start it again.
  2. aaaaand: IT CAME BACK UP! Now it said its hostname isn't 'osestaging1' but 'opensourceecology'. Coolz.
  3. I was successfully able to ssh into it, but then it froze. And my attempts to login to the lxc-console all end in timeouts
opensourceecology login: maltfield
Password: 
login: timed out after 60 seconds

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

opensourceecology login: 
  1. if I attempt to login as root, then it just times-out before it even asks me for a password
opensourceecology login: root
login: timed out after 60 seconds

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

opensourceecology login: 
  1. ssh auth suceeds, but it also fails before I get a shell
...
debug1: Authentication succeeded (publickey).
Authenticated to 10.241.189.11 ([10.241.189.11]:32415).
debug1: channel 0: new [client-session]
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug3: send packet: type 90
debug1: Requesting no-more-sessions@openssh.com
debug3: send packet: type 80
debug1: Entering interactive session.
debug1: pledge: network
  1. I stopped the container again. This time when I tried to start it, I got an error
[root@osedev1 ~]# lxc-start -n -osestaging1
lxc-start: lxc_start.c: main: 290 Executing '/sbin/init' with no configuration file may crash the host
[root@osedev1 ~]# 
  1. I moved some dirs around so that I'm no longer using the 'rootfs' dir from the snaps dir, but now I get this damn message. duckducks are dead-ends
[root@osedev1 lxc]# lxc-start -n osestaging1
lxc-start: sync.c: __sync_wake: 74 sync wake failure : Broken pipe
lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
[root@osedev1 lxc]# lxc-start -P /var/lib/lxc/ -n osestaging1
lxc-start: sync.c: __sync_wake: 74 sync wake failure : Broken pipe
lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
[root@osedev1 lxc]# 
  1. I tried rebooting the dev server. after it came up, I still got the same error when attempting to `lxc-start`
  2. I found I could get debug logs by adding `-l log -o <file>` https://github.com/lxc/lxc/issues/1555
[root@osedev1 ~]# lxc-start -n osestaging1 -l debug -o lxc-start.log
lxc-start: sync.c: __sync_wake: 74 sync wake failure : Broken pipe
lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
[root@osedev1 ~]# cat lxc-start.log
...
  1. all the god damn google results on this "sync wake failure" shit (which are already few) are regarding configs of multiple containers sharing a network. I'll destroy the whole network namespace if needed. but how? why does nobody else encounter this damn issue?
  2. well, I found the source code. could be an issue with an open file descriptor or something? https://fossies.org/linux/lxc/src/lxc/sync.c
  3. my best guess is that it's an issue with the 'rootfs.dev' symlink
[root@osedev1 lxc]# ls -lah osestaging1
total 28K
drwxrwx---.  5 root root 4.0K Oct  8 16:17 .
drwxr-xr-x.  6 root root 4.0K Oct  8 16:05 ..
-rw-r--r--.  1 root root 1.1K Oct  8 15:46 config
drwxr-xr-x.  3 root root 4.0K Oct  8 15:46 dev
drwxr-xr-x.  2 root root 4.0K Oct  8 15:52 osestaging1
dr-xr-xr-x. 20 root root 4.0K Oct  8 15:21 rootfs
lrwxrwxrwx.  1 root root   38 Oct  8 16:17 rootfs.dev -> /dev/.lxc/osestaging1.72930b02843095eb
-rw-r--r--.  1 root root   19 Oct  3 15:40 ts
[root@osedev1 lxc]# 
  1. I commented-out every fucking line in the config file that had the word 'dev' in it...and the system started! Except that, umm, I couldn't connect to its console?
[root@osedev1 lxc]# lxc-start -n osestaging1 -f osestaging1/config -l trace -o lxc-start.log
Failed to create unit file /run/systemd/generator.late/netconsole.service: File exists
Failed to create unit file /run/systemd/generator.late/network.service: File exists
Running in a container, ignoring fstab device entry for /dev/disk/by-uuid/1e457b76-5100-4b53-bcdc-667ca122b941.
Running in a container, ignoring fstab device entry for /dev/mapper/ose_dev_volume_1.
Failed to create unit file /run/systemd/generator/systemd-cryptsetup@ose_dev_volume_1.service: File exists

lxc-start: console.c: lxc_console_peer_proxy_alloc: 315 console not set up
  1. I found that if I commented-out the first line and added-back a rootfs line, I could get it to boot again, but I couldn't login from the console (same 60 second timeout) or ssh in (or ping it)
#lxc.mount.entry = /dev/net dev/net none bind,create=dir
...
lxc.rootfs = /var/lib/lxc/osestaging1/rootfs
  1. I uncommented the first line, and it still started! looks like the issue was that I didn't explicitly define a rootfs..
  2. this time I could ping the server from my laptop over the vpn
  3. I was able to login as 'maltfield' from the console, but it locked-up when I tried to `sudo su -`
  4. on the next reboot, tailed all the files in /var/log from the osedev1 server (inside the staging container's rootfs dir); I saw some interesting results
==> osestaging1/rootfs/var/log/messages <==
Oct  8 14:50:00 opensourceecology NET[248]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Oct  8 14:50:00 opensourceecology dhclient[201]: bound to 192.168.122.201 -- renewal in 1588 seconds.
Oct  8 14:50:00 opensourceecology network: Determining IP information for eth0... done.
Oct  8 14:50:00 opensourceecology network: [  OK  ]
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/kernel/yama/ptrace_scope': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '16' to '/proc/sys/kernel/sysrq': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/kernel/core_uses_pid': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/default/rp_filter': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/all/rp_filter': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv4/conf/default/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv4/conf/all/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/default/promote_secondaries': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/all/promote_secondaries': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/fs/protected_hardlinks': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/fs/protected_symlinks': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/autoconf': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_dad': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra_defrtr': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra_rtr_pref': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra_pinfo': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_redirects': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/forwarding': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/autoconf': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_dad': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra_defrtr': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra_rtr_pref': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra_pinfo': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_redirects': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/forwarding': Read-only file system
Oct  8 14:50:01 opensourceecology systemd: Started LSB: Bring up/down networking.
  1. and issues with /run
Oct  8 14:50:05 opensourceecology systemd-logind: Failed to remove runtime directory /run/user/0: Device or resource busy

Mon Oct 07, 2019

  1. I added a comment to our long-standing feature request with the Libre Office Online CODE project for the ability to draw lines & arrows in their online version of "present" https://bugs.documentfoundation.org/show_bug.cgi?id=113386#c4
  2. wiki updates & logging
  3. I tried to login to my hetzner cloud account, but I got "Account is disabled" fucking hell. so much for user-specific auditing. I logged-in with our shared account..
  4. I confirmed that our osedev1 node has a 20G disk + 10G volume.
  5. we currently are using 3.4/19G on osedev1; I never setup the 10G volume that appears to be at /mnt/HC_Volume_3110278. It has 10G avail
[maltfield@osedev1 ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        19G  3.4G   15G  19% /
devtmpfs        873M     0  873M   0% /dev
tmpfs           896M     0  896M   0% /dev/shm
tmpfs           896M   25M  871M   3% /run
tmpfs           896M     0  896M   0% /sys/fs/cgroup
/dev/sdb        9.8G   37M  9.3G   1% /mnt/HC_Volume_3110278
tmpfs           180M     0  180M   0% /run/user/1000
[maltfield@osedev1 ~]$ ls -lah /mnt/HC_Volume_3110278/
total 24K
drwxr-xr-x. 3 root root 4.0K Aug 20 11:50 .
drwxr-xr-x. 3 root root 4.0K Aug 20 12:16 ..
drwx------. 2 root root  16K Aug 20 11:50 lost+found
[maltfield@osedev1 ~]$ 
  1. the disk RAID1'd disk on prod is 197G with 75G used
[maltfield@opensourceecology ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        197G   75G  113G  40% /
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G  8.0K   32G   1% /dev/shm
tmpfs            32G  2.6G   29G   9% /run
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/md1        488M  289M  174M  63% /boot
tmpfs           6.3G     0  6.3G   0% /run/user/0
tmpfs           6.3G     0  6.3G   0% /run/user/1005
[maltfield@opensourceecology ~]$ 
  1. a quick duckduck pulled up this guide for using luks to create an encrypted volume out of hetzner block volumes; this is a good idea https://angristan.xyz/how-to-use-encrypted-block-storage-volumes-hetzner-cloud/
  2. the guide shows a method for resizing the encrypted volume. I didn't think that would be trivial, but it appears that resize2fs can increase the size of a luks-encrypted volume without issue. this is good to know. if we run out of space (or maybe we create a second staging node or ad-hoc dev nodes), we should be able to shutdown all our lxc containers, unmount the block drive, resize it, and remount it. That said, I don't think we'll be making backups of these (dev/staging) containers, so if we fuck up it would be bad.
  3. our 10G hetzner cloud block volume has been costing 0.48 EUR/mo = 5.76 EUR/yr
  4. the min needed for our current prod server is 75G. The slider on the product page has weird increments, but the actual "resize volume" option in the cloud console wui permits resizing in 1G increments. A 75G volume would cost 3.00 EUR/mo = 35 EUR/yr
  5. A much more sane choice would be equal to the disk on prod = 197G = 7.88 EUR/mo = 94.56 EUR/yr
  6. fuck, I asked Marcin for $100/yr. Currently we're spending 2.49/mo on the osedev1 instance alone. That's 29.88 EUR/yr = 32.81 USD/yr. For a 100 USD/yr budget, that leaves 67.19 USD for disk space = 61.19 EUR/yr. That's 5.09 EUR/mo, which will buy us a 127G volume at 5.08 EUR/mo.
  7. 127/197 = 0.64. Therefore, a 127G block volume will allow for an lxc staging node to replicate our prod node until our prod node grows beyond 64% capacity. 70% is a good general high-water-mark at which we'd need to look at migrating prod anyway. This (127G) seems like a resonable low-budget solution that meets the 100 USD/yr line.
  8. I resized our 10G 'ose-dev-volume-1' volume to 127G in the hetzner WUI.
  9. I clicked the 'enable protection' option, which prevents it from being deleted until the protection is manually removed
  10. the 'show configuration' window in the wui tells us that the volume is '/dev/disk/by-id/scsi-0HC_Volume_3110278' on osedev1
  11. the box itself looks like it's really /dev/sdb
[maltfield@osedev1 ~]$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=893568k,nr_inodes=223392,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset,clone_children)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda1 on / type ext4 (rw,relatime,seclabel,data=ordered)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11033)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
/dev/sdb on /mnt/HC_Volume_3110278 type ext4 (rw,relatime,seclabel,discard,data=ordered)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=183308k,mode=700,uid=1000,gid=1000)
[maltfield@osedev1 ~]$
  1. but the other name appears in fstab
[root@osedev1 ~]# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Sun Jul 14 04:14:25 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=1e457b76-5100-4b53-bcdc-667ca122b941 /                       ext4    defaults        1 1
/dev/disk/by-id/scsi-0HC_Volume_3110278 /mnt/HC_Volume_3110278 ext4 discard,nofail,defaults 0 0
[root@osedev1 ~]# 
  1. ah, indeed, the above disk is just a link back to /dev/sdb
[root@osedev1 ~]# ls -lah /dev/disk/by-id/scsi-0HC_Volume_3110278 
lrwxrwxrwx. 1 root root 9 Oct  7 10:31 /dev/disk/by-id/scsi-0HC_Volume_3110278 -> ../../sdb
[root@osedev1 ~]# 
  1. before I rebuild this volume, the cryptfs command begs the question: where do I store the key?
    1. assuming I want the server to be able to restart by itself without user interaction, the key should probably be stored in a file somewhere on '/root' on 'osedev1' but while my OS would lock-down the permissions to that file, the key file itself would likely be stored unencrypted on some hetzner drive somewhere. Is it worth encrypting the contents of the block volume when the encryption key itself might be stored unencrypted somewhere at hetzner's datacenter?
  2. as a test, I ran `testdisk` to see if I could find any deleted files in the 10G volume that hetzner gave us from previous customers; I couldn't.
  3. someone asked about this, but there wasn't much great discussion on how hetzner provisions their disks https://serverfault.com/questions/950790/cloud-server-vulnerability-analysis?noredirect=1
  4. so risk assessment: when working in a cloud, we have to accept the integrity of the cloud provider. If a rogue hetzner employee wants to steal all our data, they can. There's absolutely nothing we can do about that other than building the servers ourselves and physically locking them down. The decision to use hetzner predates me, but I agree with it. It does not make sense for OSE to buy a server rack and host our equipment at FeF. So, I accept the risk and trust that hetzner not do something malicious that will put our data at risk
  5. the real concern here is that we resize our volume (or hetzner in the background shuffles some abstracted blocks around physical devices that's black-boxed to us), and a different customer suddently gets, for example, our user's PII in their new volume. Or a malicious hetzner cloud user triggers some shuffling and is successfully able to exfiltrate our data from their cloud without breaking into our server. This is the risk that we're trying to prevent. In this case, I think it *is* worthwhile to encrypt our block volume. The chances that someone is able to get chunks of our data from an old 127G block volume that lacked encryption is significantly higher than them able to get those *and* the key from our server *and* be able to use the key to extract meaninful data from the likely non-contiguious bits that may be extracted from our recycled block volume data.
  6. hetzner does not have a clean record, but hardly anybody does. This is only customer data, though. Not the their customer's server contents data https://mybroadband.co.za/news/cloud-hosting/279181-hetzner-client-data-exposed-after-attack.html
  7. so, while recognizing that it has limitations, I also recognize that there are sufficient benefits to justfy encrypting this block volume with a key stored unencrypted on our cloud instance
  8. meanwhile, I found a guide for how to migrate the contents of /var to a block volume. It suggested doing so from a resuce disk, then editing fstab for the next reboot https://serverfault.com/questions/947732/how-to-add-hetzner-cloud-disk-volume-to-extend-var-partition
  9. I created a new key file on my laptop, stored it in our shared keepass, and uploaded it to the server at /root/keys/ose-dev-volume-1.201910.key
  10. let's shutdown osedev1 and migrate its /var/ to a block volume. First I'll shutdown the osestagng1 staging lxc container then the host osedev1
[root@osedev1 ~]# lxc-stop -n osestaging1
[root@osedev1 ~]# shutdown -h now
Connection to 195.201.233.113 closed by remote host.
Connection to 195.201.233.113 closed.
user@ose:~$ 
  1. I confirmed that the server was off in the hetzner cloud console wui
  2. I clicked on the server. I'm not clear if I should mount a rescue disk or click the "rescue" option. No idea what the latter is, so I navigated to "ISO IMAGES", found SystemRescueCD, and clicked the "MOUNT" button next to it. I went back to the "servers"# I added a comment to our long-standing feature request with the Libre Office Online CODE project for the ability to draw lines & arrows in their online version of "present" https://bugs.documentfoundation.org/show_bug.cgi?id=113386#c4
  3. wiki updates & logging
  4. I tried to login to my hetzner cloud account, but I got "Account is disabled" fucking hell. so much for user-specific auditing. I logged-in with our shared account..
  5. I confirmed that our osedev1 node has a 20G disk + 10G volume.
  6. we currently are using 3.4/19G on osedev1; I never setup the 10G volume that appears to be at /mnt/HC_Volume_3110278. It has 10G avail
[maltfield@osedev1 ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        19G  3.4G   15G  19% /
devtmpfs        873M     0  873M   0% /dev
tmpfs           896M     0  896M   0% /dev/shm
tmpfs           896M   25M  871M   3% /run
tmpfs           896M     0  896M   0% /sys/fs/cgroup
/dev/sdb        9.8G   37M  9.3G   1% /mnt/HC_Volume_3110278
tmpfs           180M     0  180M   0% /run/user/1000
[maltfield@osedev1 ~]$ ls -lah /mnt/HC_Volume_3110278/
total 24K
drwxr-xr-x. 3 root root 4.0K Aug 20 11:50 .
drwxr-xr-x. 3 root root 4.0K Aug 20 12:16 ..
drwx------. 2 root root  16K Aug 20 11:50 lost+found
[maltfield@osedev1 ~]$ 
  1. the disk RAID1'd disk on prod is 197G with 75G used
[maltfield@opensourceecology ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        197G   75G  113G  40% /
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G  8.0K   32G   1% /dev/shm
tmpfs            32G  2.6G   29G   9% /run
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/md1        488M  289M  174M  63% /boot
tmpfs           6.3G     0  6.3G   0% /run/user/0
tmpfs           6.3G     0  6.3G   0% /run/user/1005
[maltfield@opensourceecology ~]$ 
  1. a quick duckduck pulled up this guide for using luks to create an encrypted volume out of hetzner block volumes; this is a good idea https://angristan.xyz/how-to-use-encrypted-block-storage-volumes-hetzner-cloud/
  2. the guide shows a method for resizing the encrypted volume. I didn't think that would be trivial, but it appears that resize2fs can increase the size of a luks-encrypted volume without issue. this is good to know. if we run out of space (or maybe we create a second staging node or ad-hoc dev nodes), we should be able to shutdown all our lxc containers, unmount the block drive, resize it, and remount it. That said, I don't think we'll be making backups of these (dev/staging) containers, so if we fuck up it would be bad.
  3. our 10G hetzner cloud block volume has been costing 0.48 EUR/mo = 5.76 EUR/yr
  4. the min needed for our current prod server is 75G. The slider on the product page has weird increments, but the actual "resize volume" option in the cloud console wui permits resizing in 1G increments. A 75G volume would cost 3.00 EUR/mo = 35 EUR/yr
  5. A much more sane choice would be equal to the disk on prod = 197G = 7.88 EUR/mo = 94.56 EUR/yr
  6. fuck, I asked Marcin for $100/yr. Currently we're spending 2.49/mo on the osedev1 instance alone. That's 29.88 EUR/yr = 32.81 USD/yr. For a 100 USD/yr budget, that leaves 67.19 USD for disk space = 61.19 EUR/yr. That's 5.09 EUR/mo, which will buy us a 127G volume at 5.08 EUR/mo.
  7. 127/197 = 0.64. Therefore, a 127G block volume will allow for an lxc staging node to replicate our prod node until our prod node grows beyond 64% capacity. 70% is a good general high-water-mark at which we'd need to look at migrating prod anyway. This (127G) seems like a resonable low-budget solution that meets the 100 USD/yr line.
  8. I resized our 10G 'ose-dev-volume-1' volume to 127G in the hetzner WUI.
  9. I clicked the 'enable protection' option, which prevents it from being deleted until the protection is manually removed
  10. the 'show configuration' window in the wui tells us that the volume is '/dev/disk/by-id/scsi-0HC_Volume_3110278' on osedev1
  11. the box itself looks like it's really /dev/sdb
[maltfield@osedev1 ~]$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=893568k,nr_inodes=223392,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset,clone_children)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda1 on / type ext4 (rw,relatime,seclabel,data=ordered)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11033)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
/dev/sdb on /mnt/HC_Volume_3110278 type ext4 (rw,relatime,seclabel,discard,data=ordered)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=183308k,mode=700,uid=1000,gid=1000)
[maltfield@osedev1 ~]$
  1. but the other name appears in fstab
[root@osedev1 ~]# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Sun Jul 14 04:14:25 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=1e457b76-5100-4b53-bcdc-667ca122b941 /                       ext4    defaults        1 1
/dev/disk/by-id/scsi-0HC_Volume_3110278 /mnt/HC_Volume_3110278 ext4 discard,nofail,defaults 0 0
[root@osedev1 ~]# 
  1. ah, indeed, the above disk is just a link back to /dev/sdb
[root@osedev1 ~]# ls -lah /dev/disk/by-id/scsi-0HC_Volume_3110278 
lrwxrwxrwx. 1 root root 9 Oct  7 10:31 /dev/disk/by-id/scsi-0HC_Volume_3110278 -> ../../sdb
[root@osedev1 ~]# 
  1. before I rebuild this volume, the cryptfs command begs the question: where do I store the key?
    1. assuming I want the server to be able to restart by itself without user interaction, the key should probably be stored in a file somewhere on '/root' on 'osedev1' but while my OS would lock-down the permissions to that file, the key file itself would likely be stored unencrypted on some hetzner drive somewhere. Is it worth encrypting the contents of the block volume when the encryption key itself might be stored unencrypted somewhere at hetzner's datacenter?
  2. as a test, I ran `testdisk` to see if I could find any deleted files in the 10G volume that hetzner gave us from previous customers; I couldn't.
  3. someone asked about this, but there wasn't much great discussion on how hetzner provisions their disks https://serverfault.com/questions/950790/cloud-server-vulnerability-analysis?noredirect=1
  4. so risk assessment: when working in a cloud, we have to accept the integrity of the cloud provider. If a rogue hetzner employee wants to steal all our data, they can. There's absolutely nothing we can do about that other than building the servers ourselves and physically locking them down. The decision to use hetzner predates me, but I agree with it. It does not make sense for OSE to buy a server rack and host our equipment at FeF. So, I accept the risk and trust that hetzner not do something malicious that will put our data at risk
  5. the real concern here is that we resize our volume (or hetzner in the background shuffles some abstracted blocks around physical devices that's black-boxed to us), and a different customer suddently gets, for example, our user's PII in their new volume. Or a malicious hetzner cloud user triggers some shuffling and is successfully able to exfiltrate our data from their cloud without breaking into our server. This is the risk that we're trying to prevent. In this case, I think it *is* worthwhile to encrypt our block volume. The chances that someone is able to get chunks of our data from an old 127G block volume that lacked encryption is significantly higher than them able to get those *and* the key from our server *and* be able to use the key to extract meaninful data from the likely non-contiguious bits that may be extracted from our recycled block volume data.
  6. hetzner does not have a clean record, but hardly anybody does. This is only customer data, though. Not the their customer's server contents data https://mybroadband.co.za/news/cloud-hosting/279181-hetzner-client-data-exposed-after-attack.html
  7. so, while recognizing that it has limitations, I also recognize that there are sufficient benefits to justfy encrypting this block volume with a key stored unencrypted on our cloud instance
  8. meanwhile, I found a guide for how to migrate the contents of /var to a block volume. It suggested doing so from a resuce disk, then editing fstab for the next reboot https://serverfault.com/questions/947732/how-to-add-hetzner-cloud-disk-volume-to-extend-var-partition
  9. I created a new key file on my laptop, stored it in our shared keepass, and uploaded it to the server at /root/keys/ose-dev-volume-1.201910.key
  10. let's shutdown osedev1 and migrate its /var/ to a block volume. First I'll shutdown the osestagng1 staging lxc container then the host osedev1
[root@osedev1 ~]# lxc-stop -n osestaging1
[root@osedev1 ~]# shutdown -h now
Connection to 195.201.233.113 closed by remote host.
Connection to 195.201.233.113 closed.
user@ose:~$ 
  1. I confirmed that the server was off in the hetzner cloud console wui
  2. I clicked on the server. I'm not clear if I should mount a rescue disk or click the "rescue" option. No idea what the latter is, so I navigated to "ISO IMAGES", found SystemRescueCD, and clicked the "MOUNT" button next to it. I went back to the "servers" page, opened a console for 'osedev1', and clicked "Power on"
  3. the console showed the boot options for the rescue cd. I choose the first menu item = "SystemRescueCd: default boot options"
  4. I can't copy & paste from the console, but I basically found 5x items in /dev/disk/by-id/
    1. the DVD for systemrescue
    2. my 127G block volume with the same name shown above (scsi-0HC_Volume_3110278 )
    3. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0
    4. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0-part1
    5. another DVD?
  5. so 3 & 4 must be our osedev1 disk. Both are 19.1G
  6. attempting to mount the one without '-part1' failed, but the one with '-part1' succeeded, and all my data was there. It was mounted to '/mnt/osedev1-part/'
  7. I formatted the new 127G ebs volume using cryptsetup
cryptsetup luksFormat /dev/disk/by-id/scsi-0HC_Volume_211278 /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key
  1. I opened the new encrypted luks volume and created its ext4 partition
cryptsetup luksOpen --key-file /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key /dev/disk/by-id/scsi-0HC_Volume_211278 ebs
mkfs.ext4 -j /dev/mapper/ebs
  1. I mounted the new FS & began a sync the osedev1's 'var' dir (now only 2.3G) to it
mkdir /mnt/ebs
mount /dev/mapper/ebs /mnt/ebs
rsync -av --progress /mnt/osedev1/var /mnt/ebs/
  1. I added entries for fstab & crypttab to auto-mount the volume to /mnt/ose_dev_volume_1/
  2. I moved the existing /var/ dir to /var.old and made a symlink from /var/ to /mnt/ose_dev_volume_1/var
  3. I safely umounted & closed all the disks and shutdown
  4. I removed the systemrescue iso from the server and started it up again
  5. I was able to ssh-in, and the new '/var/' dir *appeared* to be setup properly
[maltfield@osedev1 /]$ ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[maltfield@osedev1 /]$ ls -lah /var/
total 80K
drwxr-xr-x. 19 root root 4.0K Jul 14 06:18 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 ..
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 adm
drwxr-xr-x.  7 root root 4.0K Oct  2 14:24 cache
drwxr-xr-x.  2 root root 4.0K Apr 24 16:03 crash
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 db
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 empty
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 games
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 gopher
drwxr-xr-x.  3 root root 4.0K Jul 14 06:14 kerberos
drwxr-xr-x. 34 root root 4.0K Oct  2 15:34 lib
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 local
lrwxrwxrwx.  1 root root   11 Jul 14 06:14 lock -> ../run/lock
drwxr-xr-x. 11 root root 4.0K Oct  7 13:49 log
lrwxrwxrwx.  1 root root   10 Jul 14 06:14 mail -> spool/mail
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 nis
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 opt
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 preserve
lrwxrwxrwx.  1 root root    6 Jul 14 06:14 run -> ../run
drwxr-xr-x.  8 root root 4.0K Oct  3 08:06 spool
drwxrwxrwt.  4 root root 4.0K Oct  7 13:49 tmp
-rw-r--r--.  1 root root  163 Jul 14 06:14 .updated
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 yp
[maltfield@osedev1 /]$ 
  1. but I immediately noticed that, for exaple, screen wasn't working
[maltfield@osedev1 /]$ screen -S ebs
Cannot make directory '/var/run/screen': No such file or directory
[maltfield@osedev1 /]$ 
  1. oh, damn, '/var/run' is a relative symlink to '../run' which won't work
[maltfield@osedev1 /]$ ls -lah /var/run
lrwxrwxrwx. 1 root root 6 Jul 14 06:14 /var/run -> ../run
[maltfield@osedev1 /]$ 
  1. I made it an absolute symlink instead
[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]# 
  1. it still fails, but everything looks ok; I gave the system a reboot
[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]# 
  1. when the system came back up, `screen` had no issues, and everything looked good.
[maltfield@osedev1 ~]$ screen -ls
There is a screen on:
		4362.ebs        (Attached)
1 Socket in /var/run/screen/S-maltfield.

[maltfield@osedev1 ~]$ sudo su -
Last login: Mon Oct  7 13:54:28 CEST 2019 on pts/0
[root@osedev1 ~]# df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/sda1                      19G  3.4G   15G  19% /
devtmpfs                      873M     0  873M   0% /dev
tmpfs                         896M     0  896M   0% /dev/shm
tmpfs                         896M   17M  879M   2% /run
tmpfs                         896M     0  896M   0% /sys/fs/cgroup
/dev/mapper/ose_dev_volume_1  125G  2.5G  116G   3% /mnt/ose_dev_volume_1
tmpfs                         180M     0  180M   0% /run/user/1000
[root@osedev1 ~]# ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[root@osedev1 ~]# ls -lah /mnt/ose_dev_volume_1/
total 28K
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:46 ..
drwx------.  2 root root  16K Oct  7 13:18 lost+found
drwxr-xr-x. 19 root root 4.0K Oct  7 13:54 var
[root@osedev1 ~]# 
  1. I started the staging server, connected to the vpn from my laptop, and was successfully able to ssh into it (though it took a long delay)
  2. I ssh'd into prod and kicked-off the rsync!
time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
  1. that also copied the old backups, which is probably unnecessary. I should also exclude
    1. home/b2user/sync
  2. this sync is going at a rate of about 1G every 5 minutes. I expect it'll be done in 5-10 hours. I'll check on it tomorrow. page, opened a console for 'osedev1', and clicked "Power on"
  3. the console showed the boot options for the rescue cd. I choose the first menu item = "SystemRescueCd: default boot options"
  4. I can't copy & paste from the console, but I basically found 5x items in /dev/disk/by-id/
    1. the DVD for systemrescue
    2. my 127G block volume with the same name shown above (scsi-0HC_Volume_3110278 )
    3. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0
    4. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0-part1
    5. another DVD?
  5. so 3 & 4 must be our osedev1 disk. Both are 19.1G
  6. attempting to mount the one without '-part1' failed, but the one with '-part1' succeeded, and all my data was there. It was mounted to '/mnt/osedev1-part/'
  7. I formatted the new 127G ebs volume using cryptsetup
cryptsetup luksFormat /dev/disk/by-id/scsi-0HC_Volume_211278 /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key
  1. I opened the new encrypted luks volume and created its ext4 partition
cryptsetup luksOpen --key-file /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key /dev/disk/by-id/scsi-0HC_Volume_211278 ebs
mkfs.ext4 -j /dev/mapper/ebs
  1. I mounted the new FS & began a sync the osedev1's 'var' dir (now only 2.3G) to it
mkdir /mnt/ebs
mount /dev/mapper/ebs /mnt/ebs
rsync -av --progress /mnt/osedev1/var /mnt/ebs/
  1. I added entries for fstab & crypttab to auto-mount the volume to /mnt/ose_dev_volume_1/
  2. I moved the existing /var/ dir to /var.old and made a symlink from /var/ to /mnt/ose_dev_volume_1/var
  3. I safely umounted & closed all the disks and shutdown
  4. I removed the systemrescue iso from the server and started it up again
  5. I was able to ssh-in, and the new '/var/' dir *appeared* to be setup properly
[maltfield@osedev1 /]$ ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[maltfield@osedev1 /]$ ls -lah /var/
total 80K
drwxr-xr-x. 19 root root 4.0K Jul 14 06:18 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 ..
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 adm
drwxr-xr-x.  7 root root 4.0K Oct  2 14:24 cache
drwxr-xr-x.  2 root root 4.0K Apr 24 16:03 crash
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 db
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 empty
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 games
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 gopher
drwxr-xr-x.  3 root root 4.0K Jul 14 06:14 kerberos
drwxr-xr-x. 34 root root 4.0K Oct  2 15:34 lib
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 local
lrwxrwxrwx.  1 root root   11 Jul 14 06:14 lock -> ../run/lock
drwxr-xr-x. 11 root root 4.0K Oct  7 13:49 log
lrwxrwxrwx.  1 root root   10 Jul 14 06:14 mail -> spool/mail
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 nis
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 opt
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 preserve
lrwxrwxrwx.  1 root root    6 Jul 14 06:14 run -> ../run
drwxr-xr-x.  8 root root 4.0K Oct  3 08:06 spool
drwxrwxrwt.  4 root root 4.0K Oct  7 13:49 tmp
-rw-r--r--.  1 root root  163 Jul 14 06:14 .updated
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 yp
[maltfield@osedev1 /]$ 
  1. but I immediately noticed that, for exaple, screen wasn't working
[maltfield@osedev1 /]$ screen -S ebs
Cannot make directory '/var/run/screen': No such file or directory
[maltfield@osedev1 /]$ 
  1. oh, damn, '/var/run' is a relative symlink to '../run' which won't work
[maltfield@osedev1 /]$ ls -lah /var/run
lrwxrwxrwx. 1 root root 6 Jul 14 06:14 /var/run -> ../run
[maltfield@osedev1 /]$ 
  1. I made it an absolute symlink instead
[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]# 
  1. it still fails, but everything looks ok; I gave the system a reboot
[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]# 
  1. when the system came back up, `screen` had no issues, and everything looked good.
[maltfield@osedev1 ~]$ screen -ls
There is a screen on:
		4362.ebs        (Attached)
1 Socket in /var/run/screen/S-maltfield.

[maltfield@osedev1 ~]$ sudo su -
Last login: Mon Oct  7 13:54:28 CEST 2019 on pts/0
[root@osedev1 ~]# df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/sda1                      19G  3.4G   15G  19% /
devtmpfs                      873M     0  873M   0% /dev
tmpfs                         896M     0  896M   0% /dev/shm
tmpfs                         896M   17M  879M   2% /run
tmpfs                         896M     0  896M   0% /sys/fs/cgroup
/dev/mapper/ose_dev_volume_1  125G  2.5G  116G   3% /mnt/ose_dev_volume_1
tmpfs                         180M     0  180M   0% /run/user/1000
[root@osedev1 ~]# ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[root@osedev1 ~]# ls -lah /mnt/ose_dev_volume_1/
total 28K
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:46 ..
drwx------.  2 root root  16K Oct  7 13:18 lost+found
drwxr-xr-x. 19 root root 4.0K Oct  7 13:54 var
[root@osedev1 ~]# 
  1. I started the staging server, connected to the vpn from my laptop, and was successfully able to ssh into it (though it took a long delay)
  2. I ssh'd into prod and kicked-off the rsync!
time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
  1. that also copied the old backups, which is probably unnecessary. I should also exclude
    1. home/b2user/sync
  2. this sync is going at a rate of about 1G every 5 minutes. I expect it'll be done in 5-10 hours. I'll check on it tomorrow.

Sat Oct 05, 2019

  1. email

Fri Oct 04, 2019

  1. email

Thr Oct 03, 2019

  1. continuing from yesterday, I copied the dev-specific encryption key from our shared keepass for the backups to the dev node
[root@osedev1 backups]# mv /home/maltfield/ose-dev-backups-cron.201910.key /root/backups/
[root@osedev1 backups]# chown root:root ose-dev-backups-cron.201910.key 
[root@osedev1 backups]# chmod 0400 ose-dev-backups-cron.201910.key 
[root@osedev1 backups]# ls -lah 
total 32K
drwxr-xr-x. 4 root root 4.0K Oct  3 07:09 .
dr-xr-x---. 7 root root 4.0K Oct  3 07:03 ..
-rw-r--r--. 1 root root  747 Oct  2 15:57 backup.settings
-rwxr-xr-x. 1 root root 5.7K Oct  3 07:03 backup.sh
drwxr-xr-x. 3 root root 4.0K Sep  9 09:02 iptables
-r--------. 1 root root 4.0K Oct  3 07:05 ose-dev-backups-cron.201910.key
drwxr-xr-x. 2 root root 4.0K Oct  3 07:04 sync
[root@osedev1 backups]# 
  1. note that I also had to install `trickle` on the dev node
[root@osedev1 backups]# ./backup.sh
================================================================================
INFO: Beginning Backup Run on 20191003_051037
INFO: Cleaning up old backup files
...
INFO: moving encrypted backup file to b2user's sync dir
INFO: Beginning upload to backblaze b2
sudo: /bin/trickle: command not found

real    0m0.030s
user    0m0.009s
sys     0m0.021s
[root@osedev1 backups]# yum install trickle
...
Installed:
  trickle.x86_64 0:1.07-19.el7                                                                                                 

Complete!
[root@osedev1 backups]# 
  1. note that something changed in the install process of the b2cli that required me to use the '--user' flag, which changed the path to the b2 binary. To keep the mods to the backup.sh script minimal, I just created a symlink
[root@osedev1 backups]# ./backup.sh
...
+ echo 'INFO: Beginning upload to backblaze b2'
INFO: Beginning upload to backblaze b2
+ /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev120191003_051511.tar.gpg daily_osedev120191003_051511.tar.gpg
trickle: exec(): No such file or directory

real    0m0.040s
user    0m0.012s
sys     0m0.020s
+ exit 0
[root@osedev1 backups]# /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev120191003_051511.tar.gpg daily_osedev120191003_051511.tar.gpg
trickle: exec(): No such file or directory
[root@osedev1 b2user]# ln -s /home/b2user/.local/bin/b2 /home/b2user/virtualenv/bin/b2
[root@osedev1 b2user]# 
  1. the backup script still failed at the upload to b2
[root@osedev1 backups]# ./backup.sh
...
INFO: Beginning upload to backblaze b2
+ /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev1_20191003_052059.tar.gpg daily_osedev1_20191003_052059.tar.gpg
ERROR: Missing account data: 'NoneType' object has no attribute 'getitem'  Use: b2 authorize-account

real    0m0.363s
user    0m0.281s
sys     0m0.076s
+ exit 0
[root@osedev1 b2user]# 
[root@osedev1 b2user]# /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev1_20191003_052059.tar.gpg daily_osedev1_20191003_052059.tar.gpg
ERROR: Missing account data: 'NoneType' object has no attribute 'getitem'  Use: b2 authorize-account
[root@osedev1 b2user]# 
  1. per the error, I used `b2 authorize-account` and added my creds for the user 'b2user'
[root@osedev1 b2user]# su - b2user
Last login: Wed Oct  2 16:15:28 CEST 2019 on pts/8
[b2user@osedev1 ~]$ .local/bin/b2 authorize-account
Using https://api.backblazeb2.com
Backblaze application key ID: XXXXXXXXXXXXXXXXXXXXXXXXX
Backblaze application key: 
[b2user@osedev1 ~]$ 
  1. this time the backup succeeded!
[root@osedev1 b2user]# /root/backups/backup.sh
...
INFO: moving encrypted backup file to b2user's sync dir
+ /bin/mv /root/backups/sync/daily_osedev1_20191003_052448.tar.gpg /home/b2user/sync/daily_osedev1_20191003_052448.tar.gpg
+ /bin/chown b2user /home/b2user/sync/daily_osedev1_20191003_052448.tar.gpg
+ echo 'INFO: Beginning upload to backblaze b2'
INFO: Beginning upload to backblaze b2
+ /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev1_20191003_052448.tar.gpg daily_osedev1_20191003_052448.tar.gpg
URL by file name: https://f001.backblazeb2.com/file/ose-dev-server-backups/daily_osedev1_20191003_052448.tar.gpg
URL by fileId: https://f001.backblazeb2.com/b2api/v2/b2_download_file_by_id?fileId=4_z2675c17c55dd1d696edd0118_f1082387e9ca2c0d4_d20191003_m052459_c001_v0001109_t0038
{ 
  "action": "upload",
  "fileId": "4_z2675c17c55dd1d696edd0118_f1082387e9ca2c0d4_d20191003_m052459_c001_v0001109_t0038",
  "fileName": "daily_osedev1_20191003_052448.tar.gpg",
  "size": 17233113,
  "uploadTimestamp": 1570080299000
}

real    0m26.435s
user    0m0.706s
sys     0m0.251s
+ exit 0
[root@osedev1 b2user]#
  1. as an out-of-band restore validation, I downloaded the 17.2M backup file from the backblaze b2 wui onto my laptop
  2. again, I downloaded the encryption key from our shared keepass
user@disp5653:~/Downloads$ gpg --batch --passphrase-file ose-dev-backups-cron.201910.key --output daily_osedev1_20191003_052448.tar ose-dev-backups-cron.201910.key 
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
gpg: no valid OpenPGP data found.
gpg: processing message failed: Unknown system error
user@disp5653:~/Downloads$ gpg --batch --passphrase-file ose-dev-backups-cron.201910.key --output daily_osedev1_20191003_052448.tar daily_osedev1_20191003_052448.tar.gpg 
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
gpg: AES256 encrypted data
gpg: encrypted with 1 passphrase
user@disp5653:~/Downloads$ tar -xf daily_osedev1_20191003_052448.tar 
user@disp5653:~/Downloads$ ls
daily_osedev1_20191003_052448.tar      ose-dev-backups-cron.201910.key
daily_osedev1_20191003_052448.tar.gpg  root
user@disp5653:~/Downloads$ find root/backups/sync/daily_osedev1_20191003_052448/ -type f
root/backups/sync/daily_osedev1_20191003_052448/www/www.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/root/root.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/log/log.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/etc/etc.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/home/home.20191003_052448.tar.gz
user@disp5653:~/Downloads$ 
  1. it looks like it's working; here's the contents of the backup file (note there's some varnish config files on here from when I did my test rsync back in on Sep 9th Maltfield_Log/2019_Q3#Mon_Sep_09.2C_2019
user@disp5653:~/Downloads$ find root/backups/sync/daily_osedev1_20191003_052448/ -type f -exec tar -tvf '{}' \; | awk '{print $6}' | cut -d/ -f 1-2 | sort -u
etc/adjtime
etc/aliases
etc/alternatives
etc/anacrontab
etc/audisp
etc/audit
etc/bash_completion.d
etc/bashrc
etc/binfmt.d
etc/centos-release
etc/centos-release-upstream
etc/chkconfig.d
etc/chrony.conf
etc/chrony.keys
etc/cloud
etc/cron.d
etc/cron.daily
etc/cron.deny
etc/cron.hourly
etc/cron.monthly
etc/crontab
etc/cron.weekly
etc/crypttab
etc/csh.cshrc
etc/csh.login
etc/dbus-1
etc/default
etc/depmod.d
etc/dhcp
etc/DIR_COLORS
etc/DIR_COLORS.256color
etc/DIR_COLORS.lightbgcolor
etc/dnsmasq.conf
etc/dnsmasq.d
etc/dracut.conf
etc/dracut.conf.d
etc/e2fsck.conf
etc/environment
etc/ethertypes
etc/exports
etc/exports.d
etc/filesystems
etc/firewalld
etc/fstab
etc/gcrypt
etc/GeoIP.conf
etc/GeoIP.conf.default
etc/gnupg
etc/GREP_COLORS
etc/groff
etc/group
etc/group-
etc/grub2.cfg
etc/grub.d
etc/gshadow
etc/gshadow-
etc/gss
etc/gssproxy
etc/host.conf
etc/hostname
etc/hosts
etc/hosts.allow
etc/hosts.deny
etc/idmapd.conf
etc/init.d
etc/inittab
etc/inputrc
etc/iproute2
etc/iscsi
etc/issue
etc/issue.net
etc/kdump.conf
etc/kernel
etc/krb5.conf
etc/krb5.conf.d
etc/ld.so.cache
etc/ld.so.conf
etc/ld.so.conf.d
etc/libaudit.conf
etc/libnl
etc/libuser.conf
etc/libvirt
etc/locale.conf
etc/localtime
etc/login.defs
etc/logrotate.conf
etc/logrotate.d
etc/lvm
etc/lxc
etc/machine-id
etc/magic
etc/makedumpfile.conf.sample
etc/man_db.conf
etc/mke2fs.conf
etc/modprobe.d
etc/modules-load.d
etc/motd
etc/mtab
etc/netconfig
etc/NetworkManager
etc/networks
etc/nfs.conf
etc/nfsmount.conf
etc/nsswitch.conf
etc/nsswitch.conf.bak
etc/numad.conf
etc/openldap
etc/openvpn
etc/opt
etc/os-release
etc/pam.d
etc/passwd
etc/passwd-
etc/pkcs11
etc/pki
etc/pm
etc/polkit-1
etc/popt.d
etc/ppp
etc/prelink.conf.d
etc/printcap
etc/profile
etc/profile.d
etc/protocols
etc/python
etc/qemu-ga
etc/radvd.conf
etc/rc0.d
etc/rc1.d
etc/rc2.d
etc/rc3.d
etc/rc4.d
etc/rc5.d
etc/rc6.d
etc/rc.d
etc/rc.local
etc/redhat-release
etc/request-key.conf
etc/request-key.d
etc/resolv.conf
etc/rpc
etc/rpm
etc/rsyncd.conf
etc/rsyslog.conf
etc/rsyslog.d
etc/rwtab
etc/rwtab.d
etc/sasl2
etc/screenrc
etc/securetty
etc/security
etc/selinux
etc/services
etc/sestatus.conf
etc/shadow
etc/shadow-
etc/shells
etc/skel
etc/ssh
etc/ssl
etc/statetab
etc/statetab.d
etc/subgid
etc/subuid
etc/sudo.conf
etc/sudoers
etc/sudoers.d
etc/sudo-ldap.conf
etc/sysconfig
etc/sysctl.conf
etc/sysctl.d
etc/systemd
etc/system-release
etc/system-release-cpe
etc/tcsd.conf
etc/terminfo
etc/timezone
etc/tmpfiles.d
etc/trickled.conf
etc/tuned
etc/udev
etc/unbound
etc/varnish
etc/vconsole.conf
etc/vimrc
etc/virc
etc/wpa_supplicant
etc/X11
etc/xdg
etc/xinetd.d
etc/yum
etc/yum.conf
etc/yum.repos.d
home/b2user
home/maltfield
root/anaconda-ks.cfg
root/backups
root/Finished
root/original-ks.cfg
root/Package
root/pki
root/Running
var/log
user@disp5653:~/Downloads$ 
  1. and a true end-to-end test, I restored the sshd_config file
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ pwd
/home/user/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ date
Thu Oct  3 11:37:49 +0545 2019
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ ls
etc.20191003_052448.tar.gz
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ tar -xzf etc.20191003_052448.tar.gz 
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ tail etc/ssh/sshd_config

# override default of no subsystems
Subsystem	sftp	/usr/libexec/openssh/sftp-server

# Example of overriding settings on a per-user basis
#Match User anoncvs
#	X11Forwarding no
#	AllowTcpForwarding no
#	PermitTTY no
#	ForceCommand cvs server
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ 
  1. I also copied the cron job and the backup report script to the dev node
[root@opensourceecology ~]# cat /etc/cron.d/backup_to_backblaze 
20 07 * * * root time /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log
20 04 03 * * root time /bin/nice /root/backups/backupReport.sh
[root@opensourceecology ~]# 
  1. I tried testing the backup report script, but it complained that the `mail` command was absent. otherwise it appears to be working without modifications
[root@osedev1 backups]# ./backupReport.sh 
./backupReport.sh: line 90: /usr/bin/mail: No such file or directory
INFO: email body below
ATTENTION: BACKUPS MISSING!


WARNING: First of this month's backup (20191001) is missing!
WARNING: First of last month's backup (20190901) is missing!
WARNING: Yesterday's backup (20191002) is missing!
WARNING: The day before yesterday's backup (20191001) is missing!

See below for the contents of the backblaze b2 bucket = ose-dev-server-backups

daily_osedev1_20191003_052448.tar.gpg
---
Note: This report was generated on 20191003_060036 UTC by script '/root/backups/backupReport.sh'
	  This script was triggered by '/etc/cron.d/backup_to_backblaze'

	  For more information about OSE backups, please see the relevant documentation pages on the wiki:
	   * https://wiki.opensourceecology.org/wiki/Backblaze
	   * https://wiki.opensourceecology.org/wiki/OSE_Server#Backups

[root@osedev1 backups]# 
  1. I installed mailx and re-ran the script
[root@osedev1 backups]# yum install mailx
...
Installed:
  mailx.x86_64 0:12.5-19.el7                                                                                                   

Complete!
[root@osedev1 backups]# 
  1. this time it failed because sendmail is not installed; I *could* install postfix, but I decided just to install sendmail
[root@osedev1 backups]# ./backupReport.sh 
...
 /usr/sbin/sendmail: No such file or directory
"/root/dead.letter" 30/1215
. . . message not sent.
[root@osedev1 backups]# rpm -qa | grep postfix
[root@osedev1 backups]# rpm -qa | grep exim
[root@osedev1 backups]# yum install sendmail
...
Installed:
  sendmail.x86_64 0:8.14.7-5.el7                                                                                               

Dependency Installed:
  hesiod.x86_64 0:3.2.1-3.el7                                 procmail.x86_64 0:3.22-36.el7_4.1                                

Complete!
[root@osedev1 backups]# 
  1. this time it ran without error, but I never got an email. this is probably because gmail is rejecting it; we don't have DNS setup properly for this server to send mail. Anyway, this is good enough for our dev node's backups for now.
  2. I also added the same lifecycle rules that we have for the 'ose-server-backups' bucket to the 'ose-dev-server-backups' bucket in the backblaze b2 wui
  3. let's proceed with getting openvpn clients configured for the prod node (and its clone the staging node, which will use the same client cert)
  4. as I did on Sep 9 to create my client cert for 'maltfield', I created a new cert for 'hetzner2' Maltfield_Log/2019_Q3#Mon_Sep_09.2C_2019
  5. again, the ca and cert files are located in /usr/share/easy-rsa/3/pki/
    1. I documented this dir on the wiki OpenVPN
  6. interestingly, I could only execute these command from the dir above the pki dir
[root@osedev1 pki]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full hetzner2

Easy-RSA error:

EASYRSA_PKI does not exist (perhaps you need to run init-pki)?
Expected to find the EASYRSA_PKI at: /usr/share/easy-rsa/3/pki/pki
Run easyrsa without commands for usage and command help.

[root@osedev1 pki]#
[root@osedev1 pki]# cd ..
[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full hetzner2

Using SSL: openssl OpenSSL 1.0.2k-fips  26 Jan 2017
Generating a 2048 bit RSA private key
.......................................................................+++
............................................+++
writing new private key to '/usr/share/easy-rsa/3/pki/private/hetzner2.key.7F3A32KzES'
Enter PEM pass phrase:
  1. note I appended the option 'nopass' so that the hetzner2 prod server could connect to the vpn using a private certificate file only & automatically, without requiring a password (it may be a good idea to look into if we can whitelist a specific IP for this user, since this hetzner2 client will only connect from the prod or staging server's static ip addresses)
[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa help build-client-full

  build-client-full <filename_base> [ cmd-opts ]
  build-server-full <filename_base> [ cmd-opts ]
  build-serverClient-full <filename_base> [ cmd-opts ]
	  Generate a keypair and sign locally for a client and/or server

	  This mode uses the <filename_base> as the X509 CN.

	  cmd-opts is an optional set of command options from this list:

		nopass  - do not encrypt the private key (default is encrypted)
[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full hetzner2 nopass

Using SSL: openssl OpenSSL 1.0.2k-fips  26 Jan 2017
Generating a 2048 bit RSA private key
..................................................................................................+++
.....+++
writing new private key to '/usr/share/easy-rsa/3/pki/private/hetzner2.key.qQ1HGf7ovg'
-----
Using configuration from /usr/share/easy-rsa/3/pki/safessl-easyrsa.cnf
Enter pass phrase for /usr/share/easy-rsa/3/pki/private/ca.key:
Check that the request matches the signature
Signature ok
The Subject's Distinguished Name is as follows
commonName            :ASN.1 12:'hetzner2'
Certificate is to be certified until Sep 17 06:42:28 2022 GMT (1080 days)

Write out database with 1 new entries
Data Base Updated
[root@osedev1 3]# 
  1. I copied the necessary files to the prod server
[root@osedev1 3]# cp pki/private/hetzner2.key /home/maltfield/
[root@osedev1 3]# cp pki/issued/hetzner2.crt /home/maltfield/
[root@osedev1 3]# cp pki/private/ta.key /home/maltfield/
[root@osedev1 3]# cp pki/ca.crt /home/maltfield/
[root@osedev1 3]# chown maltfield /home/maltfield/*.cert
[root@osedev1 3]# chown maltfield /home/maltfield/*.key
[root@osedev1 3]# logout
[maltfield@osedev1 ~]$ scp -P32415 /home/maltfield/hetzner2* opensourceecology.org:
hetzner2.crt                                                                                 100% 5675     2.8MB/s   00:00    
hetzner2.key                                                                                 100% 1708     1.0MB/s   00:00    
[maltfield@osedev1 ~]$ scp -P32415 /home/maltfield/*.key opensourceecology.org:
hetzner2.key                                                                                 100% 1708     1.0MB/s   00:00    
ta.key                                                                                       100%  636   368.9KB/s   00:00    
[maltfield@osedev1 ~]$ shred -u /home/maltfield/*.key
[maltfield@osedev1 ~]$ shred -u /home/maltfield/hetzner2.*
[maltfield@osedev1 ~]$ 
  1. and I moved them to '/root/openvpn' and locked-down the files on the prod hetzner2 server
[root@opensourceecology maltfield]# cd /root
[root@opensourceecology ~]# ls
backups  bin  iptables  output.json  rsyncTest  sandbox  staging.opensourceecology.org  tmp
[root@opensourceecology ~]# mkdir openvpn
[root@opensourceecology ~]# cd openvpn
[root@opensourceecology openvpn]# mv /home/maltfield/hetzner2* .
[root@opensourceecology openvpn]# mv /home/maltfield/*.key .
[root@opensourceecology openvpn]# mv /home/maltfield/ca.crt .
[root@opensourceecology openvpn]# ls -lah
total 28K
drwxr-xr-x   2 root      root      4.0K Oct  3 06:53 .
dr-xr-x---. 20 root      root      4.0K Oct  3 06:53 ..
-rw-------   1 maltfield maltfield 3.3K Oct  3 06:51 ca.crt
-rw-------   1 maltfield maltfield 5.6K Oct  3 06:51 hetzner2.crt
-rw-------   1 maltfield maltfield 1.7K Oct  3 06:51 hetzner2.key
-rw-------   1 maltfield maltfield  636 Oct  3 06:51 ta.key
[root@opensourceecology openvpn]# chown root:root *
[root@opensourceecology openvpn]# ls -lah
total 28K
drwxr-xr-x   2 root root 4.0K Oct  3 06:53 .
dr-xr-x---. 20 root root 4.0K Oct  3 06:53 ..
-rw-------   1 root root 3.3K Oct  3 06:51 ca.crt
-rw-------   1 root root 5.6K Oct  3 06:51 hetzner2.crt
-rw-------   1 root root 1.7K Oct  3 06:51 hetzner2.key
-rw-------   1 root root  636 Oct  3 06:51 ta.key
[root@opensourceecology openvpn]# chmod 0700 .
[root@opensourceecology openvpn]# ls -lah
total 28K
drwx------   2 root root 4.0K Oct  3 06:53 .
dr-xr-x---. 20 root root 4.0K Oct  3 06:53 ..
-rw-------   1 root root 3.3K Oct  3 06:51 ca.crt
-rw-------   1 root root 5.6K Oct  3 06:51 hetzner2.crt
-rw-------   1 root root 1.7K Oct  3 06:51 hetzner2.key
-rw-------   1 root root  636 Oct  3 06:51 ta.key
[root@opensourceecology openvpn]# 
  1. then I created a client.conf file from my personal client.conf file & modified it to use the new cert & key files
[root@opensourceecology openvpn]# vim client.conf
[root@opensourceecology openvpn]# ls -lah client.conf 
-rw-r--r-- 1 root root 3.6K Oct  3 06:56 client.conf
[root@opensourceecology openvpn]# chmod 0600 client.conf 
[root@opensourceecology openvpn]# cat client.conf 
##############################################
# Sample client-side OpenVPN 2.0 config file #
# for connecting to multi-client server.     #
#                                            #
# This configuration can be used by multiple #
# clients, however each client should have   #
# its own cert and key files.                #
#                                            #
# On Windows, you might want to rename this  #
# file so it has a .ovpn extension           #
##############################################

# Specify that we are a client and that we
# will be pulling certain config file directives
# from the server.
client

# Use the same setting as you are using on
# the server.
# On most systems, the VPN will not function
# unless you partially or fully disable
# the firewall for the TUN/TAP interface.
;dev tap
dev tun

# Windows needs the TAP-Win32 adapter name
# from the Network Connections panel
# if you have more than one.  On XP SP2,
# you may need to disable the firewall
# for the TAP adapter.
;dev-node MyTap

# Are we connecting to a TCP or
# UDP server?  Use the same setting as
# on the server.
;proto tcp
proto udp

# The hostname/IP and port of the server.
# You can have multiple remote entries
# to load balance between the servers.
remote 195.201.233.113 1194
;remote my-server-2 1194

# Choose a random host from the remote
# list for load-balancing.  Otherwise
# try hosts in the order specified.
;remote-random

# Keep trying indefinitely to resolve the
# host name of the OpenVPN server.  Very useful
# on machines which are not permanently connected
# to the internet such as laptops.
resolv-retry infinite

# Most clients don't need to bind to
# a specific local port number.
nobind

# Downgrade privileges after initialization (non-Windows only)
;user nobody
;group nobody

# Try to preserve some state across restarts.
persist-key
persist-tun

# If you are connecting through an
# HTTP proxy to reach the actual OpenVPN
# server, put the proxy server/IP and
# port number here.  See the man page
# if your proxy server requires
# authentication.
;http-proxy-retry # retry on connection failures
;http-proxy [proxy server] [proxy port #]

# Wireless networks often produce a lot
# of duplicate packets.  Set this flag
# to silence duplicate packet warnings.
;mute-replay-warnings

# SSL/TLS parms.
# See the server config file for more
# description.  It's best to use
# a separate .crt/.key file pair
# for each client.  A single ca
# file can be used for all clients.
ca ca.crt
cert hetzner2.crt
key hetzner2.key

# Verify server certificate by checking that the
# certicate has the correct key usage set.
# This is an important precaution to protect against
# a potential attack discussed here:
#  http://openvpn.net/howto.html#mitm
#
# To use this feature, you will need to generate
# your server certificates with the keyUsage set to
#   digitalSignature, keyEncipherment
# and the extendedKeyUsage to
#   serverAuth
# EasyRSA can do this for you.
remote-cert-tls server

# If a tls-auth key is used on the server
# then every client must also have the key.
tls-auth ta.key 1

# Select a cryptographic cipher.
# If the cipher option is used on the server
# then you must also specify it here.
# Note that v2.4 client/server will automatically
# negotiate AES-256-GCM in TLS mode.
# See also the ncp-cipher option in the manpage
cipher AES-256-GCM

# Enable compression on the VPN link.
# Don't enable this unless it is also
# enabled in the server config file.
#comp-lzo

# Set log file verbosity.
verb 3

# Silence repeating messages
;mute 20

# hardening
tls-cipher TLS-DHE-RSA-WITH-AES-256-GCM-SHA384
[root@opensourceecology openvpn]# 
  1. I installed the 'openvpn' package on the production hetzner2 server
[root@opensourceecology openvpn]# yum install openvpn
...
Installed:
  openvpn.x86_64 0:2.4.7-1.el7                                                                           

Dependency Installed:
  lz4.x86_64 0:1.7.5-3.el7                       pkcs11-helper.x86_64 0:1.11-3.el7                      

Complete!
[root@opensourceecology openvpn]# 
  1. I was successfully able to connect to the vpn on the dev node from the prod node
[root@opensourceecology openvpn]# openvpn client.conf 
Thu Oct  3 07:06:45 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
Thu Oct  3 07:06:45 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 07:06:45 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:06:45 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:06:45 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:06:45 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 07:06:45 2019 UDP link local: (not bound)
Thu Oct  3 07:06:45 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:06:45 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=865b6fa1 7dcf4731
Thu Oct  3 07:06:45 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 07:06:45 2019 VERIFY KU OK
Thu Oct  3 07:06:45 2019 Validating certificate extended key usage
Thu Oct  3 07:06:45 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 07:06:45 2019 VERIFY EKU OK
Thu Oct  3 07:06:45 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 07:06:45 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 07:06:45 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 07:06:46 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 07:06:46 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 0,cipher AES-256-GCM'
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 07:06:46 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:06:46 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:06:46 2019 ROUTE_GATEWAY 138.201.84.193
Thu Oct  3 07:06:46 2019 TUN/TAP device tun0 opened
Thu Oct  3 07:06:46 2019 TUN/TAP TX queue length set to 100
Thu Oct  3 07:06:46 2019 /sbin/ip link set dev tun0 up mtu 1500
Thu Oct  3 07:06:46 2019 /sbin/ip addr add dev tun0 local 10.241.189.10 peer 10.241.189.9
Thu Oct  3 07:06:46 2019 /sbin/ip route add 10.241.189.1/32 via 10.241.189.9
Thu Oct  3 07:06:46 2019 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Thu Oct  3 07:06:46 2019 Initialization Sequence Completed
  1. the prod server now has a tun0 interface with an ip address of 10.241.189.10 on the VPN private network subnet
[root@opensourceecology ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host 
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
	link/ether 90:1b:0e:94:07:c4 brd ff:ff:ff:ff:ff:ff
	inet 138.201.84.223 peer 138.201.84.193/32 brd 138.201.84.223 scope global eth0
	   valid_lft forever preferred_lft forever
	inet 138.201.84.223/32 scope global eth0
	   valid_lft forever preferred_lft forever
	inet 138.201.84.243/16 scope global eth0
	   valid_lft forever preferred_lft forever
	inet 138.201.84.243 peer 138.201.84.193/32 brd 138.201.255.255 scope global secondary eth0
	   valid_lft forever preferred_lft forever
	inet6 2a01:4f8:172:209e::2/64 scope global 
	   valid_lft forever preferred_lft forever
	inet6 fe80::921b:eff:fe94:7c4/64 scope link 
	   valid_lft forever preferred_lft forever
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
	link/none 
	inet 10.241.189.10 peer 10.241.189.9/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::a834:c77a:f65f:76fc/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@opensourceecology ~]# 
  1. I confirmed that the website didn't break ☺
  2. now I created the same dir on the staging node (note this weird systemd journal corruption error that slowed things down quite a bit)
[root@osedev1 ~]# lxc-start -n osestaging1
systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to CentOS Linux 7 (Core)!
...
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

osestaging1 login: maltfield
Password:
Last login: Wed Oct  2 13:01:56 on lxc/console
[maltfield@osestaging1 ~]$ sudo su -
[sudo] password for maltfield:

<44>systemd-journald[297]: File /run/log/journal/dd9978e8797e4112832634fa4d174c7b/system.journal corrupted or uncleanly shut down, renaming and replacing.
Last login: Wed Oct  2 13:15:46 UTC 2019 on lxc/console
Last failed login: Thu Oct  3 07:11:57 UTC 2019 on lxc/console
There was 1 failed login attempt since the last successful login.
[root@osestaging1 ~]# 
  1. on the dev node again
[root@osedev1 pki]# cp private/hetzner2.key /home/maltfield/
[root@osedev1 pki]# cp issued/hetzner2.crt /home/maltfield/
[root@osedev1 pki]# cp private/ta.key /home/maltfield/
[root@osedev1 pki]# chown maltfield /home/maltfield/*.key
[root@osedev1 pki]# chown maltfield /home/maltfield/*.crt
[root@osedev1 pki]# logout
[maltfield@osedev1 ~]$ scp -P 32415 /home/maltfield/*.key 192.168.122.201:
hetzner2.key                                                                       100% 1708     2.4MB/s   00:00    
ta.key                                                                             100%  636     1.2MB/s   00:00    
[maltfield@osedev1 ~]$ scp -P 32415 /home/maltfield/*.crt 192.168.122.201:
ca.crt                                                                             100% 1850     2.6MB/s   00:00    
hetzner2.crt                                                                       100% 5675     9.0MB/s   00:00    
[maltfield@osedev1 ~]$ shred -u /home/maltfield/*.key
[maltfield@osedev1 ~]$ shred -u /home/maltfield/*.crt
[maltfield@osedev1 ~]$ 
  1. and back on the staging container node
[root@osestaging1 ~]# cd /root/openvpn 
[root@osestaging1 openvpn]# ls 
[root@osestaging1 openvpn]# mv /home/maltfield/*.crt .
[root@osestaging1 openvpn]# mv /home/maltfield/*.key .
[root@osestaging1 openvpn]# ls -lah
total 28K
drwxr-xr-x. 2 root      root      4.0K Oct  3 07:23 .
dr-xr-x---. 3 root      root      4.0K Oct  3 07:18 ..
-rw-------. 1 maltfield maltfield 1.9K Oct  3 07:21 ca.crt
-rw-------. 1 maltfield maltfield 5.6K Oct  3 07:21 hetzner2.crt
-rw-------. 1 maltfield maltfield 1.7K Oct  3 07:21 hetzner2.key
-rw-------. 1 maltfield maltfield  636 Oct  3 07:21 ta.key
[root@osestaging1 openvpn]# chown root:root *
[root@osestaging1 openvpn]# chmod 0700 .
[root@osestaging1 openvpn]# ls -lah
total 28K
drwx------. 2 root root 4.0K Oct  3 07:23 .
dr-xr-x---. 3 root root 4.0K Oct  3 07:18 ..
-rw-------. 1 root root 1.9K Oct  3 07:21 ca.crt
-rw-------. 1 root root 5.6K Oct  3 07:21 hetzner2.crt
-rw-------. 1 root root 1.7K Oct  3 07:21 hetzner2.key
-rw-------. 1 root root  636 Oct  3 07:21 ta.key
[root@osestaging1 openvpn]# 
  1. I also installed vim, epel-release, and openvpn on the staging node
  2. I had an issue connecting to to the vpn from within the staging node; this appears to be an issue for trying to connect to a vpn from within a docker or lxc container https://serverfault.com/questions/429461/no-tun-device-in-lxc-guest-for-openvpn
[root@osestaging1 openvpn]# openvpn client.conf 
Thu Oct  3 07:29:17 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
Thu Oct  3 07:29:17 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 07:29:17 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:29:17 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:29:17 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:29:17 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 07:29:17 2019 UDP link local: (not bound)
Thu Oct  3 07:29:17 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:29:17 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=f2e8fcad efdb9311
Thu Oct  3 07:29:17 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 07:29:17 2019 VERIFY KU OK
Thu Oct  3 07:29:17 2019 Validating certificate extended key usage
Thu Oct  3 07:29:17 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 07:29:17 2019 VERIFY EKU OK
Thu Oct  3 07:29:17 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 07:29:17 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 07:29:17 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 07:29:18 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 07:29:18 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 0,cipher AES-256-GCM'
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 07:29:18 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:29:18 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:29:18 2019 ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Thu Oct  3 07:29:18 2019 ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
Thu Oct  3 07:29:18 2019 Exiting due to fatal error
[root@osestaging1 openvpn]#
  1. the above link suggests following the arch linux guide to create an openvpn client systemd module within the container
[root@osestaging1 openvpn]# ls /usr/lib/systemd/system/openvpn-client\@.service
/usr/lib/systemd/system/openvpn-client@.service
[root@osestaging1 openvpn]# ls /etc/systemd/system/
basic.target.wants  default.target.wants  local-fs.target.wants    sysinit.target.wants
default.target      getty.target.wants    multi-user.target.wants  system-update.target.wants
[root@osestaging1 openvpn]# cp /usr/lib/systemd/system/openvpn-client\@.service /etc/systemd/system/
[root@osestaging1 openvpn]# grep /etc/systemd/system/openvpn-client\@.service LimitNPROC
grep: LimitNPROC: No such file or directory
[root@osestaging1 openvpn]# grep LimitNPROC /etc/systemd/system/openvpn-client\@.service
LimitNPROC=10
[root@osestaging1 openvpn]# vim /etc/systemd/system/openvpn-client\@.service
[root@osestaging1 openvpn]# grep LimitNPROC /etc/systemd/system/openvpn-client\@.service
#LimitNPROC=10
[root@osestaging1 openvpn]# 
  1. that didn't work; it wants something after the '@' I did that, and realized that I'll need to further modify it with the correct config file
[root@osestaging1 openvpn]# cd /etc/systemd/system
[root@osestaging1 system]# ls
basic.target.wants    getty.target.wants       openvpn-client@.service
default.target        local-fs.target.wants    sysinit.target.wants
default.target.wants  multi-user.target.wants  system-update.target.wants
[root@osestaging1 system]# mv openvpn-client\@.service openvpn-client\@dev.service 
[root@osestaging1 system]# systemctl status openvpn-client\@dev.service 
● openvpn-client@dev.service - OpenVPN tunnel for dev
   Loaded: loaded (/etc/systemd/system/openvpn-client@dev.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
[root@osestaging1 system]# systemctl start openvpn-client\@dev.service 
Job for openvpn-client@dev.service failed because the control process exited with error code. See "systemctl status openvpn-client@dev.service" and "journalctl -xe" for details.
[root@osestaging1 system]# systemctl status openvpn-client\@dev.service 
● openvpn-client@dev.service - OpenVPN tunnel for dev
   Loaded: loaded (/etc/systemd/system/openvpn-client@dev.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-10-03 07:44:09 UTC; 16s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
  Process: 557 ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config %i.conf (code=exited, status=1/FAILURE)
 Main PID: 557 (code=exited, status=1/FAILURE)

Oct 03 07:44:08 osestaging1 systemd[1]: Starting OpenVPN tunnel for dev...
Oct 03 07:44:09 osestaging1 openvpn[557]: Options error: In [CMD-LINE]:1: Error opening configuration file: dev.conf
Oct 03 07:44:09 osestaging1 openvpn[557]: Use --help for more information.
Oct 03 07:44:09 osestaging1 systemd[1]: openvpn-client@dev.service: main process exited, code=exited, status=...ILURE
Oct 03 07:44:09 osestaging1 systemd[1]: Failed to start OpenVPN tunnel for dev.
Oct 03 07:44:09 osestaging1 systemd[1]: Unit openvpn-client@dev.service entered failed state.
Oct 03 07:44:09 osestaging1 systemd[1]: openvpn-client@dev.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@osestaging1 system]# vim openvpn-client\@dev.service 
  1. I updated the working dir and changed the service name to match the name of the config file in there
[root@osestaging1 system]# cat openvpn-client\@dev.service
[Unit]
Description=OpenVPN tunnel for %I
After=syslog.target network-online.target
Wants=network-online.target
Documentation=man:openvpn(8)
Documentation=https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
Documentation=https://community.openvpn.net/openvpn/wiki/HOWTO

[Service]
Type=notify
PrivateTmp=true
WorkingDirectory=/etc/openvpn/client
ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config %i.conf
CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_ADMIN CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_DAC_OVERRIDE
#LimitNPROC=10
DeviceAllow=/dev/null rw   
DeviceAllow=/dev/net/tun rw
ProtectSystem=true
ProtectHome=true
KillMode=process

[Install]
WantedBy=multi-user.target 
[root@osestaging1 system]# vim openvpn-client\@dev.service
[root@osestaging1 system]# mv openvpn-client\@dev.service openvpn-client\@client.service 
[root@osestaging1 system]# cat openvpn-client\@client.service 
[Unit]
Description=OpenVPN tunnel for %I
After=syslog.target network-online.target
Wants=network-online.target
Documentation=man:openvpn(8)
Documentation=https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
Documentation=https://community.openvpn.net/openvpn/wiki/HOWTO

[Service]
Type=notify
PrivateTmp=true
#WorkingDirectory=/etc/openvpn/client
WorkingDirectory=/root/openvpn
ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config %i.conf
CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_ADMIN CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_DAC_OVERRIDE
#LimitNPROC=10
DeviceAllow=/dev/null rw
DeviceAllow=/dev/net/tun rw
ProtectSystem=true
ProtectHome=true
KillMode=process

[Install]
WantedBy=multi-user.target
[root@osestaging1 system]# 
  1. this failed; I gave up and went with manually creating the tun interface per the guide, even though someone else commented taht this would no longer work; it worked!
[root@osestaging1 openvpn]# openvpn client.conf
Thu Oct  3 08:02:50 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20
 2019
Thu Oct  3 08:02:50 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 08:02:50 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:02:50 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:02:50 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:02:50 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 08:02:50 2019 UDP link local: (not bound)
Thu Oct  3 08:02:50 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:02:50 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=10846fe0 74bf0345
Thu Oct  3 08:02:50 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 08:02:50 2019 VERIFY KU OK
Thu Oct  3 08:02:50 2019 Validating certificate extended key usage
Thu Oct  3 08:02:50 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 08:02:50 2019 VERIFY EKU OK
Thu Oct  3 08:02:50 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 08:02:50 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 08:02:50 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 08:02:51 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:02:51 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 0,cipher AES-256-GCM'
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 08:02:51 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:02:51 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:02:51 2019 ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Thu Oct  3 08:02:51 2019 ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
Thu Oct  3 08:02:51 2019 Exiting due to fatal error
[root@osestaging1 openvpn]# mkdir /dev/net
[root@osestaging1 openvpn]# mknod /dev/net/tun c 10 200
[root@osestaging1 openvpn]# chmod 666 /dev/net/tun
[root@osestaging1 openvpn]# openvpn client.conf
Thu Oct  3 08:03:42 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
Thu Oct  3 08:03:42 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 08:03:42 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:03:42 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:03:42 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:03:42 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 08:03:42 2019 UDP link local: (not bound)
Thu Oct  3 08:03:42 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:03:42 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=dcadaef9 7ebea8f1
Thu Oct  3 08:03:42 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 08:03:42 2019 VERIFY KU OK
Thu Oct  3 08:03:42 2019 Validating certificate extended key usage
Thu Oct  3 08:03:42 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 08:03:42 2019 VERIFY EKU OK
Thu Oct  3 08:03:42 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 08:03:42 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 08:03:42 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 08:03:43 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:03:48 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:03:53 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:03:59 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:04 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:09 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:15 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:20 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:25 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:30 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:35 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:41 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:46 2019 No reply from server after sending 12 push requests
Thu Oct  3 08:04:46 2019 SIGUSR1[soft,no-push-reply] received, process restarting
Thu Oct  3 08:04:46 2019 Restart pause, 5 second(s)
Thu Oct  3 08:04:51 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:04:51 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 08:04:51 2019 UDP link local: (not bound)
Thu Oct  3 08:04:51 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:04:51 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=c3f6bcfa 04f701bb
Thu Oct  3 08:04:51 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 08:04:51 2019 VERIFY KU OK
Thu Oct  3 08:04:51 2019 Validating certificate extended key usage
Thu Oct  3 08:04:51 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 08:04:51 2019 VERIFY EKU OK
Thu Oct  3 08:04:51 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 08:04:51 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 08:04:51 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 08:04:53 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:53 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 1,cipher AES-256-GCM'
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 08:04:53 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:04:53 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:04:53 2019 ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Thu Oct  3 08:04:53 2019 TUN/TAP device tun0 opened
Thu Oct  3 08:04:53 2019 TUN/TAP TX queue length set to 100
Thu Oct  3 08:04:53 2019 /sbin/ip link set dev tun0 up mtu 1500
Thu Oct  3 08:04:53 2019 /sbin/ip addr add dev tun0 local 10.241.189.10 peer 10.241.189.9
Thu Oct  3 08:04:53 2019 /sbin/ip route add 10.241.189.1/32 via 10.241.189.9
Thu Oct  3 08:04:53 2019 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Thu Oct  3 08:04:53 2019 Initialization Sequence Completed
  1. I found that I've become stuck in a lxc console since the escape keyboard sequence uses the same keystroke as screen (ctrl-a). the solution is to define an alternate escape sequence (ie: ctrl-e) using `-e'^e'` https://serverfault.com/questions/567696/byobu-how-to-disconnect-from-lxc-console
[root@osedev1 ~]# lxc-console -e '^e' -n osestaging1

Connected to tty 1
				  Type <Ctrl+e q> to exit the console, <Ctrl+e Ctrl+e> to enter Ctrl+e itself

[root@osedev1 ~]# 
  1. I also had to change the tty to 0 to actually get access
[root@osedev1 ~]# lxc-console -e '^e' -n osestaging1 -t 0
lxc_container: commands.c: lxc_cmd_console: 724 Console 0 invalid, busy or all consoles busy.
																							 [root@osedev1 ~]# 
[root@osedev1 ~]# 
  1. I went ahead and connected to the vpn from 3x clients: my laptop, the staging container, and the prod server
  2. oddly, I noticed that the ip address given to the staging server and the prod server were the same (they do use the same client cert, but I expected them to have a distinct ip address
user@ose:~/openvpn$ ip address show dev tun0
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.6 peer 10.241.189.5/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::2ab6:3617:63cc:c654/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
user@ose:~/openvpn$ 
[root@opensourceecology openvpn]# ip address show dev tun0
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
	link/none 
	inet 10.241.189.10 peer 10.241.189.9/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::a834:c77a:f65f:76fc/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@opensourceecology openvpn]# 
[root@osestaging1 ~]# ip address show dev tun0
2: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.10 peer 10.241.189.9/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::5e8c:3af2:2e6:4aea/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osestaging1 ~]# 
  1. I noticed a few relevant options to our openvpn server config
    1. by default, I have 'ifconfig-pool-persist ipp.txt' defined, which makes clients have the same ip address persistently across the server's reboots; we appear to be using '/etc/openvpn/ipp.txt' here. The one in the 'server' dir appears to be from earlier, probably when I started the server manually rather than through systemd. Interestingly, this isn't even right! From above, we see that my 'maltfield' user has '.6' while the 'hetzner2' users have '.10'. Hmm.
[root@osedev1 server]# grep -iB5 ipp server.conf
# Maintain a record of client <-> virtual IP address
# associations in this file.  If OpenVPN goes down or
# is restarted, reconnecting clients can be assigned
# the same virtual IP address from the pool that was
# previously assigned.
ifconfig-pool-persist ipp.txt
[root@osedev1 server]# find /etc/openvpn | grep -i ipp.txt
/etc/openvpn/server/ipp.txt
/etc/openvpn/ipp.txt
[root@osedev1 server]# cat /etc/openvpn/server/ipp.txt 
maltfield,10.241.189.4
[root@osedev1 server]# cat /etc/openvpn/ipp.txt 
maltfield,10.241.189.4
hetzner2,10.241.189.8
    1. there's also an option that I have commented-out whoose comments say it should be uncommented if multiple clients will share the same cert
[root@osedev1 server]# grep -iB5 duplicate server.conf
#
# IF YOU HAVE NOT GENERATED INDIVIDUAL
# CERTIFICATE/KEY PAIRS FOR EACH CLIENT,
# EACH HAVING ITS OWN UNIQUE "COMMON NAME",
# UNCOMMENT THIS LINE OUT.
;duplicate-cn
[root@osedev1 server]# 
  1. I uncommented the above 'duplicate-cn' line and restarted openvpn on the dev node
[root@osedev1 server]# vim server.conf
[root@osedev1 server]# systemctl restart openvpn@server.service
  1. I reconnected to the vpn from the staging & prod servers; they got new IP addresses
[root@opensourceecology openvpn]# ip address show dev tun0
5: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
	link/none 
	inet 10.241.189.14 peer 10.241.189.13/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::e5fb:f261:801b:1c3d/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@opensourceecology openvpn]# 
[root@osestaging1 openvpn]# ip address show dev tun0
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.18 peer 10.241.189.17/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::27f3:9643:5530:bd0e/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osestaging1 openvpn]# 
  1. I confirmed that each client could ping themselves, but not each-other, so I uncommented the line 'client-to-client' and restarted the openvpn server again
  2. after that, I confirmed that staging could ping prod, prod could ping staging, and my laptop could ping both staging & prod. Cool!
    1. for some reason the servers could still not ping my laptop; maybe that's some complication in my like quad-NAT'd QubesOS networking stack flowing through two nested VPN connections. Anyway, that shouldn't be required *shrug*
  3. and, holy shit, I was successfully able to ssh into the staging node from the production node through the private VPN IP
[maltfield@opensourceecology ~]$ ssh -p 32415 10.241.189.18
The authenticity of host '[10.241.189.18]:32415 ([10.241.189.18]:32415)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
ECDSA key fingerprint is MD5:ab:eb:7f:f2:bb:83:a1:e5:21:49:1e:22:93:17:70:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.18]:32415' (ECDSA) to the list of known hosts.
Last login: Thu Oct  3 08:56:23 2019 from gateway
[maltfield@osestaging1 ~]$ 
  1. but I was unable to ssh into our staging node from my laptop. oddly, it *is* able to establish a connection, but it gets stuck at some handshake step
user@ose:~/openvpn$ ssh -vvvvvvp 32415 maltfield@10.241.189.18
OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2t  10 Sep 2019
debug1: Reading configuration data /home/user/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: resolving "10.241.189.18" port 32415
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to 10.241.189.18 [10.241.189.18] port 32415.
debug1: Connection established.
...
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

Connection closed by 10.241.189.18 port 32415
user@ose:~/openvpn$ 
  1. ok, I fixed this issue by removing the second VPN (qubes was configured to use a vpn qube as its NetVM; changing this to 'sys-firewall' fixed this issue)
user@ose:~/openvpn$ ssh -p 32415 maltfield@10.241.189.18
Last login: Thu Oct  3 09:20:50 2019 from 10.241.189.6
[maltfield@osestaging1 ~]$ 
  1. on second thought, I really should have static ip addresses unique for both the prod & staging nodes. to achieve this, I can't share the same cert; I'll just make '/root/openvpn' one of those dirs (like networking config dirs) that is not changed by the rsync
  2. I commented-out the 'duplicate-cn' line again in the openvpn server config & restarted the openvpn server
[root@osedev1 openvpn]# systemctl restart openvpn@server.service
(reverse-i-search)`grep': ss -plan | ^Cep -i 8080
[root@osedev1 openvpn]# grep -B5 duplicate-cn server.conf 
#
# IF YOU HAVE NOT GENERATED INDIVIDUAL
# CERTIFICATE/KEY PAIRS FOR EACH CLIENT,
# EACH HAVING ITS OWN UNIQUE "COMMON NAME",
# UNCOMMENT THIS LINE OUT.
;duplicate-cn
[root@osedev1 openvpn]# systemctl restart openvpn@server.service
  1. and I created a distinct cert for 'osestaging1'
[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full osestaging1 nopass

Using SSL: openssl OpenSSL 1.0.2k-fips  26 Jan 2017
Generating a 2048 bit RSA private key
....+++
...........................+++
writing new private key to '/usr/share/easy-rsa/3/pki/private/osestaging1.key.WsJhUsDCny'
-----
Using configuration from /usr/share/easy-rsa/3/pki/safessl-easyrsa.cnf
Enter pass phrase for /usr/share/easy-rsa/3/pki/private/ca.key:
Check that the request matches the signature
Signature ok
The Subject's Distinguished Name is as follows
commonName            :ASN.1 12:'osestaging1'
Certificate is to be certified until Sep 17 10:34:03 2022 GMT (1080 days)

Write out database with 1 new entries
Data Base Updated
[root@osedev1 3]# cp pki/private/osestaging1.key /home/maltfield/
[root@osedev1 3]# cp pki/private/ta.key /home/maltfield/
[root@osedev1 3]# cp pki/issued/osestaging1.crt /home/maltfield/
[root@osedev1 3]# cp pki/ca.crt /home/maltfield/
[root@osedev1 3]# chown maltfield /home/maltfield/*.key
[root@osedev1 3]# chown maltfield /home/maltfield/*.crt
[root@osedev1 3]# logout
  1. and on the staging server
[root@osestaging1 ~]# cd /root/openvpn/
[root@osestaging1 openvpn]# mv /home/maltfield/*.key .
mv: overwrite './ta.key'? y
[root@osestaging1 openvpn]# mv /home/maltfield/*.crt .
mv: overwrite './ca.crt'? y
[root@osestaging1 openvpn]# ls
ca.crt       hetzner2.crt  osestaging1.crt  ta.key
client.conf  hetzner2.key  osestaging1.key
[root@osestaging1 openvpn]# shred -u hetzner2.*
[root@osestaging1 openvpn]# ls -lah
total 32K
drwx------. 2 root      root      4.0K Oct  3 10:40 .
dr-xr-x---. 4 root      root      4.0K Oct  3 07:59 ..
-rw-------. 1 maltfield maltfield 1.9K Oct  3 10:36 ca.crt
-rw-r--r--. 1 root      root      3.6K Oct  3 07:27 client.conf
-rw-------. 1 maltfield maltfield 5.6K Oct  3 10:36 osestaging1.crt
-rw-------. 1 maltfield maltfield 1.7K Oct  3 10:36 osestaging1.key
-rw-------. 1 maltfield maltfield  636 Oct  3 10:36 ta.key
[root@osestaging1 openvpn]# chown root:root *.crt
[root@osestaging1 openvpn]# chown root:root *.key
[root@osestaging1 openvpn]# chmod 0600 client.conf 
[root@osestaging1 openvpn]# ls -lah
total 32K
drwx------. 2 root root 4.0K Oct  3 10:40 .
dr-xr-x---. 4 root root 4.0K Oct  3 07:59 ..
-rw-------. 1 root root 1.9K Oct  3 10:36 ca.crt
-rw-------. 1 root root 3.6K Oct  3 07:27 client.conf
-rw-------. 1 root root 5.6K Oct  3 10:36 osestaging1.crt
-rw-------. 1 root root 1.7K Oct  3 10:36 osestaging1.key
-rw-------. 1 root root  636 Oct  3 10:36 ta.key
[root@osestaging1 openvpn]# vim client.conf
  1. I decided to make the following static IPs
    1. 10.241.189.10 hetzner2 (prod)
    2. 10.241.189.11 osestaging1
  2. I did this by uncommenting the line 'client-config-dir ccd', creating a client-specifc config file in the '/etc/openvpn/ccd/' dir whoose name matches the CN (Common Name) on the client cert, and restarting the openvpn server service
[root@osedev1 openvpn]# vim server.conf
[root@osedev1 openvpn]# grep -Ei '^client-config-dir ccd' server.conf
client-config-dir ccd
[root@osedev1 openvpn]# echo "ifconfig-push 10.241.189.11 255.255.255.255" > ccd/osestaging1
[root@osedev1 openvpn]# systemctl restart openvpn@server.service
[root@osedev1 openvpn]# 
  1. I did the same for prod
[root@osedev1 openvpn]# echo "ifconfig-push 10.241.189.10 255.255.255.255" > ccd/hetzner2
[root@osedev1 openvpn]# systemctl restart openvpn@server.service
[root@osedev1 openvpn]# 
  1. now that it's static, I can update my ssh config to make connecting to the staging node easy after connecting to the vpn from my laptop
user@ose:~/openvpn$ vim ~/.ssh/config
user@ose:~/openvpn$ head -n21 ~/.ssh/config
# OSE
Host openbuildinginstitute.org *.openbuildinginstitute.org opensourceecology.org *.opensourceecology.org
	Port 32415
	ForwardAgent yes
	IdentityFile /home/user/.ssh/id_rsa.ose
	User maltfield

Host osedev1
	HostName 195.201.233.113
	Port 32415
	ForwardAgent yes
	IdentityFile /home/user/.ssh/id_rsa.ose
	User maltfield

Host osestaging1
	HostName 10.241.189.11
	Port 32415
	ForwardAgent yes
	IdentityFile /home/user/.ssh/id_rsa.ose
	User maltfield

user@ose:~/openvpn$ ssh osestaging1
The authenticity of host '[10.241.189.11]:32415 ([10.241.189.11]:32415)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.11]:32415' (ECDSA) to the list of known hosts.
Last login: Thu Oct  3 10:42:40 2019 from 10.241.189.10
[maltfield@osestaging1 ~]$ 
  1. another issue remains: we need the staging node to connect to the vpn on startup, but I can't get the fucking systemd module to work
[root@osestaging1 system]# systemctl start openvpn-client\@client.service 
Job for openvpn-client@client.service failed because the control process exited with error code. See "systemctl status openvpn-client@client.service" and "journalctl -xe" for details.
[root@osestaging1 system]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-10-03 12:34:56 UTC; 8s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
  Process: 1295 ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf (code=exited, status=200/CHDIR)
 Main PID: 1295 (code=exited, status=200/CHDIR)

Oct 03 12:34:56 osestaging1 systemd[1]: Starting OpenVPN tunnel for client...
Oct 03 12:34:56 osestaging1 systemd[1]: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct 03 12:34:56 osestaging1 systemd[1]: Failed to start OpenVPN tunnel for client.
Oct 03 12:34:56 osestaging1 systemd[1]: Unit openvpn-client@client.service entered failed state.
Oct 03 12:34:56 osestaging1 systemd[1]: openvpn-client@client.service failed.
[root@osestaging1 system]# tail -n 7 /var/log/messages 
Oct  3 12:29:29 localhost systemd: openvpn-client@client.service failed.
Oct  3 12:34:56 localhost systemd: Starting OpenVPN tunnel for client...
Oct  3 12:34:56 localhost systemd: Failed at step CHDIR spawning /usr/sbin/openvpn: No such file or directory
Oct  3 12:34:56 localhost systemd: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct  3 12:34:56 localhost systemd: Failed to start OpenVPN tunnel for client.
Oct  3 12:34:56 localhost systemd: Unit openvpn-client@client.service entered failed state.
Oct  3 12:34:56 localhost systemd: openvpn-client@client.service failed.
[root@osestaging1 system]# 
  1. the /usr/sbin/openvpn file definitely exists; I think the issue is with the tun0 not existing or something
  2. I gave the osestaging1 container a reboot
  3. after a reboot, osestaging1 now says that the openvpn-client@client.service doesn't exist!
[maltfield@osestaging1 ~]$ systemctl start openvpn-client\@client.service
Failed to start openvpn-client@client.service: The name org.freedesktop.PolicyKit1 was not provided by any .service files
See system logs and 'systemctl status openvpn-client@client.service' for details.
[maltfield@osestaging1 ~]$ systemctl list-unit-files | grep -i vpn
openvpn-client@.service                disabled
openvpn-client@client.service          disabled
openvpn-server@.service                disabled
openvpn@.service                       disabled
[maltfield@osestaging1 ~]$ 
  1. attempting to enable it failes
[maltfield@osestaging1 ~]$ systemctl enable /etc/systemd/system/openvpn-client\@client.service 
Failed to execute operation: The name org.freedesktop.PolicyKit1 was not provided by any .service files
[maltfield@osestaging1 ~]$ 
  1. oh, duh, I wasn't root
[root@osestaging1 ~]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
[root@osestaging1 ~]# systemctl start openvpn-client\@client.service 
Job for openvpn-client@client.service failed because the control process exited with error code. See "systemctl status openvpn-client@client.service" and "journalctl -xe" for details.
[root@osestaging1 ~]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-10-03 12:52:39 UTC; 7s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
  Process: 379 ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf (code=exited, status=200/CHDIR)
 Main PID: 379 (code=exited, status=200/CHDIR)

Oct 03 12:52:38 osestaging1 systemd[1]: Starting OpenVPN tunnel for client...
Oct 03 12:52:39 osestaging1 systemd[1]: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct 03 12:52:39 osestaging1 systemd[1]: Failed to start OpenVPN tunnel for client.
Oct 03 12:52:39 osestaging1 systemd[1]: Unit openvpn-client@client.service entered failed state.
Oct 03 12:52:39 osestaging1 systemd[1]: openvpn-client@client.service failed.
[root@osestaging1 ~]# tail -n 7 /var/log/messages 
Oct  3 12:52:38 localhost systemd: Created slice system-openvpn\x2dclient.slice.
Oct  3 12:52:38 localhost systemd: Starting OpenVPN tunnel for client...
Oct  3 12:52:39 localhost systemd: Failed at step CHDIR spawning /usr/sbin/openvpn: No such file or directory
Oct  3 12:52:39 localhost systemd: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct  3 12:52:39 localhost systemd: Failed to start OpenVPN tunnel for client.
Oct  3 12:52:39 localhost systemd: Unit openvpn-client@client.service entered failed state.
Oct  3 12:52:39 localhost systemd: openvpn-client@client.service failed.
[root@osestaging1 ~]#
  1. after fighting with this shit for hours, I finally just copied all my files from /root/openvpn into /etc/openvpn/client/ and it worked!
[root@osestaging1 system]# cp /root/openvpn/* /etc/openvpn/client
[root@osestaging1 system]# vim openvpn-client\@client.service
...
[root@osestaging1 system]# systemctl daemon-reload
<30>systemd-fstab-generator[425]: Running in a container, ignoring fstab device entry for /dev/root.
[root@osestaging1 system]# systemctl restart openvpn-client\@client.service 
[root@osestaging1 system]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-10-03 13:33:32 UTC; 1s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
 Main PID: 432 (openvpn)
   Status: "Initialization Sequence Completed"
   CGroup: /user.slice/user-1000.slice/session-582.scope/system.slice/system-openvpn\x2dclient.slice/openvpn-client@client.service
		   └─432 /usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf

Oct 03 13:33:33 osestaging1 openvpn[432]: Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Oct 03 13:33:33 osestaging1 openvpn[432]: WARNING: Since you are using --dev tun with a point-to-point topology, the second arg...nowarn)
Oct 03 13:33:33 osestaging1 openvpn[432]: ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Oct 03 13:33:33 osestaging1 openvpn[432]: TUN/TAP device tun0 opened
Oct 03 13:33:33 osestaging1 openvpn[432]: TUN/TAP TX queue length set to 100
Oct 03 13:33:33 osestaging1 openvpn[432]: /sbin/ip link set dev tun0 up mtu 1500
Oct 03 13:33:33 osestaging1 openvpn[432]: /sbin/ip addr add dev tun0 local 10.241.189.11 peer 255.255.255.255
Oct 03 13:33:33 osestaging1 openvpn[432]: /sbin/ip route add 10.241.189.0/24 via 255.255.255.255
Oct 03 13:33:33 osestaging1 openvpn[432]: WARNING: this configuration may cache passwords in memory -- use the auth-nocache opt...nt this
Oct 03 13:33:33 osestaging1 openvpn[432]: Initialization Sequence Completed
Hint: Some lines were ellipsized, use -l to show in full.
[root@osestaging1 system]# ip address show dev tun0
2: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.11 peer 255.255.255.255/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::927:fae4:1356:9b90/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osestaging1 system]# 
  1. I confirmed that I could ssh into the staging node from my laptop
  2. I rebooted the staging node
  3. I confirmed that I could ssh into the staging node again after the reboot!
  4. I'm not going to bother with trying to setup this with the prod node for now; I'm not in a place where I want to make & test that prod change by rebooting the server..
  5. this is a good stopping point; I created another snapshot of the staging node
[root@osedev1 ~]# lxc-stop -n osestaging1
[root@osedev1 ~]# lxc-snapshot --name osestaging1 --list
snap0 (/var/lib/lxcsnaps/osestaging1) 2019:10:02 15:37:58
[root@osedev1 ~]# lxc-snapshot --name osestaging1 afterVPN
lxc_container: lxccontainer.c: lxcapi_snapshot: 2891 Snapshot of directory-backed container requested.
lxc_container: lxccontainer.c: lxcapi_snapshot: 2892 Making a copy-clone.  If you do want snapshots, then
lxc_container: lxccontainer.c: lxcapi_snapshot: 2893 please create an aufs or overlayfs clone first, snapshot that
lxc_container: lxccontainer.c: lxcapi_snapshot: 2894 and keep the original container pristine.
[root@osedev1 ~]# lxc-snapshot --name osestaging1 --list
snap1 (/var/lib/lxcsnaps/osestaging1) 2019:10:03 15:40:16
snap0 (/var/lib/lxcsnaps/osestaging1) 2019:10:02 15:37:58
[root@osedev1 ~]# 
  1. I started the staging container again, and I tested an rsync from prod to staging; first let's see the contents of /etc/varnish on staging
[root@osestaging1 ~]# ls -lah /etc | grep -i varnish
[root@osestaging1 ~]# 
  1. and the rsync; it failed. right, I need passwordless sudo on the staging node setup
[maltfield@opensourceecology ~]$ sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" -av --progress /etc/varnish maltfield@10.241.189.10:/etc/
[sudo] password for maltfield: 
The authenticity of host '[10.241.189.10]:32415 ([10.241.189.10]:32415)' can't be established.
ECDSA key fingerprint is SHA256:HclF8ZQOjGqx+9TmwL111kZ7QxgKkoEw8g3l2YxV0gk.
ECDSA key fingerprint is MD5:cd:87:b1:bb:c1:3e:d1:d1:d4:5d:16:c9:e8:30:6a:71.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.10]:32415' (ECDSA) to the list of known hosts.
sudo: no tty present and no askpass program specified
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]
[maltfield@opensourceecology ~]$ 
  1. I added this line to the end of the staging node with 'visudo'
maltfield       ALL=(ALL)       NOPASSWD: ALL
  1. doh, I gotta install rsync on the staging node. so many prereqs...
[maltfield@opensourceecology ~]$ sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" -av --progress /etc/varnish maltfield@10.241.189.11:/etc/
The authenticity of host '[10.241.189.11]:32415 ([10.241.189.11]:32415)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
ECDSA key fingerprint is MD5:ab:eb:7f:f2:bb:83:a1:e5:21:49:1e:22:93:17:70:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.11]:32415' (ECDSA) to the list of known hosts.
sudo: rsync: command not found
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]
[maltfield@opensourceecology ~]$ 
  1. this time the rsync worked!
[maltfield@opensourceecology ~]$ sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" -av --progress /etc/varnish maltfield@10.241.189.11:/etc/
...
sent 192211 bytes  received 503 bytes  128476.00 bytes/sec
total size is 190106  speedup is 0.99
[maltfield@opensourceecology ~]$ 
  1. here's the dir on staging node's side
[root@osestaging1 ~]# ls -lah /etc/varnish 
total 44K
drwxr-xr-x.  5 root root 4.0K Aug 27 06:19 .
drwxr-xr-x. 63 root root 4.0K Oct  3 13:52 ..
-rw-r--r--.  1 root root 1.4K Apr  9 19:10 all-vhosts.vcl
-rw-r--r--.  1 root root  697 Nov 19  2017 catch-all.vcl
drwxr-xr-x.  2 root root 4.0K Aug 27 06:17 conf
-rw-rw-r--.  1 1011 1011  737 Nov 23  2017 default.vcl
drwxr-xr-x.  2 root root 4.0K Apr 12  2018 lib
-rw-------.  1 root root  129 Apr 12  2018 secret
-rw-------.  1 root root  129 Apr 12  2018 secret.20180412.bak
drwxr-xr-x.  2 root root 4.0K Aug 27 06:18 sites-enabled
-rw-r--r--.  1 root root 1.1K Oct 21  2017 varnish.params
[root@osestaging1 ~]# 
  1. again, here's the dirs we want to exclude; the openvpn configs are already preserved
	 /root
	/etc/sudo*
	/etc/openvpn
	/usr/share/easy-rsa
	/dev
	/sys
	/proc
	/boot/
	/etc/sysconfig/network*
	/tmp
	/var/tmp
	/etc/fstab
	/etc/mtab
	/etc/mdadm.conf
  1. aaaand *fingers crossed* I kicked-off the rsync
[maltfield@opensourceecology ~]$ time sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
...
  1. whoops, I got ahead of myself! I killed it & left the staging server in a broken state, so I restored from snapshot & re-did the visudo & install rsync steps. But before we actually kick-off this whole-system rsync, I need to attach a hetzner cloud volume and mount it to /var. Else, the dev node's little disk will fill-up!
[root@osedev1 ~]# lxc-snapshot --name osestaging1 -r snap1
[root@osedev1 ~]# lxc-start -n osestaging1

Wed Oct 02, 2019

  1. continuing on the dev node, I want to create a container for lxc. First I installed 'lxc'
[root@osedev1 ~]# yum install lxc
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
epel/x86_64/metalink                                                                                              |  27 kB  00:00:00
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu
base                                                                                                              | 3.6 kB  00:00:00
epel                                                                                                              | 5.3 kB  00:00:00
extras                                                                                                            | 2.9 kB  00:00:00
updates                                                                                                           | 2.9 kB  00:00:00
(1/6): base/7/x86_64/group_gz                                                                                     | 165 kB  00:00:00
(2/6): base/7/x86_64/primary_db                                                                                   | 6.0 MB  00:00:00
(3/6): epel/x86_64/updateinfo                                                                                     | 1.0 MB  00:00:00
(4/6): updates/7/x86_64/primary_db                                                                                | 1.1 MB  00:00:00
(5/6): epel/x86_64/primary_db                                                                                     | 6.8 MB  00:00:00
(6/6): extras/7/x86_64/primary_db                                                                                 | 152 kB  00:00:00
Resolving Dependencies
--> Running transaction check
---> Package lxc.x86_64 0:1.0.11-2.el7 will be installed
--> Processing Dependency: lua-lxc(x86-64) = 1.0.11-2.el7 for package: lxc-1.0.11-2.el7.x86_64
--> Processing Dependency: lua-alt-getopt for package: lxc-1.0.11-2.el7.x86_64
--> Processing Dependency: liblxc.so.1()(64bit) for package: lxc-1.0.11-2.el7.x86_64
--> Running transaction check
---> Package lua-alt-getopt.noarch 0:0.7.0-4.el7 will be installed
---> Package lua-lxc.x86_64 0:1.0.11-2.el7 will be installed
--> Processing Dependency: lua-filesystem for package: lua-lxc-1.0.11-2.el7.x86_64
---> Package lxc-libs.x86_64 0:1.0.11-2.el7 will be installed
--> Running transaction check
---> Package lua-filesystem.x86_64 0:1.6.2-2.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=========================================================================================================================================
 Package                              Arch                         Version                              Repository                  Size
=========================================================================================================================================
Installing:
 lxc                                  x86_64                       1.0.11-2.el7                         epel                       140 k
Installing for dependencies:
 lua-alt-getopt                       noarch                       0.7.0-4.el7                          epel                       7.4 k
 lua-filesystem                       x86_64                       1.6.2-2.el7                          epel                        28 k
 lua-lxc                              x86_64                       1.0.11-2.el7                         epel                        17 k
 lxc-libs                             x86_64                       1.0.11-2.el7                         epel                       276 k

Transaction Summary
=========================================================================================================================================
Install  1 Package (+4 Dependent packages)

Total download size: 468 k
Installed size: 1.0 M
Is this ok [y/d/N]: y
Downloading packages:
(1/5): lua-alt-getopt-0.7.0-4.el7.noarch.rpm                                                                      | 7.4 kB  00:00:00
(2/5): lua-filesystem-1.6.2-2.el7.x86_64.rpm                                                                      |  28 kB  00:00:00
(3/5): lua-lxc-1.0.11-2.el7.x86_64.rpm                                                                            |  17 kB  00:00:00
(4/5): lxc-1.0.11-2.el7.x86_64.rpm                                                                                | 140 kB  00:00:00
(5/5): lxc-libs-1.0.11-2.el7.x86_64.rpm                                                                           | 276 kB  00:00:00
-----------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                    717 kB/s | 468 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : lxc-libs-1.0.11-2.el7.x86_64                                                                                          1/5
  Installing : lua-filesystem-1.6.2-2.el7.x86_64                                                                                     2/5
  Installing : lua-lxc-1.0.11-2.el7.x86_64                                                                                           3/5
  Installing : lua-alt-getopt-0.7.0-4.el7.noarch                                                                                     4/5
  Installing : lxc-1.0.11-2.el7.x86_64                                                                                               5/5
  Verifying  : lua-lxc-1.0.11-2.el7.x86_64                                                                                           1/5
  Verifying  : lua-alt-getopt-0.7.0-4.el7.noarch                                                                                     2/5
  Verifying  : lxc-1.0.11-2.el7.x86_64                                                                                               3/5
  Verifying  : lua-filesystem-1.6.2-2.el7.x86_64                                                                                     4/5
  Verifying  : lxc-libs-1.0.11-2.el7.x86_64                                                                                          5/5

Installed:
  lxc.x86_64 0:1.0.11-2.el7

Dependency Installed:
  lua-alt-getopt.noarch 0:0.7.0-4.el7 lua-filesystem.x86_64 0:1.6.2-2.el7 lua-lxc.x86_64 0:1.0.11-2.el7 lxc-libs.x86_64 0:1.0.11-2.el7

Complete!
[root@osedev1 ~]#
  1. by default, it appears that we have no lxc containers
[root@osedev1 ~]# ls -lah /usr/share/lxc/templates/
total 8.0K
drwxr-xr-x. 2 root root 4.0K Mar  7  2019 .
drwxr-xr-x. 6 root root 4.0K Oct  2 12:16 ..
[root@osedev1 ~]# 
  1. I installed the 'lxc-templates' package (also from epel), and it gave me templates for many distros, including centos
[root@osedev1 ~]# yum -y install lxc-templates
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu
Resolving Dependencies
--> Running transaction check
---> Package lxc-templates.x86_64 0:1.0.11-2.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=========================================================================================================================================
 Package                             Arch                         Version                               Repository                  Size
=========================================================================================================================================
Installing:
 lxc-templates                       x86_64                       1.0.11-2.el7                          epel                        81 k

Transaction Summary
=========================================================================================================================================
Install  1 Package

Total download size: 81 k
Installed size: 333 k
Downloading packages:
lxc-templates-1.0.11-2.el7.x86_64.rpm                                                                             |  81 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : lxc-templates-1.0.11-2.el7.x86_64                                                                                     1/1
  Verifying  : lxc-templates-1.0.11-2.el7.x86_64                                                                                     1/1

Installed:
  lxc-templates.x86_64 0:1.0.11-2.el7

Complete!
[root@osedev1 ~]# ls -lah /usr/share/lxc/templates/
total 348K
drwxr-xr-x. 2 root root 4.0K Oct  2 12:29 .
drwxr-xr-x. 6 root root 4.0K Oct  2 12:16 ..
-rwxr-xr-x. 1 root root  11K Mar  7  2019 lxc-alpine
-rwxr-xr-x. 1 root root  14K Mar  7  2019 lxc-altlinux
-rwxr-xr-x. 1 root root  11K Mar  7  2019 lxc-archlinux
-rwxr-xr-x. 1 root root 9.5K Mar  7  2019 lxc-busybox
-rwxr-xr-x. 1 root root  30K Mar  7  2019 lxc-centos
-rwxr-xr-x. 1 root root  11K Mar  7  2019 lxc-cirros
-rwxr-xr-x. 1 root root  18K Mar  7  2019 lxc-debian
-rwxr-xr-x. 1 root root  18K Mar  7  2019 lxc-download
-rwxr-xr-x. 1 root root  49K Mar  7  2019 lxc-fedora
-rwxr-xr-x. 1 root root  28K Mar  7  2019 lxc-gentoo
-rwxr-xr-x. 1 root root  14K Mar  7  2019 lxc-openmandriva
-rwxr-xr-x. 1 root root  14K Mar  7  2019 lxc-opensuse
-rwxr-xr-x. 1 root root  35K Mar  7  2019 lxc-oracle
-rwxr-xr-x. 1 root root  12K Mar  7  2019 lxc-plamo
-rwxr-xr-x. 1 root root 6.7K Mar  7  2019 lxc-sshd
-rwxr-xr-x. 1 root root  24K Mar  7  2019 lxc-ubuntu
-rwxr-xr-x. 1 root root  12K Mar  7  2019 lxc-ubuntu-cloud
[root@osedev1 ~]# 
  1. now I was successfully able to create an lxc container for our staging node named 'osestaging1' from the template 'centos'. I didn't specify the version, but it does appear to be centos7
[root@osedev1 ~]# lxc-create -n osestaging1 -t centos
Host CPE ID from /etc/os-release: cpe:/o:centos:centos:7
Checking cache download in /var/cache/lxc/centos/x86_64/7/rootfs ...
Downloading CentOS minimal ...
...
Download complete.
Copy /var/cache/lxc/centos/x86_64/7/rootfs to /var/lib/lxc/osestaging1/rootfs ... 
Copying rootfs to /var/lib/lxc/osestaging1/rootfs ...
sed: can't read /var/lib/lxc/osestaging1/rootfs/etc/init/tty.conf: No such file or directory
Storing root password in '/var/lib/lxc/osestaging1/tmp_root_pass'
Expiring password for user root.
passwd: Success
sed: can't read /var/lib/lxc/osestaging1/rootfs/etc/rc.sysinit: No such file or directory
sed: can't read /var/lib/lxc/osestaging1/rootfs/etc/rc.d/rc.sysinit: No such file or directory

Container rootfs and config have been created.
Edit the config file to check/enable networking setup.

The temporary root password is stored in:

		'/var/lib/lxc/osestaging1/tmp_root_pass'


The root password is set up as expired and will require it to be changed
at first login, which you should do as soon as possible.  If you lose the
root password or wish to change it without starting the container, you
can change it from the host by running the following command (which will
also reset the expired flag):

		chroot /var/lib/lxc/osestaging1/rootfs passwd

[root@osedev1 ~]# 
  1. the sync from prod to sync is going to override the staging root password, so I won't bother creating & setting a distinct root password for this staging container
  2. `lxc-top` shows that we have 0 containers running
[root@osedev1 ~]# lxc-top

Container            CPU      CPU      CPU      BlkIO        Mem
Name                Used      Sys     User      Total       Used
TOTAL (0 )          0.00     0.00     0.00    0.00       0.00   
  1. I tried to start the staging container, but I got a networking error
[root@osedev1 ~]# lxc-start -n osestaging1
lxc-start: conf.c: instantiate_veth: 3115 failed to attach 'vethWX1L1G' to the bridge 'virbr0': No such device
																											  lxc-start: conf.c: lxc_create_network: 3407 failed to create netdev
										lxc-start: start.c: lxc_spawn: 875 failed to create the network
																									   lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
								 lxc-start: lxc_start.c: main: 336 The container failed to start.
lxc-start: lxc_start.c: main: 340 Additional information can be obtained by setting the --logfile and --logpriority options.
[root@osedev1 ~]# 
  1. it looks like there is no 'vibr0' device; we only have the loopback, ethernet, and tun device for openvpn
[root@osedev1 ~]# ip -all address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host 
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
	inet 195.201.233.113/32 brd 195.201.233.113 scope global dynamic eth0
	   valid_lft 56775sec preferred_lft 56775sec
	inet6 2a01:4f8:c010:3ca0::1/64 scope global 
	   valid_lft forever preferred_lft forever
	inet6 fe80::9400:ff:fe2e:489d/64 scope link 
	   valid_lft forever preferred_lft forever
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.1 peer 10.241.189.2/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::4ca6:2d27:e97f:1a66/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osedev1 ~]# 
  1. Ideally, the container would not be given an internet-facing ip address, anyway. It would be better to give it a bridge on the tun0 openvpn network
  2. it looks like the relevant files for containers is in /var/lib/lxc/<containerName>/
[root@osedev1 osestaging1]# date
Wed Oct  2 12:47:07 CEST 2019
[root@osedev1 osestaging1]# pwd
/var/lib/lxc/osestaging1
[root@osedev1 osestaging1]# ls
config  rootfs  tmp_root_pass
[root@osedev1 osestaging1]# 
  1. here is the default config
[root@osedev1 osestaging1]# cat config 
# Template used to create this container: /usr/share/lxc/templates/lxc-centos
# Parameters passed to the template:
# For additional config options, please look at lxc.container.conf(5)
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = virbr0
lxc.network.hwaddr = fe:07:06:a6:5f:1d
lxc.rootfs = /var/lib/lxc/osestaging1/rootfs

# Include common configuration
lxc.include = /usr/share/lxc/config/centos.common.conf

lxc.arch = x86_64
lxc.utsname = osestaging1

lxc.autodev = 1

# When using LXC with apparmor, uncomment the next line to run unconfined:
#lxc.aa_profile = unconfined

# example simple networking setup, uncomment to enable
#lxc.network.type = veth
#lxc.network.flags = up
#lxc.network.link = lxcbr0
#lxc.network.name = eth0
# Additional example for veth network type
#    static MAC address,
#lxc.network.hwaddr = 00:16:3e:77:52:20
#    persistent veth device name on host side
#        Note: This may potentially collide with other containers of same name!
#lxc.network.veth.pair = v-osestaging1-e0

[root@osedev1 osestaging1]# 
  1. to my horror, I discovered that iptables was disabled on the dev server! why!?!
[root@osedev1 osestaging1]# iptables-save
[root@osedev1 osestaging1]# ip6tables-save
[root@osedev1 osestaging1]# service iptables status
Redirecting to /bin/systemctl status iptables.service
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
[root@osedev1 osestaging1]# service iptables start
Redirecting to /bin/systemctl start iptables.service
[root@osedev1 osestaging1]# iptables-save
# Generated by iptables-save v1.4.21 on Wed Oct  2 12:58:21 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [17:1396]
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 1194 -j ACCEPT
-A INPUT -j DROP
COMMIT
# Completed on Wed Oct  2 12:58:21 2019
[root@osedev1 osestaging1]# ip6tables-save
root@osedev1 osestaging1]# service ip6tables start
Redirecting to /bin/systemctl start ip6tables.service
[root@osedev1 osestaging1]# ip6tables-save
# Generated by ip6tables-save v1.4.21 on Wed Oct  2 12:59:51 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p ipv6-icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -d fe80::/64 -p udp -m udp --dport 546 -m state --state NEW -j ACCEPT
-A INPUT -j REJECT --reject-with icmp6-adm-prohibited
-A FORWARD -j REJECT --reject-with icmp6-adm-prohibited
COMMIT
# Completed on Wed Oct  2 12:59:51 2019
[root@osedev1 osestaging1]# 
  1. systemd says that both iptables.service & ip6tables.service are 'loaded active exited'
[root@osedev1 osestaging1]# systemctl list-units | grep -Ei 'iptables|ip6tables'
ip6tables.service                                                                           loaded active exited    IPv6 firewall with ip6tables
iptables.service                                                                            loaded active exited    IPv4 firewall with iptables
[root@osedev1 osestaging1]# 
  1. systemd status shows both services are 'disabled'
[root@osedev1 osestaging1]# systemctl status iptables.service
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:58:17 CEST; 7min ago
  Process: 29121 ExecStart=/usr/libexec/iptables/iptables.init start (code=exited, status=0/SUCCESS)
 Main PID: 29121 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/iptables.service

Oct 02 12:58:17 osedev1 systemd[1]: Starting IPv4 firewall with iptables...
Oct 02 12:58:17 osedev1 iptables.init[29121]: iptables: Applying firewall rules: [  OK  ]
Oct 02 12:58:17 osedev1 systemd[1]: Started IPv4 firewall with iptables.
[root@osedev1 osestaging1]# systemctl status ip6tables.service
● ip6tables.service - IPv6 firewall with ip6tables
   Loaded: loaded (/usr/lib/systemd/system/ip6tables.service; disabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:59:46 CEST; 6min ago
  Process: 29233 ExecStart=/usr/libexec/iptables/ip6tables.init start (code=exited, status=0/SUCCESS)
 Main PID: 29233 (code=exited, status=0/SUCCESS)

Oct 02 12:59:46 osedev1 systemd[1]: Starting IPv6 firewall with ip6tables...
Oct 02 12:59:46 osedev1 ip6tables.init[29233]: ip6tables: Applying firewall rules: [  OK  ]
Oct 02 12:59:46 osedev1 systemd[1]: Started IPv6 firewall with ip6tables.
[root@osedev1 osestaging1]# 
  1. I enabled both, and I confirmed that they're now set to 'enabled' (see second line)
[root@osedev1 osestaging1]# systemctl enable iptables.service
Created symlink from /etc/systemd/system/basic.target.wants/iptables.service to /usr/lib/systemd/system/iptables.service.
[root@osedev1 osestaging1]# systemctl enable ip6tables.service
Created symlink from /etc/systemd/system/basic.target.wants/ip6tables.service to /usr/lib/systemd/system/ip6tables.service.
[root@osedev1 osestaging1]# systemctl status iptables.service
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; enabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:58:17 CEST; 8min ago
 Main PID: 29121 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/iptables.service

Oct 02 12:58:17 osedev1 systemd[1]: Starting IPv4 firewall with iptables...
Oct 02 12:58:17 osedev1 iptables.init[29121]: iptables: Applying firewall rules: [  OK  ]
Oct 02 12:58:17 osedev1 systemd[1]: Started IPv4 firewall with iptables.
[root@osedev1 osestaging1]# systemctl status ip6tables.service
● ip6tables.service - IPv6 firewall with ip6tables
   Loaded: loaded (/usr/lib/systemd/system/ip6tables.service; enabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:59:46 CEST; 7min ago
 Main PID: 29233 (code=exited, status=0/SUCCESS)

Oct 02 12:59:46 osedev1 systemd[1]: Starting IPv6 firewall with ip6tables...
Oct 02 12:59:46 osedev1 ip6tables.init[29233]: ip6tables: Applying firewall rules: [  OK  ]
Oct 02 12:59:46 osedev1 systemd[1]: Started IPv6 firewall with ip6tables.
[root@osedev1 osestaging1]# 
  1. actually, it doesn't make sense to have the staging server only have an ip address on the openvpn subnet; if that were the case, then it couldn't access the internet...which would make developing a POC nearly impossible. We want to prevent forwarding ports from the internet to the machine, but we do want to let it reach OUT to the internet. Perhaps we should setup the bridge per normal and then just have the openvpn client running on he staging server. Indeed, we'll need the prod server to be running an openvpn client, so we should be able to just duplicate this config (they'll be the same anyway!)
  2. I looked into what options are available for 'lxc.network.type', which is listed in section 5 of the man page for 'lxc.container.conf' = `man 5 lxc.container.conf`
	   lxc.network.type
			  specify what kind of network virtualization to be used for the container. Each time a lxc.network.type field is found a
			  new round of network configuration begins. In this way, several network virtualization types can be specified  for  the
			  same  container,  as well as assigning several network interfaces for one container. The different virtualization types
			  can be:

			  none: will cause the container to share the host's network namespace. This means the host network devices are usable in
			  the  container.  It  also  means  that  if both the container and host have upstart as init, 'halt' in a container (for
			  instance) will shut down the host.

			  empty: will create only the loopback interface.

			  veth: a virtual ethernet pair device is created with one side assigned to the container and the other side attached  to
			  a  bridge  specified by the lxc.network.link option.  If the bridge is not specified, then the veth pair device will be
			  created but not attached to any bridge.  Otherwise, the bridge has to be created on the system before starting the con‐
			  tainer.   lxc  won't handle any configuration outside of the container.  By default, lxc chooses a name for the network
			  device belonging to the outside of the container, but if you wish to handle this name yourselves, you can tell  lxc  to
			  set  a  specific  name  with  the lxc.network.veth.pair option (except for unprivileged containers where this option is
			  ignored for security reasons).

			  vlan: a vlan interface is linked with the interface specified by the lxc.network.link and assigned  to  the  container.
			  The vlan identifier is specified with the option lxc.network.vlan.id.

			  macvlan:  a  macvlan  interface is linked with the interface specified by the lxc.network.link and assigned to the con‐
			  tainer.  lxc.network.macvlan.mode specifies the mode the macvlan will use to communicate between different  macvlan  on
			  the  same upper device. The accepted modes are private, the device never communicates with any other device on the same
			  upper_dev (default), vepa, the new Virtual Ethernet Port Aggregator (VEPA) mode, it assumes that  the  adjacent  bridge
			  returns  all  frames  where  both  source and destination are local to the macvlan port, i.e. the bridge is set up as a
			  reflective relay. Broadcast frames coming in from the upper_dev get flooded to all macvlan  interfaces  in  VEPA  mode,
			  local  frames  are  not  delivered  locally,  or  bridge, it provides the behavior of a simple bridge between different
			  macvlan interfaces on the same port. Frames from one interface to another one get delivered directly and are  not  sent
			  out  externally.  Broadcast  frames  get flooded to all other bridge ports and to the external interface, but when they
			  come back from a reflective relay, we don't deliver them again. Since we know all the MAC addresses, the macvlan bridge
			  mode does not require learning or STP like the bridge module does.

			  phys: an already existing interface specified by the lxc.network.link is assigned to the container.
    1. we want the container to be able to touch the internet, so hat rules out 'empty'
    2. we don't have a spare physical interface on the server for each container, so that rules out 'phys'
    3. I'm unclear on the distinction between macvlan, vlan, veth, and none. Probably we want veth and we need to get the 'virbr0' interface actually working
  1. google says our error may be caused by libvert not being installed
  2. I didn't have libvirt installed, so I did so
[root@osedev1 osestaging1]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
	inet 195.201.233.113/32 brd 195.201.233.113 scope global dynamic eth0
	   valid_lft 50735sec preferred_lft 50735sec
	inet6 2a01:4f8:c010:3ca0::1/64 scope global
	   valid_lft forever preferred_lft forever
	inet6 fe80::9400:ff:fe2e:489d/64 scope link
	   valid_lft forever preferred_lft forever
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none
	inet 10.241.189.1 peer 10.241.189.2/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::4ca6:2d27:e97f:1a66/64 scope link flags 800
	   valid_lft forever preferred_lft forever
[root@osedev1 osestaging1]# rpm -qa | grep -i libvirt
[root@osedev1 osestaging1]# yum -y install libvirt
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de   
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu 
Resolving Dependencies
...
Complete!
[root@osedev1 osestaging1]# 
  1. but there didn't appear to be any changes; I had to manually start the libvirtd service to get the changes; now it shows two new interfaces: 'virbr0' & 'virbr0-nic'
[root@osedev1 osestaging1]# systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
	 Docs: man:libvirtd(8)
		   https://libvirt.org
[root@osedev1 osestaging1]# systemctl start libvirtd
[root@osedev1 osestaging1]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
	inet 195.201.233.113/32 brd 195.201.233.113 scope global dynamic eth0
	   valid_lft 50619sec preferred_lft 50619sec
	inet6 2a01:4f8:c010:3ca0::1/64 scope global
	   valid_lft forever preferred_lft forever
	inet6 fe80::9400:ff:fe2e:489d/64 scope link
	   valid_lft forever preferred_lft forever
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none
	inet 10.241.189.1 peer 10.241.189.2/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::4ca6:2d27:e97f:1a66/64 scope link flags 800
	   valid_lft forever preferred_lft forever
6: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
	inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
	   valid_lft forever preferred_lft forever
7: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
[root@osedev1 osestaging1]#
  1. and there's some changes to the routing table too
[root@osedev1 osestaging1]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 100
	link/none 
6: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
7: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
[root@osedev1 osestaging1]# ip r
default via 172.31.1.1 dev eth0 
10.241.189.0/24 via 10.241.189.2 dev tun0 
10.241.189.2 dev tun0 proto kernel scope link src 10.241.189.1 
169.254.0.0/16 dev eth0 scope link metric 1002 
172.31.1.1 dev eth0 scope link 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 
[root@osedev1 osestaging1]# 
  1. now I was successfully able to start the 'osestaging1' container
[root@osedev1 osestaging1]# lxc-start -n osestaging1
systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to CentOS Linux 7 (Core)!

Running in a container, ignoring fstab device entry for /dev/root.
Cannot add dependency job for unit display-manager.service, ignoring: Unit not found.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Swap.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Created slice Root Slice.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Paths.
[  OK  ] Listening on Delayed Shutdown Socket.
[  OK  ] Created slice System Slice.
[  OK  ] Created slice system-getty.slice.
		 Starting Journal Service...
		 Mounting POSIX Message Queue File System...
[  OK  ] Reached target Slices.
		 Starting Read and set NIS domainname from /etc/sysconfig/network...
		 Mounting Huge Pages File System...
		 Starting Remount Root and Kernel File Systems...
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Started Journal Service.
[  OK  ] Started Read and set NIS domainname from /etc/sysconfig/network.
[  OK  ] Started Remount Root and Kernel File Systems.
[  OK  ] Reached target Local File Systems (Pre).
		 Starting Configure read-only root support...
		 Starting Rebuild Hardware Database...
		 Starting Flush Journal to Persistent Storage...
<46>systemd-journald[14]: Received request to flush runtime journal from PID 1
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started Configure read-only root support.
		 Starting Load/Save Random Seed...
[  OK  ] Reached target Local File Systems.
		 Starting Rebuild Journal Catalog...
		 Starting Mark the need to relabel after reboot...
		 Starting Create Volatile Files and Directories...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Reached target Local File Systems.
		 Starting Rebuild Journal Catalog...
		 Starting Mark the need to relabel after reboot...
		 Starting Create Volatile Files and Directories...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Rebuild Journal Catalog.
[  OK  ] Started Mark the need to relabel after reboot.
[  OK  ] Started Create Volatile Files and Directories.
		 Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Started Rebuild Hardware Database.
		 Starting Update is Completed...
[  OK  ] Started Update is Completed.
[  OK  ] Reached target System Initialization.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
		 Starting LSB: Bring up/down networking...
		 Starting Permit User Sessions...
		 Starting Login Service...
		 Starting OpenSSH Server Key Generation...
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Started Permit User Sessions.
		 Starting Cleanup of Temporary Directories...
[  OK  ] Started Command Scheduler.
[  OK  ] Started Console Getty.
[  OK  ] Reached target Login Prompts.
[  OK  ] Started Cleanup of Temporary Directories.
[  OK  ] Started Login Service.
[  OK  ] Started OpenSSH Server Key Generation.

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

osestaging1 login:
  1. I was successfully able to login as root, but it made me change the password immedately. I just set it to the same root password as our prod server
osestaging1 login: root
Password: 
You are required to change your password immediately (root enforced)
Changing password for root.
(current) UNIX password: 
New password: 
Retype new password: 
[root@osestaging1 ~]# 
  1. this new container has an ip address of '192.168.122.201', and it does have access to the internet
[root@osestaging1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host 
	   valid_lft forever preferred_lft forever
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
	link/ether fe:07:06:a6:5f:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
	inet 192.168.122.201/24 brd 192.168.122.255 scope global dynamic eth0
	   valid_lft 3310sec preferred_lft 3310sec
	inet6 fe80::fc07:6ff:fea6:5f1d/64 scope link 
	   valid_lft forever preferred_lft forever
[root@osestaging1 ~]# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=55 time=5.46 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=55 time=5.48 ms

--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 5.468/5.474/5.480/0.006 ms
[root@osestaging1 ~]# 
  1. on the dev node host, we can also see the bridge with `brctl`
[root@osedev1 osestaging1]# brctl show
bridge name     bridge id               STP enabled     interfaces
virbr0          8000.5254007d0171       yes             vethYMJVGD
														virbr0-nic
[root@osedev1 osestaging1]# 
  1. now I think we're about ready to initiate this sync. Interesting decision: we could either rsync (via ssh) to the dev node or to the staging container. I think it would be safer to go to the container, as you can't fuck up the host dev node in that case.
  2. I confirmed that ssh is listening on the default install of the staging container
[root@osestaging1 ~]# ss -plan | grep -i ssh
u_str  ESTAB      0      0         * 162265                * 0                   users:(("sshd",pid=298,fd=2),("sshd",pid=298,fd=1))
tcp    LISTEN     0      128       *:22                    *:*                   users:(("sshd",pid=298,fd=3))
tcp    LISTEN     0      128    [::]:22                 [::]:*                   users:(("sshd",pid=298,fd=4))
[root@osestaging1 ~]# 
  1. I did some basic bootstrap config of the staging container, following my documentation for doing the same to its host dev server Maltfield_Log/2019_Q3#Tue_Aug_20.2C_2019
[root@osestaging1 ~]# useradd maltfield
[root@osestaging1 ~]# su - maltfield
[maltfield@osestaging1 ~]$ mkdir .ssh
[maltfield@osestaging1 ~]$ echo "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGNYjR7UKiJSAG/AbP+vlCBqNfQZ2yuSXfsEDuM7cEU8PQNJyuJnS7m0VcA48JRnpUpPYYCCB0fqtIEhpP+szpMg2LByfTtbU0vDBjzQD9mEfwZ0mzJsfzh1Nxe86l/d6h6FhxAqK+eG7ljYBElDhF4l2lgcMAl9TiSba0pcqqYBRsvJgQoAjlZOIeVEvM1lyfWfrmDaFK37jdUCBWq8QeJ98qpNDX4A76f9T5Y3q5EuSFkY0fcU+zwFxM71bGGlgmo5YsMMdSsW+89fSG0652/U4sjf4NTHCpuD0UaSPB876NJ7QzeDWtOgyBC4nhPpS8pgjsnl48QZuVm6FNDqbXr9bVk5BdntpBgps+gXdSL2j0/yRRayLXzps1LCdasMCBxCzK+lJYWGalw5dNaIDHBsEZiK55iwPp0W3lU9vXFO4oKNJGFgbhNmn+KAaW82NBwlTHo/tOlj2/VQD9uaK5YLhQqAJzIq0JuWZWFLUC2FJIIG0pJBIonNabANcN+vq+YJqjd+JXNZyTZ0mzuj3OAB/Z5zS6lT9azPfnEjpcOngFs46P7S/1hRIrSWCvZ8kfECpa8W+cTMus4rpCd40d1tVKzJA/n0MGJjEs2q4cK6lC08pXxq9zAyt7PMl94PHse2uzDFhrhh7d0ManxNZE+I5/IPWOnG1PJsDlOe4Yqw== michael@opensourceecology.org" > .ssh/authorized_keys
[maltfield@osestaging1 ~]$ chmod 0700 .ssh
[maltfield@osestaging1 ~]$ chmod 0600 .ssh/authorized_keys
[maltfield@osestaging1 ~]$ 
  1. I confirmed that I could now successfully ssh in as 'maltfield' using my key into staging from within dev
user@ose:~$ ssh -A osedev1
Last login: Wed Oct  2 12:09:35 2019 from 5.254.96.238
[maltfield@osedev1 ~]$ ssh maltfield@192.168.122.201 hostname
The authenticity of host '192.168.122.201 (192.168.122.201)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
ECDSA key fingerprint is MD5:ab:eb:7f:f2:bb:83:a1:e5:21:49:1e:22:93:17:70:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.122.201' (ECDSA) to the list of known hosts.
osestaging1
[maltfield@osedev1 ~]$ 
  1. and continued with the bootstrap of my user, giving myself sudo rights
[root@osestaging1 ~]# yum -y install sudo
...
Installed:
  sudo.x86_64 0:1.8.23-4.el7                                                                                                             

Complete!
[root@osestaging1 ~]# passwd maltfield
Changing password for user maltfield.
New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.
[root@osestaging1 ~]# gpasswd -a maltfield wheel
Adding user maltfield to group wheel
[root@osestaging1 ~]# su - maltfield
Last login: Wed Oct  2 13:00:29 UTC 2019 on lxc/console
[maltfield@osestaging1 ~]$ sudo su -

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

	#1) Respect the privacy of others.
	#2) Think before you type.
	#3) With great power comes great responsibility.

[sudo] password for maltfield: 
Last login: Wed Oct  2 12:33:00 UTC 2019 on lxc/console
[root@osestaging1 ~]# 
  1. this time I took the hardened config from dev and gave it to staging; first on dev I ran:
user@ose:~$ ssh osedev1
Last login: Wed Oct  2 14:57:15 2019 from 5.254.96.238
[maltfield@osedev1 ~]$ sudo cp /etc/ssh/sshd_config .
[maltfield@osedev1 ~]$ sudo chown maltfield sshd_config 
[maltfield@osedev1 ~]$ scp sshd_config 192.168.122.201:
sshd_config                                   100% 4455     5.7MB/s   00:00    
[maltfield@osedev1 ~]$ 
  1. and then in staging
[maltfield@osestaging1 ~]$ ls
sshd_config
[maltfield@osestaging1 ~]$ sudo su -
[sudo] password for maltfield:
Last login: Wed Oct  2 13:02:02 UTC 2019 on lxc/console
[root@osestaging1 ~]# cd /etc/ssh
[root@osestaging1 ssh]# mv sshd_config sshd_config.20191002.orig
[root@osestaging1 ssh]# mv /home/maltfield/sshd_config .
[root@osestaging1 ssh]# ls -lah
total 620K
drwxr-xr-x.  2 root      root      4.0K Oct  2 13:16 .
drwxr-xr-x. 60 root      root      4.0K Oct  2 13:01 ..
-rw-r--r--.  1 root      root      569K Aug  9 01:40 moduli
-rw-r--r--.  1 root      root      2.3K Aug  9 01:40 ssh_config
-rw-r-----.  1 root      ssh_keys   227 Oct  2 12:28 ssh_host_ecdsa_key
-rw-r--r--.  1 root      root       162 Oct  2 12:28 ssh_host_ecdsa_key.pub
-rw-r-----.  1 root      ssh_keys   387 Oct  2 12:28 ssh_host_ed25519_key
-rw-r--r--.  1 root      root        82 Oct  2 12:28 ssh_host_ed25519_key.pub
-rw-r-----.  1 root      ssh_keys  1.7K Oct  2 12:28 ssh_host_rsa_key
-rw-r--r--.  1 root      root       382 Oct  2 12:28 ssh_host_rsa_key.pub
-rw-------.  1 maltfield maltfield 4.4K Oct  2 13:07 sshd_config
-rw-------.  1 root      root      3.9K Aug  9 01:40 sshd_config.20191002.orig
[root@osestaging1 ssh]# chown root:root sshd_config
[root@osestaging1 ssh]# ls -lah
total 620K
drwxr-xr-x.  2 root root     4.0K Oct  2 13:16 .
drwxr-xr-x. 60 root root     4.0K Oct  2 13:01 ..
-rw-r--r--.  1 root root     569K Aug  9 01:40 moduli
-rw-r--r--.  1 root root     2.3K Aug  9 01:40 ssh_config
-rw-r-----.  1 root ssh_keys  227 Oct  2 12:28 ssh_host_ecdsa_key
-rw-r--r--.  1 root root      162 Oct  2 12:28 ssh_host_ecdsa_key.pub
-rw-r-----.  1 root ssh_keys  387 Oct  2 12:28 ssh_host_ed25519_key
-rw-r--r--.  1 root root       82 Oct  2 12:28 ssh_host_ed25519_key.pub
-rw-r-----.  1 root ssh_keys 1.7K Oct  2 12:28 ssh_host_rsa_key
-rw-r--r--.  1 root root      382 Oct  2 12:28 ssh_host_rsa_key.pub
-rw-------.  1 root root     4.4K Oct  2 13:07 sshd_config
-rw-------.  1 root root     3.9K Aug  9 01:40 sshd_config.20191002.orig
[root@osestaging1 ssh]# grep AllowGroups sshd_config
AllowGroups sshaccess
[root@osestaging1 ssh]# grep sshaccess /etc/group
[root@osestaging1 ssh]# groupadd sshaccess
[root@osestaging1 ssh]# gpasswd -a maltfield sshaccess
Adding user maltfield to group sshaccess
[root@osestaging1 ssh]# grep sshaccess /etc/group
sshaccess:x:1001:maltfield
[root@osestaging1 ssh]# systemctl restart sshd
[root@osestaging1 ssh]# 
  1. confirmed that I could still ssh-in on the new non-standard port from dev to staging
user@ose:~$ ssh osedev1
Last login: Wed Oct  2 15:13:21 2019 from 5.254.96.225
[maltfield@osedev1 ~]$ ssh maltfield@192.168.122.201 hostname
ssh: connect to host 192.168.122.201 port 22: Connection refused
[maltfield@osedev1 ~]$ ssh -p 32415 maltfield@192.168.122.201 hostname
osestaging1
[maltfield@osedev1 ~]$ 
  1. I could go on further to setup iptables to block things incoming, but the beauty of the fact that this is a container with a NAT'd private ip address on a host with iptables locked-down on its internet-facing ip address is that we really don't need to do that. It's already inaccessible to the internet, and it will only be accessible from the dev node--onto which our developers will vpn into as a necessary prerequisite to reach this staging node
  2. let's make it so that prod can touch staging; we'll create a cert for openvpn for our prod node, and install it on both our prod & staging nodes. Then we'll update our openvpn config to include the client-to-client option https://openvpn.net/community-resources/how-to/#scope
  3. before continuing, it would be wise to create a snapshot of the staging container
[root@osedev1 ssh]# lxc-snapshot --name osestaging1 --list
No snapshots
[root@osedev1 ssh]# lxc-snapshot --name osestaging1 afterBootstrap
lxc_container: lxccontainer.c: lxcapi_snapshot: 2891 Snapshot of directory-backed container requested.
lxc_container: lxccontainer.c: lxcapi_snapshot: 2892 Making a copy-clone.  If you do want snapshots, then
lxc_container: lxccontainer.c: lxcapi_snapshot: 2893 please create an aufs or overlayfs clone first, snapshot that
lxc_container: lxccontainer.c: lxcapi_snapshot: 2894 and keep the original container pristine.
lxc_container: lxccontainer.c: lxcapi_clone: 2643 error: Original container (osestaging1) is running
lxc_container: lxccontainer.c: lxcapi_snapshot: 2899 clone of /var/lib/lxc:osestaging1 failed
lxc_container: lxc_snapshot.c: do_snapshot: 55 Error creating a snapshot
[root@osedev1 ssh]#
  1. I tried to create a snapshot; it told me that it can't do deltas unless I use overlayfs or aufs (or probably also zfs, butter, etc). It failed probably because the container is not stopped. I stopped it and tried again.
[root@osedev1 ssh]# lxc-snapshot --name osestaging1 afterBootstrap
lxc_container: lxccontainer.c: lxcapi_snapshot: 2891 Snapshot of directory-backed container requested.
lxc_container: lxccontainer.c: lxcapi_snapshot: 2892 Making a copy-clone.  If you do want snapshots, then
lxc_container: lxccontainer.c: lxcapi_snapshot: 2893 please create an aufs or overlayfs clone first, snapshot that
lxc_container: lxccontainer.c: lxcapi_snapshot: 2894 and keep the original container pristine.
[root@osedev1 ssh]# lxc-snapshot --name osestaging1 --list
snap0 (/var/lib/lxcsnaps/osestaging1) 2019:10:02 15:37:58
[root@osedev1 ssh]# 
  1. so our container is 0.5G, and so is our 1x snapshot
[root@osedev1 ssh]# du -sh /var/lib/lxcsnaps/*
459M    /var/lib/lxcsnaps/osestaging1
[root@osedev1 ssh]# du -sh /var/lib/lxc/*
459M    /var/lib/lxc/osestaging1
[root@osedev1 ssh]# 
  1. eventually we'll need to mount the external block volume to /var/, especially before the sync from pod
[root@osedev1 ssh]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        19G  2.4G   16G  14% /
devtmpfs        873M     0  873M   0% /dev
tmpfs           896M     0  896M   0% /dev/shm
tmpfs           896M   17M  879M   2% /run
tmpfs           896M     0  896M   0% /sys/fs/cgroup
/dev/sdb        9.8G   37M  9.3G   1% /mnt/HC_Volume_3110278
tmpfs           180M     0  180M   0% /run/user/1000
[root@osedev1 ssh]# 
  1. as for backups, I created new API keys that have access to only the 'ose-dev-server-backups' bucket.
  2. because randomware is a topic of concern (and where the randomware deletes your backups), I also noticed that when we create the api key, we can remove the 'deleteFiles' and 'deleteBuckets' capabilities (the cleanup is actually done by the storage rules on backblaze's sides--not our script's logic) Apparently there's no way to edit the capabilities of exiting keys, so this would be a non-trivial change.
  3. I wrote the api key creds to osedev1:/root/scripts/backup.settings
  4. And I created a new 4K encryption key. TO make it clearer, I named it 'ose-dev-backups-cron.201910.key'. I added it to the shared ose keepass db under "backups" (files attached are under the "Advanced" tab)
  5. I also installed the b2cli depends to the dev node, unfortunately I hit some issues https://wiki.opensourceecology.org/wiki/Backblaze#Install_CLI
[root@osedev1 backups]# yum install python-virtualenv
...
Installed:
  python-virtualenv.noarch 0:15.1.0-2.el7

Dependency Installed:
  python-devel.x86_64 0:2.7.5-86.el7    python-rpm-macros.noarch 0:3-32.el7  python-srpm-macros.noarch 0:3-32.el7
  python2-rpm-macros.noarch 0:3-32.el7

Dependency Updated:
  python.x86_64 0:2.7.5-86.el7                          python-libs.x86_64 0:2.7.5-86.el7

Complete!
[root@osedev1 backups]# yum install python-setuptools
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu
Package python-setuptools-0.9.8-7.el7.noarch already installed and latest version
Nothing to do
[root@osedev1 backups]# yum install git
...
Installed:
  git.x86_64 0:1.8.3.1-20.el7                                                                                     

Dependency Installed:
  perl-Error.noarch 1:0.17020-2.el7   perl-Git.noarch 0:1.8.3.1-20.el7   perl-TermReadKey.x86_64 0:2.30-20.el7  

Complete!
[root@osedev1 backups]# adduser b2user
[root@osedev1 backups]# sudo su - b2user
[b2user@osedev1 ~]$ mkdir virtualenv
[b2user@osedev1 ~]$ cd virtualenv/
[b2user@osedev1 virtualenv]$ virtualenv .
New python executable in /home/b2user/virtualenv/bin/python
Installing setuptools, pip, wheel...done.
[b2user@osedev1 virtualenv]$ cd ..
[b2user@osedev1 ~]$ mkdir sandbox
[b2user@osedev1 ~]$ cd sandbox/
[b2user@osedev1 sandbox]$ git clone https://github.com/Backblaze/B2_Command_Line_Tool.git
Cloning into 'B2_Command_Line_Tool'...
remote: Enumerating objects: 151, done.
remote: Counting objects: 100% (151/151), done.
remote: Compressing objects: 100% (93/93), done.
remote: Total 7130 (delta 90), reused 102 (delta 55), pack-reused 6979
Receiving objects: 100% (7130/7130), 1.80 MiB | 3.35 MiB/s, done.
Resolving deltas: 100% (5127/5127), done.
[b2user@osedev1 sandbox]$ cd B2_Command_Line_Tool/
[b2user@osedev1 B2_Command_Line_Tool]$ python setup.py install
setuptools 20.2 or later is required. To fix, try running: pip install "setuptools>=20.2"
[b2user@osedev1 B2_Command_Line_Tool]$ 
  1. I hate using pip; it often breaks the OS and apps installed, but I bit my tounge & proceeded (I wouldn't do this on prod)
[root@osedev1 backups]# yum install python3-setuptools
Installed:
  python3-setuptools.noarch 0:39.2.0-10.el7

Dependency Installed:   
  python3.x86_64 0:3.6.8-10.el7      python3-libs.x86_64 0:3.6.8-10.el7      python3-pip.noarch 0:9.0.3-5.el7

Complete!
[root@osedev1 backups]#
[root@osedev1 backups]# pip install "setuptools>=20.2"
-bash: pip: command not found
[root@osedev1 backups]# yum install python-pip
...
Installed:
  python2-pip.noarch 0:8.1.2-10.el7

Complete!
[root@osedev1 backups]# pip install "setuptools>=20.2"
Collecting setuptools>=20.2
  Downloading https://files.pythonhosted.org/packages/b2/86/095d2f7829badc207c893dd4ac767e871f6cd547145df797ea26baea4e2e/setuptools-41.2.0-py2.py3-none-any.whl (576kB)
	100% || 583kB 832kB/s
Installing collected packages: setuptools
  Found existing installation: setuptools 0.9.8
	Uninstalling setuptools-0.9.8:
	  Successfully uninstalled setuptools-0.9.8
Successfully installed setuptools-41.2.0
You are using pip version 8.1.2, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[root@osedev1 backups]# pip install --upgrade pip
Collecting pip
  Downloading https://files.pythonhosted.org/packages/30/db/9e38760b32e3e7f40cce46dd5fb107b8c73840df38f0046d8e6514e675a1/pip-19.2.3-py2.py3-none-any.whl (1.4MB)
	100% || 1.4MB 511kB/s
Installing collected packages: pip
  Found existing installation: pip 8.1.2
	Uninstalling pip-8.1.2:
	  Successfully uninstalled pip-8.1.2
Successfully installed pip-19.2.3
[root@osedev1 backups]# 
  1. when it came time to install it, I had to add the '--user' flag
[b2user@osedev1 B2_Command_Line_Tool]$ python setup.py install --user
...
Installed /home/b2user/.local/lib/python2.7/site-packages/python_dateutil-2.8.0-py2.7.egg
Searching for setuptools==41.2.0
Best match: setuptools 41.2.0
Adding setuptools 41.2.0 to easy-install.pth file
Installing easy_install script to /home/b2user/.local/bin
Installing easy_install-3.6 script to /home/b2user/.local/bin

Using /usr/lib/python2.7/site-packages
Finished processing dependencies for b2==1.4.1
[b2user@osedev1 B2_Command_Line_Tool]$ 
[b2user@osedev1 B2_Command_Line_Tool]$ ^C
[b2user@osedev1 B2_Command_Line_Tool]$  ~/.local/bin/b2 version
b2 command line tool, version 1.4.1
[b2user@osedev1 B2_Command_Line_Tool]$