My work log from the year 2019 Quarter 4. I intentionally made this verbose to make future admin's work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.

Tue Oct 08, 2019

continuing from yesterday, I checked-up on the rsync running from prod to staging, and it appears to have stalled

	75497472 100%    2.90MB/s    0:00:24 (xfer#4297, to-check=1538/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000d4a57-00058e887df34962.journal   
	75497472 100%    2.80MB/s    0:00:25 (xfer#4298, to-check=1537/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000e7f5a-00058ec8f2c8422b.journal   
	23429120  31%    2.91MB/s    0:00:17

it's probably not a good idea to sync the /run dir..
attempting to ssh into the server fails

user@ose:~/openvpn$ ssh osestaging1
The authenticity of host '[10.241.189.11]:32415 ([10.241.189.11]:32415)' can't be established.
ECDSA key fingerprint is SHA256:HclF8ZQOjGqx+9TmwL111kZ7QxgKkoEw8g3l2YxV0gk.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.11]:32415' (ECDSA) to the list of known hosts.
Permission denied (publickey).
user@ose:~/openvpn$

I _can_ get into the staging server from the lxc-console on the dev server, but it doesn't look like anything is wrong with the setup of my user

[root@osestaging1 ~]# grep maltfield /etc/passwd
maltfield:x:1005:1005::/home/maltfield:/bin/bash
[root@osestaging1 ~]# grep maltfield /etc/shadow
maltfield:TRUNCATED
[root@osestaging1 ~]# grep maltfield /etc/group
wheel:x:10:maltfield,crupp,tgriffing,root
apache:x:48:cmota,crupp,maltfield,wp,apache,marcin
maltfield:x:1005:apache
sshaccess:x:1006:cmota,marcin,tgriffing,maltfield,lberezhny,crupp
keepass:x:993:maltfield,marcin,cmota,crupp
apache-admins:x:1012:cmota,maltfield,marcin,crupp,tgriffing,wp,apache
[root@osestaging1 ~]# ls -lah /home/maltfield/.ssh
total 16K
drwxr-xr-x.  2 tgriffing maltfield 4.0K Jan 19  2018 .
drwx------. 10 tgriffing maltfield 4.0K Oct  3 07:06 ..
-rw-r--r--.  1 root      root       750 Jun 20  2017 authorized_keys
-rw-r--r--.  1 tgriffing tgriffing 1.1K Oct  3 13:44 known_hosts
[root@osestaging1 ~]# cat /home/maltfield/.ssh/authorized_keys 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGNYjR7UKiJSAG/AbP+vlCBqNfQZ2yuSXfsEDuM7cEU8PQNJyuJnS7m0VcA48JRnpUpPYYCCB0fqtIEhpP+szpMg2LByfTtbU0vDBjzQD9mEfwZ0mzJsfzh1Nxe86l/d6h6FhxAqK+eG7ljYBElDhF4l2lgcMAl9TiSba0pcqqYBRsvJgQoAjlZOIeVEvM1lyfWfrmDaFK37jdUCBWq8QeJ98qpNDX4A76f9T5Y3q5EuSFkY0fcU+zwFxM71bGGlgmo5YsMMdSsW+89fSG0652/U4sjf4NTHCpuD0UaSPB876NJ7QzeDWtOgyBC4nhPpS8pgjsnl48QZuVm6FNDqbXr9bVk5BdntpBgps+gXdSL2j0/yRRayLXzps1LCdasMCBxCzK+lJYWGalw5dNaIDHBsEZiK55iwPp0W3lU9vXFO4oKNJGFgbhNmn+KAaW82NBwlTHo/tOlj2/VQD9uaK5YLhQqAJzIq0JuWZWFLUC2FJIIG0pJBIonNabANcN+vq+YJqjd+JXNZyTZ0mzuj3OAB/Z5zS6lT9azPfnEjpcOngFs46P7S/1hRIrSWCvZ8kfECpa8W+cTMus4rpCd40d1tVKzJA/n0MGJjEs2q4cK6lC08pXxq9zAyt7PMl94PHse2uzDFhrhh7d0ManxNZE+I5/IPWOnG1PJsDlOe4Yqw== guttersnipe@guttersnipe

[root@osestaging1 ~]#

ssh appears to be running too

[root@osestaging1 ~]# systemctl list-units | grep -i ssh
sshd.service                      loaded active running   OpenSSH server daemon
[root@osestaging1 ~]# ss -plan | grep -i ssh
u_str  ESTAB      0      0         * 32621                 * 32622               users:(("sshd",pid=350,fd=5))
u_dgr  UNCONN     0      0         * 32618                 * 29344               users:(("sshd",pid=350,fd=4),("sshd",pid=348,fd=4))
u_str  ESTAB      0      0         * 31143                 * 0                   users:(("sshd",pid=274,fd=2),("sshd",pid=274,fd=1))
u_str  ESTAB      0      0         * 32622                 * 32621               users:(("sshd",pid=348,fd=7))
tcp    LISTEN     0      128       *:32415                 *:*                   users:(("sshd",pid=274,fd=3))
tcp    ESTAB      0      0      10.241.189.11:32415              10.241.189.10:41270               users:(("sshd",pid=350,fd=3),("sshd",pid=348,fd=3))
tcp    LISTEN     0      128    [::]:32415              [::]:*                   users:(("sshd",pid=274,fd=4))
[root@osestaging1 ~]#

the ssh server logs say that the client just disconnects

Oct  8 05:57:01 localhost sshd[3586]: Connection closed by 10.241.189.10 port 41334 [preauth]

the ssh client says that the server rejected our public key

user@ose:~/openvpn$ ssh -vvv osestaging1
...
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/user/.ssh/id_rsa.ose
debug3: send_pubkey_test
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey
debug2: we did not send a packet, disable method
debug1: No more authentication methods to try.
Permission denied (publickey).
user@ose:~/openvpn$

I did notice that the ownership of the relevant /home/.ssh/authorized_keys file differs on the prod & staging servers

[maltfield@opensourceecology ~]$ ls -lahn /home/maltfield/.ssh
total 16K
drwxr-xr-x  2 1005 1005 4.0K Jan 19  2018 .
drwx------ 10 1005 1005 4.0K Oct  3 07:06 ..
-rw-r--r--  1    0    0  750 Jun 20  2017 authorized_keys
-rw-r--r--  1 1005 1005 1.1K Oct  3 13:44 known_hosts
[maltfield@opensourceecology ~]$

[root@osestaging1 ~]# ls -lahn /home/maltfield/.ssh
total 16K
drwxr-xr-x.  2 1000 1005 4.0K Jan 19  2018 .
drwx------. 10 1000 1005 4.0K Oct  3 07:06 ..
-rw-r--r--.  1    0    0  750 Jun 20  2017 authorized_keys
-rw-r--r--.  1 1000 1000 1.1K Oct  3 13:44 known_hosts
[root@osestaging1 ~]#

while the passwd, group, and shadow files all match

[root@opensourceecology ~]# md5sum /etc/passwd
cabf495ca12f7f32605eb764dd12c861  /etc/passwd
[root@opensourceecology ~]# md5sum /etc/group
04a70553d59a646406ecb89f2f7b17b5  /etc/group
[root@opensourceecology ~]# md5sum /etc/shadow
6f27deaf639ae2db1a1d94739a8bb834  /etc/shadow
[root@opensourceecology ~]#

[root@osestaging1 ~]# md5sum /etc/passwd
cabf495ca12f7f32605eb764dd12c861  /etc/passwd
[root@osestaging1 ~]# md5sum /etc/group
04a70553d59a646406ecb89f2f7b17b5  /etc/group
[root@osestaging1 ~]# md5sum /etc/shadow
6f27deaf639ae2db1a1d94739a8bb834  /etc/shadow
[root@osestaging1 ~]#

for some reason my '/home/maltfield' dir was also owned by 'tgriffin'. I was able to ssh-in again after fixing this

[root@osestaging1 ~]# chown -R maltfield:maltfield /home/maltfield/
[root@osestaging1 ~]# ls -lah /home
total 52K
drwxr-xr-x. 13 root       root       4.0K Jul 28  2018 .
dr-xr-xr-x. 20 root       root       4.0K Oct  7 10:05 ..
drwx------.  7 b2user     b2user     4.0K Oct  7 07:46 b2user
drwx------.  5 cmota      cmota      4.0K Jul 14  2017 cmota
drwx------.  5 crupp      crupp      4.0K Aug 12  2017 crupp
drwx------.  2 Flipo      Flipo      4.0K Sep 20  2016 Flipo
drwx------.  2 hart       hart       4.0K Mar 30  2017 hart
drwx------.  3 lberezhny  lberezhny  4.0K Jul 20  2017 lberezhny
drwx------. 10 maltfield  maltfield  4.0K Oct  3 07:06 maltfield
drwx------.  4 marcin     marcin     4.0K Jul  6  2017 marcin
drwx------.  2 not-apache not-apache 4.0K Feb 12  2018 not-apache
drwx------.  5 tgriffing  tgriffing  4.0K Aug  1 09:19 tgriffing
drwx------.  5 wp         wp         4.0K Oct  7  2017 wp
[root@osestaging1 ~]#

I re-opened the screen for the rsync, and it now exited

	75497472 100%    2.90MB/s    0:00:24 (xfer#4297, to-check=1538/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000d4a57-00058e887df34962.journal   
	75497472 100%    2.80MB/s    0:00:25 (xfer#4298, to-check=1537/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000e7f5a-00058ec8f2c8422b.journal   
	23429120  31%    2.91MB/s    0:00:17





packet_write_wait: Connection to 10.241.189.11 port 32415: Broken pipe

rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: connection unexpectedly closed (119371 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9]

real    1059m42.282s
user    12m34.775s
sys     3m5.253s
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$ time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

I updated the rsync command to exclude /run, and I kicked-off the rsync again

time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

ah, ffs! my internet connection here failed me, and I was silently disconnected from my ssh session with the prod node and dumped into a local shell. So I ended-up kicking off this rsync not from the prod node on which I was ssh'd, but my personal laptop (when I was dropped out of the prod server's ssh shell into my laptop's shell). By the time I realized it, the fucking staging server was broken!
fucking hell, I had successfully copied 35G overnight; now I have to restore from snapshot and start over.
I prepended a fucking hostname check to make sure this stupid shit doesn't happen again

[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

I had a bunch of issues restoring from snapshot; eventually I just did an rsync of the '/var/lib/lxcsnaps/osestaging1/snap1' dir to '/var/lib/lxc/osestaging1', and I was finally successfully able to `lxc-start -n osestaging1`
I did the `visudo` and install of rsync and re-initiated the rsync from prod to staging using the above-command. I noticed that I forgot to exclude the backups; here's what I should use next time

[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

while that ran, I checked our munin graphs. I nice'd & bwlimit'd the above rsync, but it's still good to check.
1. there's a spike in varnish requests, which is a bit odd
2. there was a shift in memory usage, but no issues there
3. load spiked to ~2, but our box has 8; no problems
4. there was a spike in 'nice' to ~100% cpu usage; cool
5. firewall throughput, eth0 traffic spiked to about the same level as our backups. excellent
6. there's a huge spike in disk usage read, disk IO that's much higher than backups; hmm
I also noted that the apache graphs that I added some time ago are blank; I probably have to setup an apache stats vhost for munin to scrape
munin processing graphs are also blank; hmm
all mysql graphs are also blank
even nginx graphs are all blank
I also added plugins for monitoring the 'mysqld' process and the memory of a bunch of processes

[root@opensourceecology plugins]# ls
apache_access       if_err_eth0        mysql_slowqueries   uptime                       varnish_memory_usage.bak
apache_processes    if_eth0            mysql_threads       users                        varnish_objects
apache_volume       interrupts         nginx_request       varnish4_                    varnish_objects.bak
cpu                 irqstats           nginx_status        varnish_backend_traffic      varnish_request_rate
df                  load               open_files          varnish_backend_traffic.bak  varnish_request_rate.bak
df_inode            memory             open_inodes         varnish_bad                  varnish_threads
diskstats           munin_stats        postfix_mailqueue   varnish_bad.bak              varnish_threads.bak
entropy             mysql_             postfix_mailvolume  varnish_expunge              varnish_transfer_rates
forks               mysql_bytes        processes           varnish_expunge.bak          varnish_transfer_rates.bak
fw_conntrack        mysql_innodb       proc_pri            varnish_hit_rate             varnish_uptime
fw_forwarded_local  mysql_isam_space_  swap                varnish_hit_rate.bak         varnish_uptime.bak
fw_packets          mysql_queries      threads             varnish_memory_usage         vmstat
[root@opensourceecology plugins]# ls -lah | head -n 5
total 36K
drwxr-xr-x 2 root root 4.0K Sep  7 07:37 .
drwxr-xr-x 8 root root 4.0K Jun 24 16:05 ..
lrwxrwxrwx 1 root root   38 Sep  7 07:36 apache_access -> /usr/share/munin/plugins/apache_access
lrwxrwxrwx 1 root root   41 Sep  7 07:36 apache_processes -> /usr/share/munin/plugins/apache_processes
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/multip
multiping       multips         multips_memory  
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/multips_memory
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/ps_ ps_mysqld
[root@opensourceecology plugins]#

for the munin mysql graphs, it looks like I need to grant access for the 'munin' user

[root@opensourceecology plugin-conf.d]# munin-run --debug mysql_queries
# Processing plugin configuration from /etc/munin/plugin-conf.d/amavis
# Processing plugin configuration from /etc/munin/plugin-conf.d/df
# Processing plugin configuration from /etc/munin/plugin-conf.d/fw_
# Processing plugin configuration from /etc/munin/plugin-conf.d/hddtemp_smartctl
# Processing plugin configuration from /etc/munin/plugin-conf.d/munin-node
# Processing plugin configuration from /etc/munin/plugin-conf.d/postfix
# Processing plugin configuration from /etc/munin/plugin-conf.d/postgres
# Processing plugin configuration from /etc/munin/plugin-conf.d/sendmail
# Processing plugin configuration from /etc/munin/plugin-conf.d/zzz-ose
# Setting /rgid/ruid/ to /99/99/
# Setting /egid/euid/ to /99 99/99/
# Setting up environment
# Environment mysqlopts = -u munin
# About to run '/etc/munin/plugins/mysql_queries'
mysqladmin: connect to server at 'localhost' failed
error: 'Access denied for user 'munin'@'localhost' (using password: NO)'
[root@opensourceecology plugin-conf.d]#

woah, this guide suggests that there's a ton more graphs than just is what symlink-able https://blog.penumbra.be/2010/04/monitoring-mysql-munin-directadmin/

[root@opensourceecology plugins]# ls -lah mysql_*
lrwxrwxrwx 1 root root 31 Sep  7 07:36 mysql_ -> /usr/share/munin/plugins/mysql_
lrwxrwxrwx 1 root root 36 Sep  7 07:36 mysql_bytes -> /usr/share/munin/plugins/mysql_bytes
lrwxrwxrwx 1 root root 37 Sep  7 07:36 mysql_innodb -> /usr/share/munin/plugins/mysql_innodb
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_isam_space_ -> /usr/share/munin/plugins/mysql_isam_space_
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_queries -> /usr/share/munin/plugins/mysql_queries
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_slowqueries -> /usr/share/munin/plugins/mysql_slowqueries
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_threads -> /usr/share/munin/plugins/mysql_threads
[root@opensourceecology plugins]# ls -lah /usr/share/munin/plugins/mysql_*
-rwxr-xr-x 1 root root  33K Mar  3  2017 /usr/share/munin/plugins/mysql_
-rwxr-xr-x 1 root root 1.8K Mar  3  2017 /usr/share/munin/plugins/mysql_bytes
-rwxr-xr-x 1 root root 5.4K Mar  3  2017 /usr/share/munin/plugins/mysql_innodb
-rwxr-xr-x 1 root root 5.7K Mar  3  2017 /usr/share/munin/plugins/mysql_isam_space_
-rwxr-xr-x 1 root root 2.5K Mar  3  2017 /usr/share/munin/plugins/mysql_queries
-rwxr-xr-x 1 root root 1.5K Mar  3  2017 /usr/share/munin/plugins/mysql_slowqueries
-rwxr-xr-x 1 root root 1.7K Mar  3  2017 /usr/share/munin/plugins/mysql_threads
[root@opensourceecology plugins]# /usr/share/munin/plugins/mysql_ suggest
bin_relay_log
commands
connections
files_tables
innodb_bpool
innodb_bpool_act
innodb_insert_buf
innodb_io
innodb_io_pend
innodb_log
innodb_rows
innodb_semaphores
innodb_tnx
myisam_indexes
network_traffic
qcache
qcache_mem
replication
select_types
slow
sorts
table_locks
tmp_tables
[root@opensourceecology plugins]#

I added all the mysql things

root@opensourceecology plugins]# ls -lah mysql_*
lrwxrwxrwx 1 root root 31 Sep  7 07:36 mysql_ -> /usr/share/munin/plugins/mysql_
lrwxrwxrwx 1 root root 36 Sep  7 07:36 mysql_bytes -> /usr/share/munin/plugins/mysql_bytes
lrwxrwxrwx 1 root root 37 Sep  7 07:36 mysql_innodb -> /usr/share/munin/plugins/mysql_innodb
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_isam_space_ -> /usr/share/munin/plugins/mysql_isam_space_
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_queries -> /usr/share/munin/plugins/mysql_queries
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_slowqueries -> /usr/share/munin/plugins/mysql_slowqueries
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_threads -> /usr/share/munin/plugins/mysql_threads
[root@opensourceecology plugins]# rm -rf mysql_*
[root@opensourceecology plugins]# ln -sf /usr/share/munin/plugins/mysql_ mysql_
[root@opensourceecology plugins]# for i in `./mysql_ suggest`; \
> do ln -sf /usr/share/munin/plugins/mysql_ $i; done
[root@opensourceecology plugins]# ls -lah mysql_*
lrwxrwxrwx 1 root root 31 Oct  8 08:06 mysql_ -> /usr/share/munin/plugins/mysql_
[root@opensourceecology plugins]# ls -lah commands
lrwxrwxrwx 1 root root 31 Oct  8 08:06 commands -> /usr/share/munin/plugins/mysql_
[root@opensourceecology plugins]#

according to this guide, munin needs a user that doesn't need any GRANTs to any databases, and that's sufficient http://www.mbrando.com/2007/08/06/how-to-get-your-mysql-munin-graphs-working/

create user munin@localhost identified by 'CHANGEME';
flush privileges;

and I added this stanza to /etc/munin/plugin-conf.d/zzz-ose

[mysql*]
user root
group wheel
env.mysqlopts -u munin_user -pOBFUSCATED

test worked

[root@opensourceecology plugins]# munin-run --debug mysql_queries
# Processing plugin configuration from /etc/munin/plugin-conf.d/amavis
# Processing plugin configuration from /etc/munin/plugin-conf.d/df
# Processing plugin configuration from /etc/munin/plugin-conf.d/fw_
# Processing plugin configuration from /etc/munin/plugin-conf.d/hddtemp_smartctl
# Processing plugin configuration from /etc/munin/plugin-conf.d/munin-node
# Processing plugin configuration from /etc/munin/plugin-conf.d/postfix
# Processing plugin configuration from /etc/munin/plugin-conf.d/postgres
# Processing plugin configuration from /etc/munin/plugin-conf.d/sendmail
# Processing plugin configuration from /etc/munin/plugin-conf.d/zzz-ose
# Setting /rgid/ruid/ to /99/0/
# Setting /egid/euid/ to /99 99 10/0/
# Setting up environment
# Environment mysqlopts = -u munin_user -pqd2qQiFdeNGepvhv5dsQx4rVt7pRyFJ
# About to run '/etc/munin/plugins/mysql_queries'
delete.value 837242
insert.value 896145
replace.value 1197242
select.value 148647861
update.value 1721521
cache_hits.value 0
[root@opensourceecology plugins]#

now for nginx, I confirmed that we do have the ability to spit out the status page

[root@opensourceecology plugins]# nginx -V 2>&1 | grep -o with-http_stub_status_module
with-http_stub_status_module
[root@opensourceecology plugins]#

I tried adding a block for '/nginx_status' only accessible to '127.0.0.1', but I still got 403'd when attempting to access it via curl on the local machine
the access logs showed it being accessed from an ipv6 address

2a01:4f8:172:209e::2 - - [08/Oct/2019:08:37:49 +0000] "GET /nginx_status HTTP/1.1" 403 162 "-" "curl/7.29.0" "-"

I guess it has to go out over eth0 because the server is necessarily bound to that ip (it's not bound to 127.0.0.1)
I used the following block

		# stats for munin
		location /nginx_status {
				stub_status on;
		access_log off;
				allow 127.0.0.1/32;
				allow 138.201.84.223/32;
				allow 138.201.84.243/32;
				allow ::1/128;
				allow 2a01:4f8:172:209e::2/128;
				allow fe80::921b:eff:fe94:7c4/128;
				deny all;
		}

and it worked!

[root@opensourceecology conf.d]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
[root@opensourceecology conf.d]# service nginx reload
Redirecting to /bin/systemctl reload nginx.service
[root@opensourceecology conf.d]# curl https://www.opensourceecology.org/nginx_status
Active connections: 1 
server accepts handled requests
 16063989 16063989 27383851 
Reading: 0 Writing: 1 Waiting: 0 
[root@opensourceecology conf.d]#

I found that my nginx module wouldn't work unless I installed the 'perl-LWP-Protocol-https' package

[root@opensourceecology plugins]# yum install perl-LWP-Protocol-https
...
Installed:
  perl-LWP-Protocol-https.noarch 0:6.04-4.el7                                                                                    

Dependency Installed:
  perl-Mozilla-CA.noarch 0:20130114-5.el7                                                                                        

Complete!
[root@opensourceecology plugins]#

I added nginx configs for both the wiki & osemain. If all is well, I'll add the configs for out other vhosts
I didn't bother with apache for now (also, the acl will be confusing since it sees all traffic coming from 127.0.0.1 via varnish)
meanwhile, some of the mysql graphs are populating. good!
and meanwhile, the rsync is still going; it's currently at "var/lib/mysql" copying or mysql databases' data. cool.
...
after a few hours, I checked-up on rsync; it was stuck again

var/www/html/wiki.opensourceecology.org/htdocs/images/archive/5/5f/20170722193549!CEBPressJuneGroup.fcstd
	 4840012 100%    2.56MB/s    0:00:01 (xfer#344966, to-check=1043/396314)
var/www/html/wiki.opensourceecology.org/htdocs/images/archive/5/5f/20170722195024!CEBPressJuneGroup.fcstd
	  950272  19%  879.62kB/s    0:00:04

the vpn client appears to have disconnected, and I can't ping the staging host at all from prod

[maltfield@opensourceecology ~]$ ping 10.241.189.11
PING 10.241.189.11 (10.241.189.11) 56(84) bytes of data.
^C
--- 10.241.189.11 ping statistics ---
59 packets transmitted, 0 received, 100% packet loss, time 57999ms

[maltfield@opensourceecology ~]$

I manually exited-out of the openvpn connection & reinitiated it; pings now work. After about 60 seconds, the rsync started outputting again..
when I went to check the size of the lxc container, I was told <1G, which can't be right

[root@osedev1 lxc]# du -sh /var/lib/lxc/osestaging1
604M    /var/lib/lxc/osestaging1
[root@osedev1 lxc]#

ncdu pointed me to the snap1 dir, which s currently 48G

[root@osedev1 lxc]# du -sh /var/lib/lxcsnaps/osestaging1/snap1
48G     /var/lib/lxcsnaps/osestaging1/snap1
[root@osedev1 lxc]#

apparently this is the consequence of restoring a snapshot just by doing a rsync; the snapshot's config file has a new line that identifies the rootfs path explicitly as the snapshot's rootfs

[root@osedev1 lxc]# tail /var/lib/lxc/osestaging1/config 
lxc.cap.drop = mac_admin
lxc.cap.drop = mac_override
lxc.cap.drop = setfcap
lxc.cap.drop = sys_module
lxc.cap.drop = sys_nice
lxc.cap.drop = sys_pacct
lxc.cap.drop = sys_rawio
lxc.cap.drop = sys_time
lxc.hook.clone = /usr/share/lxc/hooks/clonehostname
lxc.rootfs = /var/lib/lxcsnaps/osestaging1/snap1/rootfs
[root@osedev1 lxc]#

perhaps that means the actual dir is now my *real* snapshots data
while rsync continued, I noted that my nginx graphs are appearing, but there's no label that differentiates the wiki from osemain's graphs
I can see a list of variables defined by my plugin by default with the `munin-run <plugin> config` command https://munin.opensourceecology.org:4443/nginx-day.html

[root@opensourceecology plugins]# munin-run nginx_www.opensourceecology.org_status config
graph_title NGINX status
graph_args --base 1000
graph_category nginx
graph_vlabel Connections
total.label Active connections
total.info  Active connections
total.draw LINE2
reading.label Reading
reading.info  Reading
reading.draw LINE2
writing.label Writing
writing.info  Writing
writing.draw LINE2
waiting.label Waiting
waiting.info  Waiting
waiting.draw LINE2
[root@opensourceecology plugins]#

so it looks like I can set this as 'graph_title' or 'graph_info'
I restarted munin-node and triggered the munin-cron to update the html pages

[root@opensourceecology plugins]# service munin-node restart
Redirecting to /bin/systemctl restart munin-node.service
[root@opensourceecology plugins]# 
[root@opensourceecology plugins]# sudo -u munin /usr/bin/munin-cron

the new variables didn't affect anything, so I started grepping the logs
unrelated, the logs complained about mysql auth failure for:
1. network_traffic
2. select_types
3. innodb_tnx
4. innodb_log
5. sorts
6. myisam_indexes
7. qcache_mem
8. innodb_io
9. connections
10. qcache
11. innodb_insert_buf
12. replication
13. bin_relay_log
14. mysql_queries
15. innodb_rows
16. innodb_bpool_act
17. files_table
18. commands
19. innodb_bpool
20. tmp_tables
21. innodb_semaphores
22. innodb_io_pend
23. table_locks
24. slow
but there was nothing related to nginx
I tried overriding the graph_title in the plugins, but it didn't work
I found the datafile for munin in /var/lib/munin/datafile. This is clearly where the graph title is defined before being generated into html files

[root@opensourceecology plugins]# grep nginx /var/lib/munin/datafile | grep -i graph_title
localhost;localhost:nginx_wiki_opensourceecology_org_request.graph_title Nginx requests
localhost;localhost:nginx_wiki_opensourceecology_org_status.graph_title NGINX status
localhost;localhost:nginx_www_opensourceecology_org_status.graph_title NGINX status
localhost;localhost:nginx_www_opensourceecology_org_request.graph_title Nginx requests
[root@opensourceecology plugins]#

I found that I *could* override the title in /etc/muin/munin.conf https://www.aroundmyroom.com/2015/01/10/munin-help-needed/

[localhost]                                                                                                                      
	address 127.0.0.1                                                                                                            
	use_node_name yes                                                                                                            
	nginx_www_opensourceecology_org_status.graph_title Nginx Status (www.opensourceecology.org)                                  
	nginx_wiki_opensourceecology_org_status.graph_title Nginx Status (wiki.opensourceecology.org)

...
meanwhile, the rsync finished!

[maltfield@opensourceecology ~]$ [ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
...
var/www/html/www.opensourceecology.org/htdocs/wp-includes/widgets/class-wp-widget-text.php
	   20735 100%   21.05kB/s    0:00:00 (xfer#450852, to-check=0/517755)
var/yp/

sent 59229738371 bytes  received 11198208 bytes  2959309.47 bytes/sec
total size is 77965794338  speedup is 1.32
rsync warning: some files vanished before they could be transferred (code 24) at main.c(1052) [sender=3.0.9]

real    333m37.655s
user    19m50.292s
sys     6m0.997s
[maltfield@opensourceecology ~]$

but I still can't ssh into it; again, my home dir is owned by the wrong user

[root@osestaging1 ~]# ls -lah /home/maltfield/.ssh
total 16K
drwxr-xr-x.  2 tgriffing tgriffing 4.0K Jan 19  2018 .
drwx------. 10 tgriffing tgriffing 4.0K Oct  3 07:06 ..
-rw-r--r--.  1 root      root       750 Jun 20  2017 authorized_keys
-rw-r--r--.  1 tgriffing tgriffing 1.1K Oct  3 13:44 known_hosts
[root@osestaging1 ~]#

maybe I should add the '--numeric-ids' option if rsync is mapping the uids over?

[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --numeric-ids --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

I found that the 'sync.old' dir was still trying to sync, so I updated the command to add a wildcard after the exclude; it worked

[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --numeric-ids --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync* --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

this time the double-tap took only 3 minutes wall time

[maltfield@opensourceecology ~]$ [ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --numeric-ids --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync* --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
...
var/www/html/munin/static/zoom.js
		4760 100%    1.13MB/s    0:00:00 (xfer#2239, to-check=1002/321739)

sent 224884435 bytes  received 1668273 bytes  1352553.48 bytes/sec
total size is 41283867704  speedup is 182.23

real    2m46.967s
user    0m32.382s
sys     0m8.095s
[maltfield@opensourceecology ~]$

this time the permissions of my home dir didn't break, and I was able to ssh-in.
I'd like to take a snapshot of the staging server, but at this point we don't have space for it

[root@osedev1 lxc]# df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/sda1                      19G  3.4G   15G  19% /
devtmpfs                      873M     0  873M   0% /dev
tmpfs                         896M     0  896M   0% /dev/shm
tmpfs                         896M   17M  879M   2% /run
tmpfs                         896M     0  896M   0% /sys/fs/cgroup
/dev/mapper/ose_dev_volume_1  125G   94G   25G  80% /mnt/ose_dev_volume_1
tmpfs                         180M     0  180M   0% /run/user/1000
[root@osedev1 lxc]#

ok, now, drum roll: did we break the staging server? let's try to shut it down & start it again.
aaaaand: IT CAME BACK UP! Now it said its hostname isn't 'osestaging1' but 'opensourceecology'. Coolz.
I was successfully able to ssh into it, but then it froze. And my attempts to login to the lxc-console all end in timeouts

opensourceecology login: maltfield
Password: 
login: timed out after 60 seconds

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

opensourceecology login:

if I attempt to login as root, then it just times-out before it even asks me for a password

opensourceecology login: root
login: timed out after 60 seconds

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

opensourceecology login:

ssh auth suceeds, but it also fails before I get a shell

...
debug1: Authentication succeeded (publickey).
Authenticated to 10.241.189.11 ([10.241.189.11]:32415).
debug1: channel 0: new [client-session]
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug3: send packet: type 90
debug1: Requesting no-more-sessions@openssh.com
debug3: send packet: type 80
debug1: Entering interactive session.
debug1: pledge: network

I stopped the container again. This time when I tried to start it, I got an error

[root@osedev1 ~]# lxc-start -n -osestaging1
lxc-start: lxc_start.c: main: 290 Executing '/sbin/init' with no configuration file may crash the host
[root@osedev1 ~]#

I moved some dirs around so that I'm no longer using the 'rootfs' dir from the snaps dir, but now I get this damn message. duckducks are dead-ends

[root@osedev1 lxc]# lxc-start -n osestaging1
lxc-start: sync.c: __sync_wake: 74 sync wake failure : Broken pipe
lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
[root@osedev1 lxc]# lxc-start -P /var/lib/lxc/ -n osestaging1
lxc-start: sync.c: __sync_wake: 74 sync wake failure : Broken pipe
lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
[root@osedev1 lxc]#

I tried rebooting the dev server. after it came up, I still got the same error when attempting to `lxc-start`
I found I could get debug logs by adding `-l log -o <file>` https://github.com/lxc/lxc/issues/1555

[root@osedev1 ~]# lxc-start -n osestaging1 -l debug -o lxc-start.log
lxc-start: sync.c: __sync_wake: 74 sync wake failure : Broken pipe
lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
[root@osedev1 ~]# cat lxc-start.log
...

all the god damn google results on this "sync wake failure" shit (which are already few) are regarding configs of multiple containers sharing a network. I'll destroy the whole network namespace if needed. but how? why does nobody else encounter this damn issue?
well, I found the source code. could be an issue with an open file descriptor or something? https://fossies.org/linux/lxc/src/lxc/sync.c
my best guess is that it's an issue with the 'rootfs.dev' symlink

[root@osedev1 lxc]# ls -lah osestaging1
total 28K
drwxrwx---.  5 root root 4.0K Oct  8 16:17 .
drwxr-xr-x.  6 root root 4.0K Oct  8 16:05 ..
-rw-r--r--.  1 root root 1.1K Oct  8 15:46 config
drwxr-xr-x.  3 root root 4.0K Oct  8 15:46 dev
drwxr-xr-x.  2 root root 4.0K Oct  8 15:52 osestaging1
dr-xr-xr-x. 20 root root 4.0K Oct  8 15:21 rootfs
lrwxrwxrwx.  1 root root   38 Oct  8 16:17 rootfs.dev -> /dev/.lxc/osestaging1.72930b02843095eb
-rw-r--r--.  1 root root   19 Oct  3 15:40 ts
[root@osedev1 lxc]#

I commented-out every fucking line in the config file that had the word 'dev' in it...and the system started! Except that, umm, I couldn't connect to its console?

[root@osedev1 lxc]# lxc-start -n osestaging1 -f osestaging1/config -l trace -o lxc-start.log
Failed to create unit file /run/systemd/generator.late/netconsole.service: File exists
Failed to create unit file /run/systemd/generator.late/network.service: File exists
Running in a container, ignoring fstab device entry for /dev/disk/by-uuid/1e457b76-5100-4b53-bcdc-667ca122b941.
Running in a container, ignoring fstab device entry for /dev/mapper/ose_dev_volume_1.
Failed to create unit file /run/systemd/generator/systemd-cryptsetup@ose_dev_volume_1.service: File exists

lxc-start: console.c: lxc_console_peer_proxy_alloc: 315 console not set up

I found that if I commented-out the first line and added-back a rootfs line, I could get it to boot again, but I couldn't login from the console (same 60 second timeout) or ssh in (or ping it)

#lxc.mount.entry = /dev/net dev/net none bind,create=dir
...
lxc.rootfs = /var/lib/lxc/osestaging1/rootfs

I uncommented the first line, and it still started! looks like the issue was that I didn't explicitly define a rootfs..
this time I could ping the server from my laptop over the vpn
I was able to login as 'maltfield' from the console, but it locked-up when I tried to `sudo su -`
on the next reboot, tailed all the files in /var/log from the osedev1 server (inside the staging container's rootfs dir); I saw some interesting results

==> osestaging1/rootfs/var/log/messages <==
Oct  8 14:50:00 opensourceecology NET[248]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Oct  8 14:50:00 opensourceecology dhclient[201]: bound to 192.168.122.201 -- renewal in 1588 seconds.
Oct  8 14:50:00 opensourceecology network: Determining IP information for eth0... done.
Oct  8 14:50:00 opensourceecology network: [  OK  ]
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/kernel/yama/ptrace_scope': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '16' to '/proc/sys/kernel/sysrq': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/kernel/core_uses_pid': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/default/rp_filter': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/all/rp_filter': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv4/conf/default/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv4/conf/all/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/default/promote_secondaries': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/all/promote_secondaries': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/fs/protected_hardlinks': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/fs/protected_symlinks': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/autoconf': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_dad': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra_defrtr': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra_rtr_pref': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra_pinfo': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_redirects': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/forwarding': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/autoconf': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_dad': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra_defrtr': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra_rtr_pref': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra_pinfo': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_redirects': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/forwarding': Read-only file system
Oct  8 14:50:01 opensourceecology systemd: Started LSB: Bring up/down networking.

and issues with /run

Oct  8 14:50:05 opensourceecology systemd-logind: Failed to remove runtime directory /run/user/0: Device or resource busy

Mon Oct 07, 2019

I added a comment to our long-standing feature request with the Libre Office Online CODE project for the ability to draw lines & arrows in their online version of "present" https://bugs.documentfoundation.org/show_bug.cgi?id=113386#c4
wiki updates & logging
I tried to login to my hetzner cloud account, but I got "Account is disabled" fucking hell. so much for user-specific auditing. I logged-in with our shared account..
I confirmed that our osedev1 node has a 20G disk + 10G volume.
we currently are using 3.4/19G on osedev1; I never setup the 10G volume that appears to be at /mnt/HC_Volume_3110278. It has 10G avail

[maltfield@osedev1 ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        19G  3.4G   15G  19% /
devtmpfs        873M     0  873M   0% /dev
tmpfs           896M     0  896M   0% /dev/shm
tmpfs           896M   25M  871M   3% /run
tmpfs           896M     0  896M   0% /sys/fs/cgroup
/dev/sdb        9.8G   37M  9.3G   1% /mnt/HC_Volume_3110278
tmpfs           180M     0  180M   0% /run/user/1000
[maltfield@osedev1 ~]$ ls -lah /mnt/HC_Volume_3110278/
total 24K
drwxr-xr-x. 3 root root 4.0K Aug 20 11:50 .
drwxr-xr-x. 3 root root 4.0K Aug 20 12:16 ..
drwx------. 2 root root  16K Aug 20 11:50 lost+found
[maltfield@osedev1 ~]$

the disk RAID1'd disk on prod is 197G with 75G used

[maltfield@opensourceecology ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        197G   75G  113G  40% /
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G  8.0K   32G   1% /dev/shm
tmpfs            32G  2.6G   29G   9% /run
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/md1        488M  289M  174M  63% /boot
tmpfs           6.3G     0  6.3G   0% /run/user/0
tmpfs           6.3G     0  6.3G   0% /run/user/1005
[maltfield@opensourceecology ~]$

a quick duckduck pulled up this guide for using luks to create an encrypted volume out of hetzner block volumes; this is a good idea https://angristan.xyz/how-to-use-encrypted-block-storage-volumes-hetzner-cloud/
the guide shows a method for resizing the encrypted volume. I didn't think that would be trivial, but it appears that resize2fs can increase the size of a luks-encrypted volume without issue. this is good to know. if we run out of space (or maybe we create a second staging node or ad-hoc dev nodes), we should be able to shutdown all our lxc containers, unmount the block drive, resize it, and remount it. That said, I don't think we'll be making backups of these (dev/staging) containers, so if we fuck up it would be bad.
our 10G hetzner cloud block volume has been costing 0.48 EUR/mo = 5.76 EUR/yr
the min needed for our current prod server is 75G. The slider on the product page has weird increments, but the actual "resize volume" option in the cloud console wui permits resizing in 1G increments. A 75G volume would cost 3.00 EUR/mo = 35 EUR/yr
A much more sane choice would be equal to the disk on prod = 197G = 7.88 EUR/mo = 94.56 EUR/yr
fuck, I asked Marcin for $100/yr. Currently we're spending 2.49/mo on the osedev1 instance alone. That's 29.88 EUR/yr = 32.81 USD/yr. For a 100 USD/yr budget, that leaves 67.19 USD for disk space = 61.19 EUR/yr. That's 5.09 EUR/mo, which will buy us a 127G volume at 5.08 EUR/mo.
127/197 = 0.64. Therefore, a 127G block volume will allow for an lxc staging node to replicate our prod node until our prod node grows beyond 64% capacity. 70% is a good general high-water-mark at which we'd need to look at migrating prod anyway. This (127G) seems like a resonable low-budget solution that meets the 100 USD/yr line.
I resized our 10G 'ose-dev-volume-1' volume to 127G in the hetzner WUI.
I clicked the 'enable protection' option, which prevents it from being deleted until the protection is manually removed
the 'show configuration' window in the wui tells us that the volume is '/dev/disk/by-id/scsi-0HC_Volume_3110278' on osedev1
the box itself looks like it's really /dev/sdb

[maltfield@osedev1 ~]$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=893568k,nr_inodes=223392,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset,clone_children)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda1 on / type ext4 (rw,relatime,seclabel,data=ordered)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11033)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
/dev/sdb on /mnt/HC_Volume_3110278 type ext4 (rw,relatime,seclabel,discard,data=ordered)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=183308k,mode=700,uid=1000,gid=1000)
[maltfield@osedev1 ~]$

but the other name appears in fstab

[root@osedev1 ~]# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Sun Jul 14 04:14:25 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=1e457b76-5100-4b53-bcdc-667ca122b941 /                       ext4    defaults        1 1
/dev/disk/by-id/scsi-0HC_Volume_3110278 /mnt/HC_Volume_3110278 ext4 discard,nofail,defaults 0 0
[root@osedev1 ~]#

ah, indeed, the above disk is just a link back to /dev/sdb

[root@osedev1 ~]# ls -lah /dev/disk/by-id/scsi-0HC_Volume_3110278 
lrwxrwxrwx. 1 root root 9 Oct  7 10:31 /dev/disk/by-id/scsi-0HC_Volume_3110278 -> ../../sdb
[root@osedev1 ~]#

before I rebuild this volume, the cryptfs command begs the question: where do I store the key?
1. assuming I want the server to be able to restart by itself without user interaction, the key should probably be stored in a file somewhere on '/root' on 'osedev1' but while my OS would lock-down the permissions to that file, the key file itself would likely be stored unencrypted on some hetzner drive somewhere. Is it worth encrypting the contents of the block volume when the encryption key itself might be stored unencrypted somewhere at hetzner's datacenter?
as a test, I ran `testdisk` to see if I could find any deleted files in the 10G volume that hetzner gave us from previous customers; I couldn't.
someone asked about this, but there wasn't much great discussion on how hetzner provisions their disks https://serverfault.com/questions/950790/cloud-server-vulnerability-analysis?noredirect=1
so risk assessment: when working in a cloud, we have to accept the integrity of the cloud provider. If a rogue hetzner employee wants to steal all our data, they can. There's absolutely nothing we can do about that other than building the servers ourselves and physically locking them down. The decision to use hetzner predates me, but I agree with it. It does not make sense for OSE to buy a server rack and host our equipment at FeF. So, I accept the risk and trust that hetzner not do something malicious that will put our data at risk
the real concern here is that we resize our volume (or hetzner in the background shuffles some abstracted blocks around physical devices that's black-boxed to us), and a different customer suddently gets, for example, our user's PII in their new volume. Or a malicious hetzner cloud user triggers some shuffling and is successfully able to exfiltrate our data from their cloud without breaking into our server. This is the risk that we're trying to prevent. In this case, I think it *is* worthwhile to encrypt our block volume. The chances that someone is able to get chunks of our data from an old 127G block volume that lacked encryption is significantly higher than them able to get those *and* the key from our server *and* be able to use the key to extract meaninful data from the likely non-contiguious bits that may be extracted from our recycled block volume data.
hetzner does not have a clean record, but hardly anybody does. This is only customer data, though. Not the their customer's server contents data https://mybroadband.co.za/news/cloud-hosting/279181-hetzner-client-data-exposed-after-attack.html
so, while recognizing that it has limitations, I also recognize that there are sufficient benefits to justfy encrypting this block volume with a key stored unencrypted on our cloud instance
meanwhile, I found a guide for how to migrate the contents of /var to a block volume. It suggested doing so from a resuce disk, then editing fstab for the next reboot https://serverfault.com/questions/947732/how-to-add-hetzner-cloud-disk-volume-to-extend-var-partition
I created a new key file on my laptop, stored it in our shared keepass, and uploaded it to the server at /root/keys/ose-dev-volume-1.201910.key
let's shutdown osedev1 and migrate its /var/ to a block volume. First I'll shutdown the osestagng1 staging lxc container then the host osedev1

[root@osedev1 ~]# lxc-stop -n osestaging1
[root@osedev1 ~]# shutdown -h now
Connection to 195.201.233.113 closed by remote host.
Connection to 195.201.233.113 closed.
user@ose:~$

I confirmed that the server was off in the hetzner cloud console wui
I clicked on the server. I'm not clear if I should mount a rescue disk or click the "rescue" option. No idea what the latter is, so I navigated to "ISO IMAGES", found SystemRescueCD, and clicked the "MOUNT" button next to it. I went back to the "servers"# I added a comment to our long-standing feature request with the Libre Office Online CODE project for the ability to draw lines & arrows in their online version of "present" https://bugs.documentfoundation.org/show_bug.cgi?id=113386#c4
wiki updates & logging
I tried to login to my hetzner cloud account, but I got "Account is disabled" fucking hell. so much for user-specific auditing. I logged-in with our shared account..
I confirmed that our osedev1 node has a 20G disk + 10G volume.
we currently are using 3.4/19G on osedev1; I never setup the 10G volume that appears to be at /mnt/HC_Volume_3110278. It has 10G avail

[maltfield@osedev1 ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        19G  3.4G   15G  19% /
devtmpfs        873M     0  873M   0% /dev
tmpfs           896M     0  896M   0% /dev/shm
tmpfs           896M   25M  871M   3% /run
tmpfs           896M     0  896M   0% /sys/fs/cgroup
/dev/sdb        9.8G   37M  9.3G   1% /mnt/HC_Volume_3110278
tmpfs           180M     0  180M   0% /run/user/1000
[maltfield@osedev1 ~]$ ls -lah /mnt/HC_Volume_3110278/
total 24K
drwxr-xr-x. 3 root root 4.0K Aug 20 11:50 .
drwxr-xr-x. 3 root root 4.0K Aug 20 12:16 ..
drwx------. 2 root root  16K Aug 20 11:50 lost+found
[maltfield@osedev1 ~]$

the disk RAID1'd disk on prod is 197G with 75G used

[maltfield@opensourceecology ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        197G   75G  113G  40% /
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G  8.0K   32G   1% /dev/shm
tmpfs            32G  2.6G   29G   9% /run
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/md1        488M  289M  174M  63% /boot
tmpfs           6.3G     0  6.3G   0% /run/user/0
tmpfs           6.3G     0  6.3G   0% /run/user/1005
[maltfield@opensourceecology ~]$

a quick duckduck pulled up this guide for using luks to create an encrypted volume out of hetzner block volumes; this is a good idea https://angristan.xyz/how-to-use-encrypted-block-storage-volumes-hetzner-cloud/
the guide shows a method for resizing the encrypted volume. I didn't think that would be trivial, but it appears that resize2fs can increase the size of a luks-encrypted volume without issue. this is good to know. if we run out of space (or maybe we create a second staging node or ad-hoc dev nodes), we should be able to shutdown all our lxc containers, unmount the block drive, resize it, and remount it. That said, I don't think we'll be making backups of these (dev/staging) containers, so if we fuck up it would be bad.
our 10G hetzner cloud block volume has been costing 0.48 EUR/mo = 5.76 EUR/yr
the min needed for our current prod server is 75G. The slider on the product page has weird increments, but the actual "resize volume" option in the cloud console wui permits resizing in 1G increments. A 75G volume would cost 3.00 EUR/mo = 35 EUR/yr
A much more sane choice would be equal to the disk on prod = 197G = 7.88 EUR/mo = 94.56 EUR/yr
fuck, I asked Marcin for $100/yr. Currently we're spending 2.49/mo on the osedev1 instance alone. That's 29.88 EUR/yr = 32.81 USD/yr. For a 100 USD/yr budget, that leaves 67.19 USD for disk space = 61.19 EUR/yr. That's 5.09 EUR/mo, which will buy us a 127G volume at 5.08 EUR/mo.
127/197 = 0.64. Therefore, a 127G block volume will allow for an lxc staging node to replicate our prod node until our prod node grows beyond 64% capacity. 70% is a good general high-water-mark at which we'd need to look at migrating prod anyway. This (127G) seems like a resonable low-budget solution that meets the 100 USD/yr line.
I resized our 10G 'ose-dev-volume-1' volume to 127G in the hetzner WUI.
I clicked the 'enable protection' option, which prevents it from being deleted until the protection is manually removed
the 'show configuration' window in the wui tells us that the volume is '/dev/disk/by-id/scsi-0HC_Volume_3110278' on osedev1
the box itself looks like it's really /dev/sdb

[maltfield@osedev1 ~]$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=893568k,nr_inodes=223392,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset,clone_children)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda1 on / type ext4 (rw,relatime,seclabel,data=ordered)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11033)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
/dev/sdb on /mnt/HC_Volume_3110278 type ext4 (rw,relatime,seclabel,discard,data=ordered)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=183308k,mode=700,uid=1000,gid=1000)
[maltfield@osedev1 ~]$

but the other name appears in fstab

[root@osedev1 ~]# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Sun Jul 14 04:14:25 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=1e457b76-5100-4b53-bcdc-667ca122b941 /                       ext4    defaults        1 1
/dev/disk/by-id/scsi-0HC_Volume_3110278 /mnt/HC_Volume_3110278 ext4 discard,nofail,defaults 0 0
[root@osedev1 ~]#

ah, indeed, the above disk is just a link back to /dev/sdb

[root@osedev1 ~]# ls -lah /dev/disk/by-id/scsi-0HC_Volume_3110278 
lrwxrwxrwx. 1 root root 9 Oct  7 10:31 /dev/disk/by-id/scsi-0HC_Volume_3110278 -> ../../sdb
[root@osedev1 ~]#

before I rebuild this volume, the cryptfs command begs the question: where do I store the key?
1. assuming I want the server to be able to restart by itself without user interaction, the key should probably be stored in a file somewhere on '/root' on 'osedev1' but while my OS would lock-down the permissions to that file, the key file itself would likely be stored unencrypted on some hetzner drive somewhere. Is it worth encrypting the contents of the block volume when the encryption key itself might be stored unencrypted somewhere at hetzner's datacenter?
as a test, I ran `testdisk` to see if I could find any deleted files in the 10G volume that hetzner gave us from previous customers; I couldn't.
someone asked about this, but there wasn't much great discussion on how hetzner provisions their disks https://serverfault.com/questions/950790/cloud-server-vulnerability-analysis?noredirect=1
so risk assessment: when working in a cloud, we have to accept the integrity of the cloud provider. If a rogue hetzner employee wants to steal all our data, they can. There's absolutely nothing we can do about that other than building the servers ourselves and physically locking them down. The decision to use hetzner predates me, but I agree with it. It does not make sense for OSE to buy a server rack and host our equipment at FeF. So, I accept the risk and trust that hetzner not do something malicious that will put our data at risk
the real concern here is that we resize our volume (or hetzner in the background shuffles some abstracted blocks around physical devices that's black-boxed to us), and a different customer suddently gets, for example, our user's PII in their new volume. Or a malicious hetzner cloud user triggers some shuffling and is successfully able to exfiltrate our data from their cloud without breaking into our server. This is the risk that we're trying to prevent. In this case, I think it *is* worthwhile to encrypt our block volume. The chances that someone is able to get chunks of our data from an old 127G block volume that lacked encryption is significantly higher than them able to get those *and* the key from our server *and* be able to use the key to extract meaninful data from the likely non-contiguious bits that may be extracted from our recycled block volume data.
hetzner does not have a clean record, but hardly anybody does. This is only customer data, though. Not the their customer's server contents data https://mybroadband.co.za/news/cloud-hosting/279181-hetzner-client-data-exposed-after-attack.html
so, while recognizing that it has limitations, I also recognize that there are sufficient benefits to justfy encrypting this block volume with a key stored unencrypted on our cloud instance
meanwhile, I found a guide for how to migrate the contents of /var to a block volume. It suggested doing so from a resuce disk, then editing fstab for the next reboot https://serverfault.com/questions/947732/how-to-add-hetzner-cloud-disk-volume-to-extend-var-partition
I created a new key file on my laptop, stored it in our shared keepass, and uploaded it to the server at /root/keys/ose-dev-volume-1.201910.key
let's shutdown osedev1 and migrate its /var/ to a block volume. First I'll shutdown the osestagng1 staging lxc container then the host osedev1

[root@osedev1 ~]# lxc-stop -n osestaging1
[root@osedev1 ~]# shutdown -h now
Connection to 195.201.233.113 closed by remote host.
Connection to 195.201.233.113 closed.
user@ose:~$

I confirmed that the server was off in the hetzner cloud console wui
I clicked on the server. I'm not clear if I should mount a rescue disk or click the "rescue" option. No idea what the latter is, so I navigated to "ISO IMAGES", found SystemRescueCD, and clicked the "MOUNT" button next to it. I went back to the "servers" page, opened a console for 'osedev1', and clicked "Power on"
the console showed the boot options for the rescue cd. I choose the first menu item = "SystemRescueCd: default boot options"
I can't copy & paste from the console, but I basically found 5x items in /dev/disk/by-id/
1. the DVD for systemrescue
2. my 127G block volume with the same name shown above (scsi-0HC_Volume_3110278 )
3. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0
4. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0-part1
5. another DVD?
so 3 & 4 must be our osedev1 disk. Both are 19.1G
attempting to mount the one without '-part1' failed, but the one with '-part1' succeeded, and all my data was there. It was mounted to '/mnt/osedev1-part/'
I formatted the new 127G ebs volume using cryptsetup

cryptsetup luksFormat /dev/disk/by-id/scsi-0HC_Volume_211278 /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key

I opened the new encrypted luks volume and created its ext4 partition

cryptsetup luksOpen --key-file /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key /dev/disk/by-id/scsi-0HC_Volume_211278 ebs
mkfs.ext4 -j /dev/mapper/ebs

I mounted the new FS & began a sync the osedev1's 'var' dir (now only 2.3G) to it

mkdir /mnt/ebs
mount /dev/mapper/ebs /mnt/ebs
rsync -av --progress /mnt/osedev1/var /mnt/ebs/

I added entries for fstab & crypttab to auto-mount the volume to /mnt/ose_dev_volume_1/
I moved the existing /var/ dir to /var.old and made a symlink from /var/ to /mnt/ose_dev_volume_1/var
I safely umounted & closed all the disks and shutdown
I removed the systemrescue iso from the server and started it up again
I was able to ssh-in, and the new '/var/' dir *appeared* to be setup properly

[maltfield@osedev1 /]$ ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[maltfield@osedev1 /]$ ls -lah /var/
total 80K
drwxr-xr-x. 19 root root 4.0K Jul 14 06:18 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 ..
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 adm
drwxr-xr-x.  7 root root 4.0K Oct  2 14:24 cache
drwxr-xr-x.  2 root root 4.0K Apr 24 16:03 crash
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 db
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 empty
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 games
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 gopher
drwxr-xr-x.  3 root root 4.0K Jul 14 06:14 kerberos
drwxr-xr-x. 34 root root 4.0K Oct  2 15:34 lib
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 local
lrwxrwxrwx.  1 root root   11 Jul 14 06:14 lock -> ../run/lock
drwxr-xr-x. 11 root root 4.0K Oct  7 13:49 log
lrwxrwxrwx.  1 root root   10 Jul 14 06:14 mail -> spool/mail
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 nis
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 opt
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 preserve
lrwxrwxrwx.  1 root root    6 Jul 14 06:14 run -> ../run
drwxr-xr-x.  8 root root 4.0K Oct  3 08:06 spool
drwxrwxrwt.  4 root root 4.0K Oct  7 13:49 tmp
-rw-r--r--.  1 root root  163 Jul 14 06:14 .updated
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 yp
[maltfield@osedev1 /]$

but I immediately noticed that, for exaple, screen wasn't working

[maltfield@osedev1 /]$ screen -S ebs
Cannot make directory '/var/run/screen': No such file or directory
[maltfield@osedev1 /]$

oh, damn, '/var/run' is a relative symlink to '../run' which won't work

[maltfield@osedev1 /]$ ls -lah /var/run
lrwxrwxrwx. 1 root root 6 Jul 14 06:14 /var/run -> ../run
[maltfield@osedev1 /]$

I made it an absolute symlink instead

[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]#

it still fails, but everything looks ok; I gave the system a reboot

[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]#

when the system came back up, `screen` had no issues, and everything looked good.

[maltfield@osedev1 ~]$ screen -ls
There is a screen on:
		4362.ebs        (Attached)
1 Socket in /var/run/screen/S-maltfield.

[maltfield@osedev1 ~]$ sudo su -
Last login: Mon Oct  7 13:54:28 CEST 2019 on pts/0
[root@osedev1 ~]# df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/sda1                      19G  3.4G   15G  19% /
devtmpfs                      873M     0  873M   0% /dev
tmpfs                         896M     0  896M   0% /dev/shm
tmpfs                         896M   17M  879M   2% /run
tmpfs                         896M     0  896M   0% /sys/fs/cgroup
/dev/mapper/ose_dev_volume_1  125G  2.5G  116G   3% /mnt/ose_dev_volume_1
tmpfs                         180M     0  180M   0% /run/user/1000
[root@osedev1 ~]# ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[root@osedev1 ~]# ls -lah /mnt/ose_dev_volume_1/
total 28K
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:46 ..
drwx------.  2 root root  16K Oct  7 13:18 lost+found
drwxr-xr-x. 19 root root 4.0K Oct  7 13:54 var
[root@osedev1 ~]#

I started the staging server, connected to the vpn from my laptop, and was successfully able to ssh into it (though it took a long delay)
I ssh'd into prod and kicked-off the rsync!

time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

that also copied the old backups, which is probably unnecessary. I should also exclude
1. home/b2user/sync
this sync is going at a rate of about 1G every 5 minutes. I expect it'll be done in 5-10 hours. I'll check on it tomorrow. page, opened a console for 'osedev1', and clicked "Power on"
the console showed the boot options for the rescue cd. I choose the first menu item = "SystemRescueCd: default boot options"
I can't copy & paste from the console, but I basically found 5x items in /dev/disk/by-id/
1. the DVD for systemrescue
2. my 127G block volume with the same name shown above (scsi-0HC_Volume_3110278 )
3. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0
4. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0-part1
5. another DVD?
so 3 & 4 must be our osedev1 disk. Both are 19.1G
attempting to mount the one without '-part1' failed, but the one with '-part1' succeeded, and all my data was there. It was mounted to '/mnt/osedev1-part/'
I formatted the new 127G ebs volume using cryptsetup

cryptsetup luksFormat /dev/disk/by-id/scsi-0HC_Volume_211278 /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key

I opened the new encrypted luks volume and created its ext4 partition

cryptsetup luksOpen --key-file /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key /dev/disk/by-id/scsi-0HC_Volume_211278 ebs
mkfs.ext4 -j /dev/mapper/ebs

I mounted the new FS & began a sync the osedev1's 'var' dir (now only 2.3G) to it

mkdir /mnt/ebs
mount /dev/mapper/ebs /mnt/ebs
rsync -av --progress /mnt/osedev1/var /mnt/ebs/

I added entries for fstab & crypttab to auto-mount the volume to /mnt/ose_dev_volume_1/
I moved the existing /var/ dir to /var.old and made a symlink from /var/ to /mnt/ose_dev_volume_1/var
I safely umounted & closed all the disks and shutdown
I removed the systemrescue iso from the server and started it up again
I was able to ssh-in, and the new '/var/' dir *appeared* to be setup properly

[maltfield@osedev1 /]$ ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[maltfield@osedev1 /]$ ls -lah /var/
total 80K
drwxr-xr-x. 19 root root 4.0K Jul 14 06:18 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 ..
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 adm
drwxr-xr-x.  7 root root 4.0K Oct  2 14:24 cache
drwxr-xr-x.  2 root root 4.0K Apr 24 16:03 crash
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 db
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 empty
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 games
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 gopher
drwxr-xr-x.  3 root root 4.0K Jul 14 06:14 kerberos
drwxr-xr-x. 34 root root 4.0K Oct  2 15:34 lib
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 local
lrwxrwxrwx.  1 root root   11 Jul 14 06:14 lock -> ../run/lock
drwxr-xr-x. 11 root root 4.0K Oct  7 13:49 log
lrwxrwxrwx.  1 root root   10 Jul 14 06:14 mail -> spool/mail
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 nis
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 opt
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 preserve
lrwxrwxrwx.  1 root root    6 Jul 14 06:14 run -> ../run
drwxr-xr-x.  8 root root 4.0K Oct  3 08:06 spool
drwxrwxrwt.  4 root root 4.0K Oct  7 13:49 tmp
-rw-r--r--.  1 root root  163 Jul 14 06:14 .updated
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 yp
[maltfield@osedev1 /]$

but I immediately noticed that, for exaple, screen wasn't working

[maltfield@osedev1 /]$ screen -S ebs
Cannot make directory '/var/run/screen': No such file or directory
[maltfield@osedev1 /]$

oh, damn, '/var/run' is a relative symlink to '../run' which won't work

[maltfield@osedev1 /]$ ls -lah /var/run
lrwxrwxrwx. 1 root root 6 Jul 14 06:14 /var/run -> ../run
[maltfield@osedev1 /]$

I made it an absolute symlink instead

[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]#

it still fails, but everything looks ok; I gave the system a reboot

[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]#

when the system came back up, `screen` had no issues, and everything looked good.

[maltfield@osedev1 ~]$ screen -ls
There is a screen on:
		4362.ebs        (Attached)
1 Socket in /var/run/screen/S-maltfield.

[maltfield@osedev1 ~]$ sudo su -
Last login: Mon Oct  7 13:54:28 CEST 2019 on pts/0
[root@osedev1 ~]# df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/sda1                      19G  3.4G   15G  19% /
devtmpfs                      873M     0  873M   0% /dev
tmpfs                         896M     0  896M   0% /dev/shm
tmpfs                         896M   17M  879M   2% /run
tmpfs                         896M     0  896M   0% /sys/fs/cgroup
/dev/mapper/ose_dev_volume_1  125G  2.5G  116G   3% /mnt/ose_dev_volume_1
tmpfs                         180M     0  180M   0% /run/user/1000
[root@osedev1 ~]# ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[root@osedev1 ~]# ls -lah /mnt/ose_dev_volume_1/
total 28K
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:46 ..
drwx------.  2 root root  16K Oct  7 13:18 lost+found
drwxr-xr-x. 19 root root 4.0K Oct  7 13:54 var
[root@osedev1 ~]#

I started the staging server, connected to the vpn from my laptop, and was successfully able to ssh into it (though it took a long delay)
I ssh'd into prod and kicked-off the rsync!

time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

that also copied the old backups, which is probably unnecessary. I should also exclude
1. home/b2user/sync
this sync is going at a rate of about 1G every 5 minutes. I expect it'll be done in 5-10 hours. I'll check on it tomorrow.

Sat Oct 05, 2019

email

Fri Oct 04, 2019

email

Thr Oct 03, 2019

continuing from yesterday, I copied the dev-specific encryption key from our shared keepass for the backups to the dev node

[root@osedev1 backups]# mv /home/maltfield/ose-dev-backups-cron.201910.key /root/backups/
[root@osedev1 backups]# chown root:root ose-dev-backups-cron.201910.key 
[root@osedev1 backups]# chmod 0400 ose-dev-backups-cron.201910.key 
[root@osedev1 backups]# ls -lah 
total 32K
drwxr-xr-x. 4 root root 4.0K Oct  3 07:09 .
dr-xr-x---. 7 root root 4.0K Oct  3 07:03 ..
-rw-r--r--. 1 root root  747 Oct  2 15:57 backup.settings
-rwxr-xr-x. 1 root root 5.7K Oct  3 07:03 backup.sh
drwxr-xr-x. 3 root root 4.0K Sep  9 09:02 iptables
-r--------. 1 root root 4.0K Oct  3 07:05 ose-dev-backups-cron.201910.key
drwxr-xr-x. 2 root root 4.0K Oct  3 07:04 sync
[root@osedev1 backups]#

note that I also had to install `trickle` on the dev node

[root@osedev1 backups]# ./backup.sh
================================================================================
INFO: Beginning Backup Run on 20191003_051037
INFO: Cleaning up old backup files
...
INFO: moving encrypted backup file to b2user's sync dir
INFO: Beginning upload to backblaze b2
sudo: /bin/trickle: command not found

real    0m0.030s
user    0m0.009s
sys     0m0.021s
[root@osedev1 backups]# yum install trickle
...
Installed:
  trickle.x86_64 0:1.07-19.el7                                                                                                 

Complete!
[root@osedev1 backups]#

note that something changed in the install process of the b2cli that required me to use the '--user' flag, which changed the path to the b2 binary. To keep the mods to the backup.sh script minimal, I just created a symlink

[root@osedev1 backups]# ./backup.sh
...
+ echo 'INFO: Beginning upload to backblaze b2'
INFO: Beginning upload to backblaze b2
+ /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev120191003_051511.tar.gpg daily_osedev120191003_051511.tar.gpg
trickle: exec(): No such file or directory

real    0m0.040s
user    0m0.012s
sys     0m0.020s
+ exit 0
[root@osedev1 backups]# /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev120191003_051511.tar.gpg daily_osedev120191003_051511.tar.gpg
trickle: exec(): No such file or directory
[root@osedev1 b2user]# ln -s /home/b2user/.local/bin/b2 /home/b2user/virtualenv/bin/b2
[root@osedev1 b2user]#

the backup script still failed at the upload to b2

[root@osedev1 backups]# ./backup.sh
...
INFO: Beginning upload to backblaze b2
+ /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev1_20191003_052059.tar.gpg daily_osedev1_20191003_052059.tar.gpg
ERROR: Missing account data: 'NoneType' object has no attribute 'getitem'  Use: b2 authorize-account

real    0m0.363s
user    0m0.281s
sys     0m0.076s
+ exit 0
[root@osedev1 b2user]# 
[root@osedev1 b2user]# /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev1_20191003_052059.tar.gpg daily_osedev1_20191003_052059.tar.gpg
ERROR: Missing account data: 'NoneType' object has no attribute 'getitem'  Use: b2 authorize-account
[root@osedev1 b2user]#

per the error, I used `b2 authorize-account` and added my creds for the user 'b2user'

[root@osedev1 b2user]# su - b2user
Last login: Wed Oct  2 16:15:28 CEST 2019 on pts/8
[b2user@osedev1 ~]$ .local/bin/b2 authorize-account
Using https://api.backblazeb2.com
Backblaze application key ID: XXXXXXXXXXXXXXXXXXXXXXXXX
Backblaze application key: 
[b2user@osedev1 ~]$

this time the backup succeeded!

[root@osedev1 b2user]# /root/backups/backup.sh
...
INFO: moving encrypted backup file to b2user's sync dir
+ /bin/mv /root/backups/sync/daily_osedev1_20191003_052448.tar.gpg /home/b2user/sync/daily_osedev1_20191003_052448.tar.gpg
+ /bin/chown b2user /home/b2user/sync/daily_osedev1_20191003_052448.tar.gpg
+ echo 'INFO: Beginning upload to backblaze b2'
INFO: Beginning upload to backblaze b2
+ /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev1_20191003_052448.tar.gpg daily_osedev1_20191003_052448.tar.gpg
URL by file name: https://f001.backblazeb2.com/file/ose-dev-server-backups/daily_osedev1_20191003_052448.tar.gpg
URL by fileId: https://f001.backblazeb2.com/b2api/v2/b2_download_file_by_id?fileId=4_z2675c17c55dd1d696edd0118_f1082387e9ca2c0d4_d20191003_m052459_c001_v0001109_t0038
{ 
  "action": "upload",
  "fileId": "4_z2675c17c55dd1d696edd0118_f1082387e9ca2c0d4_d20191003_m052459_c001_v0001109_t0038",
  "fileName": "daily_osedev1_20191003_052448.tar.gpg",
  "size": 17233113,
  "uploadTimestamp": 1570080299000
}

real    0m26.435s
user    0m0.706s
sys     0m0.251s
+ exit 0
[root@osedev1 b2user]#

as an out-of-band restore validation, I downloaded the 17.2M backup file from the backblaze b2 wui onto my laptop
again, I downloaded the encryption key from our shared keepass

user@disp5653:~/Downloads$ gpg --batch --passphrase-file ose-dev-backups-cron.201910.key --output daily_osedev1_20191003_052448.tar ose-dev-backups-cron.201910.key 
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
gpg: no valid OpenPGP data found.
gpg: processing message failed: Unknown system error
user@disp5653:~/Downloads$ gpg --batch --passphrase-file ose-dev-backups-cron.201910.key --output daily_osedev1_20191003_052448.tar daily_osedev1_20191003_052448.tar.gpg 
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
gpg: AES256 encrypted data
gpg: encrypted with 1 passphrase
user@disp5653:~/Downloads$ tar -xf daily_osedev1_20191003_052448.tar 
user@disp5653:~/Downloads$ ls
daily_osedev1_20191003_052448.tar      ose-dev-backups-cron.201910.key
daily_osedev1_20191003_052448.tar.gpg  root
user@disp5653:~/Downloads$ find root/backups/sync/daily_osedev1_20191003_052448/ -type f
root/backups/sync/daily_osedev1_20191003_052448/www/www.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/root/root.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/log/log.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/etc/etc.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/home/home.20191003_052448.tar.gz
user@disp5653:~/Downloads$

it looks like it's working; here's the contents of the backup file (note there's some varnish config files on here from when I did my test rsync back in on Sep 9th Maltfield_Log/2019_Q3#Mon_Sep_09.2C_2019

user@disp5653:~/Downloads$ find root/backups/sync/daily_osedev1_20191003_052448/ -type f -exec tar -tvf '{}' \; | awk '{print $6}' | cut -d/ -f 1-2 | sort -u
etc/adjtime
etc/aliases
etc/alternatives
etc/anacrontab
etc/audisp
etc/audit
etc/bash_completion.d
etc/bashrc
etc/binfmt.d
etc/centos-release
etc/centos-release-upstream
etc/chkconfig.d
etc/chrony.conf
etc/chrony.keys
etc/cloud
etc/cron.d
etc/cron.daily
etc/cron.deny
etc/cron.hourly
etc/cron.monthly
etc/crontab
etc/cron.weekly
etc/crypttab
etc/csh.cshrc
etc/csh.login
etc/dbus-1
etc/default
etc/depmod.d
etc/dhcp
etc/DIR_COLORS
etc/DIR_COLORS.256color
etc/DIR_COLORS.lightbgcolor
etc/dnsmasq.conf
etc/dnsmasq.d
etc/dracut.conf
etc/dracut.conf.d
etc/e2fsck.conf
etc/environment
etc/ethertypes
etc/exports
etc/exports.d
etc/filesystems
etc/firewalld
etc/fstab
etc/gcrypt
etc/GeoIP.conf
etc/GeoIP.conf.default
etc/gnupg
etc/GREP_COLORS
etc/groff
etc/group
etc/group-
etc/grub2.cfg
etc/grub.d
etc/gshadow
etc/gshadow-
etc/gss
etc/gssproxy
etc/host.conf
etc/hostname
etc/hosts
etc/hosts.allow
etc/hosts.deny
etc/idmapd.conf
etc/init.d
etc/inittab
etc/inputrc
etc/iproute2
etc/iscsi
etc/issue
etc/issue.net
etc/kdump.conf
etc/kernel
etc/krb5.conf
etc/krb5.conf.d
etc/ld.so.cache
etc/ld.so.conf
etc/ld.so.conf.d
etc/libaudit.conf
etc/libnl
etc/libuser.conf
etc/libvirt
etc/locale.conf
etc/localtime
etc/login.defs
etc/logrotate.conf
etc/logrotate.d
etc/lvm
etc/lxc
etc/machine-id
etc/magic
etc/makedumpfile.conf.sample
etc/man_db.conf
etc/mke2fs.conf
etc/modprobe.d
etc/modules-load.d
etc/motd
etc/mtab
etc/netconfig
etc/NetworkManager
etc/networks
etc/nfs.conf
etc/nfsmount.conf
etc/nsswitch.conf
etc/nsswitch.conf.bak
etc/numad.conf
etc/openldap
etc/openvpn
etc/opt
etc/os-release
etc/pam.d
etc/passwd
etc/passwd-
etc/pkcs11
etc/pki
etc/pm
etc/polkit-1
etc/popt.d
etc/ppp
etc/prelink.conf.d
etc/printcap
etc/profile
etc/profile.d
etc/protocols
etc/python
etc/qemu-ga
etc/radvd.conf
etc/rc0.d
etc/rc1.d
etc/rc2.d
etc/rc3.d
etc/rc4.d
etc/rc5.d
etc/rc6.d
etc/rc.d
etc/rc.local
etc/redhat-release
etc/request-key.conf
etc/request-key.d
etc/resolv.conf
etc/rpc
etc/rpm
etc/rsyncd.conf
etc/rsyslog.conf
etc/rsyslog.d
etc/rwtab
etc/rwtab.d
etc/sasl2
etc/screenrc
etc/securetty
etc/security
etc/selinux
etc/services
etc/sestatus.conf
etc/shadow
etc/shadow-
etc/shells
etc/skel
etc/ssh
etc/ssl
etc/statetab
etc/statetab.d
etc/subgid
etc/subuid
etc/sudo.conf
etc/sudoers
etc/sudoers.d
etc/sudo-ldap.conf
etc/sysconfig
etc/sysctl.conf
etc/sysctl.d
etc/systemd
etc/system-release
etc/system-release-cpe
etc/tcsd.conf
etc/terminfo
etc/timezone
etc/tmpfiles.d
etc/trickled.conf
etc/tuned
etc/udev
etc/unbound
etc/varnish
etc/vconsole.conf
etc/vimrc
etc/virc
etc/wpa_supplicant
etc/X11
etc/xdg
etc/xinetd.d
etc/yum
etc/yum.conf
etc/yum.repos.d
home/b2user
home/maltfield
root/anaconda-ks.cfg
root/backups
root/Finished
root/original-ks.cfg
root/Package
root/pki
root/Running
var/log
user@disp5653:~/Downloads$

and a true end-to-end test, I restored the sshd_config file

user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ pwd
/home/user/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ date
Thu Oct  3 11:37:49 +0545 2019
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ ls
etc.20191003_052448.tar.gz
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ tar -xzf etc.20191003_052448.tar.gz 
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ tail etc/ssh/sshd_config

# override default of no subsystems
Subsystem	sftp	/usr/libexec/openssh/sftp-server

# Example of overriding settings on a per-user basis
#Match User anoncvs
#	X11Forwarding no
#	AllowTcpForwarding no
#	PermitTTY no
#	ForceCommand cvs server
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$

I also copied the cron job and the backup report script to the dev node

[root@opensourceecology ~]# cat /etc/cron.d/backup_to_backblaze 
20 07 * * * root time /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log
20 04 03 * * root time /bin/nice /root/backups/backupReport.sh
[root@opensourceecology ~]#

I tried testing the backup report script, but it complained that the `mail` command was absent. otherwise it appears to be working without modifications

[root@osedev1 backups]# ./backupReport.sh 
./backupReport.sh: line 90: /usr/bin/mail: No such file or directory
INFO: email body below
ATTENTION: BACKUPS MISSING!


WARNING: First of this month's backup (20191001) is missing!
WARNING: First of last month's backup (20190901) is missing!
WARNING: Yesterday's backup (20191002) is missing!
WARNING: The day before yesterday's backup (20191001) is missing!

See below for the contents of the backblaze b2 bucket = ose-dev-server-backups

daily_osedev1_20191003_052448.tar.gpg
---
Note: This report was generated on 20191003_060036 UTC by script '/root/backups/backupReport.sh'
	  This script was triggered by '/etc/cron.d/backup_to_backblaze'

	  For more information about OSE backups, please see the relevant documentation pages on the wiki:
	   * https://wiki.opensourceecology.org/wiki/Backblaze
	   * https://wiki.opensourceecology.org/wiki/OSE_Server#Backups

[root@osedev1 backups]#

I installed mailx and re-ran the script

[root@osedev1 backups]# yum install mailx
...
Installed:
  mailx.x86_64 0:12.5-19.el7                                                                                                   

Complete!
[root@osedev1 backups]#

this time it failed because sendmail is not installed; I *could* install postfix, but I decided just to install sendmail

[root@osedev1 backups]# ./backupReport.sh 
...
 /usr/sbin/sendmail: No such file or directory
"/root/dead.letter" 30/1215
. . . message not sent.
[root@osedev1 backups]# rpm -qa | grep postfix
[root@osedev1 backups]# rpm -qa | grep exim
[root@osedev1 backups]# yum install sendmail
...
Installed:
  sendmail.x86_64 0:8.14.7-5.el7                                                                                               

Dependency Installed:
  hesiod.x86_64 0:3.2.1-3.el7                                 procmail.x86_64 0:3.22-36.el7_4.1                                

Complete!
[root@osedev1 backups]#

this time it ran without error, but I never got an email. this is probably because gmail is rejecting it; we don't have DNS setup properly for this server to send mail. Anyway, this is good enough for our dev node's backups for now.
I also added the same lifecycle rules that we have for the 'ose-server-backups' bucket to the 'ose-dev-server-backups' bucket in the backblaze b2 wui
let's proceed with getting openvpn clients configured for the prod node (and its clone the staging node, which will use the same client cert)
as I did on Sep 9 to create my client cert for 'maltfield', I created a new cert for 'hetzner2' Maltfield_Log/2019_Q3#Mon_Sep_09.2C_2019
again, the ca and cert files are located in /usr/share/easy-rsa/3/pki/
1. I documented this dir on the wiki OpenVPN
interestingly, I could only execute these command from the dir above the pki dir

[root@osedev1 pki]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full hetzner2

Easy-RSA error:

EASYRSA_PKI does not exist (perhaps you need to run init-pki)?
Expected to find the EASYRSA_PKI at: /usr/share/easy-rsa/3/pki/pki
Run easyrsa without commands for usage and command help.

[root@osedev1 pki]#
[root@osedev1 pki]# cd ..
[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full hetzner2

Using SSL: openssl OpenSSL 1.0.2k-fips  26 Jan 2017
Generating a 2048 bit RSA private key
.......................................................................+++
............................................+++
writing new private key to '/usr/share/easy-rsa/3/pki/private/hetzner2.key.7F3A32KzES'
Enter PEM pass phrase:

note I appended the option 'nopass' so that the hetzner2 prod server could connect to the vpn using a private certificate file only & automatically, without requiring a password (it may be a good idea to look into if we can whitelist a specific IP for this user, since this hetzner2 client will only connect from the prod or staging server's static ip addresses)

[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa help build-client-full

  build-client-full <filename_base> [ cmd-opts ]
  build-server-full <filename_base> [ cmd-opts ]
  build-serverClient-full <filename_base> [ cmd-opts ]
	  Generate a keypair and sign locally for a client and/or server

	  This mode uses the <filename_base> as the X509 CN.

	  cmd-opts is an optional set of command options from this list:

		nopass  - do not encrypt the private key (default is encrypted)
[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full hetzner2 nopass

Using SSL: openssl OpenSSL 1.0.2k-fips  26 Jan 2017
Generating a 2048 bit RSA private key
..................................................................................................+++
.....+++
writing new private key to '/usr/share/easy-rsa/3/pki/private/hetzner2.key.qQ1HGf7ovg'
-----
Using configuration from /usr/share/easy-rsa/3/pki/safessl-easyrsa.cnf
Enter pass phrase for /usr/share/easy-rsa/3/pki/private/ca.key:
Check that the request matches the signature
Signature ok
The Subject's Distinguished Name is as follows
commonName            :ASN.1 12:'hetzner2'
Certificate is to be certified until Sep 17 06:42:28 2022 GMT (1080 days)

Write out database with 1 new entries
Data Base Updated
[root@osedev1 3]#

I copied the necessary files to the prod server

[root@osedev1 3]# cp pki/private/hetzner2.key /home/maltfield/
[root@osedev1 3]# cp pki/issued/hetzner2.crt /home/maltfield/
[root@osedev1 3]# cp pki/private/ta.key /home/maltfield/
[root@osedev1 3]# cp pki/ca.crt /home/maltfield/
[root@osedev1 3]# chown maltfield /home/maltfield/*.cert
[root@osedev1 3]# chown maltfield /home/maltfield/*.key
[root@osedev1 3]# logout
[maltfield@osedev1 ~]$ scp -P32415 /home/maltfield/hetzner2* opensourceecology.org:
hetzner2.crt                                                                                 100% 5675     2.8MB/s   00:00    
hetzner2.key                                                                                 100% 1708     1.0MB/s   00:00    
[maltfield@osedev1 ~]$ scp -P32415 /home/maltfield/*.key opensourceecology.org:
hetzner2.key                                                                                 100% 1708     1.0MB/s   00:00    
ta.key                                                                                       100%  636   368.9KB/s   00:00    
[maltfield@osedev1 ~]$ shred -u /home/maltfield/*.key
[maltfield@osedev1 ~]$ shred -u /home/maltfield/hetzner2.*
[maltfield@osedev1 ~]$

and I moved them to '/root/openvpn' and locked-down the files on the prod hetzner2 server

[root@opensourceecology maltfield]# cd /root
[root@opensourceecology ~]# ls
backups  bin  iptables  output.json  rsyncTest  sandbox  staging.opensourceecology.org  tmp
[root@opensourceecology ~]# mkdir openvpn
[root@opensourceecology ~]# cd openvpn
[root@opensourceecology openvpn]# mv /home/maltfield/hetzner2* .
[root@opensourceecology openvpn]# mv /home/maltfield/*.key .
[root@opensourceecology openvpn]# mv /home/maltfield/ca.crt .
[root@opensourceecology openvpn]# ls -lah
total 28K
drwxr-xr-x   2 root      root      4.0K Oct  3 06:53 .
dr-xr-x---. 20 root      root      4.0K Oct  3 06:53 ..
-rw-------   1 maltfield maltfield 3.3K Oct  3 06:51 ca.crt
-rw-------   1 maltfield maltfield 5.6K Oct  3 06:51 hetzner2.crt
-rw-------   1 maltfield maltfield 1.7K Oct  3 06:51 hetzner2.key
-rw-------   1 maltfield maltfield  636 Oct  3 06:51 ta.key
[root@opensourceecology openvpn]# chown root:root *
[root@opensourceecology openvpn]# ls -lah
total 28K
drwxr-xr-x   2 root root 4.0K Oct  3 06:53 .
dr-xr-x---. 20 root root 4.0K Oct  3 06:53 ..
-rw-------   1 root root 3.3K Oct  3 06:51 ca.crt
-rw-------   1 root root 5.6K Oct  3 06:51 hetzner2.crt
-rw-------   1 root root 1.7K Oct  3 06:51 hetzner2.key
-rw-------   1 root root  636 Oct  3 06:51 ta.key
[root@opensourceecology openvpn]# chmod 0700 .
[root@opensourceecology openvpn]# ls -lah
total 28K
drwx------   2 root root 4.0K Oct  3 06:53 .
dr-xr-x---. 20 root root 4.0K Oct  3 06:53 ..
-rw-------   1 root root 3.3K Oct  3 06:51 ca.crt
-rw-------   1 root root 5.6K Oct  3 06:51 hetzner2.crt
-rw-------   1 root root 1.7K Oct  3 06:51 hetzner2.key
-rw-------   1 root root  636 Oct  3 06:51 ta.key
[root@opensourceecology openvpn]#

then I created a client.conf file from my personal client.conf file & modified it to use the new cert & key files

[root@opensourceecology openvpn]# vim client.conf
[root@opensourceecology openvpn]# ls -lah client.conf 
-rw-r--r-- 1 root root 3.6K Oct  3 06:56 client.conf
[root@opensourceecology openvpn]# chmod 0600 client.conf 
[root@opensourceecology openvpn]# cat client.conf 
##############################################
# Sample client-side OpenVPN 2.0 config file #
# for connecting to multi-client server.     #
#                                            #
# This configuration can be used by multiple #
# clients, however each client should have   #
# its own cert and key files.                #
#                                            #
# On Windows, you might want to rename this  #
# file so it has a .ovpn extension           #
##############################################

# Specify that we are a client and that we
# will be pulling certain config file directives
# from the server.
client

# Use the same setting as you are using on
# the server.
# On most systems, the VPN will not function
# unless you partially or fully disable
# the firewall for the TUN/TAP interface.
;dev tap
dev tun

# Windows needs the TAP-Win32 adapter name
# from the Network Connections panel
# if you have more than one.  On XP SP2,
# you may need to disable the firewall
# for the TAP adapter.
;dev-node MyTap

# Are we connecting to a TCP or
# UDP server?  Use the same setting as
# on the server.
;proto tcp
proto udp

# The hostname/IP and port of the server.
# You can have multiple remote entries
# to load balance between the servers.
remote 195.201.233.113 1194
;remote my-server-2 1194

# Choose a random host from the remote
# list for load-balancing.  Otherwise
# try hosts in the order specified.
;remote-random

# Keep trying indefinitely to resolve the
# host name of the OpenVPN server.  Very useful
# on machines which are not permanently connected
# to the internet such as laptops.
resolv-retry infinite

# Most clients don't need to bind to
# a specific local port number.
nobind

# Downgrade privileges after initialization (non-Windows only)
;user nobody
;group nobody

# Try to preserve some state across restarts.
persist-key
persist-tun

# If you are connecting through an
# HTTP proxy to reach the actual OpenVPN
# server, put the proxy server/IP and
# port number here.  See the man page
# if your proxy server requires
# authentication.
;http-proxy-retry # retry on connection failures
;http-proxy [proxy server] [proxy port #]

# Wireless networks often produce a lot
# of duplicate packets.  Set this flag
# to silence duplicate packet warnings.
;mute-replay-warnings

# SSL/TLS parms.
# See the server config file for more
# description.  It's best to use
# a separate .crt/.key file pair
# for each client.  A single ca
# file can be used for all clients.
ca ca.crt
cert hetzner2.crt
key hetzner2.key

# Verify server certificate by checking that the
# certicate has the correct key usage set.
# This is an important precaution to protect against
# a potential attack discussed here:
#  http://openvpn.net/howto.html#mitm
#
# To use this feature, you will need to generate
# your server certificates with the keyUsage set to
#   digitalSignature, keyEncipherment
# and the extendedKeyUsage to
#   serverAuth
# EasyRSA can do this for you.
remote-cert-tls server

# If a tls-auth key is used on the server
# then every client must also have the key.
tls-auth ta.key 1

# Select a cryptographic cipher.
# If the cipher option is used on the server
# then you must also specify it here.
# Note that v2.4 client/server will automatically
# negotiate AES-256-GCM in TLS mode.
# See also the ncp-cipher option in the manpage
cipher AES-256-GCM

# Enable compression on the VPN link.
# Don't enable this unless it is also
# enabled in the server config file.
#comp-lzo

# Set log file verbosity.
verb 3

# Silence repeating messages
;mute 20

# hardening
tls-cipher TLS-DHE-RSA-WITH-AES-256-GCM-SHA384
[root@opensourceecology openvpn]#

I installed the 'openvpn' package on the production hetzner2 server

[root@opensourceecology openvpn]# yum install openvpn
...
Installed:
  openvpn.x86_64 0:2.4.7-1.el7                                                                           

Dependency Installed:
  lz4.x86_64 0:1.7.5-3.el7                       pkcs11-helper.x86_64 0:1.11-3.el7                      

Complete!
[root@opensourceecology openvpn]#

I was successfully able to connect to the vpn on the dev node from the prod node

[root@opensourceecology openvpn]# openvpn client.conf 
Thu Oct  3 07:06:45 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
Thu Oct  3 07:06:45 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 07:06:45 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:06:45 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:06:45 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:06:45 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 07:06:45 2019 UDP link local: (not bound)
Thu Oct  3 07:06:45 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:06:45 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=865b6fa1 7dcf4731
Thu Oct  3 07:06:45 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 07:06:45 2019 VERIFY KU OK
Thu Oct  3 07:06:45 2019 Validating certificate extended key usage
Thu Oct  3 07:06:45 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 07:06:45 2019 VERIFY EKU OK
Thu Oct  3 07:06:45 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 07:06:45 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 07:06:45 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 07:06:46 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 07:06:46 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 0,cipher AES-256-GCM'
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 07:06:46 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:06:46 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:06:46 2019 ROUTE_GATEWAY 138.201.84.193
Thu Oct  3 07:06:46 2019 TUN/TAP device tun0 opened
Thu Oct  3 07:06:46 2019 TUN/TAP TX queue length set to 100
Thu Oct  3 07:06:46 2019 /sbin/ip link set dev tun0 up mtu 1500
Thu Oct  3 07:06:46 2019 /sbin/ip addr add dev tun0 local 10.241.189.10 peer 10.241.189.9
Thu Oct  3 07:06:46 2019 /sbin/ip route add 10.241.189.1/32 via 10.241.189.9
Thu Oct  3 07:06:46 2019 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Thu Oct  3 07:06:46 2019 Initialization Sequence Completed

the prod server now has a tun0 interface with an ip address of 10.241.189.10 on the VPN private network subnet

[root@opensourceecology ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host 
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
	link/ether 90:1b:0e:94:07:c4 brd ff:ff:ff:ff:ff:ff
	inet 138.201.84.223 peer 138.201.84.193/32 brd 138.201.84.223 scope global eth0
	   valid_lft forever preferred_lft forever
	inet 138.201.84.223/32 scope global eth0
	   valid_lft forever preferred_lft forever
	inet 138.201.84.243/16 scope global eth0
	   valid_lft forever preferred_lft forever
	inet 138.201.84.243 peer 138.201.84.193/32 brd 138.201.255.255 scope global secondary eth0
	   valid_lft forever preferred_lft forever
	inet6 2a01:4f8:172:209e::2/64 scope global 
	   valid_lft forever preferred_lft forever
	inet6 fe80::921b:eff:fe94:7c4/64 scope link 
	   valid_lft forever preferred_lft forever
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
	link/none 
	inet 10.241.189.10 peer 10.241.189.9/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::a834:c77a:f65f:76fc/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@opensourceecology ~]#

I confirmed that the website didn't break ☺
now I created the same dir on the staging node (note this weird systemd journal corruption error that slowed things down quite a bit)

[root@osedev1 ~]# lxc-start -n osestaging1
systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to CentOS Linux 7 (Core)!
...
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

osestaging1 login: maltfield
Password:
Last login: Wed Oct  2 13:01:56 on lxc/console
[maltfield@osestaging1 ~]$ sudo su -
[sudo] password for maltfield:

<44>systemd-journald[297]: File /run/log/journal/dd9978e8797e4112832634fa4d174c7b/system.journal corrupted or uncleanly shut down, renaming and replacing.
Last login: Wed Oct  2 13:15:46 UTC 2019 on lxc/console
Last failed login: Thu Oct  3 07:11:57 UTC 2019 on lxc/console
There was 1 failed login attempt since the last successful login.
[root@osestaging1 ~]#

on the dev node again

[root@osedev1 pki]# cp private/hetzner2.key /home/maltfield/
[root@osedev1 pki]# cp issued/hetzner2.crt /home/maltfield/
[root@osedev1 pki]# cp private/ta.key /home/maltfield/
[root@osedev1 pki]# chown maltfield /home/maltfield/*.key
[root@osedev1 pki]# chown maltfield /home/maltfield/*.crt
[root@osedev1 pki]# logout
[maltfield@osedev1 ~]$ scp -P 32415 /home/maltfield/*.key 192.168.122.201:
hetzner2.key                                                                       100% 1708     2.4MB/s   00:00    
ta.key                                                                             100%  636     1.2MB/s   00:00    
[maltfield@osedev1 ~]$ scp -P 32415 /home/maltfield/*.crt 192.168.122.201:
ca.crt                                                                             100% 1850     2.6MB/s   00:00    
hetzner2.crt                                                                       100% 5675     9.0MB/s   00:00    
[maltfield@osedev1 ~]$ shred -u /home/maltfield/*.key
[maltfield@osedev1 ~]$ shred -u /home/maltfield/*.crt
[maltfield@osedev1 ~]$

and back on the staging container node

[root@osestaging1 ~]# cd /root/openvpn 
[root@osestaging1 openvpn]# ls 
[root@osestaging1 openvpn]# mv /home/maltfield/*.crt .
[root@osestaging1 openvpn]# mv /home/maltfield/*.key .
[root@osestaging1 openvpn]# ls -lah
total 28K
drwxr-xr-x. 2 root      root      4.0K Oct  3 07:23 .
dr-xr-x---. 3 root      root      4.0K Oct  3 07:18 ..
-rw-------. 1 maltfield maltfield 1.9K Oct  3 07:21 ca.crt
-rw-------. 1 maltfield maltfield 5.6K Oct  3 07:21 hetzner2.crt
-rw-------. 1 maltfield maltfield 1.7K Oct  3 07:21 hetzner2.key
-rw-------. 1 maltfield maltfield  636 Oct  3 07:21 ta.key
[root@osestaging1 openvpn]# chown root:root *
[root@osestaging1 openvpn]# chmod 0700 .
[root@osestaging1 openvpn]# ls -lah
total 28K
drwx------. 2 root root 4.0K Oct  3 07:23 .
dr-xr-x---. 3 root root 4.0K Oct  3 07:18 ..
-rw-------. 1 root root 1.9K Oct  3 07:21 ca.crt
-rw-------. 1 root root 5.6K Oct  3 07:21 hetzner2.crt
-rw-------. 1 root root 1.7K Oct  3 07:21 hetzner2.key
-rw-------. 1 root root  636 Oct  3 07:21 ta.key
[root@osestaging1 openvpn]#

I also installed vim, epel-release, and openvpn on the staging node
I had an issue connecting to to the vpn from within the staging node; this appears to be an issue for trying to connect to a vpn from within a docker or lxc container https://serverfault.com/questions/429461/no-tun-device-in-lxc-guest-for-openvpn

[root@osestaging1 openvpn]# openvpn client.conf 
Thu Oct  3 07:29:17 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
Thu Oct  3 07:29:17 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 07:29:17 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:29:17 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:29:17 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:29:17 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 07:29:17 2019 UDP link local: (not bound)
Thu Oct  3 07:29:17 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:29:17 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=f2e8fcad efdb9311
Thu Oct  3 07:29:17 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 07:29:17 2019 VERIFY KU OK
Thu Oct  3 07:29:17 2019 Validating certificate extended key usage
Thu Oct  3 07:29:17 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 07:29:17 2019 VERIFY EKU OK
Thu Oct  3 07:29:17 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 07:29:17 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 07:29:17 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 07:29:18 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 07:29:18 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 0,cipher AES-256-GCM'
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 07:29:18 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:29:18 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:29:18 2019 ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Thu Oct  3 07:29:18 2019 ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
Thu Oct  3 07:29:18 2019 Exiting due to fatal error
[root@osestaging1 openvpn]#

the above link suggests following the arch linux guide to create an openvpn client systemd module within the container

[root@osestaging1 openvpn]# ls /usr/lib/systemd/system/openvpn-client\@.service
/usr/lib/systemd/system/openvpn-client@.service
[root@osestaging1 openvpn]# ls /etc/systemd/system/
basic.target.wants  default.target.wants  local-fs.target.wants    sysinit.target.wants
default.target      getty.target.wants    multi-user.target.wants  system-update.target.wants
[root@osestaging1 openvpn]# cp /usr/lib/systemd/system/openvpn-client\@.service /etc/systemd/system/
[root@osestaging1 openvpn]# grep /etc/systemd/system/openvpn-client\@.service LimitNPROC
grep: LimitNPROC: No such file or directory
[root@osestaging1 openvpn]# grep LimitNPROC /etc/systemd/system/openvpn-client\@.service
LimitNPROC=10
[root@osestaging1 openvpn]# vim /etc/systemd/system/openvpn-client\@.service
[root@osestaging1 openvpn]# grep LimitNPROC /etc/systemd/system/openvpn-client\@.service
#LimitNPROC=10
[root@osestaging1 openvpn]#

that didn't work; it wants something after the '@' I did that, and realized that I'll need to further modify it with the correct config file

[root@osestaging1 openvpn]# cd /etc/systemd/system
[root@osestaging1 system]# ls
basic.target.wants    getty.target.wants       openvpn-client@.service
default.target        local-fs.target.wants    sysinit.target.wants
default.target.wants  multi-user.target.wants  system-update.target.wants
[root@osestaging1 system]# mv openvpn-client\@.service openvpn-client\@dev.service 
[root@osestaging1 system]# systemctl status openvpn-client\@dev.service 
● openvpn-client@dev.service - OpenVPN tunnel for dev
   Loaded: loaded (/etc/systemd/system/openvpn-client@dev.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
[root@osestaging1 system]# systemctl start openvpn-client\@dev.service 
Job for openvpn-client@dev.service failed because the control process exited with error code. See "systemctl status openvpn-client@dev.service" and "journalctl -xe" for details.
[root@osestaging1 system]# systemctl status openvpn-client\@dev.service 
● openvpn-client@dev.service - OpenVPN tunnel for dev
   Loaded: loaded (/etc/systemd/system/openvpn-client@dev.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-10-03 07:44:09 UTC; 16s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
  Process: 557 ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config %i.conf (code=exited, status=1/FAILURE)
 Main PID: 557 (code=exited, status=1/FAILURE)

Oct 03 07:44:08 osestaging1 systemd[1]: Starting OpenVPN tunnel for dev...
Oct 03 07:44:09 osestaging1 openvpn[557]: Options error: In [CMD-LINE]:1: Error opening configuration file: dev.conf
Oct 03 07:44:09 osestaging1 openvpn[557]: Use --help for more information.
Oct 03 07:44:09 osestaging1 systemd[1]: openvpn-client@dev.service: main process exited, code=exited, status=...ILURE
Oct 03 07:44:09 osestaging1 systemd[1]: Failed to start OpenVPN tunnel for dev.
Oct 03 07:44:09 osestaging1 systemd[1]: Unit openvpn-client@dev.service entered failed state.
Oct 03 07:44:09 osestaging1 systemd[1]: openvpn-client@dev.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@osestaging1 system]# vim openvpn-client\@dev.service

I updated the working dir and changed the service name to match the name of the config file in there

[root@osestaging1 system]# cat openvpn-client\@dev.service
[Unit]
Description=OpenVPN tunnel for %I
After=syslog.target network-online.target
Wants=network-online.target
Documentation=man:openvpn(8)
Documentation=https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
Documentation=https://community.openvpn.net/openvpn/wiki/HOWTO

[Service]
Type=notify
PrivateTmp=true
WorkingDirectory=/etc/openvpn/client
ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config %i.conf
CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_ADMIN CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_DAC_OVERRIDE
#LimitNPROC=10
DeviceAllow=/dev/null rw   
DeviceAllow=/dev/net/tun rw
ProtectSystem=true
ProtectHome=true
KillMode=process

[Install]
WantedBy=multi-user.target 
[root@osestaging1 system]# vim openvpn-client\@dev.service
[root@osestaging1 system]# mv openvpn-client\@dev.service openvpn-client\@client.service 
[root@osestaging1 system]# cat openvpn-client\@client.service 
[Unit]
Description=OpenVPN tunnel for %I
After=syslog.target network-online.target
Wants=network-online.target
Documentation=man:openvpn(8)
Documentation=https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
Documentation=https://community.openvpn.net/openvpn/wiki/HOWTO

[Service]
Type=notify
PrivateTmp=true
#WorkingDirectory=/etc/openvpn/client
WorkingDirectory=/root/openvpn
ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config %i.conf
CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_ADMIN CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_DAC_OVERRIDE
#LimitNPROC=10
DeviceAllow=/dev/null rw
DeviceAllow=/dev/net/tun rw
ProtectSystem=true
ProtectHome=true
KillMode=process

[Install]
WantedBy=multi-user.target
[root@osestaging1 system]#

this failed; I gave up and went with manually creating the tun interface per the guide, even though someone else commented taht this would no longer work; it worked!

[root@osestaging1 openvpn]# openvpn client.conf
Thu Oct  3 08:02:50 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20
 2019
Thu Oct  3 08:02:50 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 08:02:50 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:02:50 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:02:50 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:02:50 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 08:02:50 2019 UDP link local: (not bound)
Thu Oct  3 08:02:50 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:02:50 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=10846fe0 74bf0345
Thu Oct  3 08:02:50 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 08:02:50 2019 VERIFY KU OK
Thu Oct  3 08:02:50 2019 Validating certificate extended key usage
Thu Oct  3 08:02:50 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 08:02:50 2019 VERIFY EKU OK
Thu Oct  3 08:02:50 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 08:02:50 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 08:02:50 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 08:02:51 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:02:51 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 0,cipher AES-256-GCM'
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 08:02:51 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:02:51 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:02:51 2019 ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Thu Oct  3 08:02:51 2019 ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
Thu Oct  3 08:02:51 2019 Exiting due to fatal error
[root@osestaging1 openvpn]# mkdir /dev/net
[root@osestaging1 openvpn]# mknod /dev/net/tun c 10 200
[root@osestaging1 openvpn]# chmod 666 /dev/net/tun
[root@osestaging1 openvpn]# openvpn client.conf
Thu Oct  3 08:03:42 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
Thu Oct  3 08:03:42 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 08:03:42 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:03:42 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:03:42 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:03:42 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 08:03:42 2019 UDP link local: (not bound)
Thu Oct  3 08:03:42 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:03:42 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=dcadaef9 7ebea8f1
Thu Oct  3 08:03:42 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 08:03:42 2019 VERIFY KU OK
Thu Oct  3 08:03:42 2019 Validating certificate extended key usage
Thu Oct  3 08:03:42 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 08:03:42 2019 VERIFY EKU OK
Thu Oct  3 08:03:42 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 08:03:42 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 08:03:42 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 08:03:43 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:03:48 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:03:53 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:03:59 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:04 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:09 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:15 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:20 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:25 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:30 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:35 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:41 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:46 2019 No reply from server after sending 12 push requests
Thu Oct  3 08:04:46 2019 SIGUSR1[soft,no-push-reply] received, process restarting
Thu Oct  3 08:04:46 2019 Restart pause, 5 second(s)
Thu Oct  3 08:04:51 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:04:51 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 08:04:51 2019 UDP link local: (not bound)
Thu Oct  3 08:04:51 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:04:51 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=c3f6bcfa 04f701bb
Thu Oct  3 08:04:51 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 08:04:51 2019 VERIFY KU OK
Thu Oct  3 08:04:51 2019 Validating certificate extended key usage
Thu Oct  3 08:04:51 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 08:04:51 2019 VERIFY EKU OK
Thu Oct  3 08:04:51 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 08:04:51 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 08:04:51 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 08:04:53 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:53 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 1,cipher AES-256-GCM'
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 08:04:53 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:04:53 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:04:53 2019 ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Thu Oct  3 08:04:53 2019 TUN/TAP device tun0 opened
Thu Oct  3 08:04:53 2019 TUN/TAP TX queue length set to 100
Thu Oct  3 08:04:53 2019 /sbin/ip link set dev tun0 up mtu 1500
Thu Oct  3 08:04:53 2019 /sbin/ip addr add dev tun0 local 10.241.189.10 peer 10.241.189.9
Thu Oct  3 08:04:53 2019 /sbin/ip route add 10.241.189.1/32 via 10.241.189.9
Thu Oct  3 08:04:53 2019 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Thu Oct  3 08:04:53 2019 Initialization Sequence Completed

I found that I've become stuck in a lxc console since the escape keyboard sequence uses the same keystroke as screen (ctrl-a). the solution is to define an alternate escape sequence (ie: ctrl-e) using `-e'^e'` https://serverfault.com/questions/567696/byobu-how-to-disconnect-from-lxc-console

[root@osedev1 ~]# lxc-console -e '^e' -n osestaging1

Connected to tty 1
				  Type <Ctrl+e q> to exit the console, <Ctrl+e Ctrl+e> to enter Ctrl+e itself

[root@osedev1 ~]#

I also had to change the tty to 0 to actually get access

[root@osedev1 ~]# lxc-console -e '^e' -n osestaging1 -t 0
lxc_container: commands.c: lxc_cmd_console: 724 Console 0 invalid, busy or all consoles busy.
																							 [root@osedev1 ~]# 
[root@osedev1 ~]#

I went ahead and connected to the vpn from 3x clients: my laptop, the staging container, and the prod server
oddly, I noticed that the ip address given to the staging server and the prod server were the same (they do use the same client cert, but I expected them to have a distinct ip address

user@ose:~/openvpn$ ip address show dev tun0
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.6 peer 10.241.189.5/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::2ab6:3617:63cc:c654/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
user@ose:~/openvpn$

[root@opensourceecology openvpn]# ip address show dev tun0
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
	link/none 
	inet 10.241.189.10 peer 10.241.189.9/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::a834:c77a:f65f:76fc/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@opensourceecology openvpn]#

[root@osestaging1 ~]# ip address show dev tun0
2: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.10 peer 10.241.189.9/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::5e8c:3af2:2e6:4aea/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osestaging1 ~]#

I noticed a few relevant options to our openvpn server config
1. by default, I have 'ifconfig-pool-persist ipp.txt' defined, which makes clients have the same ip address persistently across the server's reboots; we appear to be using '/etc/openvpn/ipp.txt' here. The one in the 'server' dir appears to be from earlier, probably when I started the server manually rather than through systemd. Interestingly, this isn't even right! From above, we see that my 'maltfield' user has '.6' while the 'hetzner2' users have '.10'. Hmm.

[root@osedev1 server]# grep -iB5 ipp server.conf
# Maintain a record of client <-> virtual IP address
# associations in this file.  If OpenVPN goes down or
# is restarted, reconnecting clients can be assigned
# the same virtual IP address from the pool that was
# previously assigned.
ifconfig-pool-persist ipp.txt
[root@osedev1 server]# find /etc/openvpn | grep -i ipp.txt
/etc/openvpn/server/ipp.txt
/etc/openvpn/ipp.txt
[root@osedev1 server]# cat /etc/openvpn/server/ipp.txt 
maltfield,10.241.189.4
[root@osedev1 server]# cat /etc/openvpn/ipp.txt 
maltfield,10.241.189.4
hetzner2,10.241.189.8

1. there's also an option that I have commented-out whoose comments say it should be uncommented if multiple clients will share the same cert

[root@osedev1 server]# grep -iB5 duplicate server.conf
#
# IF YOU HAVE NOT GENERATED INDIVIDUAL
# CERTIFICATE/KEY PAIRS FOR EACH CLIENT,
# EACH HAVING ITS OWN UNIQUE "COMMON NAME",
# UNCOMMENT THIS LINE OUT.
;duplicate-cn
[root@osedev1 server]#

I uncommented the above 'duplicate-cn' line and restarted openvpn on the dev node

[root@osedev1 server]# vim server.conf
[root@osedev1 server]# systemctl restart openvpn@server.service

I reconnected to the vpn from the staging & prod servers; they got new IP addresses

[root@opensourceecology openvpn]# ip address show dev tun0
5: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
	link/none 
	inet 10.241.189.14 peer 10.241.189.13/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::e5fb:f261:801b:1c3d/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@opensourceecology openvpn]#

[root@osestaging1 openvpn]# ip address show dev tun0
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.18 peer 10.241.189.17/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::27f3:9643:5530:bd0e/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osestaging1 openvpn]#

I confirmed that each client could ping themselves, but not each-other, so I uncommented the line 'client-to-client' and restarted the openvpn server again
after that, I confirmed that staging could ping prod, prod could ping staging, and my laptop could ping both staging & prod. Cool!
1. for some reason the servers could still not ping my laptop; maybe that's some complication in my like quad-NAT'd QubesOS networking stack flowing through two nested VPN connections. Anyway, that shouldn't be required *shrug*
and, holy shit, I was successfully able to ssh into the staging node from the production node through the private VPN IP

[maltfield@opensourceecology ~]$ ssh -p 32415 10.241.189.18
The authenticity of host '[10.241.189.18]:32415 ([10.241.189.18]:32415)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
ECDSA key fingerprint is MD5:ab:eb:7f:f2:bb:83:a1:e5:21:49:1e:22:93:17:70:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.18]:32415' (ECDSA) to the list of known hosts.
Last login: Thu Oct  3 08:56:23 2019 from gateway
[maltfield@osestaging1 ~]$

but I was unable to ssh into our staging node from my laptop. oddly, it *is* able to establish a connection, but it gets stuck at some handshake step

user@ose:~/openvpn$ ssh -vvvvvvp 32415 maltfield@10.241.189.18
OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2t  10 Sep 2019
debug1: Reading configuration data /home/user/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: resolving "10.241.189.18" port 32415
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to 10.241.189.18 [10.241.189.18] port 32415.
debug1: Connection established.
...
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

Connection closed by 10.241.189.18 port 32415
user@ose:~/openvpn$

ok, I fixed this issue by removing the second VPN (qubes was configured to use a vpn qube as its NetVM; changing this to 'sys-firewall' fixed this issue)

user@ose:~/openvpn$ ssh -p 32415 maltfield@10.241.189.18
Last login: Thu Oct  3 09:20:50 2019 from 10.241.189.6
[maltfield@osestaging1 ~]$

on second thought, I really should have static ip addresses unique for both the prod & staging nodes. to achieve this, I can't share the same cert; I'll just make '/root/openvpn' one of those dirs (like networking config dirs) that is not changed by the rsync
I commented-out the 'duplicate-cn' line again in the openvpn server config & restarted the openvpn server

[root@osedev1 openvpn]# systemctl restart openvpn@server.service
(reverse-i-search)`grep': ss -plan | ^Cep -i 8080
[root@osedev1 openvpn]# grep -B5 duplicate-cn server.conf 
#
# IF YOU HAVE NOT GENERATED INDIVIDUAL
# CERTIFICATE/KEY PAIRS FOR EACH CLIENT,
# EACH HAVING ITS OWN UNIQUE "COMMON NAME",
# UNCOMMENT THIS LINE OUT.
;duplicate-cn
[root@osedev1 openvpn]# systemctl restart openvpn@server.service

and I created a distinct cert for 'osestaging1'

[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full osestaging1 nopass

Using SSL: openssl OpenSSL 1.0.2k-fips  26 Jan 2017
Generating a 2048 bit RSA private key
....+++
...........................+++
writing new private key to '/usr/share/easy-rsa/3/pki/private/osestaging1.key.WsJhUsDCny'
-----
Using configuration from /usr/share/easy-rsa/3/pki/safessl-easyrsa.cnf
Enter pass phrase for /usr/share/easy-rsa/3/pki/private/ca.key:
Check that the request matches the signature
Signature ok
The Subject's Distinguished Name is as follows
commonName            :ASN.1 12:'osestaging1'
Certificate is to be certified until Sep 17 10:34:03 2022 GMT (1080 days)

Write out database with 1 new entries
Data Base Updated
[root@osedev1 3]# cp pki/private/osestaging1.key /home/maltfield/
[root@osedev1 3]# cp pki/private/ta.key /home/maltfield/
[root@osedev1 3]# cp pki/issued/osestaging1.crt /home/maltfield/
[root@osedev1 3]# cp pki/ca.crt /home/maltfield/
[root@osedev1 3]# chown maltfield /home/maltfield/*.key
[root@osedev1 3]# chown maltfield /home/maltfield/*.crt
[root@osedev1 3]# logout

and on the staging server

[root@osestaging1 ~]# cd /root/openvpn/
[root@osestaging1 openvpn]# mv /home/maltfield/*.key .
mv: overwrite './ta.key'? y
[root@osestaging1 openvpn]# mv /home/maltfield/*.crt .
mv: overwrite './ca.crt'? y
[root@osestaging1 openvpn]# ls
ca.crt       hetzner2.crt  osestaging1.crt  ta.key
client.conf  hetzner2.key  osestaging1.key
[root@osestaging1 openvpn]# shred -u hetzner2.*
[root@osestaging1 openvpn]# ls -lah
total 32K
drwx------. 2 root      root      4.0K Oct  3 10:40 .
dr-xr-x---. 4 root      root      4.0K Oct  3 07:59 ..
-rw-------. 1 maltfield maltfield 1.9K Oct  3 10:36 ca.crt
-rw-r--r--. 1 root      root      3.6K Oct  3 07:27 client.conf
-rw-------. 1 maltfield maltfield 5.6K Oct  3 10:36 osestaging1.crt
-rw-------. 1 maltfield maltfield 1.7K Oct  3 10:36 osestaging1.key
-rw-------. 1 maltfield maltfield  636 Oct  3 10:36 ta.key
[root@osestaging1 openvpn]# chown root:root *.crt
[root@osestaging1 openvpn]# chown root:root *.key
[root@osestaging1 openvpn]# chmod 0600 client.conf 
[root@osestaging1 openvpn]# ls -lah
total 32K
drwx------. 2 root root 4.0K Oct  3 10:40 .
dr-xr-x---. 4 root root 4.0K Oct  3 07:59 ..
-rw-------. 1 root root 1.9K Oct  3 10:36 ca.crt
-rw-------. 1 root root 3.6K Oct  3 07:27 client.conf
-rw-------. 1 root root 5.6K Oct  3 10:36 osestaging1.crt
-rw-------. 1 root root 1.7K Oct  3 10:36 osestaging1.key
-rw-------. 1 root root  636 Oct  3 10:36 ta.key
[root@osestaging1 openvpn]# vim client.conf

I decided to make the following static IPs
1. 10.241.189.10 hetzner2 (prod)
2. 10.241.189.11 osestaging1
I did this by uncommenting the line 'client-config-dir ccd', creating a client-specifc config file in the '/etc/openvpn/ccd/' dir whoose name matches the CN (Common Name) on the client cert, and restarting the openvpn server service

[root@osedev1 openvpn]# vim server.conf
[root@osedev1 openvpn]# grep -Ei '^client-config-dir ccd' server.conf
client-config-dir ccd
[root@osedev1 openvpn]# echo "ifconfig-push 10.241.189.11 255.255.255.255" > ccd/osestaging1
[root@osedev1 openvpn]# systemctl restart openvpn@server.service
[root@osedev1 openvpn]#

I did the same for prod

[root@osedev1 openvpn]# echo "ifconfig-push 10.241.189.10 255.255.255.255" > ccd/hetzner2
[root@osedev1 openvpn]# systemctl restart openvpn@server.service
[root@osedev1 openvpn]#

now that it's static, I can update my ssh config to make connecting to the staging node easy after connecting to the vpn from my laptop

user@ose:~/openvpn$ vim ~/.ssh/config
user@ose:~/openvpn$ head -n21 ~/.ssh/config
# OSE
Host openbuildinginstitute.org *.openbuildinginstitute.org opensourceecology.org *.opensourceecology.org
	Port 32415
	ForwardAgent yes
	IdentityFile /home/user/.ssh/id_rsa.ose
	User maltfield

Host osedev1
	HostName 195.201.233.113
	Port 32415
	ForwardAgent yes
	IdentityFile /home/user/.ssh/id_rsa.ose
	User maltfield

Host osestaging1
	HostName 10.241.189.11
	Port 32415
	ForwardAgent yes
	IdentityFile /home/user/.ssh/id_rsa.ose
	User maltfield

user@ose:~/openvpn$ ssh osestaging1
The authenticity of host '[10.241.189.11]:32415 ([10.241.189.11]:32415)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.11]:32415' (ECDSA) to the list of known hosts.
Last login: Thu Oct  3 10:42:40 2019 from 10.241.189.10
[maltfield@osestaging1 ~]$

another issue remains: we need the staging node to connect to the vpn on startup, but I can't get the fucking systemd module to work

[root@osestaging1 system]# systemctl start openvpn-client\@client.service 
Job for openvpn-client@client.service failed because the control process exited with error code. See "systemctl status openvpn-client@client.service" and "journalctl -xe" for details.
[root@osestaging1 system]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-10-03 12:34:56 UTC; 8s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
  Process: 1295 ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf (code=exited, status=200/CHDIR)
 Main PID: 1295 (code=exited, status=200/CHDIR)

Oct 03 12:34:56 osestaging1 systemd[1]: Starting OpenVPN tunnel for client...
Oct 03 12:34:56 osestaging1 systemd[1]: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct 03 12:34:56 osestaging1 systemd[1]: Failed to start OpenVPN tunnel for client.
Oct 03 12:34:56 osestaging1 systemd[1]: Unit openvpn-client@client.service entered failed state.
Oct 03 12:34:56 osestaging1 systemd[1]: openvpn-client@client.service failed.
[root@osestaging1 system]# tail -n 7 /var/log/messages 
Oct  3 12:29:29 localhost systemd: openvpn-client@client.service failed.
Oct  3 12:34:56 localhost systemd: Starting OpenVPN tunnel for client...
Oct  3 12:34:56 localhost systemd: Failed at step CHDIR spawning /usr/sbin/openvpn: No such file or directory
Oct  3 12:34:56 localhost systemd: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct  3 12:34:56 localhost systemd: Failed to start OpenVPN tunnel for client.
Oct  3 12:34:56 localhost systemd: Unit openvpn-client@client.service entered failed state.
Oct  3 12:34:56 localhost systemd: openvpn-client@client.service failed.
[root@osestaging1 system]#

the /usr/sbin/openvpn file definitely exists; I think the issue is with the tun0 not existing or something
I gave the osestaging1 container a reboot
after a reboot, osestaging1 now says that the openvpn-client@client.service doesn't exist!

[maltfield@osestaging1 ~]$ systemctl start openvpn-client\@client.service
Failed to start openvpn-client@client.service: The name org.freedesktop.PolicyKit1 was not provided by any .service files
See system logs and 'systemctl status openvpn-client@client.service' for details.
[maltfield@osestaging1 ~]$ systemctl list-unit-files | grep -i vpn
openvpn-client@.service                disabled
openvpn-client@client.service          disabled
openvpn-server@.service                disabled
openvpn@.service                       disabled
[maltfield@osestaging1 ~]$

attempting to enable it failes

[maltfield@osestaging1 ~]$ systemctl enable /etc/systemd/system/openvpn-client\@client.service 
Failed to execute operation: The name org.freedesktop.PolicyKit1 was not provided by any .service files
[maltfield@osestaging1 ~]$

oh, duh, I wasn't root

[root@osestaging1 ~]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
[root@osestaging1 ~]# systemctl start openvpn-client\@client.service 
Job for openvpn-client@client.service failed because the control process exited with error code. See "systemctl status openvpn-client@client.service" and "journalctl -xe" for details.
[root@osestaging1 ~]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-10-03 12:52:39 UTC; 7s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
  Process: 379 ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf (code=exited, status=200/CHDIR)
 Main PID: 379 (code=exited, status=200/CHDIR)

Oct 03 12:52:38 osestaging1 systemd[1]: Starting OpenVPN tunnel for client...
Oct 03 12:52:39 osestaging1 systemd[1]: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct 03 12:52:39 osestaging1 systemd[1]: Failed to start OpenVPN tunnel for client.
Oct 03 12:52:39 osestaging1 systemd[1]: Unit openvpn-client@client.service entered failed state.
Oct 03 12:52:39 osestaging1 systemd[1]: openvpn-client@client.service failed.
[root@osestaging1 ~]# tail -n 7 /var/log/messages 
Oct  3 12:52:38 localhost systemd: Created slice system-openvpn\x2dclient.slice.
Oct  3 12:52:38 localhost systemd: Starting OpenVPN tunnel for client...
Oct  3 12:52:39 localhost systemd: Failed at step CHDIR spawning /usr/sbin/openvpn: No such file or directory
Oct  3 12:52:39 localhost systemd: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct  3 12:52:39 localhost systemd: Failed to start OpenVPN tunnel for client.
Oct  3 12:52:39 localhost systemd: Unit openvpn-client@client.service entered failed state.
Oct  3 12:52:39 localhost systemd: openvpn-client@client.service failed.
[root@osestaging1 ~]#

after fighting with this shit for hours, I finally just copied all my files from /root/openvpn into /etc/openvpn/client/ and it worked!

[root@osestaging1 system]# cp /root/openvpn/* /etc/openvpn/client
[root@osestaging1 system]# vim openvpn-client\@client.service
...
[root@osestaging1 system]# systemctl daemon-reload
<30>systemd-fstab-generator[425]: Running in a container, ignoring fstab device entry for /dev/root.
[root@osestaging1 system]# systemctl restart openvpn-client\@client.service 
[root@osestaging1 system]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-10-03 13:33:32 UTC; 1s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
 Main PID: 432 (openvpn)
   Status: "Initialization Sequence Completed"
   CGroup: /user.slice/user-1000.slice/session-582.scope/system.slice/system-openvpn\x2dclient.slice/openvpn-client@client.service
		   └─432 /usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf

Oct 03 13:33:33 osestaging1 openvpn[432]: Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Oct 03 13:33:33 osestaging1 openvpn[432]: WARNING: Since you are using --dev tun with a point-to-point topology, the second arg...nowarn)
Oct 03 13:33:33 osestaging1 openvpn[432]: ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Oct 03 13:33:33 osestaging1 openvpn[432]: TUN/TAP device tun0 opened
Oct 03 13:33:33 osestaging1 openvpn[432]: TUN/TAP TX queue length set to 100
Oct 03 13:33:33 osestaging1 openvpn[432]: /sbin/ip link set dev tun0 up mtu 1500
Oct 03 13:33:33 osestaging1 openvpn[432]: /sbin/ip addr add dev tun0 local 10.241.189.11 peer 255.255.255.255
Oct 03 13:33:33 osestaging1 openvpn[432]: /sbin/ip route add 10.241.189.0/24 via 255.255.255.255
Oct 03 13:33:33 osestaging1 openvpn[432]: WARNING: this configuration may cache passwords in memory -- use the auth-nocache opt...nt this
Oct 03 13:33:33 osestaging1 openvpn[432]: Initialization Sequence Completed
Hint: Some lines were ellipsized, use -l to show in full.
[root@osestaging1 system]# ip address show dev tun0
2: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.11 peer 255.255.255.255/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::927:fae4:1356:9b90/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osestaging1 system]#

I confirmed that I could ssh into the staging node from my laptop
I rebooted the staging node
I confirmed that I could ssh into the staging node again after the reboot!
I'm not going to bother with trying to setup this with the prod node for now; I'm not in a place where I want to make & test that prod change by rebooting the server..
this is a good stopping point; I created another snapshot of the staging node

[root@osedev1 ~]# lxc-stop -n osestaging1
[root@osedev1 ~]# lxc-snapshot --name osestaging1 --list
snap0 (/var/lib/lxcsnaps/osestaging1) 2019:10:02 15:37:58
[root@osedev1 ~]# lxc-snapshot --name osestaging1 afterVPN
lxc_container: lxccontainer.c: lxcapi_snapshot: 2891 Snapshot of directory-backed container requested.
lxc_container: lxccontainer.c: lxcapi_snapshot: 2892 Making a copy-clone.  If you do want snapshots, then
lxc_container: lxccontainer.c: lxcapi_snapshot: 2893 please create an aufs or overlayfs clone first, snapshot that
lxc_container: lxccontainer.c: lxcapi_snapshot: 2894 and keep the original container pristine.
[root@osedev1 ~]# lxc-snapshot --name osestaging1 --list
snap1 (/var/lib/lxcsnaps/osestaging1) 2019:10:03 15:40:16
snap0 (/var/lib/lxcsnaps/osestaging1) 2019:10:02 15:37:58
[root@osedev1 ~]#

I started the staging container again, and I tested an rsync from prod to staging; first let's see the contents of /etc/varnish on staging

[root@osestaging1 ~]# ls -lah /etc | grep -i varnish
[root@osestaging1 ~]#

and the rsync; it failed. right, I need passwordless sudo on the staging node setup

[maltfield@opensourceecology ~]$ sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" -av --progress /etc/varnish maltfield@10.241.189.10:/etc/
[sudo] password for maltfield: 
The authenticity of host '[10.241.189.10]:32415 ([10.241.189.10]:32415)' can't be established.
ECDSA key fingerprint is SHA256:HclF8ZQOjGqx+9TmwL111kZ7QxgKkoEw8g3l2YxV0gk.
ECDSA key fingerprint is MD5:cd:87:b1:bb:c1:3e:d1:d1:d4:5d:16:c9:e8:30:6a:71.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.10]:32415' (ECDSA) to the list of known hosts.
sudo: no tty present and no askpass program specified
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]
[maltfield@opensourceecology ~]$

I added this line to the end of the staging node with 'visudo'

maltfield       ALL=(ALL)       NOPASSWD: ALL

doh, I gotta install rsync on the staging node. so many prereqs...

[maltfield@opensourceecology ~]$ sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" -av --progress /etc/varnish maltfield@10.241.189.11:/etc/
The authenticity of host '[10.241.189.11]:32415 ([10.241.189.11]:32415)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
ECDSA key fingerprint is MD5:ab:eb:7f:f2:bb:83:a1:e5:21:49:1e:22:93:17:70:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.11]:32415' (ECDSA) to the list of known hosts.
sudo: rsync: command not found
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]
[maltfield@opensourceecology ~]$

this time the rsync worked!

[maltfield@opensourceecology ~]$ sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" -av --progress /etc/varnish maltfield@10.241.189.11:/etc/
...
sent 192211 bytes  received 503 bytes  128476.00 bytes/sec
total size is 190106  speedup is 0.99
[maltfield@opensourceecology ~]$

here's the dir on staging node's side

[root@osestaging1 ~]# ls -lah /etc/varnish 
total 44K
drwxr-xr-x.  5 root root 4.0K Aug 27 06:19 .
drwxr-xr-x. 63 root root 4.0K Oct  3 13:52 ..
-rw-r--r--.  1 root root 1.4K Apr  9 19:10 all-vhosts.vcl
-rw-r--r--.  1 root root  697 Nov 19  2017 catch-all.vcl
drwxr-xr-x.  2 root root 4.0K Aug 27 06:17 conf
-rw-rw-r--.  1 1011 1011  737 Nov 23  2017 default.vcl
drwxr-xr-x.  2 root root 4.0K Apr 12  2018 lib
-rw-------.  1 root root  129 Apr 12  2018 secret
-rw-------.  1 root root  129 Apr 12  2018 secret.20180412.bak
drwxr-xr-x.  2 root root 4.0K Aug 27 06:18 sites-enabled
-rw-r--r--.  1 root root 1.1K Oct 21  2017 varnish.params
[root@osestaging1 ~]#

again, here's the dirs we want to exclude; the openvpn configs are already preserved

	 /root
	/etc/sudo*
	/etc/openvpn
	/usr/share/easy-rsa
	/dev
	/sys
	/proc
	/boot/
	/etc/sysconfig/network*
	/tmp
	/var/tmp
	/etc/fstab
	/etc/mtab
	/etc/mdadm.conf

aaaand *fingers crossed* I kicked-off the rsync

[maltfield@opensourceecology ~]$ time sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
...

whoops, I got ahead of myself! I killed it & left the staging server in a broken state, so I restored from snapshot & re-did the visudo & install rsync steps. But before we actually kick-off this whole-system rsync, I need to attach a hetzner cloud volume and mount it to /var. Else, the dev node's little disk will fill-up!

[root@osedev1 ~]# lxc-snapshot --name osestaging1 -r snap1
[root@osedev1 ~]# lxc-start -n osestaging1

Wed Oct 02, 2019

continuing on the dev node, I want to create a container for lxc. First I installed 'lxc'

[root@osedev1 ~]# yum install lxc
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
epel/x86_64/metalink                                                                                              |  27 kB  00:00:00
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu
base                                                                                                              | 3.6 kB  00:00:00
epel                                                                                                              | 5.3 kB  00:00:00
extras                                                                                                            | 2.9 kB  00:00:00
updates                                                                                                           | 2.9 kB  00:00:00
(1/6): base/7/x86_64/group_gz                                                                                     | 165 kB  00:00:00
(2/6): base/7/x86_64/primary_db                                                                                   | 6.0 MB  00:00:00
(3/6): epel/x86_64/updateinfo                                                                                     | 1.0 MB  00:00:00
(4/6): updates/7/x86_64/primary_db                                                                                | 1.1 MB  00:00:00
(5/6): epel/x86_64/primary_db                                                                                     | 6.8 MB  00:00:00
(6/6): extras/7/x86_64/primary_db                                                                                 | 152 kB  00:00:00
Resolving Dependencies
--> Running transaction check
---> Package lxc.x86_64 0:1.0.11-2.el7 will be installed
--> Processing Dependency: lua-lxc(x86-64) = 1.0.11-2.el7 for package: lxc-1.0.11-2.el7.x86_64
--> Processing Dependency: lua-alt-getopt for package: lxc-1.0.11-2.el7.x86_64
--> Processing Dependency: liblxc.so.1()(64bit) for package: lxc-1.0.11-2.el7.x86_64
--> Running transaction check
---> Package lua-alt-getopt.noarch 0:0.7.0-4.el7 will be installed
---> Package lua-lxc.x86_64 0:1.0.11-2.el7 will be installed
--> Processing Dependency: lua-filesystem for package: lua-lxc-1.0.11-2.el7.x86_64
---> Package lxc-libs.x86_64 0:1.0.11-2.el7 will be installed
--> Running transaction check
---> Package lua-filesystem.x86_64 0:1.6.2-2.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=========================================================================================================================================
 Package                              Arch                         Version                              Repository                  Size
=========================================================================================================================================
Installing:
 lxc                                  x86_64                       1.0.11-2.el7                         epel                       140 k
Installing for dependencies:
 lua-alt-getopt                       noarch                       0.7.0-4.el7                          epel                       7.4 k
 lua-filesystem                       x86_64                       1.6.2-2.el7                          epel                        28 k
 lua-lxc                              x86_64                       1.0.11-2.el7                         epel                        17 k
 lxc-libs                             x86_64                       1.0.11-2.el7                         epel                       276 k

Transaction Summary
=========================================================================================================================================
Install  1 Package (+4 Dependent packages)

Total download size: 468 k
Installed size: 1.0 M
Is this ok [y/d/N]: y
Downloading packages:
(1/5): lua-alt-getopt-0.7.0-4.el7.noarch.rpm                                                                      | 7.4 kB  00:00:00
(2/5): lua-filesystem-1.6.2-2.el7.x86_64.rpm                                                                      |  28 kB  00:00:00
(3/5): lua-lxc-1.0.11-2.el7.x86_64.rpm                                                                            |  17 kB  00:00:00
(4/5): lxc-1.0.11-2.el7.x86_64.rpm                                                                                | 140 kB  00:00:00
(5/5): lxc-libs-1.0.11-2.el7.x86_64.rpm                                                                           | 276 kB  00:00:00
-----------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                    717 kB/s | 468 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : lxc-libs-1.0.11-2.el7.x86_64                                                                                          1/5
  Installing : lua-filesystem-1.6.2-2.el7.x86_64                                                                                     2/5
  Installing : lua-lxc-1.0.11-2.el7.x86_64                                                                                           3/5
  Installing : lua-alt-getopt-0.7.0-4.el7.noarch                                                                                     4/5
  Installing : lxc-1.0.11-2.el7.x86_64                                                                                               5/5
  Verifying  : lua-lxc-1.0.11-2.el7.x86_64                                                                                           1/5
  Verifying  : lua-alt-getopt-0.7.0-4.el7.noarch                                                                                     2/5
  Verifying  : lxc-1.0.11-2.el7.x86_64                                                                                               3/5
  Verifying  : lua-filesystem-1.6.2-2.el7.x86_64                                                                                     4/5
  Verifying  : lxc-libs-1.0.11-2.el7.x86_64                                                                                          5/5

Installed:
  lxc.x86_64 0:1.0.11-2.el7

Dependency Installed:
  lua-alt-getopt.noarch 0:0.7.0-4.el7 lua-filesystem.x86_64 0:1.6.2-2.el7 lua-lxc.x86_64 0:1.0.11-2.el7 lxc-libs.x86_64 0:1.0.11-2.el7

Complete!
[root@osedev1 ~]#

by default, it appears that we have no lxc containers

[root@osedev1 ~]# ls -lah /usr/share/lxc/templates/
total 8.0K
drwxr-xr-x. 2 root root 4.0K Mar  7  2019 .
drwxr-xr-x. 6 root root 4.0K Oct  2 12:16 ..
[root@osedev1 ~]#

I installed the 'lxc-templates' package (also from epel), and it gave me templates for many distros, including centos

[root@osedev1 ~]# yum -y install lxc-templates
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu
Resolving Dependencies
--> Running transaction check
---> Package lxc-templates.x86_64 0:1.0.11-2.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=========================================================================================================================================
 Package                             Arch                         Version                               Repository                  Size
=========================================================================================================================================
Installing:
 lxc-templates                       x86_64                       1.0.11-2.el7                          epel                        81 k

Transaction Summary
=========================================================================================================================================
Install  1 Package

Total download size: 81 k
Installed size: 333 k
Downloading packages:
lxc-templates-1.0.11-2.el7.x86_64.rpm                                                                             |  81 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : lxc-templates-1.0.11-2.el7.x86_64                                                                                     1/1
  Verifying  : lxc-templates-1.0.11-2.el7.x86_64                                                                                     1/1

Installed:
  lxc-templates.x86_64 0:1.0.11-2.el7

Complete!
[root@osedev1 ~]# ls -lah /usr/share/lxc/templates/
total 348K
drwxr-xr-x. 2 root root 4.0K Oct  2 12:29 .
drwxr-xr-x. 6 root root 4.0K Oct  2 12:16 ..
-rwxr-xr-x. 1 root root  11K Mar  7  2019 lxc-alpine
-rwxr-xr-x. 1 root root  14K Mar  7  2019 lxc-altlinux
-rwxr-xr-x. 1 root root  11K Mar  7  2019 lxc-archlinux
-rwxr-xr-x. 1 root root 9.5K Mar  7  2019 lxc-busybox
-rwxr-xr-x. 1 root root  30K Mar  7  2019 lxc-centos
-rwxr-xr-x. 1 root root  11K Mar  7  2019 lxc-cirros
-rwxr-xr-x. 1 root root  18K Mar  7  2019 lxc-debian
-rwxr-xr-x. 1 root root  18K Mar  7  2019 lxc-download
-rwxr-xr-x. 1 root root  49K Mar  7  2019 lxc-fedora
-rwxr-xr-x. 1 root root  28K Mar  7  2019 lxc-gentoo
-rwxr-xr-x. 1 root root  14K Mar  7  2019 lxc-openmandriva
-rwxr-xr-x. 1 root root  14K Mar  7  2019 lxc-opensuse
-rwxr-xr-x. 1 root root  35K Mar  7  2019 lxc-oracle
-rwxr-xr-x. 1 root root  12K Mar  7  2019 lxc-plamo
-rwxr-xr-x. 1 root root 6.7K Mar  7  2019 lxc-sshd
-rwxr-xr-x. 1 root root  24K Mar  7  2019 lxc-ubuntu
-rwxr-xr-x. 1 root root  12K Mar  7  2019 lxc-ubuntu-cloud
[root@osedev1 ~]#

now I was successfully able to create an lxc container for our staging node named 'osestaging1' from the template 'centos'. I didn't specify the version, but it does appear to be centos7

[root@osedev1 ~]# lxc-create -n osestaging1 -t centos
Host CPE ID from /etc/os-release: cpe:/o:centos:centos:7
Checking cache download in /var/cache/lxc/centos/x86_64/7/rootfs ...
Downloading CentOS minimal ...
...
Download complete.
Copy /var/cache/lxc/centos/x86_64/7/rootfs to /var/lib/lxc/osestaging1/rootfs ... 
Copying rootfs to /var/lib/lxc/osestaging1/rootfs ...
sed: can't read /var/lib/lxc/osestaging1/rootfs/etc/init/tty.conf: No such file or directory
Storing root password in '/var/lib/lxc/osestaging1/tmp_root_pass'
Expiring password for user root.
passwd: Success
sed: can't read /var/lib/lxc/osestaging1/rootfs/etc/rc.sysinit: No such file or directory
sed: can't read /var/lib/lxc/osestaging1/rootfs/etc/rc.d/rc.sysinit: No such file or directory

Container rootfs and config have been created.
Edit the config file to check/enable networking setup.

The temporary root password is stored in:

		'/var/lib/lxc/osestaging1/tmp_root_pass'


The root password is set up as expired and will require it to be changed
at first login, which you should do as soon as possible.  If you lose the
root password or wish to change it without starting the container, you
can change it from the host by running the following command (which will
also reset the expired flag):

		chroot /var/lib/lxc/osestaging1/rootfs passwd

[root@osedev1 ~]#

the sync from prod to sync is going to override the staging root password, so I won't bother creating & setting a distinct root password for this staging container
`lxc-top` shows that we have 0 containers running

[root@osedev1 ~]# lxc-top

Container            CPU      CPU      CPU      BlkIO        Mem
Name                Used      Sys     User      Total       Used
TOTAL (0 )          0.00     0.00     0.00    0.00       0.00

I tried to start the staging container, but I got a networking error

[root@osedev1 ~]# lxc-start -n osestaging1
lxc-start: conf.c: instantiate_veth: 3115 failed to attach 'vethWX1L1G' to the bridge 'virbr0': No such device
																											  lxc-start: conf.c: lxc_create_network: 3407 failed to create netdev
										lxc-start: start.c: lxc_spawn: 875 failed to create the network
																									   lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
								 lxc-start: lxc_start.c: main: 336 The container failed to start.
lxc-start: lxc_start.c: main: 340 Additional information can be obtained by setting the --logfile and --logpriority options.
[root@osedev1 ~]#

it looks like there is no 'vibr0' device; we only have the loopback, ethernet, and tun device for openvpn

[root@osedev1 ~]# ip -all address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host 
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
	inet 195.201.233.113/32 brd 195.201.233.113 scope global dynamic eth0
	   valid_lft 56775sec preferred_lft 56775sec
	inet6 2a01:4f8:c010:3ca0::1/64 scope global 
	   valid_lft forever preferred_lft forever
	inet6 fe80::9400:ff:fe2e:489d/64 scope link 
	   valid_lft forever preferred_lft forever
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.1 peer 10.241.189.2/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::4ca6:2d27:e97f:1a66/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osedev1 ~]#

Ideally, the container would not be given an internet-facing ip address, anyway. It would be better to give it a bridge on the tun0 openvpn network
it looks like the relevant files for containers is in /var/lib/lxc/<containerName>/

[root@osedev1 osestaging1]# date
Wed Oct  2 12:47:07 CEST 2019
[root@osedev1 osestaging1]# pwd
/var/lib/lxc/osestaging1
[root@osedev1 osestaging1]# ls
config  rootfs  tmp_root_pass
[root@osedev1 osestaging1]#

here is the default config

[root@osedev1 osestaging1]# cat config 
# Template used to create this container: /usr/share/lxc/templates/lxc-centos
# Parameters passed to the template:
# For additional config options, please look at lxc.container.conf(5)
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = virbr0
lxc.network.hwaddr = fe:07:06:a6:5f:1d
lxc.rootfs = /var/lib/lxc/osestaging1/rootfs

# Include common configuration
lxc.include = /usr/share/lxc/config/centos.common.conf

lxc.arch = x86_64
lxc.utsname = osestaging1

lxc.autodev = 1

# When using LXC with apparmor, uncomment the next line to run unconfined:
#lxc.aa_profile = unconfined

# example simple networking setup, uncomment to enable
#lxc.network.type = veth
#lxc.network.flags = up
#lxc.network.link = lxcbr0
#lxc.network.name = eth0
# Additional example for veth network type
#    static MAC address,
#lxc.network.hwaddr = 00:16:3e:77:52:20
#    persistent veth device name on host side
#        Note: This may potentially collide with other containers of same name!
#lxc.network.veth.pair = v-osestaging1-e0

[root@osedev1 osestaging1]#

to my horror, I discovered that iptables was disabled on the dev server! why!?!

[root@osedev1 osestaging1]# iptables-save
[root@osedev1 osestaging1]# ip6tables-save
[root@osedev1 osestaging1]# service iptables status
Redirecting to /bin/systemctl status iptables.service
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
[root@osedev1 osestaging1]# service iptables start
Redirecting to /bin/systemctl start iptables.service
[root@osedev1 osestaging1]# iptables-save
# Generated by iptables-save v1.4.21 on Wed Oct  2 12:58:21 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [17:1396]
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 1194 -j ACCEPT
-A INPUT -j DROP
COMMIT
# Completed on Wed Oct  2 12:58:21 2019
[root@osedev1 osestaging1]# ip6tables-save
root@osedev1 osestaging1]# service ip6tables start
Redirecting to /bin/systemctl start ip6tables.service
[root@osedev1 osestaging1]# ip6tables-save
# Generated by ip6tables-save v1.4.21 on Wed Oct  2 12:59:51 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p ipv6-icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -d fe80::/64 -p udp -m udp --dport 546 -m state --state NEW -j ACCEPT
-A INPUT -j REJECT --reject-with icmp6-adm-prohibited
-A FORWARD -j REJECT --reject-with icmp6-adm-prohibited
COMMIT
# Completed on Wed Oct  2 12:59:51 2019
[root@osedev1 osestaging1]#

systemd says that both iptables.service & ip6tables.service are 'loaded active exited'

[root@osedev1 osestaging1]# systemctl list-units | grep -Ei 'iptables|ip6tables'
ip6tables.service                                                                           loaded active exited    IPv6 firewall with ip6tables
iptables.service                                                                            loaded active exited    IPv4 firewall with iptables
[root@osedev1 osestaging1]#

systemd status shows both services are 'disabled'

[root@osedev1 osestaging1]# systemctl status iptables.service
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:58:17 CEST; 7min ago
  Process: 29121 ExecStart=/usr/libexec/iptables/iptables.init start (code=exited, status=0/SUCCESS)
 Main PID: 29121 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/iptables.service

Oct 02 12:58:17 osedev1 systemd[1]: Starting IPv4 firewall with iptables...
Oct 02 12:58:17 osedev1 iptables.init[29121]: iptables: Applying firewall rules: [  OK  ]
Oct 02 12:58:17 osedev1 systemd[1]: Started IPv4 firewall with iptables.
[root@osedev1 osestaging1]# systemctl status ip6tables.service
● ip6tables.service - IPv6 firewall with ip6tables
   Loaded: loaded (/usr/lib/systemd/system/ip6tables.service; disabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:59:46 CEST; 6min ago
  Process: 29233 ExecStart=/usr/libexec/iptables/ip6tables.init start (code=exited, status=0/SUCCESS)
 Main PID: 29233 (code=exited, status=0/SUCCESS)

Oct 02 12:59:46 osedev1 systemd[1]: Starting IPv6 firewall with ip6tables...
Oct 02 12:59:46 osedev1 ip6tables.init[29233]: ip6tables: Applying firewall rules: [  OK  ]
Oct 02 12:59:46 osedev1 systemd[1]: Started IPv6 firewall with ip6tables.
[root@osedev1 osestaging1]#

I enabled both, and I confirmed that they're now set to 'enabled' (see second line)

[root@osedev1 osestaging1]# systemctl enable iptables.service
Created symlink from /etc/systemd/system/basic.target.wants/iptables.service to /usr/lib/systemd/system/iptables.service.
[root@osedev1 osestaging1]# systemctl enable ip6tables.service
Created symlink from /etc/systemd/system/basic.target.wants/ip6tables.service to /usr/lib/systemd/system/ip6tables.service.
[root@osedev1 osestaging1]# systemctl status iptables.service
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; enabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:58:17 CEST; 8min ago
 Main PID: 29121 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/iptables.service

Oct 02 12:58:17 osedev1 systemd[1]: Starting IPv4 firewall with iptables...
Oct 02 12:58:17 osedev1 iptables.init[29121]: iptables: Applying firewall rules: [  OK  ]
Oct 02 12:58:17 osedev1 systemd[1]: Started IPv4 firewall with iptables.
[root@osedev1 osestaging1]# systemctl status ip6tables.service
● ip6tables.service - IPv6 firewall with ip6tables
   Loaded: loaded (/usr/lib/systemd/system/ip6tables.service; enabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:59:46 CEST; 7min ago
 Main PID: 29233 (code=exited, status=0/SUCCESS)

Oct 02 12:59:46 osedev1 systemd[1]: Starting IPv6 firewall with ip6tables...
Oct 02 12:59:46 osedev1 ip6tables.init[29233]: ip6tables: Applying firewall rules: [  OK  ]
Oct 02 12:59:46 osedev1 systemd[1]: Started IPv6 firewall with ip6tables.
[root@osedev1 osestaging1]#

actually, it doesn't make sense to have the staging server only have an ip address on the openvpn subnet; if that were the case, then it couldn't access the internet...which would make developing a POC nearly impossible. We want to prevent forwarding ports from the internet to the machine, but we do want to let it reach OUT to the internet. Perhaps we should setup the bridge per normal and then just have the openvpn client running on he staging server. Indeed, we'll need the prod server to be running an openvpn client, so we should be able to just duplicate this config (they'll be the same anyway!)
I looked into what options are available for 'lxc.network.type', which is listed in section 5 of the man page for 'lxc.container.conf' = `man 5 lxc.container.conf`

lxc.network.type
specify what kind of network virtualization to be used for the container. Each time a lxc.network.type field is found a
new round of network configuration begins. In this way, several network virtualization types can be specified for the
same container, as well as assigning several network interfaces for one container. The different virtualization types
can be:

none: will cause the container to share the host's network namespace. This means the host network devices are usable in
the container. It also means that if both the container and host have upstart as init, 'halt' in a container (for
instance) will shut down the host.

empty: will create only the loopback interface.

veth: a virtual ethernet pair device is created with one side assigned to the container and the other side attached to
a bridge specified by the lxc.network.link option. If the bridge is not specified, then the veth pair device will be
created but not attached to any bridge. Otherwise, the bridge has to be created on the system before starting the con‐
tainer. lxc won't handle any configuration outside of the container. By default, lxc chooses a name for the network
device belonging to the outside of the container, but if you wish to handle this name yourselves, you can tell lxc to
set a specific name with the lxc.network.veth.pair option (except for unprivileged containers where this option is
ignored for security reasons).

vlan: a vlan interface is linked with the interface specified by the lxc.network.link and assigned to the container.
The vlan identifier is specified with the option lxc.network.vlan.id.

macvlan: a macvlan interface is linked with the interface specified by the lxc.network.link and assigned to the con‐
tainer. lxc.network.macvlan.mode specifies the mode the macvlan will use to communicate between different macvlan on
the same upper device. The accepted modes are private, the device never communicates with any other device on the same
upper_dev (default), vepa, the new Virtual Ethernet Port Aggregator (VEPA) mode, it assumes that the adjacent bridge
returns all frames where both source and destination are local to the macvlan port, i.e. the bridge is set up as a
reflective relay. Broadcast frames coming in from the upper_dev get flooded to all macvlan interfaces in VEPA mode,
local frames are not delivered locally, or bridge, it provides the behavior of a simple bridge between different
macvlan interfaces on the same port. Frames from one interface to another one get delivered directly and are not sent
out externally. Broadcast frames get flooded to all other bridge ports and to the external interface, but when they
come back from a reflective relay, we don't deliver them again. Since we know all the MAC addresses, the macvlan bridge
mode does not require learning or STP like the bridge module does.

phys: an already existing interface specified by the lxc.network.link is assigned to the container.

1. we want the container to be able to touch the internet, so hat rules out 'empty'
2. we don't have a spare physical interface on the server for each container, so that rules out 'phys'
3. I'm unclear on the distinction between macvlan, vlan, veth, and none. Probably we want veth and we need to get the 'virbr0' interface actually working
google says our error may be caused by libvert not being installed
I didn't have libvirt installed, so I did so

[root@osedev1 osestaging1]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
	inet 195.201.233.113/32 brd 195.201.233.113 scope global dynamic eth0
	   valid_lft 50735sec preferred_lft 50735sec
	inet6 2a01:4f8:c010:3ca0::1/64 scope global
	   valid_lft forever preferred_lft forever
	inet6 fe80::9400:ff:fe2e:489d/64 scope link
	   valid_lft forever preferred_lft forever
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none
	inet 10.241.189.1 peer 10.241.189.2/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::4ca6:2d27:e97f:1a66/64 scope link flags 800
	   valid_lft forever preferred_lft forever
[root@osedev1 osestaging1]# rpm -qa | grep -i libvirt
[root@osedev1 osestaging1]# yum -y install libvirt
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de   
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu 
Resolving Dependencies
...
Complete!
[root@osedev1 osestaging1]#

but there didn't appear to be any changes; I had to manually start the libvirtd service to get the changes; now it shows two new interfaces: 'virbr0' & 'virbr0-nic'

[root@osedev1 osestaging1]# systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
	 Docs: man:libvirtd(8)
		   https://libvirt.org
[root@osedev1 osestaging1]# systemctl start libvirtd
[root@osedev1 osestaging1]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
	inet 195.201.233.113/32 brd 195.201.233.113 scope global dynamic eth0
	   valid_lft 50619sec preferred_lft 50619sec
	inet6 2a01:4f8:c010:3ca0::1/64 scope global
	   valid_lft forever preferred_lft forever
	inet6 fe80::9400:ff:fe2e:489d/64 scope link
	   valid_lft forever preferred_lft forever
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none
	inet 10.241.189.1 peer 10.241.189.2/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::4ca6:2d27:e97f:1a66/64 scope link flags 800
	   valid_lft forever preferred_lft forever
6: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
	inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
	   valid_lft forever preferred_lft forever
7: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
[root@osedev1 osestaging1]#

and there's some changes to the routing table too

[root@osedev1 osestaging1]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 100
	link/none 
6: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
7: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
[root@osedev1 osestaging1]# ip r
default via 172.31.1.1 dev eth0 
10.241.189.0/24 via 10.241.189.2 dev tun0 
10.241.189.2 dev tun0 proto kernel scope link src 10.241.189.1 
169.254.0.0/16 dev eth0 scope link metric 1002 
172.31.1.1 dev eth0 scope link 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 
[root@osedev1 osestaging1]#

now I was successfully able to start the 'osestaging1' container

[root@osedev1 osestaging1]# lxc-start -n osestaging1
systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to CentOS Linux 7 (Core)!

Running in a container, ignoring fstab device entry for /dev/root.
Cannot add dependency job for unit display-manager.service, ignoring: Unit not found.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Swap.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Created slice Root Slice.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Paths.
[  OK  ] Listening on Delayed Shutdown Socket.
[  OK  ] Created slice System Slice.
[  OK  ] Created slice system-getty.slice.
		 Starting Journal Service...
		 Mounting POSIX Message Queue File System...
[  OK  ] Reached target Slices.
		 Starting Read and set NIS domainname from /etc/sysconfig/network...
		 Mounting Huge Pages File System...
		 Starting Remount Root and Kernel File Systems...
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Started Journal Service.
[  OK  ] Started Read and set NIS domainname from /etc/sysconfig/network.
[  OK  ] Started Remount Root and Kernel File Systems.
[  OK  ] Reached target Local File Systems (Pre).
		 Starting Configure read-only root support...
		 Starting Rebuild Hardware Database...
		 Starting Flush Journal to Persistent Storage...
<46>systemd-journald[14]: Received request to flush runtime journal from PID 1
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started Configure read-only root support.
		 Starting Load/Save Random Seed...
[  OK  ] Reached target Local File Systems.
		 Starting Rebuild Journal Catalog...
		 Starting Mark the need to relabel after reboot...
		 Starting Create Volatile Files and Directories...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Reached target Local File Systems.
		 Starting Rebuild Journal Catalog...
		 Starting Mark the need to relabel after reboot...
		 Starting Create Volatile Files and Directories...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Rebuild Journal Catalog.
[  OK  ] Started Mark the need to relabel after reboot.
[  OK  ] Started Create Volatile Files and Directories.
		 Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Started Rebuild Hardware Database.
		 Starting Update is Completed...
[  OK  ] Started Update is Completed.
[  OK  ] Reached target System Initialization.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
		 Starting LSB: Bring up/down networking...
		 Starting Permit User Sessions...
		 Starting Login Service...
		 Starting OpenSSH Server Key Generation...
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Started Permit User Sessions.
		 Starting Cleanup of Temporary Directories...
[  OK  ] Started Command Scheduler.
[  OK  ] Started Console Getty.
[  OK  ] Reached target Login Prompts.
[  OK  ] Started Cleanup of Temporary Directories.
[  OK  ] Started Login Service.
[  OK  ] Started OpenSSH Server Key Generation.

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

osestaging1 login:

I was successfully able to login as root, but it made me change the password immedately. I just set it to the same root password as our prod server

osestaging1 login: root
Password: 
You are required to change your password immediately (root enforced)
Changing password for root.
(current) UNIX password: 
New password: 
Retype new password: 
[root@osestaging1 ~]#

this new container has an ip address of '192.168.122.201', and it does have access to the internet

[root@osestaging1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host 
	   valid_lft forever preferred_lft forever
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
	link/ether fe:07:06:a6:5f:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
	inet 192.168.122.201/24 brd 192.168.122.255 scope global dynamic eth0
	   valid_lft 3310sec preferred_lft 3310sec
	inet6 fe80::fc07:6ff:fea6:5f1d/64 scope link 
	   valid_lft forever preferred_lft forever
[root@osestaging1 ~]# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=55 time=5.46 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=55 time=5.48 ms

--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 5.468/5.474/5.480/0.006 ms
[root@osestaging1 ~]#

on the dev node host, we can also see the bridge with `brctl`

[root@osedev1 osestaging1]# brctl show
bridge name     bridge id               STP enabled     interfaces
virbr0          8000.5254007d0171       yes             vethYMJVGD
														virbr0-nic
[root@osedev1 osestaging1]#

now I think we're about ready to initiate this sync. Interesting decision: we could either rsync (via ssh) to the dev node or to the staging container. I think it would be safer to go to the container, as you can't fuck up the host dev node in that case.
I confirmed that ssh is listening on the default install of the staging container

[root@osestaging1 ~]# ss -plan | grep -i ssh
u_str  ESTAB      0      0         * 162265                * 0                   users:(("sshd",pid=298,fd=2),("sshd",pid=298,fd=1))
tcp    LISTEN     0      128       *:22                    *:*                   users:(("sshd",pid=298,fd=3))
tcp    LISTEN     0      128    [::]:22                 [::]:*                   users:(("sshd",pid=298,fd=4))
[root@osestaging1 ~]#

I did some basic bootstrap config of the staging container, following my documentation for doing the same to its host dev server Maltfield_Log/2019_Q3#Tue_Aug_20.2C_2019

[root@osestaging1 ~]# useradd maltfield
[root@osestaging1 ~]# su - maltfield
[maltfield@osestaging1 ~]$ mkdir .ssh
[maltfield@osestaging1 ~]$ echo "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGNYjR7UKiJSAG/AbP+vlCBqNfQZ2yuSXfsEDuM7cEU8PQNJyuJnS7m0VcA48JRnpUpPYYCCB0fqtIEhpP+szpMg2LByfTtbU0vDBjzQD9mEfwZ0mzJsfzh1Nxe86l/d6h6FhxAqK+eG7ljYBElDhF4l2lgcMAl9TiSba0pcqqYBRsvJgQoAjlZOIeVEvM1lyfWfrmDaFK37jdUCBWq8QeJ98qpNDX4A76f9T5Y3q5EuSFkY0fcU+zwFxM71bGGlgmo5YsMMdSsW+89fSG0652/U4sjf4NTHCpuD0UaSPB876NJ7QzeDWtOgyBC4nhPpS8pgjsnl48QZuVm6FNDqbXr9bVk5BdntpBgps+gXdSL2j0/yRRayLXzps1LCdasMCBxCzK+lJYWGalw5dNaIDHBsEZiK55iwPp0W3lU9vXFO4oKNJGFgbhNmn+KAaW82NBwlTHo/tOlj2/VQD9uaK5YLhQqAJzIq0JuWZWFLUC2FJIIG0pJBIonNabANcN+vq+YJqjd+JXNZyTZ0mzuj3OAB/Z5zS6lT9azPfnEjpcOngFs46P7S/1hRIrSWCvZ8kfECpa8W+cTMus4rpCd40d1tVKzJA/n0MGJjEs2q4cK6lC08pXxq9zAyt7PMl94PHse2uzDFhrhh7d0ManxNZE+I5/IPWOnG1PJsDlOe4Yqw== michael@opensourceecology.org" > .ssh/authorized_keys
[maltfield@osestaging1 ~]$ chmod 0700 .ssh
[maltfield@osestaging1 ~]$ chmod 0600 .ssh/authorized_keys
[maltfield@osestaging1 ~]$

I confirmed that I could now successfully ssh in as 'maltfield' using my key into staging from within dev

user@ose:~$ ssh -A osedev1
Last login: Wed Oct  2 12:09:35 2019 from 5.254.96.238
[maltfield@osedev1 ~]$ ssh maltfield@192.168.122.201 hostname
The authenticity of host '192.168.122.201 (192.168.122.201)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
ECDSA key fingerprint is MD5:ab:eb:7f:f2:bb:83:a1:e5:21:49:1e:22:93:17:70:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.122.201' (ECDSA) to the list of known hosts.
osestaging1
[maltfield@osedev1 ~]$

and continued with the bootstrap of my user, giving myself sudo rights

[root@osestaging1 ~]# yum -y install sudo
...
Installed:
  sudo.x86_64 0:1.8.23-4.el7                                                                                                             

Complete!
[root@osestaging1 ~]# passwd maltfield
Changing password for user maltfield.
New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.
[root@osestaging1 ~]# gpasswd -a maltfield wheel
Adding user maltfield to group wheel
[root@osestaging1 ~]# su - maltfield
Last login: Wed Oct  2 13:00:29 UTC 2019 on lxc/console
[maltfield@osestaging1 ~]$ sudo su -

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

	#1) Respect the privacy of others.
	#2) Think before you type.
	#3) With great power comes great responsibility.

[sudo] password for maltfield: 
Last login: Wed Oct  2 12:33:00 UTC 2019 on lxc/console
[root@osestaging1 ~]#

this time I took the hardened config from dev and gave it to staging; first on dev I ran:

user@ose:~$ ssh osedev1
Last login: Wed Oct  2 14:57:15 2019 from 5.254.96.238
[maltfield@osedev1 ~]$ sudo cp /etc/ssh/sshd_config .
[maltfield@osedev1 ~]$ sudo chown maltfield sshd_config 
[maltfield@osedev1 ~]$ scp sshd_config 192.168.122.201:
sshd_config                                   100% 4455     5.7MB/s   00:00    
[maltfield@osedev1 ~]$

and then in staging

[maltfield@osestaging1 ~]$ ls
sshd_config
[maltfield@osestaging1 ~]$ sudo su -
[sudo] password for maltfield:
Last login: Wed Oct  2 13:02:02 UTC 2019 on lxc/console
[root@osestaging1 ~]# cd /etc/ssh
[root@osestaging1 ssh]# mv sshd_config sshd_config.20191002.orig
[root@osestaging1 ssh]# mv /home/maltfield/sshd_config .
[root@osestaging1 ssh]# ls -lah
total 620K
drwxr-xr-x.  2 root      root      4.0K Oct  2 13:16 .
drwxr-xr-x. 60 root      root      4.0K Oct  2 13:01 ..
-rw-r--r--.  1 root      root      569K Aug  9 01:40 moduli
-rw-r--r--.  1 root      root      2.3K Aug  9 01:40 ssh_config
-rw-r-----.  1 root      ssh_keys   227 Oct  2 12:28 ssh_host_ecdsa_key
-rw-r--r--.  1 root      root       162 Oct  2 12:28 ssh_host_ecdsa_key.pub
-rw-r-----.  1 root      ssh_keys   387 Oct  2 12:28 ssh_host_ed25519_key
-rw-r--r--.  1 root      root        82 Oct  2 12:28 ssh_host_ed25519_key.pub
-rw-r-----.  1 root      ssh_keys  1.7K Oct  2 12:28 ssh_host_rsa_key
-rw-r--r--.  1 root      root       382 Oct  2 12:28 ssh_host_rsa_key.pub
-rw-------.  1 maltfield maltfield 4.4K Oct  2 13:07 sshd_config
-rw-------.  1 root      root      3.9K Aug  9 01:40 sshd_config.20191002.orig
[root@osestaging1 ssh]# chown root:root sshd_config
[root@osestaging1 ssh]# ls -lah
total 620K
drwxr-xr-x.  2 root root     4.0K Oct  2 13:16 .
drwxr-xr-x. 60 root root     4.0K Oct  2 13:01 ..
-rw-r--r--.  1 root root     569K Aug  9 01:40 moduli
-rw-r--r--.  1 root root     2.3K Aug  9 01:40 ssh_config
-rw-r-----.  1 root ssh_keys  227 Oct  2 12:28 ssh_host_ecdsa_key
-rw-r--r--.  1 root root      162 Oct  2 12:28 ssh_host_ecdsa_key.pub
-rw-r-----.  1 root ssh_keys  387 Oct  2 12:28 ssh_host_ed25519_key
-rw-r--r--.  1 root root       82 Oct  2 12:28 ssh_host_ed25519_key.pub
-rw-r-----.  1 root ssh_keys 1.7K Oct  2 12:28 ssh_host_rsa_key
-rw-r--r--.  1 root root      382 Oct  2 12:28 ssh_host_rsa_key.pub
-rw-------.  1 root root     4.4K Oct  2 13:07 sshd_config
-rw-------.  1 root root     3.9K Aug  9 01:40 sshd_config.20191002.orig
[root@osestaging1 ssh]# grep AllowGroups sshd_config
AllowGroups sshaccess
[root@osestaging1 ssh]# grep sshaccess /etc/group
[root@osestaging1 ssh]# groupadd sshaccess
[root@osestaging1 ssh]# gpasswd -a maltfield sshaccess
Adding user maltfield to group sshaccess
[root@osestaging1 ssh]# grep sshaccess /etc/group
sshaccess:x:1001:maltfield
[root@osestaging1 ssh]# systemctl restart sshd
[root@osestaging1 ssh]#

confirmed that I could still ssh-in on the new non-standard port from dev to staging

user@ose:~$ ssh osedev1
Last login: Wed Oct  2 15:13:21 2019 from 5.254.96.225
[maltfield@osedev1 ~]$ ssh maltfield@192.168.122.201 hostname
ssh: connect to host 192.168.122.201 port 22: Connection refused
[maltfield@osedev1 ~]$ ssh -p 32415 maltfield@192.168.122.201 hostname
osestaging1
[maltfield@osedev1 ~]$

I could go on further to setup iptables to block things incoming, but the beauty of the fact that this is a container with a NAT'd private ip address on a host with iptables locked-down on its internet-facing ip address is that we really don't need to do that. It's already inaccessible to the internet, and it will only be accessible from the dev node--onto which our developers will vpn into as a necessary prerequisite to reach this staging node
let's make it so that prod can touch staging; we'll create a cert for openvpn for our prod node, and install it on both our prod & staging nodes. Then we'll update our openvpn config to include the client-to-client option https://openvpn.net/community-resources/how-to/#scope
before continuing, it would be wise to create a snapshot of the staging container

[root@osedev1 ssh]# lxc-snapshot --name osestaging1 --list
No snapshots
[root@osedev1 ssh]# lxc-snapshot --name osestaging1 afterBootstrap
lxc_container: lxccontainer.c: lxcapi_snapshot: 2891 Snapshot of directory-backed container requested.
lxc_container: lxccontainer.c: lxcapi_snapshot: 2892 Making a copy-clone.  If you do want snapshots, then
lxc_container: lxccontainer.c: lxcapi_snapshot: 2893 please create an aufs or overlayfs clone first, snapshot that
lxc_container: lxccontainer.c: lxcapi_snapshot: 2894 and keep the original container pristine.
lxc_container: lxccontainer.c: lxcapi_clone: 2643 error: Original container (osestaging1) is running
lxc_container: lxccontainer.c: lxcapi_snapshot: 2899 clone of /var/lib/lxc:osestaging1 failed
lxc_container: lxc_snapshot.c: do_snapshot: 55 Error creating a snapshot
[root@osedev1 ssh]#

I tried to create a snapshot; it told me that it can't do deltas unless I use overlayfs or aufs (or probably also zfs, butter, etc). It failed probably because the container is not stopped. I stopped it and tried again.

[root@osedev1 ssh]# lxc-snapshot --name osestaging1 afterBootstrap
lxc_container: lxccontainer.c: lxcapi_snapshot: 2891 Snapshot of directory-backed container requested.
lxc_container: lxccontainer.c: lxcapi_snapshot: 2892 Making a copy-clone.  If you do want snapshots, then
lxc_container: lxccontainer.c: lxcapi_snapshot: 2893 please create an aufs or overlayfs clone first, snapshot that
lxc_container: lxccontainer.c: lxcapi_snapshot: 2894 and keep the original container pristine.
[root@osedev1 ssh]# lxc-snapshot --name osestaging1 --list
snap0 (/var/lib/lxcsnaps/osestaging1) 2019:10:02 15:37:58
[root@osedev1 ssh]#

so our container is 0.5G, and so is our 1x snapshot

[root@osedev1 ssh]# du -sh /var/lib/lxcsnaps/*
459M    /var/lib/lxcsnaps/osestaging1
[root@osedev1 ssh]# du -sh /var/lib/lxc/*
459M    /var/lib/lxc/osestaging1
[root@osedev1 ssh]#

eventually we'll need to mount the external block volume to /var/, especially before the sync from pod

[root@osedev1 ssh]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        19G  2.4G   16G  14% /
devtmpfs        873M     0  873M   0% /dev
tmpfs           896M     0  896M   0% /dev/shm
tmpfs           896M   17M  879M   2% /run
tmpfs           896M     0  896M   0% /sys/fs/cgroup
/dev/sdb        9.8G   37M  9.3G   1% /mnt/HC_Volume_3110278
tmpfs           180M     0  180M   0% /run/user/1000
[root@osedev1 ssh]#

as for backups, I created new API keys that have access to only the 'ose-dev-server-backups' bucket.
because randomware is a topic of concern (and where the randomware deletes your backups), I also noticed that when we create the api key, we can remove the 'deleteFiles' and 'deleteBuckets' capabilities (the cleanup is actually done by the storage rules on backblaze's sides--not our script's logic) Apparently there's no way to edit the capabilities of exiting keys, so this would be a non-trivial change.
I wrote the api key creds to osedev1:/root/scripts/backup.settings
And I created a new 4K encryption key. TO make it clearer, I named it 'ose-dev-backups-cron.201910.key'. I added it to the shared ose keepass db under "backups" (files attached are under the "Advanced" tab)
I also installed the b2cli depends to the dev node, unfortunately I hit some issues https://wiki.opensourceecology.org/wiki/Backblaze#Install_CLI

[root@osedev1 backups]# yum install python-virtualenv
...
Installed:
  python-virtualenv.noarch 0:15.1.0-2.el7

Dependency Installed:
  python-devel.x86_64 0:2.7.5-86.el7    python-rpm-macros.noarch 0:3-32.el7  python-srpm-macros.noarch 0:3-32.el7
  python2-rpm-macros.noarch 0:3-32.el7

Dependency Updated:
  python.x86_64 0:2.7.5-86.el7                          python-libs.x86_64 0:2.7.5-86.el7

Complete!
[root@osedev1 backups]# yum install python-setuptools
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu
Package python-setuptools-0.9.8-7.el7.noarch already installed and latest version
Nothing to do
[root@osedev1 backups]# yum install git
...
Installed:
  git.x86_64 0:1.8.3.1-20.el7                                                                                     

Dependency Installed:
  perl-Error.noarch 1:0.17020-2.el7   perl-Git.noarch 0:1.8.3.1-20.el7   perl-TermReadKey.x86_64 0:2.30-20.el7  

Complete!
[root@osedev1 backups]# adduser b2user
[root@osedev1 backups]# sudo su - b2user
[b2user@osedev1 ~]$ mkdir virtualenv
[b2user@osedev1 ~]$ cd virtualenv/
[b2user@osedev1 virtualenv]$ virtualenv .
New python executable in /home/b2user/virtualenv/bin/python
Installing setuptools, pip, wheel...done.
[b2user@osedev1 virtualenv]$ cd ..
[b2user@osedev1 ~]$ mkdir sandbox
[b2user@osedev1 ~]$ cd sandbox/
[b2user@osedev1 sandbox]$ git clone https://github.com/Backblaze/B2_Command_Line_Tool.git
Cloning into 'B2_Command_Line_Tool'...
remote: Enumerating objects: 151, done.
remote: Counting objects: 100% (151/151), done.
remote: Compressing objects: 100% (93/93), done.
remote: Total 7130 (delta 90), reused 102 (delta 55), pack-reused 6979
Receiving objects: 100% (7130/7130), 1.80 MiB | 3.35 MiB/s, done.
Resolving deltas: 100% (5127/5127), done.
[b2user@osedev1 sandbox]$ cd B2_Command_Line_Tool/
[b2user@osedev1 B2_Command_Line_Tool]$ python setup.py install
setuptools 20.2 or later is required. To fix, try running: pip install "setuptools>=20.2"
[b2user@osedev1 B2_Command_Line_Tool]$

I hate using pip; it often breaks the OS and apps installed, but I bit my tounge & proceeded (I wouldn't do this on prod)

[root@osedev1 backups]# yum install python3-setuptools
Installed:
  python3-setuptools.noarch 0:39.2.0-10.el7

Dependency Installed:   
  python3.x86_64 0:3.6.8-10.el7      python3-libs.x86_64 0:3.6.8-10.el7      python3-pip.noarch 0:9.0.3-5.el7

Complete!
[root@osedev1 backups]#
[root@osedev1 backups]# pip install "setuptools>=20.2"
-bash: pip: command not found
[root@osedev1 backups]# yum install python-pip
...
Installed:
  python2-pip.noarch 0:8.1.2-10.el7

Complete!
[root@osedev1 backups]# pip install "setuptools>=20.2"
Collecting setuptools>=20.2
  Downloading https://files.pythonhosted.org/packages/b2/86/095d2f7829badc207c893dd4ac767e871f6cd547145df797ea26baea4e2e/setuptools-41.2.0-py2.py3-none-any.whl (576kB)
	100% || 583kB 832kB/s
Installing collected packages: setuptools
  Found existing installation: setuptools 0.9.8
	Uninstalling setuptools-0.9.8:
	  Successfully uninstalled setuptools-0.9.8
Successfully installed setuptools-41.2.0
You are using pip version 8.1.2, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[root@osedev1 backups]# pip install --upgrade pip
Collecting pip
  Downloading https://files.pythonhosted.org/packages/30/db/9e38760b32e3e7f40cce46dd5fb107b8c73840df38f0046d8e6514e675a1/pip-19.2.3-py2.py3-none-any.whl (1.4MB)
	100% || 1.4MB 511kB/s
Installing collected packages: pip
  Found existing installation: pip 8.1.2
	Uninstalling pip-8.1.2:
	  Successfully uninstalled pip-8.1.2
Successfully installed pip-19.2.3
[root@osedev1 backups]#

when it came time to install it, I had to add the '--user' flag

[b2user@osedev1 B2_Command_Line_Tool]$ python setup.py install --user
...
Installed /home/b2user/.local/lib/python2.7/site-packages/python_dateutil-2.8.0-py2.7.egg
Searching for setuptools==41.2.0
Best match: setuptools 41.2.0
Adding setuptools 41.2.0 to easy-install.pth file
Installing easy_install script to /home/b2user/.local/bin
Installing easy_install-3.6 script to /home/b2user/.local/bin

Using /usr/lib/python2.7/site-packages
Finished processing dependencies for b2==1.4.1
[b2user@osedev1 B2_Command_Line_Tool]$ 
[b2user@osedev1 B2_Command_Line_Tool]$ ^C
[b2user@osedev1 B2_Command_Line_Tool]$  ~/.local/bin/b2 version
b2 command line tool, version 1.4.1
[b2user@osedev1 B2_Command_Line_Tool]$

Maltfield Log/2019 Q4

Contents

See Also