Revision as of 11:51, 28 October 2019

My work log from the year 2019 Quarter 4. I intentionally made this verbose to make future admin's work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.

Wed Oct 25, 2019

now that I've finished a script to automate the sync from prod to staging, I can finally proceed with a POC of Discourse or AskBot
I emailed Marcin asking which was higher priority, which I'll begin next week
Marcin said he's getting 414 request-uri too large issues from wordpress when attempting spam moderation. I checked our nginx config, which uses a 10M limit on 'client_max_body_size' which is 10x the default of 1M.
I responded to our old email chain with Christian from almost 2 months ago asking if he heard back from kiwix regarding our offline zim wiki archive, and I asked if he could write an article about this archive as a howto for users to use it on andorid to publish on osemain
Marcin confirmed: I should work on the Discourse POC next week
I updated my TODO list https://wiki.opensourceecology.org/wiki/OSE_Server#TODO
1. namely, in addition to this Discourse POC, I also need to add 2FA support to our VPN and put together guides for OSE devs to gain access to the VPN and also guides for the OSE sysadmin to grant them access

Tue Oct 24, 2019

Marcin mentioned yesterday that the ajax signup form for osemail on our phplist post on osemain is broken https://www.opensourceecology.org/moving-to-open-source-email-list-software/
looks like it's wordpress wrapping our javascript in paragraph
tags again; I fixed this back January by using the wpautop-control plugin https://wiki.opensourceecology.org/wiki/Maltfield_Log/2019_Q1#Sat_Jan_26.2C_2019
Marcin said he couldn't fix it today by doing a restore (fucking worpdress doesn't do an actual restore; it *still* tries to prase the old content & add paragraph tags??), so he just made an image of the form and linked to the signup page on phplist.opensourceecology.org. That sucks.
I logged into osemain's wordpress wui. Oh, no, the 'wpautop-control' plugin isn't activated anymore. I'm assuming that marcin disabled it when doing some cleanup to debug slowdown after we added the social media and seo plugins https://wiki.opensourceecology.org/wiki/Maltfield_Log/2019_Q3#Mon_Sep_10.2C_2019
I activated the plugin, and I restored to the most recent revision that was made by me. And it worked! https://wiki.opensourceecology.org/wiki/Maltfield_Log/2019_Q3#Mon_Sep_10.2C_2019
...
continuing from yesterday, I need to create a new non-root user (which will have to exist on both staging & production) that I'll both [a] give NOPASSWD sudo access on the staging server only and [b] grant ssh key authorized access to only on the staging server
I named this user stagingsync. On staging, I added an authorized_keys file with the root-owned public key for the 4096-bit passwordless rsa ssh key that I generated on prod yesterday

[root@osestaging1 ~]# adduser stagingsync
[root@osestaging1 ~]# ls -lah /home/stagingsync
total 20K
drwx------.  2 stagingsync stagingsync 4.0K Oct 24 12:12 .
drwxr-xr-x. 14 root        root        4.0K Oct 24 12:12 ..
-rw-r--r--.  1 stagingsync stagingsync   18 Sep  6  2017 .bash_logout
-rw-r--r--.  1 stagingsync stagingsync  193 Sep  6  2017 .bash_profile
-rw-r--r--.  1 stagingsync stagingsync  231 Sep  6  2017 .bashrc
[root@osestaging1 ~]# su - stagingsync
[stagingsync@osestaging1 ~]$ mkdir .ssh
[stagingsync@osestaging1 ~]$ chmod 0700 .ssh
[stagingsync@osestaging1 ~]$ echo "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC4FqRKRYw8qgLqbgfH1Yze+EWQ9wJudNU4+jrHHsatKag3yl90zE557NukZGfIcNP6sFp6+f8VeK0W9g6yhMiAq9wrsS6VrgZw1frjsFaflBaDPwPQb8s5uvj5O6P+9R0jg05t5kiHtkSrgXD7uFXkYbXeUm7xaeQRgOk+0Lt1tnVcT8g+EJDnQ7XlChLd+AXGUCiyRv+kLYCO9014Yd0Q4zlLfpRvHwXgE2gPjJDUqjiVM4SDtCqP1wSSp6JvW+bGAnFKEof/n1MyuYWajicJBijLkooCamI6VY20Qed1mv0V4E/9q2E3eQa/itd/Ai3SiEHxZURl3sVL3MPpKWqX9SG7ygZYIcnfnRah/JRjEkS84drIhdPgvF+W+X8r9i3/jRduP4H5nY9giqQBkchgZ+zixduVsjJk69oaxW3bMsJDH/UfX96gKl4HZaboJecBbKm3ZZi1YKsmAWBl6FdfsLT2FERHxWpb3PUsrfUGza187N9UHnPQESqyhpI0SRd+xMF/nZypDQEv1dSHnl4W/d6iaotZ4/RSMUF+nNHzbL/hjtusnd0f9llaEkc+v0IzRMtL6DB5XMmp9wWVkfE0Mg9qWIaqWgJKu1/wp4GABjpt2T5D2OkksgePWUQgHzXVC7By0I3XoEswFfFV/FTpp4r16lZc36s4dkDGsXT/6Q== root@opensourceecology.org" > .ssh/authorized_keys
[stagingsync@osestaging1 ~]$ chmod 0600 .ssh/authorized_keys 
[stagingsync@osestaging1 ~]$

then, per requirement, I added the stagingsync user to the sshaccess group

[root@osestaging1 ~]# gpasswd -a stagingsync sshaccess Adding user stagingsync to group sshaccess [root@osestaging1 ~]#

I don't want stagingsync to have ssh access to prod (which, without a authorized_keys file on prod, it wouldn't be able to ssh in anyway--but it would be wise anyway to leave it out of the sshaccess group on prod), so I'll *not* do this on prod. because I do want to sync the /etc/groups file from prod to staging, I'll add a step in the sync script that appends ',stagingsync' to the 'sshaccess' line in /etc/groups
cool, it works

[root@opensourceecology ~]# ssh -i /root/.ssh/id_rsa.201910 -p 32415 stagingsync@10.241.189.11 hostname
osestaging1
[root@opensourceecology ~]#

now I added the 'stagingsync' user to have NOPASSWD rights on staging only; note that this will not get overwritten as our rsync command explicitly excludes the sudo config

[root@osestaging1 ~]# tail /etc/sudoers
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom

## Allows members of the users group to shutdown this system
# %users  localhost=/sbin/shutdown -h now

## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)
#includedir /etc/sudoers.d

maltfield       ALL=(ALL)       NOPASSWD: ALL
stagingsync       ALL=(ALL)       NOPASSWD: ALL
[root@osestaging1 ~]#

I'm having issues with connections to staging suddenly failing from other vpn clients (my laptop and the prod server) after some time, even though my connection appears to remain successful. closing & reconnecting re-enables me to access staging.
I inititaed a new rsync using my new script. here's what it looks like now

############
# SETTINGS #
############

STAGING_HOST=10.241.189.11
STAGING_SSH_PORT=32415
SYNC_USERNAME=stagingsync

#########
# RSYNC #
#########

# bwlimit prevents saturating the network on prod
# rsync-path makes a non-root ssh user become root on the staging side
# exclude /home/b2user just saves space & time
# exclude /home/stagingsync because 'stagingsync' should be able to access
#                           staging but not production
# exclude /etc/sudo* as we want 'stagingsync' NOPASSWD on staging, not root

time nice rsync \
		-e "ssh -p ${STAGING_SSH_PORT} -i /root/.ssh/id_rsa.201910" \
		--bwlimit=3000 \
		--numeric-ids \
		--rsync-path="sudo rsync" \
		--exclude=/root \
		--exclude=/run \
		--exclude=/home/b2user/sync* \
		--exclude=/home/stagingsync* \
		--exclude=/etc/sudo* \
		--exclude=/etc/openvpn \
		--exclude=/usr/share/easy-rsa \
		--exclude=/dev \
		--exclude=/sys \
		--exclude=/proc \
		--exclude=/boot/ \
		--exclude=/etc/sysconfig/network* \
		--exclude=/tmp \
		--exclude=/var/tmp \
		--exclude=/etc/fstab \
		--exclude=/etc/mtab \
		--exclude=/etc/mdadm.conf \
		--exclude=/etc/hostname \
		-av \
		--progress \
		/ ${SYNC_USERNAME}@${STAGING_HOST}:/

it works!

[root@opensourceecology bin]# ./syncToStaging.sh 
...
var/www/html/www.opensourceecology.org/htdocs/wp-content/uploads/2019/10/workshop9sm.jpg
	   97794 100%  113.29kB/s    0:00:00 (xfer#4820, to-check=898/518063)

sent 810748552 bytes  received 2196940 bytes  1910565.20 bytes/sec
total size is 41443449279  speedup is 50.98
+ exit 0
[root@opensourceecology bin]#

A double-tap fails, probably because the sync updated /etc/group, removing 'stagingsync' from the 'sshaccess' group

[root@opensourceecology bin]# ./syncToStaging.sh 
+ STAGING_HOST=10.241.189.11
+ STAGING_SSH_PORT=32415
+ SYNC_USERNAME=stagingsync
+ nice rsync -e 'ssh -p 32415 -i /root/.ssh/id_rsa.201910' --bwlimit=3000 --numeric-ids '--rsync-path=sudo rsync' --exclude=/root --exclude=/run '--exclude=/home/b2user/sync*' '--exclude=/home/stagingsync*' '--exclude=/etc/sudo*' --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ '--exclude=/etc/sysconfig/network*' --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf --exclude=/etc/hostname -av --progress / stagingsync@10.241.189.11:/
Permission denied (publickey).
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9]

real    0m0.147s
user    0m0.031s
sys     0m0.007s
+ exit 0
[root@opensourceecology bin]#

I ran this sed command, which I'll add to the script

[root@osestaging1 ~]# grep sshaccess /etc/group
sshaccess:x:1006:cmota,marcin,tgriffing,maltfield,lberezhny,crupp
[root@osestaging1 ~]# sed -i 's/^sshaccess:\(.*\)$/sshaccess:\1,stagingsync/' /etc/group
[root@osestaging1 ~]# grep sshaccess /etc/group
sshaccess:x:1006:cmota,marcin,tgriffing,maltfield,lberezhny,crupp,stagingsync
[root@osestaging1 ~]#

now the second tap works; it wasn't quite as fast as I'd like; it spent a lot of time on mysql, logs, ossec, munin, etc files that changed from just a few minutes ago

[root@opensourceecology bin]# ./syncToStaging.sh 
...
var/www/html/munin/static/zoom.js
		4760 100%  422.59kB/s    0:00:00 (xfer#773, to-check=1002/322356)

sent 61086821 bytes  received 1431809 bytes  454680.95 bytes/sec
total size is 41445400614  speedup is 662.93

real    2m17.019s
user    0m22.964s
sys     0m8.157s
+ exit 0
[root@opensourceecology bin]#

I went to add the sed command to be executed after the rsync but--well--that's a new line with a new connection necessarily. And I can't connect after rsync copied-over the /etc/group file. I'm in a catch-22.
my solution: exclude the rsync of /etc/group, and do it manually with sed piping to the file over ssh

sed 's/^sshaccess:\(.*\)$/sshaccess:\1,stagingsync/' /etc/group | ssh -p32415 -i /root/.ssh/id_rsa.201910 stagingsync@10.241.189.11 'sudo tee /etc/group'

I was also able to fix all the nginx configs by adding this to the script

############
# SETTINGS #
############
...
PRODUCTION_IP1=138.201.84.243
PRODUCTION_IP2=138.201.84.223
PRODUCTION_IPv6='2a01:4f8:172:209e::2'
...
#############
# FUNCTIONS #
#############

runOnStaging () {

		ssh -p ${STAGING_SSH_PORT} -i '/root/.ssh/id_rsa.201910' ${SYNC_USERNAME}@${STAGING_HOST} $1

}
...
##################
# NGINX BINDINGS #
##################

# nginx configs must be updated to bind to our staging server's VPN address
# instead of the prod server's internet-facing IP addresses

# update the listen lines to use the VPN IP
runOnStaging "sudo sed -i 's/${PRODUCTION_IP1}/${STAGING_HOST}/g' /etc/nginx/conf.d/*"
runOnStaging "sudo sed -i 's/${PRODUCTION_IP2}/${STAGING_HOST}/g' /etc/nginx/conf.d/*"

# since the main config file has both listens (for redirecting port 80 to port
# 80 to port 443, we just do it once & comment-out the second one to avoid err
runOnStaging "sudo sed -i 's/${PRODUCTION_IP1}/${STAGING_HOST}/g' /etc/nginx/nginx.conf"
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen ${PRODUCTION_IP2}\(.*\)^\1#listen ${PRODUCTION_IP2}\2^' /etc/nginx/nginx.conf"

# just remove all of ipv6 listens
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen \[${PRODUCTION_IPv6}\(.*\)^\1#listen \[${PRODUCTION_IPv6}\2^' /etc/nginx/nginx.conf"
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen \[${PRODUCTION_IPv6}\(.*\)^\1#listen \[${PRODUCTION_IPv6}\2^' /etc/nginx/conf.d/*"

# since we went from 2 prod IPs to 1 staging IP, we must remove one of the
# default_server entries. We choose to make OSE default & remove it from OBI
runOnStaging "sudo sed -i 's^listen \(.*\) default_server^listen \1^' /etc/nginx/conf.d/www.openbuildinginstitute.org.conf"

because the websites necessarily look exactly the same, I decided to add a quick one-liner to add a 'is_staging' file into the docroot of the vhosts with the contents 'true' on the staging box after the sync. On prod, a GET for '/is_staging' should return a 404.

for docroot in $(find /var/www/html/* -maxdepth 1 -name htdocs -type d); do echo 'true' > "$docroot/is_staging"; done

ok, I finished the sync script! I haven't added it to a cron yet (which I would also have to comment-out on the staging box; super meta), but here's what I got so far

[root@opensourceecology bin]# cat syncToStaging.sh
#!/bin/bash
set -x
################################################################################
# Author:  Michael Altfield <michael at opensourceecology dot org>
# Created: 2019-10-23
# Updated: 2019-10-24
# Version: 0.1
# Purpose: Syncs 99% of the prod node state to staging & staging-ifys it
################################################################################

############
# SETTINGS #
############

STAGING_HOST=10.241.189.11
STAGING_SSH_PORT=32415
SYNC_USERNAME=stagingsync

PRODUCTION_IP1=138.201.84.243
PRODUCTION_IP2=138.201.84.223
PRODUCTION_IPv6='2a01:4f8:172:209e::2'

#############
# FUNCTIONS #
#############

runOnStaging () {

		ssh -p ${STAGING_SSH_PORT} -i '/root/.ssh/id_rsa.201910' ${SYNC_USERNAME}@${STAGING_HOST} $1

}

#########
# RSYNC #
#########

# bwlimit prevents saturating the network on prod
# rsync-path makes a non-root ssh user become root on the staging side
# exclude /home/b2user/sync* just saves space & time
# exclude /home/stagingsync* because 'stagingsync' should be able to access
#                            staging but not production
# exclude /etc/group so 'stagingsync' is in the 'sshaccess' group on staging
#                    but not on prod
# exclude /etc/sudo* as we want 'stagingsync' NOPASSWD on staging, not root

time nice rsync \
		-e "ssh -p ${STAGING_SSH_PORT} -i /root/.ssh/id_rsa.201910" \
		--bwlimit=3000 \
		--numeric-ids \
		--rsync-path="sudo rsync" \
		--exclude=/root \
		--exclude=/run \
		--exclude=/home/b2user/sync* \
		--exclude=/home/stagingsync* \
		--exclude=/etc/sudo* \
		--exclude=/etc/group \
		--exclude=/etc/openvpn \
		--exclude=/usr/share/easy-rsa \
		--exclude=/dev \
		--exclude=/sys \
		--exclude=/proc \
		--exclude=/boot/ \
		--exclude=/etc/sysconfig/network* \
		--exclude=/tmp \
		--exclude=/var/tmp \  
		--exclude=/etc/fstab \
		--exclude=/etc/mtab \ 
		--exclude=/etc/mdadm.conf \
		--exclude=/etc/hostname \
		-av \
		--progress \
		/ ${SYNC_USERNAME}@${STAGING_HOST}:/

##################
# NGINX BINDINGS #
##################

# nginx configs must be updated to bind to our staging server's VPN address
# instead of the prod server's internet-facing IP addresses

# update the listen lines to use the VPN IP
runOnStaging "sudo sed -i 's/${PRODUCTION_IP1}/${STAGING_HOST}/g' /etc/nginx/conf.d/*"
runOnStaging "sudo sed -i 's/${PRODUCTION_IP2}/${STAGING_HOST}/g' /etc/nginx/conf.d/*"

# since the main config file has both listens (for redirecting port 80 to port
# 80 to port 443, we just do it once & comment-out the second one to avoid err
runOnStaging "sudo sed -i 's/${PRODUCTION_IP1}/${STAGING_HOST}/g' /etc/nginx/nginx.conf"
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen ${PRODUCTION_IP2}\(.*\)^\1#listen ${PRODUCTION_IP2}\2^' /etc
/nginx/nginx.conf"

# just remove all of ipv6 listens
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen \[${PRODUCTION_IPv6}\(.*\)^\1#listen \[${PRODUCTION_IPv6}\2^
' /etc/nginx/nginx.conf"
runOnStaging "sudo sed -i 's^\(\s*\)[^#]*listen \[${PRODUCTION_IPv6}\(.*\)^\1#listen \[${PRODUCTION_IPv6}\2^
' /etc/nginx/conf.d/*"

# since we went from 2 prod IPs to 1 staging IP, we must remove one of the
# default_server entries. We choose to make OSE default & remove it from OBI
runOnStaging "sudo sed -i 's^listen \(.*\) default_server^listen \1^' /etc/nginx/conf.d/www.openbuildinginstitute.org.conf"

# finally, restart nginx
runOnStaging "sudo systemctl restart nginx.service"

#########################
# MAKE THE STAGING MARK #
#########################

# we leave a mark so we can test to see if we're looking at staging by doing a
# GET request against '/is_staging'. It should 404 on prod but return 200 on
# staging

runOnStaging 'for docroot in $(sudo find /var/www/html/* -maxdepth 1 -name htdocs -type d); do echo 'true' | sudo tee "$docroot/is_staging"; done'

###################
# OSSEC SILENCING #
###################

# we don't need ossec email alerts from our staging server
runOnStaging "sudo sed -i 's^<email_notification>yes</email_notification>^<email_notification>no</email_notification>^' /var/ossec/etc/ossec.conf"

##################
# CRON DISABLING #
##################

# disable certbot cron
runOnStaging "sudo sed -i 's^\(\s*\)\([^#]\)\(.*\)^\1#\2\3^' /etc/cron.d/letsencrypt"

# disable backups cron
runOnStaging "sudo sed -i 's^\(\s*\)\([^#]\)\(.*\)^\1#\2\3^' /etc/cron.d/backup_to_backblaze"

###############
# USER/GROUPS #
###############

# append ',stagingsync' to the 'sshaccess' line in /etc/groups to permit this
# user to be able to ssh into staging (we don't do this on prod so they can't
# ssh into prod)
sed 's/^sshaccess:\(.*\)$/sshaccess:\1,stagingsync/' /etc/group | ssh -p${STAGING_SSH_PORT} -i /root/.ssh/id_rsa.201910 ${SYNC_USERNAME}@${STAGING_HOST} 'sudo tee /etc/group'

########
# EXIT #
########

# clean exit
exit 0
[root@opensourceecology bin]#

Mon Oct 23, 2019

I updated the wiki documentation on the development server, added an article on the staging server, and added some bits about the /var network block mount and the vpn config
...
it does not appear that I can simply add items to the client's /etc/hosts file or otherwise on a per-ip or per-dns basis. It appears that I can only add a "dhcp-option DNS" item to the server (or client) configs to override the dns server used on the client https://openvpn.net/community-resources/pushing-dhcp-options-to-clients/
so then I can run a dns server on osedev1 which has a few entries for each of our websites, point them to the VPN IP of osestaging1 (10.241.189.11), and defers the rest onto 1.1.1.1 or something.
this question suggests using dnsmasq https://askubuntu.com/questions/885497/openvpn-and-dns
cool, dnsmasq-2.76-9 is already installed on our cent7 osedev1 box. Let's take that low-hanging fruit

[root@osedev1 3]# rpm -qa | grep -i dns
dnsmasq-2.76-9.el7.x86_64
[root@osedev1 3]# cat /etc/redhat-release 
CentOS Linux release 7.6.1810 (Core) 
[root@osedev1 3]#

It also appears to already be running

[root@osedev1 3]# ps -ef | grep dnsmasq
nobody    1346     1  0 Oct22 ?        00:00:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
root      1347  1346  0 Oct22 ?        00:00:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
root     18856 14405  0 13:55 pts/10   00:00:00 grep --color=auto dnsmasq
[root@osedev1 3]#

oh, shit, it appears to be only listening on 192.168.122.1:53 which is our lxc network

[root@osedev1 etc]# ss -plan | grep -i dnsmasq
u_dgr  UNCONN     0      0         * 18757                 * 8150                users:(("dnsmasq",pid=1346,fd=10))
udp    UNCONN     0      0      192.168.122.1:53                    *:*                   users:(("dnsmasq",pid=1346,fd=5))
udp    UNCONN     0      0      *%virbr0:67                    *:*                   users:(("dnsmasq",pid=1346,fd=3))
tcp    LISTEN     0      5      192.168.122.1:53                    *:*                   users:(("dnsmasq",pid=1346,fd=6))
[root@osedev1 etc]# ss -planu | grep -i dnsmasq
UNCONN     0      0      192.168.122.1:53                       *:*                   users:(("dnsmasq",pid=1346,fd=5))
UNCONN     0      0      *%virbr0:67                       *:*                   users:(("dnsmasq",pid=1346,fd=3))
[root@osedev1 etc]#

I could find no entries in the dnsmasq.conf file for the bind address

[root@osedev1 etc]# ls -lah /etc/dnsmasq.*
-rw-r--r--. 1 root root  27K Aug  9 01:12 /etc/dnsmasq.conf

/etc/dnsmasq.d:
total 8.0K
drwxr-xr-x.  2 root root 4.0K Aug  9 01:12 .
drwxr-xr-x. 86 root root 4.0K Oct 23 14:20 ..
[root@osedev1 etc]# grep '192.168.122' /etc/dnsmasq.conf 
[root@osedev1 etc]#

I found two unrelated files that specify this network--unless dnsmasq is somehow configured by libvirt?

[root@osedev1 etc]# grep -irl '192.168.122' /etc
/etc/libvirt/qemu/networks/default.xml
/etc/openvpn/openvpn-status.log
[root@osedev1 etc]# cat /etc/libvirt/qemu/networks/default.xml 
<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
  virsh net-edit default
or other application using the libvirt API.
-->

<network>
  <name>default</name>
  <uuid>a11767e5-cc15-4acd-9443-bbffc220fa4d</uuid>
  <forward mode='nat'/>
  <bridge name='virbr0' stp='on' delay='0'/>
  <mac address='52:54:00:7d:01:71'/>
  <ip address='192.168.122.1' netmask='255.255.255.0'>
	<dhcp>
	  <range start='192.168.122.2' end='192.168.122.254'/>
	</dhcp>
  </ip>
</network>
[root@osedev1 etc]# cat /etc/openvpn/openvpn-status.log 
OpenVPN CLIENT LIST
Updated,Wed Oct 23 14:28:10 2019
Common Name,Real Address,Bytes Received,Bytes Sent,Connected Since
hetzner2,138.201.84.223:34914,8122227,496427,Tue Oct 22 17:48:40 2019
osestaging1,192.168.122.201:51674,2340646,949941,Tue Oct 22 18:07:40 2019
maltfield,27.7.149.58:51080,44891,39735,Wed Oct 23 13:28:30 2019
ROUTING TABLE
Virtual Address,Common Name,Real Address,Last Ref
10.241.189.10,maltfield,27.7.149.58:51080,Wed Oct 23 14:28:09 2019
10.241.189.11,osestaging1,192.168.122.201:51674,Wed Oct 23 13:48:08 2019
GLOBAL STATS
Max bcast/mcast queue length,1
END
[root@osedev1 etc]#

I think it is libvirt; this libvirt guide describes how to avoid conflicts when trying to use a distinct "global" dnsmasq config https://wiki.libvirt.org/page/Libvirtd_and_dnsmasq
I made a backup of the existing /etc/dnsmasq.conf file and added the lines to bind dnsmasq only on tun0 to the config

[root@osedev1 etc]# cp dnsmasq.conf dnsmasq.20191023.orig.conf
[root@osedev1 etc]# vim dnsmasq.conf 
...
[root@osedev1 etc]# tail /etc/dnsmasq.conf 
#conf-dir=/etc/dnsmasq.d,.bak

# Include all files in a directory which end in .conf
#conf-dir=/etc/dnsmasq.d/,*.conf

# Include all files in /etc/dnsmasq.d except RPM backup files
conf-dir=/etc/dnsmasq.d,.rpmnew,.rpmsave,.rpmorig

interface=tun0
bind-interfaces
[root@osedev1 etc]#

And then I verified that the dnsmasq.service is disabled

[root@osedev1 etc]# systemctl list-units | grep -i dns
  unbound-anchor.timer                                                                        loaded active waiting   daily update of the root trust anchor for DNSSEC
[root@osedev1 etc]# systemctl list-unit-files | grep -i dns
chrony-dnssrv@.service                        static  
dnsmasq.service                               disabled
chrony-dnssrv@.timer                          disabled
[root@osedev1 etc]#

I started it

[root@osedev1 etc]# systemctl status dnsmasq.service
● dnsmasq.service - DNS caching server.
   Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
[root@osedev1 etc]# systemctl start dnsmasq.service
[root@osedev1 etc]# systemctl status dnsmasq.service
● dnsmasq.service - DNS caching server.
   Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2019-10-23 14:40:58 CEST; 2s ago
 Main PID: 29666 (dnsmasq)
	Tasks: 1
   CGroup: /system.slice/dnsmasq.service
		   └─29666 /usr/sbin/dnsmasq -k

Oct 23 14:40:58 osedev1 systemd[1]: Started DNS caching server..
Oct 23 14:40:58 osedev1 dnsmasq[29666]: started, version 2.76 cachesize 150
Oct 23 14:40:58 osedev1 dnsmasq[29666]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-...inotify
Oct 23 14:40:58 osedev1 dnsmasq[29666]: reading /etc/resolv.conf
Oct 23 14:40:58 osedev1 dnsmasq[29666]: using nameserver 213.133.100.100#53
Oct 23 14:40:58 osedev1 dnsmasq[29666]: using nameserver 213.133.99.99#53
Oct 23 14:40:58 osedev1 dnsmasq[29666]: using nameserver 213.133.98.98#53
Oct 23 14:40:58 osedev1 dnsmasq[29666]: read /etc/hosts - 6 addresses
Hint: Some lines were ellipsized, use -l to show in full.
[root@osedev1 etc]#

cool, now it looks like it's running on both the 192.168.122 virbr0 lxc network and the 10.241.189 tun0 vpn network

[root@osedev1 etc]# ss -plan | grep -i dnsmasq
u_dgr  UNCONN     0      0         * 20573881              * 8150                users:(("dnsmasq",pid=29666,fd=15))
u_dgr  UNCONN     0      0         * 18757                 * 8150                users:(("dnsmasq",pid=1346,fd=10))
udp    UNCONN     0      0      127.0.0.1:53                    *:*                   users:(("dnsmasq",pid=29666,fd=6))
udp    UNCONN     0      0      10.241.189.1:53                    *:*                   users:(("dnsmasq",pid=29666,fd=4))
udp    UNCONN     0      0      192.168.122.1:53                    *:*                   users:(("dnsmasq",pid=1346,fd=5))
udp    UNCONN     0      0      *%virbr0:67                    *:*                   users:(("dnsmasq",pid=1346,fd=3))
udp    UNCONN     0      0       ::1:53                   :::*                   users:(("dnsmasq",pid=29666,fd=10))
udp    UNCONN     0      0      fe80::fd4a:7df9:169:e7e2%tun0:53                   :::*                   users:(("dnsmasq",pid=29666,fd=8))
tcp    LISTEN     0      5      127.0.0.1:53                    *:*                   users:(("dnsmasq",pid=29666,fd=7))
tcp    LISTEN     0      5      10.241.189.1:53                    *:*                   users:(("dnsmasq",pid=29666,fd=5))
tcp    LISTEN     0      5      192.168.122.1:53                    *:*                   users:(("dnsmasq",pid=1346,fd=6))
tcp    LISTEN     0      5       ::1:53                   :::*                   users:(("dnsmasq",pid=29666,fd=11))
tcp    LISTEN     0      5      fe80::fd4a:7df9:169:e7e2%tun0:53                   :::*                   users:(("dnsmasq",pid=29666,fd=9))
[root@osedev1 etc]# ss -planu | grep -i dnsmasq
UNCONN     0      0      127.0.0.1:53                       *:*                   users:(("dnsmasq",pid=29666,fd=6))
UNCONN     0      0      10.241.189.1:53                       *:*                   users:(("dnsmasq",pid=29666,fd=4))
UNCONN     0      0      192.168.122.1:53                       *:*                   users:(("dnsmasq",pid=1346,fd=5))
UNCONN     0      0      *%virbr0:67                       *:*                   users:(("dnsmasq",pid=1346,fd=3))
UNCONN     0      0          ::1:53                      :::*                   users:(("dnsmasq",pid=29666,fd=10))
UNCONN     0      0      fe80::fd4a:7df9:169:e7e2%tun0:53                      :::*                   users:(("dnsmasq",pid=29666,fd=8))
[root@osedev1 etc]#

cool, from my laptop the 53 udp port on osedev1's vpn address appears to be open. or, uh, filtered?

user@ose:~/openvpn$ sudo nmap -Pn -sU -p53 10.137.0.1

Starting Nmap 7.40 ( https://nmap.org ) at 2019-10-23 18:31 +0545
Nmap scan report for 10.137.0.1 (10.137.0.1)
Host is up.
PORT   STATE         SERVICE
53/udp open|filtered domain

Nmap done: 1 IP address (1 host up) scanned in 2.12 seconds
user@ose:~/openvpn$

nope, fail.

user@ose:~/openvpn$ dig @10.137.0.1 google.com

; <<>> DiG 9.10.3-P4-Debian <<>> @10.137.0.1 google.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
user@ose:~/openvpn$

Indeed, all ports are reported as "filtered"

user@ose:~/openvpn$ nmap -Pn 10.241.189.1

Starting Nmap 7.40 ( https://nmap.org ) at 2019-10-23 18:29 +0545
Nmap scan report for 10.241.189.1 (10.241.189.1)
Host is up.
All 1000 scanned ports on 10.241.189.1 (10.241.189.1) are filtered

Nmap done: 1 IP address (1 host up) scanned in 201.47 seconds
user@ose:~/openvpn$

I bet this is an iptables issues. And, christ, the iptables looks more complex than anything I built; I guess this is libvirt's doing?

[root@osedev1 etc]# iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 2929  205K ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            udp dpt:53
	0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:53
   54 17712 ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            udp dpt:67
	0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:67
  101 15196 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
 107K   15M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED
   17   706 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0           
	4   628 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:32415
	9   804 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW udp dpt:1194
11218  621K DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy ACCEPT 320 packets, 26880 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 7135   30M ACCEPT     all  --  *      virbr0  0.0.0.0/0            192.168.122.0/24     ctstate RELATED,ESTABLISHED
 7756  935K ACCEPT     all  --  virbr0 *       192.168.122.0/24     0.0.0.0/0           
	0     0 ACCEPT     all  --  virbr0 virbr0  0.0.0.0/0            0.0.0.0/0           
	0     0 REJECT     all  --  *      virbr0  0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable
	0     0 REJECT     all  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable

Chain OUTPUT (policy ACCEPT 38266 packets, 6557K bytes)
 pkts bytes target     prot opt in     out     source               destination         
   54 18295 ACCEPT     udp  --  *      virbr0  0.0.0.0/0            0.0.0.0/0            udp dpt:68
[root@osedev1 etc]# 
[root@osedev1 etc]# iptables-save
# Generated by iptables-save v1.4.21 on Wed Oct 23 14:50:44 2019
*mangle
:PREROUTING ACCEPT [136754:47136918]
:INPUT ACCEPT [121507:15710076]
:FORWARD ACCEPT [15247:31426842]
:OUTPUT ACCEPT [38360:6581630]
:POSTROUTING ACCEPT [53607:38008472]
-A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
COMMIT
# Completed on Wed Oct 23 14:50:44 2019
# Generated by iptables-save v1.4.21 on Wed Oct 23 14:50:44 2019
*nat
:PREROUTING ACCEPT [13853:810289]
:INPUT ACCEPT [1821:140336]
:OUTPUT ACCEPT [2275:162484]
:POSTROUTING ACCEPT [2276:162568]
-A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
COMMIT
# Completed on Wed Oct 23 14:50:44 2019
# Generated by iptables-save v1.4.21 on Wed Oct 23 14:50:44 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [320:26880]
:OUTPUT ACCEPT [38306:6563335]
-A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 1194 -j ACCEPT
-A INPUT -j DROP
-A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT
-A FORWARD -i virbr0 -o virbr0 -j ACCEPT
-A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable
-A OUTPUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT
COMMIT
# Completed on Wed Oct 23 14:50:44 2019
[root@osedev1 etc]#

I just added a single line before the drop to permit udp packets to 53 from tun0

[root@osedev1 20191023]# service iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables:[  OK  ]
[root@osedev1 20191023]# iptables-save
# Generated by iptables-save v1.4.21 on Wed Oct 23 15:23:13 2019
*mangle
:PREROUTING ACCEPT [279:24925]
:INPUT ACCEPT [279:24925]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [106:13445]
:POSTROUTING ACCEPT [106:13445]
-A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
COMMIT
# Completed on Wed Oct 23 15:23:13 2019
# Generated by iptables-save v1.4.21 on Wed Oct 23 15:23:13 2019
*nat
:PREROUTING ACCEPT [30:1478]
:INPUT ACCEPT [3:218]
:OUTPUT ACCEPT [4:304]
:POSTROUTING ACCEPT [4:304]
-A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
COMMIT
# Completed on Wed Oct 23 15:23:13 2019
# Generated by iptables-save v1.4.21 on Wed Oct 23 15:23:13 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [106:13445]
-A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 1194 -j ACCEPT
-A INPUT -i tun0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -j DROP
-A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT
-A FORWARD -i virbr0 -o virbr0 -j ACCEPT
-A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable
-A OUTPUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT
COMMIT
# Completed on Wed Oct 23 15:23:13 2019
[root@osedev1 20191023]#

And now it works!

user@ose:~/openvpn$ dig @10.241.189.1 michaelaltfield.net

; <<>> DiG 9.10.3-P4-Debian <<>> @10.241.189.1 michaelaltfield.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34648
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;michaelaltfield.net.		IN	A

;; ANSWER SECTION:
michaelaltfield.net.	3554	IN	A	176.56.237.113

;; Query time: 148 msec
;; SERVER: 10.241.189.1#53(10.241.189.1)
;; WHEN: Wed Oct 23 19:07:28 +0545 2019
;; MSG SIZE  rcvd: 64

user@ose:~/openvpn$

now let's see i I can hardcode www.opensourceecology.org. By default, it returns the internet ip address of our prod server per our public dns records

user@ose:~/openvpn$ dig @10.241.189.1 www.opensourceecology.org

; <<>> DiG 9.10.3-P4-Debian <<>> @10.241.189.1 www.opensourceecology.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40391
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.opensourceecology.org.	IN	A

;; ANSWER SECTION:
www.opensourceecology.org. 120	IN	A	138.201.84.243

;; Query time: 214 msec
;; SERVER: 10.241.189.1#53(10.241.189.1)
;; WHEN: Wed Oct 23 19:09:07 +0545 2019
;; MSG SIZE  rcvd: 70

user@ose:~/openvpn$

The dnsmasq.conf config says that it reads from /etc/hosts, so I just added a line to osedev1:/etc/hosts

[root@osedev1 20191023]# tail /etc/hosts
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4

# The following lines are desirable for IPv6 capable hosts
::1 osedev1 osedev1
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

# staging
10.241.189.11 www.opensourceecology.org
[root@osedev1 20191023]#

I tried the query again, but I still got the 138.201.84.243 address

user@ose:~/openvpn$ dig @10.241.189.1 www.opensourceecology.org

; <<>> DiG 9.10.3-P4-Debian <<>> @10.241.189.1 www.opensourceecology.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62221
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.opensourceecology.org.	IN	A

;; ANSWER SECTION:
www.opensourceecology.org. 87	IN	A	138.201.84.243

;; Query time: 158 msec
;; SERVER: 10.241.189.1#53(10.241.189.1)
;; WHEN: Wed Oct 23 19:11:46 +0545 2019
;; MSG SIZE  rcvd: 70

user@ose:~/openvpn$

I gave dnsmasq a restart (maybe caching issue?)

[root@osedev1 20191023]# service dnsmasq restart
Redirecting to /bin/systemctl restart dnsmasq.servic
[root@osedev1 20191023]#

And I tried again; It worked this time!

user@ose:~/openvpn$ dig @10.241.189.1 www.opensourceecology.org

; <<>> DiG 9.10.3-P4-Debian <<>> @10.241.189.1 www.opensourceecology.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34890
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.opensourceecology.org.	IN	A

;; ANSWER SECTION:
www.opensourceecology.org. 0	IN	A	10.241.189.11

;; Query time: 146 msec
;; SERVER: 10.241.189.1#53(10.241.189.1)
;; WHEN: Wed Oct 23 19:11:56 +0545 2019
;; MSG SIZE  rcvd: 70

user@ose:~/openvpn$

cool, so now to push that option in the vpn; I added the "push dhcp-option" line to /etc/openvpn/server/server.conf

push "dhcp-option DNS 10.241.189.1"

I reconnected to the vpn from my laptop, but there were no changes to my /etc/resolv.conf. I tried to restart the openvpn server on osedev1

[root@osedev1 server]# systemctl restart openvpn@server.service
[root@osedev1 server]#

I still have no changes on my resolv.conf, but I do see the option in the output of the client

Wed Oct 23 19:20:42 2019 PUSH: Received control message: 'PUSH_REPLY,dhcp-option DNS 10.241.189.1,route 10.241.189.0 255.255.255.0,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 2,cipher AES-256-GCM'

ah, fuck, apparently the linux client of openvpn doesn't support the dhcp-option push https://unix.stackexchange.com/questions/201946/how-to-define-dns-server-in-openvpn
this archlinux wiki has a solution for linux, but the location of the scripts pull-resolv-conf are in a distinct location on centos https://wiki.archlinux.org/index.php/OpenVPN#DNS

[root@osedev1 server]# find / | grep -i pull-resolv-conf
/usr/share/doc/openvpn-2.4.7/contrib/pull-resolv-conf
/usr/share/doc/openvpn-2.4.7/contrib/pull-resolv-conf/client.down
/usr/share/doc/openvpn-2.4.7/contrib/pull-resolv-conf/client.up
...

these scripts actually have to live client-side, though. My client is debian-9. It doesn't have the 'pull-resolve-conf' scripts on it. But it does have 'update-resolv-conf' and 'systemd-resolved'. The latter isn't openvpn-specific, however. I think I should use '/etc/openvpn/update-resolv-conf'

root@ose:~# find / | grep -i pull-resolv-conf
root@ose:~# find / | grep -i resolv-conf
/etc/openvpn/update-resolv-conf
root@ose:~# find / | grep -i systemd-resolved
/usr/share/man/man8/systemd-resolved.service.8.gz
/usr/share/man/man8/systemd-resolved.8.gz
/lib/systemd/system/systemd-resolved.service.d
/lib/systemd/system/systemd-resolved.service.d/resolvconf.conf
/lib/systemd/system/systemd-resolved.service
/lib/systemd/systemd-resolved
root@ose:~# cat /etc/issue
Debian GNU/Linux 9 \n \l

root@ose:~# ls -lah /etc/openvpn/update-resolv-conf 
-rwxr-xr-x 1 root root 1.3K Oct 15  2018 /etc/openvpn/update-resolv-conf
root@ose:~#

I added the needful to my client.conf file, but it didn't do anything when I reconnected to the vpn

root@ose:/home/user/openvpn# tail client.conf
# Silence repeating messages
;mute 20

# hardening
tls-cipher TLS-DHE-RSA-WITH-AES-256-GCM-SHA384

# dns for staging
script-security 2
up /etc/openvpn/update-resolv-conf
down /etc/openvpn/update-resolv-conf
root@ose:/home/user/openvpn#

well, the first non-commented line in that script is to check for the existance of /sbin/resolvconf and `exit 0` if it doesn't exist. Yeah, it doesn't exist.

root@ose:/home/user/openvpn# grep resolvconf /etc/openvpn/update-resolv-conf 
# Used snippets of resolvconf script by Thomas Hood and Chris Hanson.
[ -x /sbin/resolvconf ] || exit 0
	echo -n "$R" | /sbin/resolvconf -a "${dev}.openvpn"
	/sbin/resolvconf -d "${dev}.openvpn"
root@ose:/home/user/openvpn# ls -lah /sbin/resolvconf
ls: cannot access '/sbin/resolvconf': No such file or directory
root@ose:/home/user/openvpn#

per the archlinux guide linked above, I installed the 'openresolv' package from apt-get. This time it worked!

user@ose:~/openvpn$ sudo openvpn client.conf
...
Wed Oct 23 19:43:00 2019 PUSH: Received control message: 'PUSH_REPLY,dhcp-option DNS 10.241.189.1,route 10.241.189.0 255.255.255.0,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 2,cipher AES-256-GCM'
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: timers and/or timeouts modified
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: --ifconfig/up options modified
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: route options modified
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: --ip-win32 and/or --dhcp-option options modified
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: peer-id set
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Wed Oct 23 19:43:00 2019 OPTIONS IMPORT: data channel crypto options modified
Wed Oct 23 19:43:00 2019 Data Channel Encrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Wed Oct 23 19:43:00 2019 Data Channel Decrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Wed Oct 23 19:43:00 2019 ROUTE_GATEWAY 10.137.0.6
Wed Oct 23 19:43:00 2019 TUN/TAP device tun0 opened
Wed Oct 23 19:43:00 2019 TUN/TAP TX queue length set to 100
Wed Oct 23 19:43:00 2019 do_ifconfig, tt->did_ifconfig_ipv6_setup=0
Wed Oct 23 19:43:00 2019 /sbin/ip link set dev tun0 up mtu 1500
Wed Oct 23 19:43:00 2019 /sbin/ip addr add dev tun0 local 10.241.189.10 peer 10.241.189.9
Wed Oct 23 19:43:00 2019 /etc/openvpn/update-resolv-conf tun0 1500 1552 10.241.189.10 10.241.189.9 init
dhcp-option DNS 10.241.189.1
Too few arguments.
Wed Oct 23 19:43:00 2019 /sbin/ip route add 10.241.189.0/24 via 10.241.189.9
Wed Oct 23 19:43:00 2019 Initialization Sequence Completed

And my laptop's new resolv.conf file

user@ose:~/openvpn$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 10.241.189.1
user@ose:~/openvpn$

I refreshed the 'www.opensourceecology.org' page on my browser, and--boom--it's now showing staging! Success!!1one
now, I finished adding the other hostnames to osedev1:/etc/hosts. Unfortunately, this will have to be updated as-needed in the future

[root@osedev1 pull-resolv-conf]# tail /etc/hosts
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4

# The following lines are desirable for IPv6 capable hosts
::1 osedev1 osedev1
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

# staging
10.241.189.11 www.opensourceecology.org opensourceecology.org awstats.opensourceecology.org fef.opensourceecology.org forum.opensourceecology.org microfactory.opensourceecology.org munin.opensourceecology.org opensourceecology.org oswh.opensourceecology.org phplist.opensourceecology.org store.opensourceecology.org wiki.opensourceecology.org awstats.openbuildinginstitute.org openbuildinginstitute.org seedhome.openbuildinginstitute.org www.openbuildinginstitute.org
[root@osedev1 pull-resolv-conf]#

I restarted dnsmasq and attempted to test www.openbuildinginstitute.org. Well, it kinda worked. It pointed to the staging server--which has an expired certificate. This means that I need to do another sync & automate this nginx config sed process. But it also means that I need to somehow kill the certbot cron on staging
...
meanwhile, I logged-into backblaze b2 to check the status of our backups of the dev node
first of all the prod 'ose-sever-backups bucket has 19 files totaling to 300G. One file appears to be uploding at the moment. There's two from 2018-11 & 2018-12 at <20M, but the others vary in size from 17.5G - 18.4G.
as for the new dev-specific 'ose-dev-server-backups' bucket, there's 0 fucking files
I kicked-off a backup; it completed relatively fast. There were no obvious errors during the upload, but the file is not visible on the wui

INFO: moving encrypted backup file to b2user's sync dir
INFO: Beginning upload to backblaze b2
URL by file name: https://f001.backblazeb2.com/file/ose-dev-server-backups/daily_osedev1_20191023_144309.tar.gpg
URL by fileId: https://f001.backblazeb2.com/b2api/v2/b2_download_file_by_id?fileId=4_z2675c17c55dd1d696edd0118_f10281b8779570cee_d20191023_m144325_c001_v0001130_t0041
{
  "action": "upload", 
  "fileId": "4_z2675c17c55dd1d696edd0118_f10281b8779570cee_d20191023_m144325_c001_v0001130_t0041", 
  "fileName": "daily_osedev1_20191023_144309.tar.gpg", 
  "size": 18465051, 
  "uploadTimestamp": 1571841805000
}

real    0m27.979s
user    0m1.037s
sys     0m0.321s
[root@osedev1 backups]# 
[root@osedev1 backups]# ./backup.sh

the last upload appears to be from 20 days ago

[root@osedev1 backups]# ls -lah /home/b2user/sync
total 18M
drwxr-xr-x. 2 root   root   4.0K Oct 23 16:43 .
drwx------. 8 b2user b2user 4.0K Oct 23 16:43 ..
-rw-r--r--. 1 b2user root    18M Oct 23 16:43 daily_osedev1_20191023_144309.tar.gpg
[root@osedev1 backups]# ls -lah /home/b2user/sync.old
total 17M
drwxr-xr-x. 2 root   root   4.0K Oct  3 07:24 .
drwx------. 8 b2user b2user 4.0K Oct 23 16:43 ..
-rw-r--r--. 1 b2user root    17M Oct  3 07:24 daily_osedev1_20191003_052448.tar.gpg
[root@osedev1 backups]#

the cron job looks good

[root@osedev1 backups]# cat /etc/cron.d/backup_to_backblaze
20 07 * * * root time /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log
20 04 03 * * root time /bin/nice /root/backups/backupReport.sh
[root@osedev1 backups]#

but the logging dir doesn't exist; I created it

[root@osedev1 backups]# ls -lah /var/log/backups
ls: cannot access /var/log/backups: No such file or directory
[root@osedev1 backups]# mkdir /var/log/backups
[root@osedev1 backups]#

actually, after some time, the b2 wui now shows the files I just uploaded; totalling to 36.9M. Wasn't the dev server in a broken state recently? That's probably what happened..
well, I'll follow-up in a few days. Hopefully it'll be stable for ~10 days through the monthly backup on 2019-11-01, which will have a 1-year retention time.
..
ok, back to the sync. First, I fixed the hostname of the staging node so I don't do the sync the wrong way (!)

[root@opensourceecology ~]# vim /etc/hostname
[root@opensourceecology ~]# cat /etc/hostname
osestaging1
[root@opensourceecology ~]# 
[root@opensourceecology ~]# hostname osestaging1
[root@opensourceecology ~]# exit
logout
[maltfield@osestaging1 ~]$

oh, shit, weird. I went to ssh into the prod server using `ssh opensourceecology.org`, but it ssh'd into staging because of the new dns changes. I fixed this by updating my .ssh/config file for the 'oseprod' Host line

user@ose:~$ head .ssh/config
# OSE
Host oseprod
	HostName 138.201.84.243
	Port 32415
	ForwardAgent yes
	IdentityFile /home/user/.ssh/id_rsa.ose
	User maltfield

Host osedev1
	HostName 195.201.233.113
user@ose:~$ 
user@ose:~$ ssh oseprod
Last login: Wed Oct 23 15:01:19 2019 from 116.75.124.97
[maltfield@opensourceecology ~]$

so I think I should put this sync & sed process into a script that lives on prod. This was the last command I see executed in screen on prod

[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --numeric-ids --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync* --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

In order to automate this, I'll also need to give root an ssh key that lives on prod and has the ability to ssh into the staging node as some sync user which has NOPASSWD sudo rights. Of course, I do *not* want *any* such config that permits someone to do such a thing to our prod node but to grant prod access to staging in this way seems fair enough. If someone gains this locked-down key file from the prod server, we have bigger problems..
I created a new script for this & locked it down

[root@opensourceecology bin]# date
Wed Oct 23 15:10:06 UTC 2019
[root@opensourceecology bin]# pwd
/root/bin
[root@opensourceecology bin]# ls -lah syncToStaging.sh 
-rwx------ 1 root root 469 Oct 23 15:09 syncToStaging.sh
[root@opensourceecology bin]# cat syncToStaging.sh 
#!/bin/bash
set -x
################################################################################
# Author:  Michael Altfield <michael at opensourceecology dot org>
# Created: 2019-10-23
# Updated: 2019-10-23
# Version: 0.1
# Purpose: Syncs 99% of the prod node state to staging & staging-ifys it
################################################################################

############
# SETTINGS #
############

########
# EXIT #
########

# clean exit
exit 0
[root@opensourceecology bin]#

There is an existing rsa key for the root user on our prod server, but it's only 2048-bits. I think this was used to auth to our dreamhost server for scp-ing backups back in the day. In any case, it's too small; I generated a new one. Note that this key should only be used for ssh-ing into the staging server as a non-root (on the staging server). It should *not* be used to ssh into the prod server. And, of course, we should *never* allow root to ssh into any server anywhere. Oh, and, the staging server is also not exposed on the Internet; it's only accessible behind the VPN..

[root@opensourceecology bin]# ssh-keygen -lf /root/.ssh/id_rsa.pub 
2048 SHA256:/LpjdDSJFVAt0a4d2PM3fWu7ci3VVwqQT0UxobZel2s root@CentOS-72-64-minimal (RSA)
[root@opensourceecology bin]# ssh-keygen -t rsa -b 4096 -o -a 100
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): /root/.ssh/id_rsa.201910
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.201910.
Your public key has been saved in /root/.ssh/id_rsa.201910.pub.
...
[root@opensourceecology bin]# ls -lah /root/.ssh/id_rsa.201910*
-rw------- 1 root root 3.4K Oct 23 15:27 /root/.ssh/id_rsa.201910
-rw-r--r-- 1 root root  752 Oct 23 15:27 /root/.ssh/id_rsa.201910.pub
[root@opensourceecology bin]#

now I need a non-root user (which will have to exist on both staging & production) that I'll both [a] give NOPASSWD sudo access on the staging server only and [b] grant ssh key authorized access to only on the staging server

Mon Oct 21, 2019

earlier this month a critical vulnerability was fixed in sudo 1.8.28 https://www.sudo.ws/alerts/minus_1_uid.html
I configured this server to auto-update security-related updates, but I didn't see any changes to `sudo` since I've been away. I *did* see updates to nginx, but why didn't sudo update. Indeed, it's stuck at 1.8.19p2-11

[root@opensourceecology ~]# rpm -qa | grep -i sudo
sudo-1.8.19p2-11.el7_4.x86_64
[root@opensourceecology ~]#

fortunately the issue is an edge-case that doesn't affect us, specifically when the sudo config is setup to allow a defined user to run a defined command as any user except root https://access.redhat.com/security/cve/cve-2019-14287
the fucking redhat solution is to fix your config; not to update sudo. A check-update run shows there *is* a newer version of sudo available

	   upgrade
[root@opensourceecology ~]# yum check-update sudo
Loaded plugins: fastestmirror, replace
Loading mirror speeds from cached hostfile
 * base: mirror.checkdomain.de
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.checkdomain.de
 * webtatic: uk.repo.webtatic.com

sudo.x86_64                          1.8.23-4.el7                           base
[root@opensourceecology ~]#

it looks like the '--changelog' arg to `rpm` only shows changes for what's installed, not prospective updates. So I updated

[root@opensourceecology ~]# yum install sudo
Loaded plugins: fastestmirror, replace
Loading mirror speeds from cached hostfile
 * base: mirror.checkdomain.de
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.checkdomain.de
 * webtatic: uk.repo.webtatic.com
Resolving Dependencies
--> Running transaction check
---> Package sudo.x86_64 0:1.8.19p2-11.el7_4 will be updated
---> Package sudo.x86_64 0:1.8.23-4.el7 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package        Arch             Version                   Repository      Size
================================================================================
Updating:
 sudo           x86_64           1.8.23-4.el7              base           841 k

Transaction Summary
================================================================================
Upgrade  1 Package

Total download size: 841 k
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
sudo-1.8.23-4.el7.x86_64.rpm                               | 841 kB   00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : sudo-1.8.23-4.el7.x86_64                                     1/2
warning: /etc/sudoers created as /etc/sudoers.rpmnew
  Cleanup    : sudo-1.8.19p2-11.el7_4.x86_64                                2/2
  Verifying  : sudo-1.8.23-4.el7.x86_64                                     1/2
  Verifying  : sudo-1.8.19p2-11.el7_4.x86_64                                2/2

Updated:
  sudo.x86_64 0:1.8.23-4.el7

Complete!
[root@opensourceecology ~]#

apparently the update doesn't patch his bug. ugh, I'm loosing faith in cent/rhel over debian..

[root@opensourceecology ~]# rpm -q --changelog sudo | head
* Wed Feb 20 2019 Radovan Sroka <rsroka@redhat.com> 1.8.23-4
- RHEL-7.7 erratum
  Resolves: rhbz#1672876 - Backporting sudo bug with expired passwords
  Resolves: rhbz#1665285 - Problem with sudo-1.8.23 and 'who am i'

* Mon Sep 24 2018 Daniel Kopecek <dkopecek@redhat.com> 1.8.23-3
- RHEL-7.6 erratum
  Resolves: rhbz#1547974 - Rebase sudo to latest stable upstream version

* Fri Sep 21 2018 Daniel Kopecek <dkopecek@redhat.com> 1.8.23-2
[root@opensourceecology ~]#

well, that's all I can do for now on sudo
regarding the package that *did* update, I got an email from ossec on changed packages two days ago on Oct 20th, and the checksums changing to the binaries

OSSEC HIDS Notification.
2019 Oct 20 04:39:44

Received From: opensourceecology->/var/log/messages
Rule: 2932 fired (level 7) -> "New Yum package installed."
Portion of the log(s):

Oct 20 04:39:42 opensourceecology yum[29637]: Installed: nginx.x86_64 1:1.16.1-1.el7

the changelog shows a sec update from 2 months ago. why so delayed?

[root@opensourceecology ~]# rpm -q --changelog nginx | head
* Sun Sep 15 2019 Warren Togami <warren@blockstream.com>
- add conditionals for EPEL7, see rhbz#1750857

* Tue Aug 13 2019 Jamie Nguyen <jamielinux@fedoraproject.org> - 1:1.16.1-1
- Update to upstream release 1.16.1
- Fixes CVE-2019-9511, CVE-2019-9513, CVE-2019-9516

* Thu Jul 25 2019 Fedora Release Engineering <releng@fedoraproject.org> - 1:1.16.0-5
- Rebuilt for https://fedoraproject.org/wiki/Fedora_31_Mass_Rebuild

[root@opensourceecology ~]#

the yum-cron package is responsible for updating security packages; it's kicked-off daily

[root@opensourceecology log]# ls -lah /etc/cron.daily/0yum-daily.cron 
-rwxr-xr-x 1 root root 332 Aug  5  2017 /etc/cron.daily/0yum-daily.cron
[root@opensourceecology log]# cat /etc/cron.daily/0yum-daily.cron 
#!/bin/bash

# Only run if this flag is set. The flag is created by the yum-cron init
# script when the service is started -- this allows one to use chkconfig and
# the standard "service stop|start" commands to enable or disable yum-cron.
if  ! -f /var/lock/subsys/yum-cron ; then
  exit 0
fi

# Action!
exec /usr/sbin/yum-cron
[root@opensourceecology log]#

the logs show that it was only updated in Oct 20

[root@opensourceecology log]# grep -ir nginx yum.log
May 26 06:30:47 Updated: nginx-filesystem.noarch 1:1.12.2-3.el7
May 26 06:30:47 Updated: nginx-mod-http-perl.x86_64 1:1.12.2-3.el7
May 26 06:30:47 Updated: nginx-mod-mail.x86_64 1:1.12.2-3.el7
May 26 06:30:47 Updated: nginx-mod-stream.x86_64 1:1.12.2-3.el7
May 26 06:30:47 Updated: nginx-mod-http-image-filter.x86_64 1:1.12.2-3.el7
May 26 06:30:48 Updated: nginx-mod-http-xslt-filter.x86_64 1:1.12.2-3.el7
May 26 06:30:48 Updated: nginx-mod-http-geoip.x86_64 1:1.12.2-3.el7
May 26 06:30:48 Updated: nginx-all-modules.noarch 1:1.12.2-3.el7
May 26 06:30:48 Updated: nginx.x86_64 1:1.12.2-3.el7
Oct 20 04:39:42 Updated: nginx-filesystem.noarch 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-mod-mail.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-mod-http-image-filter.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-mod-stream.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-mod-http-xslt-filter.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Installed: nginx.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-mod-http-perl.x86_64 1:1.16.1-1.el7
Oct 20 04:39:42 Updated: nginx-all-modules.noarch 1:1.16.1-1.el7
Oct 20 04:39:42 Erased: nginx-mod-http-geoip
[root@opensourceecology log]#

and the yum-cron config looks sane

[root@opensourceecology log]# head /etc/yum/yum-cron.conf 
[commands]
#  What kind of update to use:
# default                            = yum upgrade
# security                           = yum --security upgrade
# security-severity:Critical         = yum --sec-severity=Critical upgrade
# minimal                            = yum --bugfix update-minimal
# minimal-security                   = yum --security update-minimal
# minimal-security-severity:Critical =  --sec-severity=Critical update-minimal
update_cmd = minimal-security

[root@opensourceecology log]#

I still don't understand why it was delayed, but everything seems to be setup properly..
...
anyway, returning to the dev/staging server setup; it looks like I can't VPN into our dev server anymore

user@ose:~/openvpn$ Tue Oct 22 21:24:32 2019 OpenVPN 2.4.0 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Oct 14 2018
Tue Oct 22 21:24:32 2019 library versions: OpenSSL 1.0.2t  10 Sep 2019, LZO 2.08
Enter Private Key Password: *
Tue Oct 22 21:24:35 2019 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Tue Oct 22 21:24:35 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Tue Oct 22 21:24:35 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Tue Oct 22 21:24:35 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Tue Oct 22 21:24:35 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Tue Oct 22 21:24:35 2019 UDP link local: (not bound)
Tue Oct 22 21:24:35 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Tue Oct 22 21:25:35 2019 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Tue Oct 22 21:25:35 2019 TLS Error: TLS handshake failed
Tue Oct 22 21:25:35 2019 SIGUSR1[soft,tls-error] received, process restarting
Tue Oct 22 21:25:35 2019 Restart pause, 5 second(s)

And I can't ping the server either

user@ose:~$ ping 195.201.233.113
PING 195.201.233.113 (195.201.233.113) 56(84) bytes of data.
^C
--- 195.201.233.113 ping statistics ---
104 packets transmitted, 0 received, 100% packet loss, time 105449ms

user@ose:~$

and ssh fails

user@ose:~$ ssh -vvvv osedev1
OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2t  10 Sep 2019
debug1: Reading configuration data /home/user/.ssh/config
debug1: /home/user/.ssh/config line 8: Applying options for osedev1
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: resolving "195.201.233.113" port 32415
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to 195.201.233.113 [195.201.233.113] port 32415.
debug1: connect to address 195.201.233.113 port 32415: Connection timed out
ssh: connect to host 195.201.233.113 port 32415: Connection timed out
user@ose:~$

logging into the hetzner cloud console shows that the box is online and sitting on the login screen. I tried to login, but after typing the username it freezes. Now my dev node is acting like my damn staging node was.
I gave the dev server a reboot
after a few minutes, I could ssh-in.
and I could VPN-in as well.
now when I start the staging container, I still get timeout issues

opensourceecology login: maltfield
Password:
login: timed out after 60 seconds

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

opensourceecology login:

it's worth noting that systemd-journal is chewing up >90% of the CPU on the host osedev1 server
I added these 2x lines to the lxc container's config file per https://serverfault.com/questions/658052/systemd-journal-in-debian-jessie-lxc-container-eats-100-cpu

lxc.autodev = 1
lxc.kmsg = 0

I stopped the container & started it again; this time systemd on the host was <10% CPU usage, and I was able to login without any delay!
I became root too, and that worked great!
I had issues with ssh-ing in from my laptop, but after I disconnected from the VPN and reconnected, I was able to ssh into osestaging1 from my laptop!
this time I was also able to become root and poke around at our new, shiny production clone server, cool!

user@ose:~/openvpn$ ssh -vvvvv osestaging1
...
Last login: Tue Oct 22 16:08:03 2019
[maltfield@opensourceecology ~]$ sudo su -
Last login: Tue Oct 22 16:09:19 UTC 2019 on lxc/console
[root@opensourceecology ~]# ls -lah /var/www/html/ | head
total 100K
drwxr-xr-x. 25 root       root   4.0K Apr  9  2019 .
drwxr-xr-x.  5 root       root   4.0K Aug 23  2017 ..
d---r-x---.  3 not-apache apache 4.0K Aug  8  2018 3dp.opensourceecology.org
drwxr-xr-x.  3 root       root   4.0K Dec 24  2017 awstats.openbuildinginstitute.org
drwxr-xr-x.  3 root       root   4.0K Feb  9  2018 awstats.opensourceecology.org
drwxr-xr-x.  2 root       root   4.0K Mar  2  2018 cacti.opensourceecology.org.old
drwxr-xr-x.  3 apache     apache 4.0K Feb  9  2018 certbot
d---r-x---.  3 not-apache apache 4.0K Aug  7  2018 d3d.opensourceecology.org
d---r-x---.  3 not-apache apache 4.0K Apr  9  2019 fef.opensourceecology.org
[root@opensourceecology ~]#

ss shows that varnish & apache are listening

[root@opensourceecology ~]# ss -plan | grep -i LISTEN
u_str  LISTEN     0      100    private/proxymap 183064                * 0                   users:(("master",pid=782,fd=49))
u_str  LISTEN     0      100    public/pickup 183032                * 0                   users:(("pickup",pid=791,fd=6),("master",pid=782,fd=17))
u_str  LISTEN     0      100    public/cleanup 183036                * 0                   users:(("master",pid=782,fd=21))
u_str  LISTEN     0      100    public/qmgr 183039                * 0                   users:(("qmgr",pid=792,fd=6),("master",pid=782,fd=24))
u_str  LISTEN     0      100    private/tlsmgr 183043                * 0                   users:(("master",pid=782,fd=28))
u_str  LISTEN     0      100    private/rewrite 183046                * 0                   users:(("master",pid=782,fd=31))
u_str  LISTEN     0      100    private/bounce 183049                * 0                   users:(("master",pid=782,fd=34))
u_str  LISTEN     0      100    private/defer 183052                * 0                   users:(("master",pid=782,fd=37))
u_str  LISTEN     0      100    private/trace 183055                * 0                   users:(("master",pid=782,fd=40))
u_str  LISTEN     0      128    /run/systemd/private 174128                * 0                   users:(("systemd",pid=1,fd=12))
u_str  LISTEN     0      128    /run/lvm/lvmpolld.socket 174135                * 0                   users:(("systemd",pid=1,fd=20))
u_str  LISTEN     0      128    /run/lvm/lvmetad.socket 174138                * 0                   users:(("lvmetad",pid=24,fd=3),("systemd",pid=1,fd=21))
u_str  LISTEN     0      128    /run/systemd/journal/stdout 174140                * 0                   users:(("systemd-journal",pid=18,fd=3),("systemd",pid=1,fd=22))
u_str  LISTEN     0      100    private/verify 183058                * 0                   users:(("master",pid=782,fd=43))
u_str  LISTEN     0      128    /tmp/ssh-bd3GlfYKNm/agent.1751 223092                * 0                   users:(("sshd",pid=1751,fd=9))
u_str  LISTEN     0      100    private/retry 183082                * 0                   users:(("master",pid=782,fd=67))
u_str  LISTEN     0      50     /var/lib/mysql/mysql.sock 187559                * 0                   users:(("mysqld",pid=1011,fd=14))
u_str  LISTEN     0      100    private/discard 183085                * 0                   users:(("master",pid=782,fd=70))
u_str  LISTEN     0      100    public/flush 183061                * 0                   users:(("master",pid=782,fd=46))
u_str  LISTEN     0      100    private/local 183088                * 0                   users:(("master",pid=782,fd=73))
u_str  LISTEN     0      100    private/virtual 183091                * 0                   users:(("master",pid=782,fd=76))
u_str  LISTEN     0      100    private/lmtp 183094                * 0                   users:(("master",pid=782,fd=79))
u_str  LISTEN     0      100    private/anvil 183097                * 0                   users:(("master",pid=782,fd=82))
u_str  LISTEN     0      100    private/scache 183100                * 0                   users:(("master",pid=782,fd=85))
u_str  LISTEN     0      100    private/proxywrite 183067                * 0                   users:(("master",pid=782,fd=52))
u_str  LISTEN     0      100    private/smtp 183070                * 0                   users:(("master",pid=782,fd=55))
u_str  LISTEN     0      100    private/relay 183073                * 0                   users:(("master",pid=782,fd=58))
u_str  LISTEN     0      100    public/showq 183076                * 0                   users:(("master",pid=782,fd=61))
u_str  LISTEN     0      100    private/error 183079                * 0                   users:(("master",pid=782,fd=64))
u_str  LISTEN     0      10     /var/run/acpid.socket 176097                * 0                   users:(("acpid",pid=48,fd=5))
u_str  LISTEN     0      128    /var/run/dbus/system_bus_socket 175844                * 0                   users:(("dbus-daemon",pid=51,fd=3),("systemd",pid=1,fd=31))
tcp    LISTEN     0      128    127.0.0.1:8000                  *:*                   users:(("httpd",pid=520,fd=3),("httpd",pid=519,fd=3),("httpd",pid=518,fd=3),("httpd",pid=517,fd=3),("httpd",pid=516,fd=3),("httpd",pid=314,fd=3))
tcp    LISTEN     0      128    127.0.0.1:6081                  *:*                   users:(("varnishd",pid=1165,fd=6))
tcp    LISTEN     0      10     127.0.0.1:6082                  *:*                   users:(("varnishd",pid=1109,fd=5))
tcp    LISTEN     0      128    127.0.0.1:8010                  *:*                   users:(("httpd",pid=520,fd=4),("httpd",pid=519,fd=4),("httpd",pid=518,fd=4),("httpd",pid=517,fd=4),("httpd",pid=516,fd=4),("httpd",pid=314,fd=4))
tcp    LISTEN     0      128       *:10000                 *:*                   users:(("miniserv.pl",pid=533,fd=5))
tcp    LISTEN     0      100    127.0.0.1:25                    *:*                   users:(("master",pid=782,fd=13))
tcp    LISTEN     0      128       *:32415                 *:*                   users:(("sshd",pid=326,fd=3))
tcp    LISTEN     0      128      :::4949                 :::*                   users:(("munin-node",pid=379,fd=5))
tcp    LISTEN     0      128      :::32415                :::*                   users:(("sshd",pid=326,fd=4))
[root@opensourceecology ~]#

as expected, nginx is failing because it can't bind to the hardcoded external ip addresses that don't exist on this distinct server; we'll have to sed this later

[root@opensourceecology ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] bind() to 138.201.84.223:4443 failed (99: Cannot assign requested address)
nginx: configuration file /etc/nginx/nginx.conf test failed
[root@opensourceecology ~]#

note that the hostname above is an exact match of the production server. This is confusing for my logs and makes it a risk of running commands on the wrong server. If possible, I should try to sed this back to 'osestaging1' or exclude the relevant configs' rsync as well
so it looks like apache is listening to 127.0.0.1:8000 for name-based-vhosts, except certbot which listens on 127.0.0.1:8010

[root@opensourceecology conf.d]# grep VirtualHost *
000-www.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
000-www.opensourceecology.org.conf:</VirtualHost>
00-fef.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-fef.opensourceecology.org.conf:</VirtualHost>
00-forum.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-forum.opensourceecology.org.conf:</VirtualHost>
00-microfactory.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-microfactory.opensourceecology.org.conf:</VirtualHost>
00-oswh.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-oswh.opensourceecology.org.conf:</VirtualHost>
00-phplist.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-phplist.opensourceecology.org.conf:</VirtualHost>
00-seedhome.openbuildinginstitute.org.conf:<VirtualHost 127.0.0.1:8000>
00-seedhome.openbuildinginstitute.org.conf:</VirtualHost>
00-store.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-store.opensourceecology.org.conf:</VirtualHost>
00-wiki.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
00-wiki.opensourceecology.org.conf:</VirtualHost>
00-www.openbuildinginstitute.org.conf:<VirtualHost 127.0.0.1:8000>
00-www.openbuildinginstitute.org.conf:</VirtualHost>
awstats.openbuildinginstitute.org.conf:<VirtualHost 127.0.0.1:8000>
awstats.openbuildinginstitute.org.conf:</VirtualHost>
awstats.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
awstats.opensourceecology.org.conf:</VirtualHost>
certbot.conf:<VirtualHost 127.0.0.1:8010>
certbot.conf:</VirtualHost>
munin.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
munin.opensourceecology.org.conf:</VirtualHost>
ssl.conf.disabled:<VirtualHost _default_:443>
ssl.conf.disabled:# moved outside VirtualHost block (see below)
ssl.conf.disabled:# moved outside VirtualHost block (see below)
ssl.conf.disabled:</VirtualHost>
ssl.conf.orig:<VirtualHost _default_:443>
ssl.conf.orig:</VirtualHost>                                  
ssl.openbuildinginstitute.org:# Purpose: To be included inside the <VirtualHost> block for all
ssl.opensourceecology.org:# Purpose: To be included inside the <VirtualHost> block for all
staging.openbuildinginstitute.org.conf.bak:<VirtualHost staging.openbuildinginstitute.org:8000>
staging.openbuildinginstitute.org.conf.bak:</VirtualHost>
staging.opensourceecology.org.conf:<VirtualHost 127.0.0.1:8000>
staging.opensourceecology.org.conf:</VirtualHost>
varnishTest.conf.disabled:<VirtualHost 127.0.0.1:8000>
varnishTest.conf.disabled:</VirtualHost>
[root@opensourceecology conf.d]#

unfortunately I get 403 forbiddens for both with curl

[root@opensourceecology conf.d]# curl 127.0.0.1:8000/
<!DOCTYPE HTML PUBLIC "-IETFDTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.</p>
</body></html>
[root@opensourceecology conf.d]# curl 127.0.0.1:8010/
<!DOCTYPE HTML PUBLIC "-IETFDTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.</p>
</body></html>
[root@opensourceecology conf.d]#

tailing the logs shows modsec blocking us from the fef vhost because we specified the URI as an IP address. Well, ok.

==> fef.opensourceecology.org/error_log <==
[Tue Oct 22 16:20:34.573535 2019] [:error] [pid 518] [client 127.0.0.1] ModSecurity: Access denied with code 403 (phase 2). Pattern match "^[\\\\d.:]+$" at REQUEST_HEADERS:Host. [file "/etc/httpd/modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf"] [line "98"] [id "960017"] [rev "2"] [msg "Host header is a numeric IP address"] [data "127.0.0.1:8000"] [severity "WARNING"] [ver "OWASP_CRS/2.2.9"] [maturity "9"] [accuracy "9"] [tag "OWASP_CRS/PROTOCOL_VIOLATION/IP_HOST"] [tag "WASCTC/WASC-21"] [tag "OWASP_TOP_10/A7"] [tag "PCI/6.5.10"] [tag "http://technet.microsoft.com/en-us/magazine/2005.01.hackerbasher.aspx"] [hostname "127.0.0.1"] [uri "/"] [unique_id "Xa8sUlvJZ8GVfznr1gxo6AAAAAI"]

==> modsec_audit.log <==
--cbc91b75-A--
[22/Oct/2019:16:20:34 +0000] Xa8sUlvJZ8GVfznr1gxo6AAAAAI 127.0.0.1 33594 127.0.0.1 8000
--cbc91b75-B--
GET / HTTP/1.1
User-Agent: curl/7.29.0
Host: 127.0.0.1:8000
Accept: */*

--cbc91b75-F--
HTTP/1.1 403 Forbidden
Content-Length: 202
Content-Type: text/html; charset=iso-8859-1

--cbc91b75-E--

--cbc91b75-H--
Message: Access denied with code 403 (phase 2). Pattern match "^[\\d.:]+$" at REQUEST_HEADERS:Host. [file "/etc/httpd/modsecurity.d/activated_rules/modsecurity_crs_21_protocol_anomalies.conf"] [line "98"] [id "960017"] [rev "2"] [msg "Host header is a numeric IP address"] [data "127.0.0.1:8000"] [severity "WARNING"] [ver "OWASP_CRS/2.2.9"] [maturity "9"] [accuracy "9"] [tag "OWASP_CRS/PROTOCOL_VIOLATION/IP_HOST"] [tag "WASCTC/WASC-21"] [tag "OWASP_TOP_10/A7"] [tag "PCI/6.5.10"] [tag "http://technet.microsoft.com/en-us/magazine/2005.01.hackerbasher.aspx"]
Action: Intercepted (phase 2)
Stopwatch: 1571761234559472 14421 (- - -)
Stopwatch2: 1571761234559472 14421; combined=4661, p1=4516, p2=113, p3=0, p4=0, p5=32, sr=3976, sw=0, l=0, gc=0
Response-Body-Transformed: Dechunked
Producer: ModSecurity for Apache/2.7.3 (http://www.modsecurity.org/); OWASP_CRS/2.2.9.
Server: Apache
Engine-Mode: "ENABLED"

--cbc91b75-Z--


==> fef.opensourceecology.org/access_log <==
127.0.0.1 - - [22/Oct/2019:16:20:34 +0000] "GET / HTTP/1.1" 403 202 "-" "curl/7.29.0"

attempting 8000 does a redirect that tries to strip itself; attempting 8010 works! The latter is just an empty docroot that gets populated by `certbot` for renewing certs on complicated some non-public vhost sites

[root@opensourceecology ~]# curl -i http://localhost:8000/
HTTP/1.1 301 Moved Permanently
Date: Tue, 22 Oct 2019 16:22:24 GMT
Server: Apache
X-VC-Enabled: true
X-VC-TTL: 86400
Location: http://localhost/
X-XSS-Protection: 1; mode=block
Content-Length: 0
Content-Type: text/html; charset=UTF-8

[root@opensourceecology ~]# curl -i http://localhost:8010/
HTTP/1.1 200 OK
Date: Tue, 22 Oct 2019 16:23:43 GMT
Server: Apache
Last-Modified: Fri, 09 Feb 2018 20:56:47 GMT
Accept-Ranges: bytes
Content-Length: 18
X-XSS-Protection: 1; mode=block
Content-Type: text/html; charset=UTF-8

can you see this?
[root@opensourceecology ~]#

this is going to be a pain; let's see if I can get nginx working; we have to fix '138.201.84.223'

[root@opensourceecology nginx]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] bind() to 138.201.84.223:4443 failed (99: Cannot assign requested address)
nginx: configuration file /etc/nginx/nginx.conf test failed
[root@opensourceecology nginx]# 
[root@opensourceecology nginx]# grep -irl '138.201.84.223' *
conf.d/www.openbuildinginstitute.org.conf
conf.d/wiki.opensourceecology.org.conf
conf.d/seedhome.openbuildinginstitute.org.conf
conf.d/www.opensourceecology.org.conf
conf.d/awstats.openbuildinginstitute.org.conf
nginx.conf
[root@opensourceecology nginx]#

I replaced the first IP for OBI with our VPN IP

[root@opensourceecology nginx]# sed -i 's/138.201.84.223/10.241.189.11/g' nginx.conf
[root@opensourceecology nginx]# sed -i 's/138.201.84.223/10.241.189.11/g' conf.d/*
[root@opensourceecology nginx]#

And then I replaced the second IP for oSE with our VPN IP as well

[root@opensourceecology nginx]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] bind() to 138.201.84.243:4443 failed (99: Cannot assign requested address)
nginx: configuration file /etc/nginx/nginx.conf test failed
[root@opensourceecology nginx]# sed -i 's/138.201.84.243/10.241.189.11/g' nginx.conf
[root@opensourceecology nginx]# sed -i 's/138.201.84.243/10.241.189.11/g' conf.d/*
[root@opensourceecology nginx]#

well now there is a duplicate line to listen on this same IP; I removed that from nginx.conf
And now I'm having issues with a duplicate default_server line. Oh, right, now that OBI and OSE share the same IP I'll make OSE the default server and remove it from OBI

[root@opensourceecology conf.d]# nginx -t
nginx: [emerg] a duplicate default server for 10.241.189.11:443 in /etc/nginx/conf.d/www.opensourceecology.org.conf:58
nginx: configuration file /etc/nginx/nginx.conf test failed
[root@opensourceecology conf.d]# grep -irl 'default_server' *
www.openbuildinginstitute.org.conf
www.opensourceecology.org.conf
[root@opensourceecology conf.d]# vim www.openbuildinginstitute.org.conf

Aaand now it's failing on the same issue but for the IPv6 addresses. I'm just going to comment those out entirely for the staging server

[root@opensourceecology conf.d]# nginx -t
nginx: [warn] conflicting server name "_" on 10.241.189.11:443, ignored
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] bind() to [2a01:4f8:172:209e::2]:443 failed (99: Cannot assign requested address)
nginx: configuration file /etc/nginx/nginx.conf test failed
[root@opensourceecology conf.d]# grep -irl '2a01:4f8:172:209e::2' *
awstats.opensourceecology.org.conf
fef.opensourceecology.org.conf
forum.opensourceecology.org.conf
microfactory.opensourceecology.org
munin.opensourceecology.org.conf
oswh.opensourceecology.org.conf
store.opensourceecology.org.conf
wiki.opensourceecology.org.conf
www.opensourceecology.org.conf

This last sed fixed it!

[root@opensourceecology conf.d]# sed -i 's^\(\s*\)[^#]*listen \[2a01:4f8:172:209e::2\(.*\)^\1#listen \[2a01:4f8:172:209e::2\2^' *
[root@opensourceecology conf.d]# nginx -t
nginx: [warn] conflicting server name "_" on 10.241.189.11:443, ignored
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
[root@opensourceecology conf.d]#

I added these lines to /etc/hosts to make a new domain 'staging.www.opensourceecology.org' point to this IP address; it works!

[root@opensourceecology conf.d]# tail /etc/hosts
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
2a01:4f8:172:209e::2 hetzner2.opensourceecology.org hetzner2

# staging
127.0.0.1 staging.www.opensourceecology.org
[root@opensourceecology conf.d]# 
[root@opensourceecology conf.d]# curl -si https://staging.opensourceecology.org | tail
var mo_theme = {"name_required":"Please provide your name","name_format":"Your name must consist of at least 5 characters","email_required":"Please provide a valid email address","url_required":"Please provide a valid URL","phone_required":"Minimum 5 characters required","human_check_failed":"The input the correct value for the equation above","message_required":"Please input the message","message_format":"Your message must be at least 15 characters long","success_message":"Your message has been sent. Thanks!","blog_url":"https:\/\/staging.opensourceecology.org","loading_portfolio":"Loading the next set of posts...","finished_loading":"No more items to load..."};
/* ]]> */
</script>
<script type='text/javascript' src='https://staging.opensourceecology.org/wp-content/themes/enigmatic/js/main.js?ver=1.6'></script>
<script type='text/javascript' src='https://staging.opensourceecology.org/wp-includes/js/wp-embed.min.js?ver=4.9.4'></script>

</body>
</html>


[root@opensourceecology conf.d]#

I then added a line for 'staging.www.opensourceecology.org' and 'www.opensourceecology.org' to point to my staging server's VPN IP address on my laptop and fired up firefox; I was successfully able to access the staging site's nginx -> varnish -> http site!

10.241.189.11 www.opensourceecology.org
10.241.189.11 staging.www.opensourceecology.org

note that, of course, I get a cert error when attempting to access 'staging.www.opensourceecology.org', but it loads fine when hitting 'www.opensourceecology.org'. I'll have to think more about how I want to fix this. If one is on the VPN, should they be automatically forced to using the staging site? That seems like it could create confusion, but if the names are *not* the same, then I'm sure lots of errors will be encountered with links and such; so perhaps that *is* the most logical thing to do...
oh fuck. now, somehow, I am getting emails from OSSEC on the staging server. I'll have to fix that too. For now I just stopped the ossec service on the staging server

Tue Oct 08, 2019

continuing from yesterday, I checked-up on the rsync running from prod to staging, and it appears to have stalled

	75497472 100%    2.90MB/s    0:00:24 (xfer#4297, to-check=1538/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000d4a57-00058e887df34962.journal   
	75497472 100%    2.80MB/s    0:00:25 (xfer#4298, to-check=1537/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000e7f5a-00058ec8f2c8422b.journal   
	23429120  31%    2.91MB/s    0:00:17

it's probably not a good idea to sync the /run dir..
attempting to ssh into the server fails

user@ose:~/openvpn$ ssh osestaging1
The authenticity of host '[10.241.189.11]:32415 ([10.241.189.11]:32415)' can't be established.
ECDSA key fingerprint is SHA256:HclF8ZQOjGqx+9TmwL111kZ7QxgKkoEw8g3l2YxV0gk.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.11]:32415' (ECDSA) to the list of known hosts.
Permission denied (publickey).
user@ose:~/openvpn$

I _can_ get into the staging server from the lxc-console on the dev server, but it doesn't look like anything is wrong with the setup of my user

[root@osestaging1 ~]# grep maltfield /etc/passwd
maltfield:x:1005:1005::/home/maltfield:/bin/bash
[root@osestaging1 ~]# grep maltfield /etc/shadow
maltfield:TRUNCATED
[root@osestaging1 ~]# grep maltfield /etc/group
wheel:x:10:maltfield,crupp,tgriffing,root
apache:x:48:cmota,crupp,maltfield,wp,apache,marcin
maltfield:x:1005:apache
sshaccess:x:1006:cmota,marcin,tgriffing,maltfield,lberezhny,crupp
keepass:x:993:maltfield,marcin,cmota,crupp
apache-admins:x:1012:cmota,maltfield,marcin,crupp,tgriffing,wp,apache
[root@osestaging1 ~]# ls -lah /home/maltfield/.ssh
total 16K
drwxr-xr-x.  2 tgriffing maltfield 4.0K Jan 19  2018 .
drwx------. 10 tgriffing maltfield 4.0K Oct  3 07:06 ..
-rw-r--r--.  1 root      root       750 Jun 20  2017 authorized_keys
-rw-r--r--.  1 tgriffing tgriffing 1.1K Oct  3 13:44 known_hosts
[root@osestaging1 ~]# cat /home/maltfield/.ssh/authorized_keys 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGNYjR7UKiJSAG/AbP+vlCBqNfQZ2yuSXfsEDuM7cEU8PQNJyuJnS7m0VcA48JRnpUpPYYCCB0fqtIEhpP+szpMg2LByfTtbU0vDBjzQD9mEfwZ0mzJsfzh1Nxe86l/d6h6FhxAqK+eG7ljYBElDhF4l2lgcMAl9TiSba0pcqqYBRsvJgQoAjlZOIeVEvM1lyfWfrmDaFK37jdUCBWq8QeJ98qpNDX4A76f9T5Y3q5EuSFkY0fcU+zwFxM71bGGlgmo5YsMMdSsW+89fSG0652/U4sjf4NTHCpuD0UaSPB876NJ7QzeDWtOgyBC4nhPpS8pgjsnl48QZuVm6FNDqbXr9bVk5BdntpBgps+gXdSL2j0/yRRayLXzps1LCdasMCBxCzK+lJYWGalw5dNaIDHBsEZiK55iwPp0W3lU9vXFO4oKNJGFgbhNmn+KAaW82NBwlTHo/tOlj2/VQD9uaK5YLhQqAJzIq0JuWZWFLUC2FJIIG0pJBIonNabANcN+vq+YJqjd+JXNZyTZ0mzuj3OAB/Z5zS6lT9azPfnEjpcOngFs46P7S/1hRIrSWCvZ8kfECpa8W+cTMus4rpCd40d1tVKzJA/n0MGJjEs2q4cK6lC08pXxq9zAyt7PMl94PHse2uzDFhrhh7d0ManxNZE+I5/IPWOnG1PJsDlOe4Yqw== guttersnipe@guttersnipe

[root@osestaging1 ~]#

ssh appears to be running too

[root@osestaging1 ~]# systemctl list-units | grep -i ssh
sshd.service                      loaded active running   OpenSSH server daemon
[root@osestaging1 ~]# ss -plan | grep -i ssh
u_str  ESTAB      0      0         * 32621                 * 32622               users:(("sshd",pid=350,fd=5))
u_dgr  UNCONN     0      0         * 32618                 * 29344               users:(("sshd",pid=350,fd=4),("sshd",pid=348,fd=4))
u_str  ESTAB      0      0         * 31143                 * 0                   users:(("sshd",pid=274,fd=2),("sshd",pid=274,fd=1))
u_str  ESTAB      0      0         * 32622                 * 32621               users:(("sshd",pid=348,fd=7))
tcp    LISTEN     0      128       *:32415                 *:*                   users:(("sshd",pid=274,fd=3))
tcp    ESTAB      0      0      10.241.189.11:32415              10.241.189.10:41270               users:(("sshd",pid=350,fd=3),("sshd",pid=348,fd=3))
tcp    LISTEN     0      128    [::]:32415              [::]:*                   users:(("sshd",pid=274,fd=4))
[root@osestaging1 ~]#

the ssh server logs say that the client just disconnects

Oct  8 05:57:01 localhost sshd[3586]: Connection closed by 10.241.189.10 port 41334 [preauth]

the ssh client says that the server rejected our public key

user@ose:~/openvpn$ ssh -vvv osestaging1
...
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/user/.ssh/id_rsa.ose
debug3: send_pubkey_test
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey
debug2: we did not send a packet, disable method
debug1: No more authentication methods to try.
Permission denied (publickey).
user@ose:~/openvpn$

I did notice that the ownership of the relevant /home/.ssh/authorized_keys file differs on the prod & staging servers

[maltfield@opensourceecology ~]$ ls -lahn /home/maltfield/.ssh
total 16K
drwxr-xr-x  2 1005 1005 4.0K Jan 19  2018 .
drwx------ 10 1005 1005 4.0K Oct  3 07:06 ..
-rw-r--r--  1    0    0  750 Jun 20  2017 authorized_keys
-rw-r--r--  1 1005 1005 1.1K Oct  3 13:44 known_hosts
[maltfield@opensourceecology ~]$

[root@osestaging1 ~]# ls -lahn /home/maltfield/.ssh
total 16K
drwxr-xr-x.  2 1000 1005 4.0K Jan 19  2018 .
drwx------. 10 1000 1005 4.0K Oct  3 07:06 ..
-rw-r--r--.  1    0    0  750 Jun 20  2017 authorized_keys
-rw-r--r--.  1 1000 1000 1.1K Oct  3 13:44 known_hosts
[root@osestaging1 ~]#

while the passwd, group, and shadow files all match

[root@opensourceecology ~]# md5sum /etc/passwd
cabf495ca12f7f32605eb764dd12c861  /etc/passwd
[root@opensourceecology ~]# md5sum /etc/group
04a70553d59a646406ecb89f2f7b17b5  /etc/group
[root@opensourceecology ~]# md5sum /etc/shadow
6f27deaf639ae2db1a1d94739a8bb834  /etc/shadow
[root@opensourceecology ~]#

[root@osestaging1 ~]# md5sum /etc/passwd
cabf495ca12f7f32605eb764dd12c861  /etc/passwd
[root@osestaging1 ~]# md5sum /etc/group
04a70553d59a646406ecb89f2f7b17b5  /etc/group
[root@osestaging1 ~]# md5sum /etc/shadow
6f27deaf639ae2db1a1d94739a8bb834  /etc/shadow
[root@osestaging1 ~]#

for some reason my '/home/maltfield' dir was also owned by 'tgriffin'. I was able to ssh-in again after fixing this

[root@osestaging1 ~]# chown -R maltfield:maltfield /home/maltfield/
[root@osestaging1 ~]# ls -lah /home
total 52K
drwxr-xr-x. 13 root       root       4.0K Jul 28  2018 .
dr-xr-xr-x. 20 root       root       4.0K Oct  7 10:05 ..
drwx------.  7 b2user     b2user     4.0K Oct  7 07:46 b2user
drwx------.  5 cmota      cmota      4.0K Jul 14  2017 cmota
drwx------.  5 crupp      crupp      4.0K Aug 12  2017 crupp
drwx------.  2 Flipo      Flipo      4.0K Sep 20  2016 Flipo
drwx------.  2 hart       hart       4.0K Mar 30  2017 hart
drwx------.  3 lberezhny  lberezhny  4.0K Jul 20  2017 lberezhny
drwx------. 10 maltfield  maltfield  4.0K Oct  3 07:06 maltfield
drwx------.  4 marcin     marcin     4.0K Jul  6  2017 marcin
drwx------.  2 not-apache not-apache 4.0K Feb 12  2018 not-apache
drwx------.  5 tgriffing  tgriffing  4.0K Aug  1 09:19 tgriffing
drwx------.  5 wp         wp         4.0K Oct  7  2017 wp
[root@osestaging1 ~]#

I re-opened the screen for the rsync, and it now exited

	75497472 100%    2.90MB/s    0:00:24 (xfer#4297, to-check=1538/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000d4a57-00058e887df34962.journal   
	75497472 100%    2.80MB/s    0:00:25 (xfer#4298, to-check=1537/7463)
run/log/journal/34a04596e14a410d9f2f816d507c55ab/system@fb40211581a0421d8abbe026c6a270ac-00000000000e7f5a-00058ec8f2c8422b.journal   
	23429120  31%    2.91MB/s    0:00:17





packet_write_wait: Connection to 10.241.189.11 port 32415: Broken pipe

rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: connection unexpectedly closed (119371 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9]

real    1059m42.282s
user    12m34.775s
sys     3m5.253s
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$
[maltfield@opensourceecology ~]$ time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

I updated the rsync command to exclude /run, and I kicked-off the rsync again

time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

ah, ffs! my internet connection here failed me, and I was silently disconnected from my ssh session with the prod node and dumped into a local shell. So I ended-up kicking off this rsync not from the prod node on which I was ssh'd, but my personal laptop (when I was dropped out of the prod server's ssh shell into my laptop's shell). By the time I realized it, the fucking staging server was broken!
fucking hell, I had successfully copied 35G overnight; now I have to restore from snapshot and start over.
I prepended a fucking hostname check to make sure this stupid shit doesn't happen again

[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

I had a bunch of issues restoring from snapshot; eventually I just did an rsync of the '/var/lib/lxcsnaps/osestaging1/snap1' dir to '/var/lib/lxc/osestaging1', and I was finally successfully able to `lxc-start -n osestaging1`
I did the `visudo` and install of rsync and re-initiated the rsync from prod to staging using the above-command. I noticed that I forgot to exclude the backups; here's what I should use next time

[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

while that ran, I checked our munin graphs. I nice'd & bwlimit'd the above rsync, but it's still good to check.
1. there's a spike in varnish requests, which is a bit odd
2. there was a shift in memory usage, but no issues there
3. load spiked to ~2, but our box has 8; no problems
4. there was a spike in 'nice' to ~100% cpu usage; cool
5. firewall throughput, eth0 traffic spiked to about the same level as our backups. excellent
6. there's a huge spike in disk usage read, disk IO that's much higher than backups; hmm
I also noted that the apache graphs that I added some time ago are blank; I probably have to setup an apache stats vhost for munin to scrape
munin processing graphs are also blank; hmm
all mysql graphs are also blank
even nginx graphs are all blank
I also added plugins for monitoring the 'mysqld' process and the memory of a bunch of processes

[root@opensourceecology plugins]# ls
apache_access       if_err_eth0        mysql_slowqueries   uptime                       varnish_memory_usage.bak
apache_processes    if_eth0            mysql_threads       users                        varnish_objects
apache_volume       interrupts         nginx_request       varnish4_                    varnish_objects.bak
cpu                 irqstats           nginx_status        varnish_backend_traffic      varnish_request_rate
df                  load               open_files          varnish_backend_traffic.bak  varnish_request_rate.bak
df_inode            memory             open_inodes         varnish_bad                  varnish_threads
diskstats           munin_stats        postfix_mailqueue   varnish_bad.bak              varnish_threads.bak
entropy             mysql_             postfix_mailvolume  varnish_expunge              varnish_transfer_rates
forks               mysql_bytes        processes           varnish_expunge.bak          varnish_transfer_rates.bak
fw_conntrack        mysql_innodb       proc_pri            varnish_hit_rate             varnish_uptime
fw_forwarded_local  mysql_isam_space_  swap                varnish_hit_rate.bak         varnish_uptime.bak
fw_packets          mysql_queries      threads             varnish_memory_usage         vmstat
[root@opensourceecology plugins]# ls -lah | head -n 5
total 36K
drwxr-xr-x 2 root root 4.0K Sep  7 07:37 .
drwxr-xr-x 8 root root 4.0K Jun 24 16:05 ..
lrwxrwxrwx 1 root root   38 Sep  7 07:36 apache_access -> /usr/share/munin/plugins/apache_access
lrwxrwxrwx 1 root root   41 Sep  7 07:36 apache_processes -> /usr/share/munin/plugins/apache_processes
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/multip
multiping       multips         multips_memory  
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/multips_memory
[root@opensourceecology plugins]# ln -s /usr/share/munin/plugins/ps_ ps_mysqld
[root@opensourceecology plugins]#

for the munin mysql graphs, it looks like I need to grant access for the 'munin' user

[root@opensourceecology plugin-conf.d]# munin-run --debug mysql_queries
# Processing plugin configuration from /etc/munin/plugin-conf.d/amavis
# Processing plugin configuration from /etc/munin/plugin-conf.d/df
# Processing plugin configuration from /etc/munin/plugin-conf.d/fw_
# Processing plugin configuration from /etc/munin/plugin-conf.d/hddtemp_smartctl
# Processing plugin configuration from /etc/munin/plugin-conf.d/munin-node
# Processing plugin configuration from /etc/munin/plugin-conf.d/postfix
# Processing plugin configuration from /etc/munin/plugin-conf.d/postgres
# Processing plugin configuration from /etc/munin/plugin-conf.d/sendmail
# Processing plugin configuration from /etc/munin/plugin-conf.d/zzz-ose
# Setting /rgid/ruid/ to /99/99/
# Setting /egid/euid/ to /99 99/99/
# Setting up environment
# Environment mysqlopts = -u munin
# About to run '/etc/munin/plugins/mysql_queries'
mysqladmin: connect to server at 'localhost' failed
error: 'Access denied for user 'munin'@'localhost' (using password: NO)'
[root@opensourceecology plugin-conf.d]#

woah, this guide suggests that there's a ton more graphs than just is what symlink-able https://blog.penumbra.be/2010/04/monitoring-mysql-munin-directadmin/

[root@opensourceecology plugins]# ls -lah mysql_*
lrwxrwxrwx 1 root root 31 Sep  7 07:36 mysql_ -> /usr/share/munin/plugins/mysql_
lrwxrwxrwx 1 root root 36 Sep  7 07:36 mysql_bytes -> /usr/share/munin/plugins/mysql_bytes
lrwxrwxrwx 1 root root 37 Sep  7 07:36 mysql_innodb -> /usr/share/munin/plugins/mysql_innodb
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_isam_space_ -> /usr/share/munin/plugins/mysql_isam_space_
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_queries -> /usr/share/munin/plugins/mysql_queries
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_slowqueries -> /usr/share/munin/plugins/mysql_slowqueries
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_threads -> /usr/share/munin/plugins/mysql_threads
[root@opensourceecology plugins]# ls -lah /usr/share/munin/plugins/mysql_*
-rwxr-xr-x 1 root root  33K Mar  3  2017 /usr/share/munin/plugins/mysql_
-rwxr-xr-x 1 root root 1.8K Mar  3  2017 /usr/share/munin/plugins/mysql_bytes
-rwxr-xr-x 1 root root 5.4K Mar  3  2017 /usr/share/munin/plugins/mysql_innodb
-rwxr-xr-x 1 root root 5.7K Mar  3  2017 /usr/share/munin/plugins/mysql_isam_space_
-rwxr-xr-x 1 root root 2.5K Mar  3  2017 /usr/share/munin/plugins/mysql_queries
-rwxr-xr-x 1 root root 1.5K Mar  3  2017 /usr/share/munin/plugins/mysql_slowqueries
-rwxr-xr-x 1 root root 1.7K Mar  3  2017 /usr/share/munin/plugins/mysql_threads
[root@opensourceecology plugins]# /usr/share/munin/plugins/mysql_ suggest
bin_relay_log
commands
connections
files_tables
innodb_bpool
innodb_bpool_act
innodb_insert_buf
innodb_io
innodb_io_pend
innodb_log
innodb_rows
innodb_semaphores
innodb_tnx
myisam_indexes
network_traffic
qcache
qcache_mem
replication
select_types
slow
sorts
table_locks
tmp_tables
[root@opensourceecology plugins]#

I added all the mysql things

root@opensourceecology plugins]# ls -lah mysql_*
lrwxrwxrwx 1 root root 31 Sep  7 07:36 mysql_ -> /usr/share/munin/plugins/mysql_
lrwxrwxrwx 1 root root 36 Sep  7 07:36 mysql_bytes -> /usr/share/munin/plugins/mysql_bytes
lrwxrwxrwx 1 root root 37 Sep  7 07:36 mysql_innodb -> /usr/share/munin/plugins/mysql_innodb
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_isam_space_ -> /usr/share/munin/plugins/mysql_isam_space_
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_queries -> /usr/share/munin/plugins/mysql_queries
lrwxrwxrwx 1 root root 42 Sep  7 07:36 mysql_slowqueries -> /usr/share/munin/plugins/mysql_slowqueries
lrwxrwxrwx 1 root root 38 Sep  7 07:36 mysql_threads -> /usr/share/munin/plugins/mysql_threads
[root@opensourceecology plugins]# rm -rf mysql_*
[root@opensourceecology plugins]# ln -sf /usr/share/munin/plugins/mysql_ mysql_
[root@opensourceecology plugins]# for i in `./mysql_ suggest`; \
> do ln -sf /usr/share/munin/plugins/mysql_ $i; done
[root@opensourceecology plugins]# ls -lah mysql_*
lrwxrwxrwx 1 root root 31 Oct  8 08:06 mysql_ -> /usr/share/munin/plugins/mysql_
[root@opensourceecology plugins]# ls -lah commands
lrwxrwxrwx 1 root root 31 Oct  8 08:06 commands -> /usr/share/munin/plugins/mysql_
[root@opensourceecology plugins]#

according to this guide, munin needs a user that doesn't need any GRANTs to any databases, and that's sufficient http://www.mbrando.com/2007/08/06/how-to-get-your-mysql-munin-graphs-working/

create user munin@localhost identified by 'CHANGEME';
flush privileges;

and I added this stanza to /etc/munin/plugin-conf.d/zzz-ose

[mysql*]
user root
group wheel
env.mysqlopts -u munin_user -pOBFUSCATED

test worked

[root@opensourceecology plugins]# munin-run --debug mysql_queries
# Processing plugin configuration from /etc/munin/plugin-conf.d/amavis
# Processing plugin configuration from /etc/munin/plugin-conf.d/df
# Processing plugin configuration from /etc/munin/plugin-conf.d/fw_
# Processing plugin configuration from /etc/munin/plugin-conf.d/hddtemp_smartctl
# Processing plugin configuration from /etc/munin/plugin-conf.d/munin-node
# Processing plugin configuration from /etc/munin/plugin-conf.d/postfix
# Processing plugin configuration from /etc/munin/plugin-conf.d/postgres
# Processing plugin configuration from /etc/munin/plugin-conf.d/sendmail
# Processing plugin configuration from /etc/munin/plugin-conf.d/zzz-ose
# Setting /rgid/ruid/ to /99/0/
# Setting /egid/euid/ to /99 99 10/0/
# Setting up environment
# Environment mysqlopts = -u munin_user -pqd2qQiFdeNGepvhv5dsQx4rVt7pRyFJ
# About to run '/etc/munin/plugins/mysql_queries'
delete.value 837242
insert.value 896145
replace.value 1197242
select.value 148647861
update.value 1721521
cache_hits.value 0
[root@opensourceecology plugins]#

now for nginx, I confirmed that we do have the ability to spit out the status page

[root@opensourceecology plugins]# nginx -V 2>&1 | grep -o with-http_stub_status_module
with-http_stub_status_module
[root@opensourceecology plugins]#

I tried adding a block for '/nginx_status' only accessible to '127.0.0.1', but I still got 403'd when attempting to access it via curl on the local machine
the access logs showed it being accessed from an ipv6 address

2a01:4f8:172:209e::2 - - [08/Oct/2019:08:37:49 +0000] "GET /nginx_status HTTP/1.1" 403 162 "-" "curl/7.29.0" "-"

I guess it has to go out over eth0 because the server is necessarily bound to that ip (it's not bound to 127.0.0.1)
I used the following block

		# stats for munin
		location /nginx_status {
				stub_status on;
		access_log off;
				allow 127.0.0.1/32;
				allow 138.201.84.223/32;
				allow 138.201.84.243/32;
				allow ::1/128;
				allow 2a01:4f8:172:209e::2/128;
				allow fe80::921b:eff:fe94:7c4/128;
				deny all;
		}

and it worked!

[root@opensourceecology conf.d]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
[root@opensourceecology conf.d]# service nginx reload
Redirecting to /bin/systemctl reload nginx.service
[root@opensourceecology conf.d]# curl https://www.opensourceecology.org/nginx_status
Active connections: 1 
server accepts handled requests
 16063989 16063989 27383851 
Reading: 0 Writing: 1 Waiting: 0 
[root@opensourceecology conf.d]#

I found that my nginx module wouldn't work unless I installed the 'perl-LWP-Protocol-https' package

[root@opensourceecology plugins]# yum install perl-LWP-Protocol-https
...
Installed:
  perl-LWP-Protocol-https.noarch 0:6.04-4.el7                                                                                    

Dependency Installed:
  perl-Mozilla-CA.noarch 0:20130114-5.el7                                                                                        

Complete!
[root@opensourceecology plugins]#

I added nginx configs for both the wiki & osemain. If all is well, I'll add the configs for out other vhosts
I didn't bother with apache for now (also, the acl will be confusing since it sees all traffic coming from 127.0.0.1 via varnish)
meanwhile, some of the mysql graphs are populating. good!
and meanwhile, the rsync is still going; it's currently at "var/lib/mysql" copying or mysql databases' data. cool.
...
after a few hours, I checked-up on rsync; it was stuck again

var/www/html/wiki.opensourceecology.org/htdocs/images/archive/5/5f/20170722193549!CEBPressJuneGroup.fcstd
	 4840012 100%    2.56MB/s    0:00:01 (xfer#344966, to-check=1043/396314)
var/www/html/wiki.opensourceecology.org/htdocs/images/archive/5/5f/20170722195024!CEBPressJuneGroup.fcstd
	  950272  19%  879.62kB/s    0:00:04

the vpn client appears to have disconnected, and I can't ping the staging host at all from prod

[maltfield@opensourceecology ~]$ ping 10.241.189.11
PING 10.241.189.11 (10.241.189.11) 56(84) bytes of data.
^C
--- 10.241.189.11 ping statistics ---
59 packets transmitted, 0 received, 100% packet loss, time 57999ms

[maltfield@opensourceecology ~]$

I manually exited-out of the openvpn connection & reinitiated it; pings now work. After about 60 seconds, the rsync started outputting again..
when I went to check the size of the lxc container, I was told <1G, which can't be right

[root@osedev1 lxc]# du -sh /var/lib/lxc/osestaging1
604M    /var/lib/lxc/osestaging1
[root@osedev1 lxc]#

ncdu pointed me to the snap1 dir, which s currently 48G

[root@osedev1 lxc]# du -sh /var/lib/lxcsnaps/osestaging1/snap1
48G     /var/lib/lxcsnaps/osestaging1/snap1
[root@osedev1 lxc]#

apparently this is the consequence of restoring a snapshot just by doing a rsync; the snapshot's config file has a new line that identifies the rootfs path explicitly as the snapshot's rootfs

[root@osedev1 lxc]# tail /var/lib/lxc/osestaging1/config 
lxc.cap.drop = mac_admin
lxc.cap.drop = mac_override
lxc.cap.drop = setfcap
lxc.cap.drop = sys_module
lxc.cap.drop = sys_nice
lxc.cap.drop = sys_pacct
lxc.cap.drop = sys_rawio
lxc.cap.drop = sys_time
lxc.hook.clone = /usr/share/lxc/hooks/clonehostname
lxc.rootfs = /var/lib/lxcsnaps/osestaging1/snap1/rootfs
[root@osedev1 lxc]#

perhaps that means the actual dir is now my *real* snapshots data
while rsync continued, I noted that my nginx graphs are appearing, but there's no label that differentiates the wiki from osemain's graphs
I can see a list of variables defined by my plugin by default with the `munin-run <plugin> config` command https://munin.opensourceecology.org:4443/nginx-day.html

[root@opensourceecology plugins]# munin-run nginx_www.opensourceecology.org_status config
graph_title NGINX status
graph_args --base 1000
graph_category nginx
graph_vlabel Connections
total.label Active connections
total.info  Active connections
total.draw LINE2
reading.label Reading
reading.info  Reading
reading.draw LINE2
writing.label Writing
writing.info  Writing
writing.draw LINE2
waiting.label Waiting
waiting.info  Waiting
waiting.draw LINE2
[root@opensourceecology plugins]#

so it looks like I can set this as 'graph_title' or 'graph_info'
I restarted munin-node and triggered the munin-cron to update the html pages

[root@opensourceecology plugins]# service munin-node restart
Redirecting to /bin/systemctl restart munin-node.service
[root@opensourceecology plugins]# 
[root@opensourceecology plugins]# sudo -u munin /usr/bin/munin-cron

the new variables didn't affect anything, so I started grepping the logs
unrelated, the logs complained about mysql auth failure for:
1. network_traffic
2. select_types
3. innodb_tnx
4. innodb_log
5. sorts
6. myisam_indexes
7. qcache_mem
8. innodb_io
9. connections
10. qcache
11. innodb_insert_buf
12. replication
13. bin_relay_log
14. mysql_queries
15. innodb_rows
16. innodb_bpool_act
17. files_table
18. commands
19. innodb_bpool
20. tmp_tables
21. innodb_semaphores
22. innodb_io_pend
23. table_locks
24. slow
but there was nothing related to nginx
I tried overriding the graph_title in the plugins, but it didn't work
I found the datafile for munin in /var/lib/munin/datafile. This is clearly where the graph title is defined before being generated into html files

[root@opensourceecology plugins]# grep nginx /var/lib/munin/datafile | grep -i graph_title
localhost;localhost:nginx_wiki_opensourceecology_org_request.graph_title Nginx requests
localhost;localhost:nginx_wiki_opensourceecology_org_status.graph_title NGINX status
localhost;localhost:nginx_www_opensourceecology_org_status.graph_title NGINX status
localhost;localhost:nginx_www_opensourceecology_org_request.graph_title Nginx requests
[root@opensourceecology plugins]#

I found that I *could* override the title in /etc/muin/munin.conf https://www.aroundmyroom.com/2015/01/10/munin-help-needed/

[localhost]                                                                                                                      
	address 127.0.0.1                                                                                                            
	use_node_name yes                                                                                                            
	nginx_www_opensourceecology_org_status.graph_title Nginx Status (www.opensourceecology.org)                                  
	nginx_wiki_opensourceecology_org_status.graph_title Nginx Status (wiki.opensourceecology.org)

...
meanwhile, the rsync finished!

[maltfield@opensourceecology ~]$ [ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
...
var/www/html/www.opensourceecology.org/htdocs/wp-includes/widgets/class-wp-widget-text.php
	   20735 100%   21.05kB/s    0:00:00 (xfer#450852, to-check=0/517755)
var/yp/

sent 59229738371 bytes  received 11198208 bytes  2959309.47 bytes/sec
total size is 77965794338  speedup is 1.32
rsync warning: some files vanished before they could be transferred (code 24) at main.c(1052) [sender=3.0.9]

real    333m37.655s
user    19m50.292s
sys     6m0.997s
[maltfield@opensourceecology ~]$

but I still can't ssh into it; again, my home dir is owned by the wrong user

[root@osestaging1 ~]# ls -lah /home/maltfield/.ssh
total 16K
drwxr-xr-x.  2 tgriffing tgriffing 4.0K Jan 19  2018 .
drwx------. 10 tgriffing tgriffing 4.0K Oct  3 07:06 ..
-rw-r--r--.  1 root      root       750 Jun 20  2017 authorized_keys
-rw-r--r--.  1 tgriffing tgriffing 1.1K Oct  3 13:44 known_hosts
[root@osestaging1 ~]#

maybe I should add the '--numeric-ids' option if rsync is mapping the uids over?

[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --numeric-ids --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

I found that the 'sync.old' dir was still trying to sync, so I updated the command to add a wildcard after the exclude; it worked

[ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --numeric-ids --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync* --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

this time the double-tap took only 3 minutes wall time

[maltfield@opensourceecology ~]$ [ "`hostname`" = "opensourceecology.org" ] && time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --numeric-ids --rsync-path="sudo rsync" --exclude=/root --exclude=/run --exclude=/home/b2user/sync* --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
...
var/www/html/munin/static/zoom.js
		4760 100%    1.13MB/s    0:00:00 (xfer#2239, to-check=1002/321739)

sent 224884435 bytes  received 1668273 bytes  1352553.48 bytes/sec
total size is 41283867704  speedup is 182.23

real    2m46.967s
user    0m32.382s
sys     0m8.095s
[maltfield@opensourceecology ~]$

this time the permissions of my home dir didn't break, and I was able to ssh-in.
I'd like to take a snapshot of the staging server, but at this point we don't have space for it

[root@osedev1 lxc]# df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/sda1                      19G  3.4G   15G  19% /
devtmpfs                      873M     0  873M   0% /dev
tmpfs                         896M     0  896M   0% /dev/shm
tmpfs                         896M   17M  879M   2% /run
tmpfs                         896M     0  896M   0% /sys/fs/cgroup
/dev/mapper/ose_dev_volume_1  125G   94G   25G  80% /mnt/ose_dev_volume_1
tmpfs                         180M     0  180M   0% /run/user/1000
[root@osedev1 lxc]#

ok, now, drum roll: did we break the staging server? let's try to shut it down & start it again.
aaaaand: IT CAME BACK UP! Now it said its hostname isn't 'osestaging1' but 'opensourceecology'. Coolz.
I was successfully able to ssh into it, but then it froze. And my attempts to login to the lxc-console all end in timeouts

opensourceecology login: maltfield
Password: 
login: timed out after 60 seconds

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

opensourceecology login:

if I attempt to login as root, then it just times-out before it even asks me for a password

opensourceecology login: root
login: timed out after 60 seconds

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

opensourceecology login:

ssh auth suceeds, but it also fails before I get a shell

...
debug1: Authentication succeeded (publickey).
Authenticated to 10.241.189.11 ([10.241.189.11]:32415).
debug1: channel 0: new [client-session]
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug3: send packet: type 90
debug1: Requesting no-more-sessions@openssh.com
debug3: send packet: type 80
debug1: Entering interactive session.
debug1: pledge: network

I stopped the container again. This time when I tried to start it, I got an error

[root@osedev1 ~]# lxc-start -n -osestaging1
lxc-start: lxc_start.c: main: 290 Executing '/sbin/init' with no configuration file may crash the host
[root@osedev1 ~]#

I moved some dirs around so that I'm no longer using the 'rootfs' dir from the snaps dir, but now I get this damn message. duckducks are dead-ends

[root@osedev1 lxc]# lxc-start -n osestaging1
lxc-start: sync.c: __sync_wake: 74 sync wake failure : Broken pipe
lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
[root@osedev1 lxc]# lxc-start -P /var/lib/lxc/ -n osestaging1
lxc-start: sync.c: __sync_wake: 74 sync wake failure : Broken pipe
lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
[root@osedev1 lxc]#

I tried rebooting the dev server. after it came up, I still got the same error when attempting to `lxc-start`
I found I could get debug logs by adding `-l log -o <file>` https://github.com/lxc/lxc/issues/1555

[root@osedev1 ~]# lxc-start -n osestaging1 -l debug -o lxc-start.log
lxc-start: sync.c: __sync_wake: 74 sync wake failure : Broken pipe
lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
[root@osedev1 ~]# cat lxc-start.log
...

all the god damn google results on this "sync wake failure" shit (which are already few) are regarding configs of multiple containers sharing a network. I'll destroy the whole network namespace if needed. but how? why does nobody else encounter this damn issue?
well, I found the source code. could be an issue with an open file descriptor or something? https://fossies.org/linux/lxc/src/lxc/sync.c
my best guess is that it's an issue with the 'rootfs.dev' symlink

[root@osedev1 lxc]# ls -lah osestaging1
total 28K
drwxrwx---.  5 root root 4.0K Oct  8 16:17 .
drwxr-xr-x.  6 root root 4.0K Oct  8 16:05 ..
-rw-r--r--.  1 root root 1.1K Oct  8 15:46 config
drwxr-xr-x.  3 root root 4.0K Oct  8 15:46 dev
drwxr-xr-x.  2 root root 4.0K Oct  8 15:52 osestaging1
dr-xr-xr-x. 20 root root 4.0K Oct  8 15:21 rootfs
lrwxrwxrwx.  1 root root   38 Oct  8 16:17 rootfs.dev -> /dev/.lxc/osestaging1.72930b02843095eb
-rw-r--r--.  1 root root   19 Oct  3 15:40 ts
[root@osedev1 lxc]#

I commented-out every fucking line in the config file that had the word 'dev' in it...and the system started! Except that, umm, I couldn't connect to its console?

[root@osedev1 lxc]# lxc-start -n osestaging1 -f osestaging1/config -l trace -o lxc-start.log
Failed to create unit file /run/systemd/generator.late/netconsole.service: File exists
Failed to create unit file /run/systemd/generator.late/network.service: File exists
Running in a container, ignoring fstab device entry for /dev/disk/by-uuid/1e457b76-5100-4b53-bcdc-667ca122b941.
Running in a container, ignoring fstab device entry for /dev/mapper/ose_dev_volume_1.
Failed to create unit file /run/systemd/generator/systemd-cryptsetup@ose_dev_volume_1.service: File exists

lxc-start: console.c: lxc_console_peer_proxy_alloc: 315 console not set up

I found that if I commented-out the first line and added-back a rootfs line, I could get it to boot again, but I couldn't login from the console (same 60 second timeout) or ssh in (or ping it)

#lxc.mount.entry = /dev/net dev/net none bind,create=dir
...
lxc.rootfs = /var/lib/lxc/osestaging1/rootfs

I uncommented the first line, and it still started! looks like the issue was that I didn't explicitly define a rootfs..
this time I could ping the server from my laptop over the vpn
I was able to login as 'maltfield' from the console, but it locked-up when I tried to `sudo su -`
on the next reboot, tailed all the files in /var/log from the osedev1 server (inside the staging container's rootfs dir); I saw some interesting results

==> osestaging1/rootfs/var/log/messages <==
Oct  8 14:50:00 opensourceecology NET[248]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Oct  8 14:50:00 opensourceecology dhclient[201]: bound to 192.168.122.201 -- renewal in 1588 seconds.
Oct  8 14:50:00 opensourceecology network: Determining IP information for eth0... done.
Oct  8 14:50:00 opensourceecology network: [  OK  ]
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/kernel/yama/ptrace_scope': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '16' to '/proc/sys/kernel/sysrq': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/kernel/core_uses_pid': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/default/rp_filter': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/all/rp_filter': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv4/conf/default/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv4/conf/all/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/default/promote_secondaries': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/conf/all/promote_secondaries': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/fs/protected_hardlinks': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/fs/protected_symlinks': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '1' to '/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/autoconf': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_dad': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra_defrtr': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra_rtr_pref': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_ra_pinfo': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/accept_redirects': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/default/forwarding': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/autoconf': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_dad': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra_defrtr': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra_rtr_pref': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_ra_pinfo': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_source_route': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/accept_redirects': Read-only file system
Oct  8 14:50:00 opensourceecology systemd-sysctl: Failed to write '0' to '/proc/sys/net/ipv6/conf/all/forwarding': Read-only file system
Oct  8 14:50:01 opensourceecology systemd: Started LSB: Bring up/down networking.

and issues with /run

Oct  8 14:50:05 opensourceecology systemd-logind: Failed to remove runtime directory /run/user/0: Device or resource busy

Mon Oct 07, 2019

I added a comment to our long-standing feature request with the Libre Office Online CODE project for the ability to draw lines & arrows in their online version of "present" https://bugs.documentfoundation.org/show_bug.cgi?id=113386#c4
wiki updates & logging
I tried to login to my hetzner cloud account, but I got "Account is disabled" fucking hell. so much for user-specific auditing. I logged-in with our shared account..
I confirmed that our osedev1 node has a 20G disk + 10G volume.
we currently are using 3.4/19G on osedev1; I never setup the 10G volume that appears to be at /mnt/HC_Volume_3110278. It has 10G avail

[maltfield@osedev1 ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        19G  3.4G   15G  19% /
devtmpfs        873M     0  873M   0% /dev
tmpfs           896M     0  896M   0% /dev/shm
tmpfs           896M   25M  871M   3% /run
tmpfs           896M     0  896M   0% /sys/fs/cgroup
/dev/sdb        9.8G   37M  9.3G   1% /mnt/HC_Volume_3110278
tmpfs           180M     0  180M   0% /run/user/1000
[maltfield@osedev1 ~]$ ls -lah /mnt/HC_Volume_3110278/
total 24K
drwxr-xr-x. 3 root root 4.0K Aug 20 11:50 .
drwxr-xr-x. 3 root root 4.0K Aug 20 12:16 ..
drwx------. 2 root root  16K Aug 20 11:50 lost+found
[maltfield@osedev1 ~]$

the disk RAID1'd disk on prod is 197G with 75G used

[maltfield@opensourceecology ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        197G   75G  113G  40% /
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G  8.0K   32G   1% /dev/shm
tmpfs            32G  2.6G   29G   9% /run
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/md1        488M  289M  174M  63% /boot
tmpfs           6.3G     0  6.3G   0% /run/user/0
tmpfs           6.3G     0  6.3G   0% /run/user/1005
[maltfield@opensourceecology ~]$

a quick duckduck pulled up this guide for using luks to create an encrypted volume out of hetzner block volumes; this is a good idea https://angristan.xyz/how-to-use-encrypted-block-storage-volumes-hetzner-cloud/
the guide shows a method for resizing the encrypted volume. I didn't think that would be trivial, but it appears that resize2fs can increase the size of a luks-encrypted volume without issue. this is good to know. if we run out of space (or maybe we create a second staging node or ad-hoc dev nodes), we should be able to shutdown all our lxc containers, unmount the block drive, resize it, and remount it. That said, I don't think we'll be making backups of these (dev/staging) containers, so if we fuck up it would be bad.
our 10G hetzner cloud block volume has been costing 0.48 EUR/mo = 5.76 EUR/yr
the min needed for our current prod server is 75G. The slider on the product page has weird increments, but the actual "resize volume" option in the cloud console wui permits resizing in 1G increments. A 75G volume would cost 3.00 EUR/mo = 35 EUR/yr
A much more sane choice would be equal to the disk on prod = 197G = 7.88 EUR/mo = 94.56 EUR/yr
fuck, I asked Marcin for $100/yr. Currently we're spending 2.49/mo on the osedev1 instance alone. That's 29.88 EUR/yr = 32.81 USD/yr. For a 100 USD/yr budget, that leaves 67.19 USD for disk space = 61.19 EUR/yr. That's 5.09 EUR/mo, which will buy us a 127G volume at 5.08 EUR/mo.
127/197 = 0.64. Therefore, a 127G block volume will allow for an lxc staging node to replicate our prod node until our prod node grows beyond 64% capacity. 70% is a good general high-water-mark at which we'd need to look at migrating prod anyway. This (127G) seems like a resonable low-budget solution that meets the 100 USD/yr line.
I resized our 10G 'ose-dev-volume-1' volume to 127G in the hetzner WUI.
I clicked the 'enable protection' option, which prevents it from being deleted until the protection is manually removed
the 'show configuration' window in the wui tells us that the volume is '/dev/disk/by-id/scsi-0HC_Volume_3110278' on osedev1
the box itself looks like it's really /dev/sdb

[maltfield@osedev1 ~]$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=893568k,nr_inodes=223392,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset,clone_children)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda1 on / type ext4 (rw,relatime,seclabel,data=ordered)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11033)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
/dev/sdb on /mnt/HC_Volume_3110278 type ext4 (rw,relatime,seclabel,discard,data=ordered)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=183308k,mode=700,uid=1000,gid=1000)
[maltfield@osedev1 ~]$

but the other name appears in fstab

[root@osedev1 ~]# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Sun Jul 14 04:14:25 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=1e457b76-5100-4b53-bcdc-667ca122b941 /                       ext4    defaults        1 1
/dev/disk/by-id/scsi-0HC_Volume_3110278 /mnt/HC_Volume_3110278 ext4 discard,nofail,defaults 0 0
[root@osedev1 ~]#

ah, indeed, the above disk is just a link back to /dev/sdb

[root@osedev1 ~]# ls -lah /dev/disk/by-id/scsi-0HC_Volume_3110278 
lrwxrwxrwx. 1 root root 9 Oct  7 10:31 /dev/disk/by-id/scsi-0HC_Volume_3110278 -> ../../sdb
[root@osedev1 ~]#

before I rebuild this volume, the cryptfs command begs the question: where do I store the key?
1. assuming I want the server to be able to restart by itself without user interaction, the key should probably be stored in a file somewhere on '/root' on 'osedev1' but while my OS would lock-down the permissions to that file, the key file itself would likely be stored unencrypted on some hetzner drive somewhere. Is it worth encrypting the contents of the block volume when the encryption key itself might be stored unencrypted somewhere at hetzner's datacenter?
as a test, I ran `testdisk` to see if I could find any deleted files in the 10G volume that hetzner gave us from previous customers; I couldn't.
someone asked about this, but there wasn't much great discussion on how hetzner provisions their disks https://serverfault.com/questions/950790/cloud-server-vulnerability-analysis?noredirect=1
so risk assessment: when working in a cloud, we have to accept the integrity of the cloud provider. If a rogue hetzner employee wants to steal all our data, they can. There's absolutely nothing we can do about that other than building the servers ourselves and physically locking them down. The decision to use hetzner predates me, but I agree with it. It does not make sense for OSE to buy a server rack and host our equipment at FeF. So, I accept the risk and trust that hetzner not do something malicious that will put our data at risk
the real concern here is that we resize our volume (or hetzner in the background shuffles some abstracted blocks around physical devices that's black-boxed to us), and a different customer suddently gets, for example, our user's PII in their new volume. Or a malicious hetzner cloud user triggers some shuffling and is successfully able to exfiltrate our data from their cloud without breaking into our server. This is the risk that we're trying to prevent. In this case, I think it *is* worthwhile to encrypt our block volume. The chances that someone is able to get chunks of our data from an old 127G block volume that lacked encryption is significantly higher than them able to get those *and* the key from our server *and* be able to use the key to extract meaninful data from the likely non-contiguious bits that may be extracted from our recycled block volume data.
hetzner does not have a clean record, but hardly anybody does. This is only customer data, though. Not the their customer's server contents data https://mybroadband.co.za/news/cloud-hosting/279181-hetzner-client-data-exposed-after-attack.html
so, while recognizing that it has limitations, I also recognize that there are sufficient benefits to justfy encrypting this block volume with a key stored unencrypted on our cloud instance
meanwhile, I found a guide for how to migrate the contents of /var to a block volume. It suggested doing so from a resuce disk, then editing fstab for the next reboot https://serverfault.com/questions/947732/how-to-add-hetzner-cloud-disk-volume-to-extend-var-partition
I created a new key file on my laptop, stored it in our shared keepass, and uploaded it to the server at /root/keys/ose-dev-volume-1.201910.key
let's shutdown osedev1 and migrate its /var/ to a block volume. First I'll shutdown the osestagng1 staging lxc container then the host osedev1

[root@osedev1 ~]# lxc-stop -n osestaging1
[root@osedev1 ~]# shutdown -h now
Connection to 195.201.233.113 closed by remote host.
Connection to 195.201.233.113 closed.
user@ose:~$

I confirmed that the server was off in the hetzner cloud console wui
I clicked on the server. I'm not clear if I should mount a rescue disk or click the "rescue" option. No idea what the latter is, so I navigated to "ISO IMAGES", found SystemRescueCD, and clicked the "MOUNT" button next to it. I went back to the "servers"# I added a comment to our long-standing feature request with the Libre Office Online CODE project for the ability to draw lines & arrows in their online version of "present" https://bugs.documentfoundation.org/show_bug.cgi?id=113386#c4
wiki updates & logging
I tried to login to my hetzner cloud account, but I got "Account is disabled" fucking hell. so much for user-specific auditing. I logged-in with our shared account..
I confirmed that our osedev1 node has a 20G disk + 10G volume.
we currently are using 3.4/19G on osedev1; I never setup the 10G volume that appears to be at /mnt/HC_Volume_3110278. It has 10G avail

[maltfield@osedev1 ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        19G  3.4G   15G  19% /
devtmpfs        873M     0  873M   0% /dev
tmpfs           896M     0  896M   0% /dev/shm
tmpfs           896M   25M  871M   3% /run
tmpfs           896M     0  896M   0% /sys/fs/cgroup
/dev/sdb        9.8G   37M  9.3G   1% /mnt/HC_Volume_3110278
tmpfs           180M     0  180M   0% /run/user/1000
[maltfield@osedev1 ~]$ ls -lah /mnt/HC_Volume_3110278/
total 24K
drwxr-xr-x. 3 root root 4.0K Aug 20 11:50 .
drwxr-xr-x. 3 root root 4.0K Aug 20 12:16 ..
drwx------. 2 root root  16K Aug 20 11:50 lost+found
[maltfield@osedev1 ~]$

the disk RAID1'd disk on prod is 197G with 75G used

[maltfield@opensourceecology ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        197G   75G  113G  40% /
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G  8.0K   32G   1% /dev/shm
tmpfs            32G  2.6G   29G   9% /run
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/md1        488M  289M  174M  63% /boot
tmpfs           6.3G     0  6.3G   0% /run/user/0
tmpfs           6.3G     0  6.3G   0% /run/user/1005
[maltfield@opensourceecology ~]$

a quick duckduck pulled up this guide for using luks to create an encrypted volume out of hetzner block volumes; this is a good idea https://angristan.xyz/how-to-use-encrypted-block-storage-volumes-hetzner-cloud/
the guide shows a method for resizing the encrypted volume. I didn't think that would be trivial, but it appears that resize2fs can increase the size of a luks-encrypted volume without issue. this is good to know. if we run out of space (or maybe we create a second staging node or ad-hoc dev nodes), we should be able to shutdown all our lxc containers, unmount the block drive, resize it, and remount it. That said, I don't think we'll be making backups of these (dev/staging) containers, so if we fuck up it would be bad.
our 10G hetzner cloud block volume has been costing 0.48 EUR/mo = 5.76 EUR/yr
the min needed for our current prod server is 75G. The slider on the product page has weird increments, but the actual "resize volume" option in the cloud console wui permits resizing in 1G increments. A 75G volume would cost 3.00 EUR/mo = 35 EUR/yr
A much more sane choice would be equal to the disk on prod = 197G = 7.88 EUR/mo = 94.56 EUR/yr
fuck, I asked Marcin for $100/yr. Currently we're spending 2.49/mo on the osedev1 instance alone. That's 29.88 EUR/yr = 32.81 USD/yr. For a 100 USD/yr budget, that leaves 67.19 USD for disk space = 61.19 EUR/yr. That's 5.09 EUR/mo, which will buy us a 127G volume at 5.08 EUR/mo.
127/197 = 0.64. Therefore, a 127G block volume will allow for an lxc staging node to replicate our prod node until our prod node grows beyond 64% capacity. 70% is a good general high-water-mark at which we'd need to look at migrating prod anyway. This (127G) seems like a resonable low-budget solution that meets the 100 USD/yr line.
I resized our 10G 'ose-dev-volume-1' volume to 127G in the hetzner WUI.
I clicked the 'enable protection' option, which prevents it from being deleted until the protection is manually removed
the 'show configuration' window in the wui tells us that the volume is '/dev/disk/by-id/scsi-0HC_Volume_3110278' on osedev1
the box itself looks like it's really /dev/sdb

[maltfield@osedev1 ~]$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=893568k,nr_inodes=223392,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset,clone_children)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda1 on / type ext4 (rw,relatime,seclabel,data=ordered)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11033)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
/dev/sdb on /mnt/HC_Volume_3110278 type ext4 (rw,relatime,seclabel,discard,data=ordered)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=183308k,mode=700,uid=1000,gid=1000)
[maltfield@osedev1 ~]$

but the other name appears in fstab

[root@osedev1 ~]# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Sun Jul 14 04:14:25 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=1e457b76-5100-4b53-bcdc-667ca122b941 /                       ext4    defaults        1 1
/dev/disk/by-id/scsi-0HC_Volume_3110278 /mnt/HC_Volume_3110278 ext4 discard,nofail,defaults 0 0
[root@osedev1 ~]#

ah, indeed, the above disk is just a link back to /dev/sdb

[root@osedev1 ~]# ls -lah /dev/disk/by-id/scsi-0HC_Volume_3110278 
lrwxrwxrwx. 1 root root 9 Oct  7 10:31 /dev/disk/by-id/scsi-0HC_Volume_3110278 -> ../../sdb
[root@osedev1 ~]#

before I rebuild this volume, the cryptfs command begs the question: where do I store the key?
1. assuming I want the server to be able to restart by itself without user interaction, the key should probably be stored in a file somewhere on '/root' on 'osedev1' but while my OS would lock-down the permissions to that file, the key file itself would likely be stored unencrypted on some hetzner drive somewhere. Is it worth encrypting the contents of the block volume when the encryption key itself might be stored unencrypted somewhere at hetzner's datacenter?
as a test, I ran `testdisk` to see if I could find any deleted files in the 10G volume that hetzner gave us from previous customers; I couldn't.
someone asked about this, but there wasn't much great discussion on how hetzner provisions their disks https://serverfault.com/questions/950790/cloud-server-vulnerability-analysis?noredirect=1
so risk assessment: when working in a cloud, we have to accept the integrity of the cloud provider. If a rogue hetzner employee wants to steal all our data, they can. There's absolutely nothing we can do about that other than building the servers ourselves and physically locking them down. The decision to use hetzner predates me, but I agree with it. It does not make sense for OSE to buy a server rack and host our equipment at FeF. So, I accept the risk and trust that hetzner not do something malicious that will put our data at risk
the real concern here is that we resize our volume (or hetzner in the background shuffles some abstracted blocks around physical devices that's black-boxed to us), and a different customer suddently gets, for example, our user's PII in their new volume. Or a malicious hetzner cloud user triggers some shuffling and is successfully able to exfiltrate our data from their cloud without breaking into our server. This is the risk that we're trying to prevent. In this case, I think it *is* worthwhile to encrypt our block volume. The chances that someone is able to get chunks of our data from an old 127G block volume that lacked encryption is significantly higher than them able to get those *and* the key from our server *and* be able to use the key to extract meaninful data from the likely non-contiguious bits that may be extracted from our recycled block volume data.
hetzner does not have a clean record, but hardly anybody does. This is only customer data, though. Not the their customer's server contents data https://mybroadband.co.za/news/cloud-hosting/279181-hetzner-client-data-exposed-after-attack.html
so, while recognizing that it has limitations, I also recognize that there are sufficient benefits to justfy encrypting this block volume with a key stored unencrypted on our cloud instance
meanwhile, I found a guide for how to migrate the contents of /var to a block volume. It suggested doing so from a resuce disk, then editing fstab for the next reboot https://serverfault.com/questions/947732/how-to-add-hetzner-cloud-disk-volume-to-extend-var-partition
I created a new key file on my laptop, stored it in our shared keepass, and uploaded it to the server at /root/keys/ose-dev-volume-1.201910.key
let's shutdown osedev1 and migrate its /var/ to a block volume. First I'll shutdown the osestagng1 staging lxc container then the host osedev1

[root@osedev1 ~]# lxc-stop -n osestaging1
[root@osedev1 ~]# shutdown -h now
Connection to 195.201.233.113 closed by remote host.
Connection to 195.201.233.113 closed.
user@ose:~$

I confirmed that the server was off in the hetzner cloud console wui
I clicked on the server. I'm not clear if I should mount a rescue disk or click the "rescue" option. No idea what the latter is, so I navigated to "ISO IMAGES", found SystemRescueCD, and clicked the "MOUNT" button next to it. I went back to the "servers" page, opened a console for 'osedev1', and clicked "Power on"
the console showed the boot options for the rescue cd. I choose the first menu item = "SystemRescueCd: default boot options"
I can't copy & paste from the console, but I basically found 5x items in /dev/disk/by-id/
1. the DVD for systemrescue
2. my 127G block volume with the same name shown above (scsi-0HC_Volume_3110278 )
3. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0
4. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0-part1
5. another DVD?
so 3 & 4 must be our osedev1 disk. Both are 19.1G
attempting to mount the one without '-part1' failed, but the one with '-part1' succeeded, and all my data was there. It was mounted to '/mnt/osedev1-part/'
I formatted the new 127G ebs volume using cryptsetup

cryptsetup luksFormat /dev/disk/by-id/scsi-0HC_Volume_211278 /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key

I opened the new encrypted luks volume and created its ext4 partition

cryptsetup luksOpen --key-file /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key /dev/disk/by-id/scsi-0HC_Volume_211278 ebs
mkfs.ext4 -j /dev/mapper/ebs

I mounted the new FS & began a sync the osedev1's 'var' dir (now only 2.3G) to it

mkdir /mnt/ebs
mount /dev/mapper/ebs /mnt/ebs
rsync -av --progress /mnt/osedev1/var /mnt/ebs/

I added entries for fstab & crypttab to auto-mount the volume to /mnt/ose_dev_volume_1/
I moved the existing /var/ dir to /var.old and made a symlink from /var/ to /mnt/ose_dev_volume_1/var
I safely umounted & closed all the disks and shutdown
I removed the systemrescue iso from the server and started it up again
I was able to ssh-in, and the new '/var/' dir *appeared* to be setup properly

[maltfield@osedev1 /]$ ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[maltfield@osedev1 /]$ ls -lah /var/
total 80K
drwxr-xr-x. 19 root root 4.0K Jul 14 06:18 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 ..
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 adm
drwxr-xr-x.  7 root root 4.0K Oct  2 14:24 cache
drwxr-xr-x.  2 root root 4.0K Apr 24 16:03 crash
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 db
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 empty
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 games
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 gopher
drwxr-xr-x.  3 root root 4.0K Jul 14 06:14 kerberos
drwxr-xr-x. 34 root root 4.0K Oct  2 15:34 lib
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 local
lrwxrwxrwx.  1 root root   11 Jul 14 06:14 lock -> ../run/lock
drwxr-xr-x. 11 root root 4.0K Oct  7 13:49 log
lrwxrwxrwx.  1 root root   10 Jul 14 06:14 mail -> spool/mail
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 nis
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 opt
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 preserve
lrwxrwxrwx.  1 root root    6 Jul 14 06:14 run -> ../run
drwxr-xr-x.  8 root root 4.0K Oct  3 08:06 spool
drwxrwxrwt.  4 root root 4.0K Oct  7 13:49 tmp
-rw-r--r--.  1 root root  163 Jul 14 06:14 .updated
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 yp
[maltfield@osedev1 /]$

but I immediately noticed that, for exaple, screen wasn't working

[maltfield@osedev1 /]$ screen -S ebs
Cannot make directory '/var/run/screen': No such file or directory
[maltfield@osedev1 /]$

oh, damn, '/var/run' is a relative symlink to '../run' which won't work

[maltfield@osedev1 /]$ ls -lah /var/run
lrwxrwxrwx. 1 root root 6 Jul 14 06:14 /var/run -> ../run
[maltfield@osedev1 /]$

I made it an absolute symlink instead

[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]#

it still fails, but everything looks ok; I gave the system a reboot

[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]#

when the system came back up, `screen` had no issues, and everything looked good.

[maltfield@osedev1 ~]$ screen -ls
There is a screen on:
		4362.ebs        (Attached)
1 Socket in /var/run/screen/S-maltfield.

[maltfield@osedev1 ~]$ sudo su -
Last login: Mon Oct  7 13:54:28 CEST 2019 on pts/0
[root@osedev1 ~]# df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/sda1                      19G  3.4G   15G  19% /
devtmpfs                      873M     0  873M   0% /dev
tmpfs                         896M     0  896M   0% /dev/shm
tmpfs                         896M   17M  879M   2% /run
tmpfs                         896M     0  896M   0% /sys/fs/cgroup
/dev/mapper/ose_dev_volume_1  125G  2.5G  116G   3% /mnt/ose_dev_volume_1
tmpfs                         180M     0  180M   0% /run/user/1000
[root@osedev1 ~]# ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[root@osedev1 ~]# ls -lah /mnt/ose_dev_volume_1/
total 28K
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:46 ..
drwx------.  2 root root  16K Oct  7 13:18 lost+found
drwxr-xr-x. 19 root root 4.0K Oct  7 13:54 var
[root@osedev1 ~]#

I started the staging server, connected to the vpn from my laptop, and was successfully able to ssh into it (though it took a long delay)
I ssh'd into prod and kicked-off the rsync!

time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

that also copied the old backups, which is probably unnecessary. I should also exclude
1. home/b2user/sync
this sync is going at a rate of about 1G every 5 minutes. I expect it'll be done in 5-10 hours. I'll check on it tomorrow. page, opened a console for 'osedev1', and clicked "Power on"
the console showed the boot options for the rescue cd. I choose the first menu item = "SystemRescueCd: default boot options"
I can't copy & paste from the console, but I basically found 5x items in /dev/disk/by-id/
1. the DVD for systemrescue
2. my 127G block volume with the same name shown above (scsi-0HC_Volume_3110278 )
3. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0
4. scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0-part1
5. another DVD?
so 3 & 4 must be our osedev1 disk. Both are 19.1G
attempting to mount the one without '-part1' failed, but the one with '-part1' succeeded, and all my data was there. It was mounted to '/mnt/osedev1-part/'
I formatted the new 127G ebs volume using cryptsetup

cryptsetup luksFormat /dev/disk/by-id/scsi-0HC_Volume_211278 /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key

I opened the new encrypted luks volume and created its ext4 partition

cryptsetup luksOpen --key-file /mnt/osedev1/root/keys/ose-dev-volume-1.201910.key /dev/disk/by-id/scsi-0HC_Volume_211278 ebs
mkfs.ext4 -j /dev/mapper/ebs

I mounted the new FS & began a sync the osedev1's 'var' dir (now only 2.3G) to it

mkdir /mnt/ebs
mount /dev/mapper/ebs /mnt/ebs
rsync -av --progress /mnt/osedev1/var /mnt/ebs/

I added entries for fstab & crypttab to auto-mount the volume to /mnt/ose_dev_volume_1/
I moved the existing /var/ dir to /var.old and made a symlink from /var/ to /mnt/ose_dev_volume_1/var
I safely umounted & closed all the disks and shutdown
I removed the systemrescue iso from the server and started it up again
I was able to ssh-in, and the new '/var/' dir *appeared* to be setup properly

[maltfield@osedev1 /]$ ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[maltfield@osedev1 /]$ ls -lah /var/
total 80K
drwxr-xr-x. 19 root root 4.0K Jul 14 06:18 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 ..
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 adm
drwxr-xr-x.  7 root root 4.0K Oct  2 14:24 cache
drwxr-xr-x.  2 root root 4.0K Apr 24 16:03 crash
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 db
drwxr-xr-x.  3 root root 4.0K Jul 14 06:15 empty
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 games
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 gopher
drwxr-xr-x.  3 root root 4.0K Jul 14 06:14 kerberos
drwxr-xr-x. 34 root root 4.0K Oct  2 15:34 lib
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 local
lrwxrwxrwx.  1 root root   11 Jul 14 06:14 lock -> ../run/lock
drwxr-xr-x. 11 root root 4.0K Oct  7 13:49 log
lrwxrwxrwx.  1 root root   10 Jul 14 06:14 mail -> spool/mail
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 nis
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 opt
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 preserve
lrwxrwxrwx.  1 root root    6 Jul 14 06:14 run -> ../run
drwxr-xr-x.  8 root root 4.0K Oct  3 08:06 spool
drwxrwxrwt.  4 root root 4.0K Oct  7 13:49 tmp
-rw-r--r--.  1 root root  163 Jul 14 06:14 .updated
drwxr-xr-x.  2 root root 4.0K Apr 11  2018 yp
[maltfield@osedev1 /]$

but I immediately noticed that, for exaple, screen wasn't working

[maltfield@osedev1 /]$ screen -S ebs
Cannot make directory '/var/run/screen': No such file or directory
[maltfield@osedev1 /]$

oh, damn, '/var/run' is a relative symlink to '../run' which won't work

[maltfield@osedev1 /]$ ls -lah /var/run
lrwxrwxrwx. 1 root root 6 Jul 14 06:14 /var/run -> ../run
[maltfield@osedev1 /]$

I made it an absolute symlink instead

[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]#

it still fails, but everything looks ok; I gave the system a reboot

[root@osedev1 var]# rm -rf lock
[root@osedev1 var]# rm -rf run
[root@osedev1 var]# ln -s /run 
[root@osedev1 var]# ln -s /run/lock
[root@osedev1 var]# ls -lah run
lrwxrwxrwx. 1 root root 4 Oct  7 13:54 run -> /run
[root@osedev1 var]# ls -lah lock
lrwxrwxrwx. 1 root root 9 Oct  7 13:54 lock -> /run/lock
[root@osedev1 var]#

when the system came back up, `screen` had no issues, and everything looked good.

[maltfield@osedev1 ~]$ screen -ls
There is a screen on:
		4362.ebs        (Attached)
1 Socket in /var/run/screen/S-maltfield.

[maltfield@osedev1 ~]$ sudo su -
Last login: Mon Oct  7 13:54:28 CEST 2019 on pts/0
[root@osedev1 ~]# df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/sda1                      19G  3.4G   15G  19% /
devtmpfs                      873M     0  873M   0% /dev
tmpfs                         896M     0  896M   0% /dev/shm
tmpfs                         896M   17M  879M   2% /run
tmpfs                         896M     0  896M   0% /sys/fs/cgroup
/dev/mapper/ose_dev_volume_1  125G  2.5G  116G   3% /mnt/ose_dev_volume_1
tmpfs                         180M     0  180M   0% /run/user/1000
[root@osedev1 ~]# ls -lah /var
lrwxrwxrwx. 1 root root 25 Oct  7 13:47 /var -> /mnt/ose_dev_volume_1/var
[root@osedev1 ~]# ls -lah /mnt/ose_dev_volume_1/
total 28K
drwxr-xr-x.  4 root root 4.0K Oct  7 13:22 .
drwxr-xr-x.  4 root root 4.0K Oct  7 13:46 ..
drwx------.  2 root root  16K Oct  7 13:18 lost+found
drwxr-xr-x. 19 root root 4.0K Oct  7 13:54 var
[root@osedev1 ~]#

I started the staging server, connected to the vpn from my laptop, and was successfully able to ssh into it (though it took a long delay)
I ssh'd into prod and kicked-off the rsync!

time sudo -E nice rsync -e 'ssh -p 32415' --bwlimit=3000 --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/

that also copied the old backups, which is probably unnecessary. I should also exclude
1. home/b2user/sync
this sync is going at a rate of about 1G every 5 minutes. I expect it'll be done in 5-10 hours. I'll check on it tomorrow.

Sat Oct 05, 2019

email

Fri Oct 04, 2019

email

Thr Oct 03, 2019

continuing from yesterday, I copied the dev-specific encryption key from our shared keepass for the backups to the dev node

[root@osedev1 backups]# mv /home/maltfield/ose-dev-backups-cron.201910.key /root/backups/
[root@osedev1 backups]# chown root:root ose-dev-backups-cron.201910.key 
[root@osedev1 backups]# chmod 0400 ose-dev-backups-cron.201910.key 
[root@osedev1 backups]# ls -lah 
total 32K
drwxr-xr-x. 4 root root 4.0K Oct  3 07:09 .
dr-xr-x---. 7 root root 4.0K Oct  3 07:03 ..
-rw-r--r--. 1 root root  747 Oct  2 15:57 backup.settings
-rwxr-xr-x. 1 root root 5.7K Oct  3 07:03 backup.sh
drwxr-xr-x. 3 root root 4.0K Sep  9 09:02 iptables
-r--------. 1 root root 4.0K Oct  3 07:05 ose-dev-backups-cron.201910.key
drwxr-xr-x. 2 root root 4.0K Oct  3 07:04 sync
[root@osedev1 backups]#

note that I also had to install `trickle` on the dev node

[root@osedev1 backups]# ./backup.sh
================================================================================
INFO: Beginning Backup Run on 20191003_051037
INFO: Cleaning up old backup files
...
INFO: moving encrypted backup file to b2user's sync dir
INFO: Beginning upload to backblaze b2
sudo: /bin/trickle: command not found

real    0m0.030s
user    0m0.009s
sys     0m0.021s
[root@osedev1 backups]# yum install trickle
...
Installed:
  trickle.x86_64 0:1.07-19.el7                                                                                                 

Complete!
[root@osedev1 backups]#

note that something changed in the install process of the b2cli that required me to use the '--user' flag, which changed the path to the b2 binary. To keep the mods to the backup.sh script minimal, I just created a symlink

[root@osedev1 backups]# ./backup.sh
...
+ echo 'INFO: Beginning upload to backblaze b2'
INFO: Beginning upload to backblaze b2
+ /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev120191003_051511.tar.gpg daily_osedev120191003_051511.tar.gpg
trickle: exec(): No such file or directory

real    0m0.040s
user    0m0.012s
sys     0m0.020s
+ exit 0
[root@osedev1 backups]# /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev120191003_051511.tar.gpg daily_osedev120191003_051511.tar.gpg
trickle: exec(): No such file or directory
[root@osedev1 b2user]# ln -s /home/b2user/.local/bin/b2 /home/b2user/virtualenv/bin/b2
[root@osedev1 b2user]#

the backup script still failed at the upload to b2

[root@osedev1 backups]# ./backup.sh
...
INFO: Beginning upload to backblaze b2
+ /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev1_20191003_052059.tar.gpg daily_osedev1_20191003_052059.tar.gpg
ERROR: Missing account data: 'NoneType' object has no attribute 'getitem'  Use: b2 authorize-account

real    0m0.363s
user    0m0.281s
sys     0m0.076s
+ exit 0
[root@osedev1 b2user]# 
[root@osedev1 b2user]# /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev1_20191003_052059.tar.gpg daily_osedev1_20191003_052059.tar.gpg
ERROR: Missing account data: 'NoneType' object has no attribute 'getitem'  Use: b2 authorize-account
[root@osedev1 b2user]#

per the error, I used `b2 authorize-account` and added my creds for the user 'b2user'

[root@osedev1 b2user]# su - b2user
Last login: Wed Oct  2 16:15:28 CEST 2019 on pts/8
[b2user@osedev1 ~]$ .local/bin/b2 authorize-account
Using https://api.backblazeb2.com
Backblaze application key ID: XXXXXXXXXXXXXXXXXXXXXXXXX
Backblaze application key: 
[b2user@osedev1 ~]$

this time the backup succeeded!

[root@osedev1 b2user]# /root/backups/backup.sh
...
INFO: moving encrypted backup file to b2user's sync dir
+ /bin/mv /root/backups/sync/daily_osedev1_20191003_052448.tar.gpg /home/b2user/sync/daily_osedev1_20191003_052448.tar.gpg
+ /bin/chown b2user /home/b2user/sync/daily_osedev1_20191003_052448.tar.gpg
+ echo 'INFO: Beginning upload to backblaze b2'
INFO: Beginning upload to backblaze b2
+ /bin/sudo -u b2user /bin/trickle -s -u 3000 /home/b2user/virtualenv/bin/b2 upload-file --noProgress --threads 1 ose-dev-server-backups /home/b2user/sync/daily_osedev1_20191003_052448.tar.gpg daily_osedev1_20191003_052448.tar.gpg
URL by file name: https://f001.backblazeb2.com/file/ose-dev-server-backups/daily_osedev1_20191003_052448.tar.gpg
URL by fileId: https://f001.backblazeb2.com/b2api/v2/b2_download_file_by_id?fileId=4_z2675c17c55dd1d696edd0118_f1082387e9ca2c0d4_d20191003_m052459_c001_v0001109_t0038
{ 
  "action": "upload",
  "fileId": "4_z2675c17c55dd1d696edd0118_f1082387e9ca2c0d4_d20191003_m052459_c001_v0001109_t0038",
  "fileName": "daily_osedev1_20191003_052448.tar.gpg",
  "size": 17233113,
  "uploadTimestamp": 1570080299000
}

real    0m26.435s
user    0m0.706s
sys     0m0.251s
+ exit 0
[root@osedev1 b2user]#

as an out-of-band restore validation, I downloaded the 17.2M backup file from the backblaze b2 wui onto my laptop
again, I downloaded the encryption key from our shared keepass

user@disp5653:~/Downloads$ gpg --batch --passphrase-file ose-dev-backups-cron.201910.key --output daily_osedev1_20191003_052448.tar ose-dev-backups-cron.201910.key 
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
gpg: no valid OpenPGP data found.
gpg: processing message failed: Unknown system error
user@disp5653:~/Downloads$ gpg --batch --passphrase-file ose-dev-backups-cron.201910.key --output daily_osedev1_20191003_052448.tar daily_osedev1_20191003_052448.tar.gpg 
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
gpg: AES256 encrypted data
gpg: encrypted with 1 passphrase
user@disp5653:~/Downloads$ tar -xf daily_osedev1_20191003_052448.tar 
user@disp5653:~/Downloads$ ls
daily_osedev1_20191003_052448.tar      ose-dev-backups-cron.201910.key
daily_osedev1_20191003_052448.tar.gpg  root
user@disp5653:~/Downloads$ find root/backups/sync/daily_osedev1_20191003_052448/ -type f
root/backups/sync/daily_osedev1_20191003_052448/www/www.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/root/root.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/log/log.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/etc/etc.20191003_052448.tar.gz
root/backups/sync/daily_osedev1_20191003_052448/home/home.20191003_052448.tar.gz
user@disp5653:~/Downloads$

it looks like it's working; here's the contents of the backup file (note there's some varnish config files on here from when I did my test rsync back in on Sep 9th Maltfield_Log/2019_Q3#Mon_Sep_09.2C_2019

user@disp5653:~/Downloads$ find root/backups/sync/daily_osedev1_20191003_052448/ -type f -exec tar -tvf '{}' \; | awk '{print $6}' | cut -d/ -f 1-2 | sort -u
etc/adjtime
etc/aliases
etc/alternatives
etc/anacrontab
etc/audisp
etc/audit
etc/bash_completion.d
etc/bashrc
etc/binfmt.d
etc/centos-release
etc/centos-release-upstream
etc/chkconfig.d
etc/chrony.conf
etc/chrony.keys
etc/cloud
etc/cron.d
etc/cron.daily
etc/cron.deny
etc/cron.hourly
etc/cron.monthly
etc/crontab
etc/cron.weekly
etc/crypttab
etc/csh.cshrc
etc/csh.login
etc/dbus-1
etc/default
etc/depmod.d
etc/dhcp
etc/DIR_COLORS
etc/DIR_COLORS.256color
etc/DIR_COLORS.lightbgcolor
etc/dnsmasq.conf
etc/dnsmasq.d
etc/dracut.conf
etc/dracut.conf.d
etc/e2fsck.conf
etc/environment
etc/ethertypes
etc/exports
etc/exports.d
etc/filesystems
etc/firewalld
etc/fstab
etc/gcrypt
etc/GeoIP.conf
etc/GeoIP.conf.default
etc/gnupg
etc/GREP_COLORS
etc/groff
etc/group
etc/group-
etc/grub2.cfg
etc/grub.d
etc/gshadow
etc/gshadow-
etc/gss
etc/gssproxy
etc/host.conf
etc/hostname
etc/hosts
etc/hosts.allow
etc/hosts.deny
etc/idmapd.conf
etc/init.d
etc/inittab
etc/inputrc
etc/iproute2
etc/iscsi
etc/issue
etc/issue.net
etc/kdump.conf
etc/kernel
etc/krb5.conf
etc/krb5.conf.d
etc/ld.so.cache
etc/ld.so.conf
etc/ld.so.conf.d
etc/libaudit.conf
etc/libnl
etc/libuser.conf
etc/libvirt
etc/locale.conf
etc/localtime
etc/login.defs
etc/logrotate.conf
etc/logrotate.d
etc/lvm
etc/lxc
etc/machine-id
etc/magic
etc/makedumpfile.conf.sample
etc/man_db.conf
etc/mke2fs.conf
etc/modprobe.d
etc/modules-load.d
etc/motd
etc/mtab
etc/netconfig
etc/NetworkManager
etc/networks
etc/nfs.conf
etc/nfsmount.conf
etc/nsswitch.conf
etc/nsswitch.conf.bak
etc/numad.conf
etc/openldap
etc/openvpn
etc/opt
etc/os-release
etc/pam.d
etc/passwd
etc/passwd-
etc/pkcs11
etc/pki
etc/pm
etc/polkit-1
etc/popt.d
etc/ppp
etc/prelink.conf.d
etc/printcap
etc/profile
etc/profile.d
etc/protocols
etc/python
etc/qemu-ga
etc/radvd.conf
etc/rc0.d
etc/rc1.d
etc/rc2.d
etc/rc3.d
etc/rc4.d
etc/rc5.d
etc/rc6.d
etc/rc.d
etc/rc.local
etc/redhat-release
etc/request-key.conf
etc/request-key.d
etc/resolv.conf
etc/rpc
etc/rpm
etc/rsyncd.conf
etc/rsyslog.conf
etc/rsyslog.d
etc/rwtab
etc/rwtab.d
etc/sasl2
etc/screenrc
etc/securetty
etc/security
etc/selinux
etc/services
etc/sestatus.conf
etc/shadow
etc/shadow-
etc/shells
etc/skel
etc/ssh
etc/ssl
etc/statetab
etc/statetab.d
etc/subgid
etc/subuid
etc/sudo.conf
etc/sudoers
etc/sudoers.d
etc/sudo-ldap.conf
etc/sysconfig
etc/sysctl.conf
etc/sysctl.d
etc/systemd
etc/system-release
etc/system-release-cpe
etc/tcsd.conf
etc/terminfo
etc/timezone
etc/tmpfiles.d
etc/trickled.conf
etc/tuned
etc/udev
etc/unbound
etc/varnish
etc/vconsole.conf
etc/vimrc
etc/virc
etc/wpa_supplicant
etc/X11
etc/xdg
etc/xinetd.d
etc/yum
etc/yum.conf
etc/yum.repos.d
home/b2user
home/maltfield
root/anaconda-ks.cfg
root/backups
root/Finished
root/original-ks.cfg
root/Package
root/pki
root/Running
var/log
user@disp5653:~/Downloads$

and a true end-to-end test, I restored the sshd_config file

user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ pwd
/home/user/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ date
Thu Oct  3 11:37:49 +0545 2019
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ ls
etc.20191003_052448.tar.gz
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ tar -xzf etc.20191003_052448.tar.gz 
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$ tail etc/ssh/sshd_config

# override default of no subsystems
Subsystem	sftp	/usr/libexec/openssh/sftp-server

# Example of overriding settings on a per-user basis
#Match User anoncvs
#	X11Forwarding no
#	AllowTcpForwarding no
#	PermitTTY no
#	ForceCommand cvs server
user@disp5653:~/Downloads/root/backups/sync/daily_osedev1_20191003_052448/etc$

I also copied the cron job and the backup report script to the dev node

[root@opensourceecology ~]# cat /etc/cron.d/backup_to_backblaze 
20 07 * * * root time /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log
20 04 03 * * root time /bin/nice /root/backups/backupReport.sh
[root@opensourceecology ~]#

I tried testing the backup report script, but it complained that the `mail` command was absent. otherwise it appears to be working without modifications

[root@osedev1 backups]# ./backupReport.sh 
./backupReport.sh: line 90: /usr/bin/mail: No such file or directory
INFO: email body below
ATTENTION: BACKUPS MISSING!


WARNING: First of this month's backup (20191001) is missing!
WARNING: First of last month's backup (20190901) is missing!
WARNING: Yesterday's backup (20191002) is missing!
WARNING: The day before yesterday's backup (20191001) is missing!

See below for the contents of the backblaze b2 bucket = ose-dev-server-backups

daily_osedev1_20191003_052448.tar.gpg
---
Note: This report was generated on 20191003_060036 UTC by script '/root/backups/backupReport.sh'
	  This script was triggered by '/etc/cron.d/backup_to_backblaze'

	  For more information about OSE backups, please see the relevant documentation pages on the wiki:
	   * https://wiki.opensourceecology.org/wiki/Backblaze
	   * https://wiki.opensourceecology.org/wiki/OSE_Server#Backups

[root@osedev1 backups]#

I installed mailx and re-ran the script

[root@osedev1 backups]# yum install mailx
...
Installed:
  mailx.x86_64 0:12.5-19.el7                                                                                                   

Complete!
[root@osedev1 backups]#

this time it failed because sendmail is not installed; I *could* install postfix, but I decided just to install sendmail

[root@osedev1 backups]# ./backupReport.sh 
...
 /usr/sbin/sendmail: No such file or directory
"/root/dead.letter" 30/1215
. . . message not sent.
[root@osedev1 backups]# rpm -qa | grep postfix
[root@osedev1 backups]# rpm -qa | grep exim
[root@osedev1 backups]# yum install sendmail
...
Installed:
  sendmail.x86_64 0:8.14.7-5.el7                                                                                               

Dependency Installed:
  hesiod.x86_64 0:3.2.1-3.el7                                 procmail.x86_64 0:3.22-36.el7_4.1                                

Complete!
[root@osedev1 backups]#

this time it ran without error, but I never got an email. this is probably because gmail is rejecting it; we don't have DNS setup properly for this server to send mail. Anyway, this is good enough for our dev node's backups for now.
I also added the same lifecycle rules that we have for the 'ose-server-backups' bucket to the 'ose-dev-server-backups' bucket in the backblaze b2 wui
let's proceed with getting openvpn clients configured for the prod node (and its clone the staging node, which will use the same client cert)
as I did on Sep 9 to create my client cert for 'maltfield', I created a new cert for 'hetzner2' Maltfield_Log/2019_Q3#Mon_Sep_09.2C_2019
again, the ca and cert files are located in /usr/share/easy-rsa/3/pki/
1. I documented this dir on the wiki OpenVPN
interestingly, I could only execute these command from the dir above the pki dir

[root@osedev1 pki]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full hetzner2

Easy-RSA error:

EASYRSA_PKI does not exist (perhaps you need to run init-pki)?
Expected to find the EASYRSA_PKI at: /usr/share/easy-rsa/3/pki/pki
Run easyrsa without commands for usage and command help.

[root@osedev1 pki]#
[root@osedev1 pki]# cd ..
[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full hetzner2

Using SSL: openssl OpenSSL 1.0.2k-fips  26 Jan 2017
Generating a 2048 bit RSA private key
.......................................................................+++
............................................+++
writing new private key to '/usr/share/easy-rsa/3/pki/private/hetzner2.key.7F3A32KzES'
Enter PEM pass phrase:

note I appended the option 'nopass' so that the hetzner2 prod server could connect to the vpn using a private certificate file only & automatically, without requiring a password (it may be a good idea to look into if we can whitelist a specific IP for this user, since this hetzner2 client will only connect from the prod or staging server's static ip addresses)

[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa help build-client-full

  build-client-full <filename_base> [ cmd-opts ]
  build-server-full <filename_base> [ cmd-opts ]
  build-serverClient-full <filename_base> [ cmd-opts ]
	  Generate a keypair and sign locally for a client and/or server

	  This mode uses the <filename_base> as the X509 CN.

	  cmd-opts is an optional set of command options from this list:

		nopass  - do not encrypt the private key (default is encrypted)
[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full hetzner2 nopass

Using SSL: openssl OpenSSL 1.0.2k-fips  26 Jan 2017
Generating a 2048 bit RSA private key
..................................................................................................+++
.....+++
writing new private key to '/usr/share/easy-rsa/3/pki/private/hetzner2.key.qQ1HGf7ovg'
-----
Using configuration from /usr/share/easy-rsa/3/pki/safessl-easyrsa.cnf
Enter pass phrase for /usr/share/easy-rsa/3/pki/private/ca.key:
Check that the request matches the signature
Signature ok
The Subject's Distinguished Name is as follows
commonName            :ASN.1 12:'hetzner2'
Certificate is to be certified until Sep 17 06:42:28 2022 GMT (1080 days)

Write out database with 1 new entries
Data Base Updated
[root@osedev1 3]#

I copied the necessary files to the prod server

[root@osedev1 3]# cp pki/private/hetzner2.key /home/maltfield/
[root@osedev1 3]# cp pki/issued/hetzner2.crt /home/maltfield/
[root@osedev1 3]# cp pki/private/ta.key /home/maltfield/
[root@osedev1 3]# cp pki/ca.crt /home/maltfield/
[root@osedev1 3]# chown maltfield /home/maltfield/*.cert
[root@osedev1 3]# chown maltfield /home/maltfield/*.key
[root@osedev1 3]# logout
[maltfield@osedev1 ~]$ scp -P32415 /home/maltfield/hetzner2* opensourceecology.org:
hetzner2.crt                                                                                 100% 5675     2.8MB/s   00:00    
hetzner2.key                                                                                 100% 1708     1.0MB/s   00:00    
[maltfield@osedev1 ~]$ scp -P32415 /home/maltfield/*.key opensourceecology.org:
hetzner2.key                                                                                 100% 1708     1.0MB/s   00:00    
ta.key                                                                                       100%  636   368.9KB/s   00:00    
[maltfield@osedev1 ~]$ shred -u /home/maltfield/*.key
[maltfield@osedev1 ~]$ shred -u /home/maltfield/hetzner2.*
[maltfield@osedev1 ~]$

and I moved them to '/root/openvpn' and locked-down the files on the prod hetzner2 server

[root@opensourceecology maltfield]# cd /root
[root@opensourceecology ~]# ls
backups  bin  iptables  output.json  rsyncTest  sandbox  staging.opensourceecology.org  tmp
[root@opensourceecology ~]# mkdir openvpn
[root@opensourceecology ~]# cd openvpn
[root@opensourceecology openvpn]# mv /home/maltfield/hetzner2* .
[root@opensourceecology openvpn]# mv /home/maltfield/*.key .
[root@opensourceecology openvpn]# mv /home/maltfield/ca.crt .
[root@opensourceecology openvpn]# ls -lah
total 28K
drwxr-xr-x   2 root      root      4.0K Oct  3 06:53 .
dr-xr-x---. 20 root      root      4.0K Oct  3 06:53 ..
-rw-------   1 maltfield maltfield 3.3K Oct  3 06:51 ca.crt
-rw-------   1 maltfield maltfield 5.6K Oct  3 06:51 hetzner2.crt
-rw-------   1 maltfield maltfield 1.7K Oct  3 06:51 hetzner2.key
-rw-------   1 maltfield maltfield  636 Oct  3 06:51 ta.key
[root@opensourceecology openvpn]# chown root:root *
[root@opensourceecology openvpn]# ls -lah
total 28K
drwxr-xr-x   2 root root 4.0K Oct  3 06:53 .
dr-xr-x---. 20 root root 4.0K Oct  3 06:53 ..
-rw-------   1 root root 3.3K Oct  3 06:51 ca.crt
-rw-------   1 root root 5.6K Oct  3 06:51 hetzner2.crt
-rw-------   1 root root 1.7K Oct  3 06:51 hetzner2.key
-rw-------   1 root root  636 Oct  3 06:51 ta.key
[root@opensourceecology openvpn]# chmod 0700 .
[root@opensourceecology openvpn]# ls -lah
total 28K
drwx------   2 root root 4.0K Oct  3 06:53 .
dr-xr-x---. 20 root root 4.0K Oct  3 06:53 ..
-rw-------   1 root root 3.3K Oct  3 06:51 ca.crt
-rw-------   1 root root 5.6K Oct  3 06:51 hetzner2.crt
-rw-------   1 root root 1.7K Oct  3 06:51 hetzner2.key
-rw-------   1 root root  636 Oct  3 06:51 ta.key
[root@opensourceecology openvpn]#

then I created a client.conf file from my personal client.conf file & modified it to use the new cert & key files

[root@opensourceecology openvpn]# vim client.conf
[root@opensourceecology openvpn]# ls -lah client.conf 
-rw-r--r-- 1 root root 3.6K Oct  3 06:56 client.conf
[root@opensourceecology openvpn]# chmod 0600 client.conf 
[root@opensourceecology openvpn]# cat client.conf 
##############################################
# Sample client-side OpenVPN 2.0 config file #
# for connecting to multi-client server.     #
#                                            #
# This configuration can be used by multiple #
# clients, however each client should have   #
# its own cert and key files.                #
#                                            #
# On Windows, you might want to rename this  #
# file so it has a .ovpn extension           #
##############################################

# Specify that we are a client and that we
# will be pulling certain config file directives
# from the server.
client

# Use the same setting as you are using on
# the server.
# On most systems, the VPN will not function
# unless you partially or fully disable
# the firewall for the TUN/TAP interface.
;dev tap
dev tun

# Windows needs the TAP-Win32 adapter name
# from the Network Connections panel
# if you have more than one.  On XP SP2,
# you may need to disable the firewall
# for the TAP adapter.
;dev-node MyTap

# Are we connecting to a TCP or
# UDP server?  Use the same setting as
# on the server.
;proto tcp
proto udp

# The hostname/IP and port of the server.
# You can have multiple remote entries
# to load balance between the servers.
remote 195.201.233.113 1194
;remote my-server-2 1194

# Choose a random host from the remote
# list for load-balancing.  Otherwise
# try hosts in the order specified.
;remote-random

# Keep trying indefinitely to resolve the
# host name of the OpenVPN server.  Very useful
# on machines which are not permanently connected
# to the internet such as laptops.
resolv-retry infinite

# Most clients don't need to bind to
# a specific local port number.
nobind

# Downgrade privileges after initialization (non-Windows only)
;user nobody
;group nobody

# Try to preserve some state across restarts.
persist-key
persist-tun

# If you are connecting through an
# HTTP proxy to reach the actual OpenVPN
# server, put the proxy server/IP and
# port number here.  See the man page
# if your proxy server requires
# authentication.
;http-proxy-retry # retry on connection failures
;http-proxy [proxy server] [proxy port #]

# Wireless networks often produce a lot
# of duplicate packets.  Set this flag
# to silence duplicate packet warnings.
;mute-replay-warnings

# SSL/TLS parms.
# See the server config file for more
# description.  It's best to use
# a separate .crt/.key file pair
# for each client.  A single ca
# file can be used for all clients.
ca ca.crt
cert hetzner2.crt
key hetzner2.key

# Verify server certificate by checking that the
# certicate has the correct key usage set.
# This is an important precaution to protect against
# a potential attack discussed here:
#  http://openvpn.net/howto.html#mitm
#
# To use this feature, you will need to generate
# your server certificates with the keyUsage set to
#   digitalSignature, keyEncipherment
# and the extendedKeyUsage to
#   serverAuth
# EasyRSA can do this for you.
remote-cert-tls server

# If a tls-auth key is used on the server
# then every client must also have the key.
tls-auth ta.key 1

# Select a cryptographic cipher.
# If the cipher option is used on the server
# then you must also specify it here.
# Note that v2.4 client/server will automatically
# negotiate AES-256-GCM in TLS mode.
# See also the ncp-cipher option in the manpage
cipher AES-256-GCM

# Enable compression on the VPN link.
# Don't enable this unless it is also
# enabled in the server config file.
#comp-lzo

# Set log file verbosity.
verb 3

# Silence repeating messages
;mute 20

# hardening
tls-cipher TLS-DHE-RSA-WITH-AES-256-GCM-SHA384
[root@opensourceecology openvpn]#

I installed the 'openvpn' package on the production hetzner2 server

[root@opensourceecology openvpn]# yum install openvpn
...
Installed:
  openvpn.x86_64 0:2.4.7-1.el7                                                                           

Dependency Installed:
  lz4.x86_64 0:1.7.5-3.el7                       pkcs11-helper.x86_64 0:1.11-3.el7                      

Complete!
[root@opensourceecology openvpn]#

I was successfully able to connect to the vpn on the dev node from the prod node

[root@opensourceecology openvpn]# openvpn client.conf 
Thu Oct  3 07:06:45 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
Thu Oct  3 07:06:45 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 07:06:45 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:06:45 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:06:45 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:06:45 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 07:06:45 2019 UDP link local: (not bound)
Thu Oct  3 07:06:45 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:06:45 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=865b6fa1 7dcf4731
Thu Oct  3 07:06:45 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 07:06:45 2019 VERIFY KU OK
Thu Oct  3 07:06:45 2019 Validating certificate extended key usage
Thu Oct  3 07:06:45 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 07:06:45 2019 VERIFY EKU OK
Thu Oct  3 07:06:45 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 07:06:45 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 07:06:45 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 07:06:46 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 07:06:46 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 0,cipher AES-256-GCM'
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 07:06:46 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 07:06:46 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:06:46 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:06:46 2019 ROUTE_GATEWAY 138.201.84.193
Thu Oct  3 07:06:46 2019 TUN/TAP device tun0 opened
Thu Oct  3 07:06:46 2019 TUN/TAP TX queue length set to 100
Thu Oct  3 07:06:46 2019 /sbin/ip link set dev tun0 up mtu 1500
Thu Oct  3 07:06:46 2019 /sbin/ip addr add dev tun0 local 10.241.189.10 peer 10.241.189.9
Thu Oct  3 07:06:46 2019 /sbin/ip route add 10.241.189.1/32 via 10.241.189.9
Thu Oct  3 07:06:46 2019 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Thu Oct  3 07:06:46 2019 Initialization Sequence Completed

the prod server now has a tun0 interface with an ip address of 10.241.189.10 on the VPN private network subnet

[root@opensourceecology ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host 
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
	link/ether 90:1b:0e:94:07:c4 brd ff:ff:ff:ff:ff:ff
	inet 138.201.84.223 peer 138.201.84.193/32 brd 138.201.84.223 scope global eth0
	   valid_lft forever preferred_lft forever
	inet 138.201.84.223/32 scope global eth0
	   valid_lft forever preferred_lft forever
	inet 138.201.84.243/16 scope global eth0
	   valid_lft forever preferred_lft forever
	inet 138.201.84.243 peer 138.201.84.193/32 brd 138.201.255.255 scope global secondary eth0
	   valid_lft forever preferred_lft forever
	inet6 2a01:4f8:172:209e::2/64 scope global 
	   valid_lft forever preferred_lft forever
	inet6 fe80::921b:eff:fe94:7c4/64 scope link 
	   valid_lft forever preferred_lft forever
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
	link/none 
	inet 10.241.189.10 peer 10.241.189.9/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::a834:c77a:f65f:76fc/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@opensourceecology ~]#

I confirmed that the website didn't break ☺
now I created the same dir on the staging node (note this weird systemd journal corruption error that slowed things down quite a bit)

[root@osedev1 ~]# lxc-start -n osestaging1
systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to CentOS Linux 7 (Core)!
...
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

osestaging1 login: maltfield
Password:
Last login: Wed Oct  2 13:01:56 on lxc/console
[maltfield@osestaging1 ~]$ sudo su -
[sudo] password for maltfield:

<44>systemd-journald[297]: File /run/log/journal/dd9978e8797e4112832634fa4d174c7b/system.journal corrupted or uncleanly shut down, renaming and replacing.
Last login: Wed Oct  2 13:15:46 UTC 2019 on lxc/console
Last failed login: Thu Oct  3 07:11:57 UTC 2019 on lxc/console
There was 1 failed login attempt since the last successful login.
[root@osestaging1 ~]#

on the dev node again

[root@osedev1 pki]# cp private/hetzner2.key /home/maltfield/
[root@osedev1 pki]# cp issued/hetzner2.crt /home/maltfield/
[root@osedev1 pki]# cp private/ta.key /home/maltfield/
[root@osedev1 pki]# chown maltfield /home/maltfield/*.key
[root@osedev1 pki]# chown maltfield /home/maltfield/*.crt
[root@osedev1 pki]# logout
[maltfield@osedev1 ~]$ scp -P 32415 /home/maltfield/*.key 192.168.122.201:
hetzner2.key                                                                       100% 1708     2.4MB/s   00:00    
ta.key                                                                             100%  636     1.2MB/s   00:00    
[maltfield@osedev1 ~]$ scp -P 32415 /home/maltfield/*.crt 192.168.122.201:
ca.crt                                                                             100% 1850     2.6MB/s   00:00    
hetzner2.crt                                                                       100% 5675     9.0MB/s   00:00    
[maltfield@osedev1 ~]$ shred -u /home/maltfield/*.key
[maltfield@osedev1 ~]$ shred -u /home/maltfield/*.crt
[maltfield@osedev1 ~]$

and back on the staging container node

[root@osestaging1 ~]# cd /root/openvpn 
[root@osestaging1 openvpn]# ls 
[root@osestaging1 openvpn]# mv /home/maltfield/*.crt .
[root@osestaging1 openvpn]# mv /home/maltfield/*.key .
[root@osestaging1 openvpn]# ls -lah
total 28K
drwxr-xr-x. 2 root      root      4.0K Oct  3 07:23 .
dr-xr-x---. 3 root      root      4.0K Oct  3 07:18 ..
-rw-------. 1 maltfield maltfield 1.9K Oct  3 07:21 ca.crt
-rw-------. 1 maltfield maltfield 5.6K Oct  3 07:21 hetzner2.crt
-rw-------. 1 maltfield maltfield 1.7K Oct  3 07:21 hetzner2.key
-rw-------. 1 maltfield maltfield  636 Oct  3 07:21 ta.key
[root@osestaging1 openvpn]# chown root:root *
[root@osestaging1 openvpn]# chmod 0700 .
[root@osestaging1 openvpn]# ls -lah
total 28K
drwx------. 2 root root 4.0K Oct  3 07:23 .
dr-xr-x---. 3 root root 4.0K Oct  3 07:18 ..
-rw-------. 1 root root 1.9K Oct  3 07:21 ca.crt
-rw-------. 1 root root 5.6K Oct  3 07:21 hetzner2.crt
-rw-------. 1 root root 1.7K Oct  3 07:21 hetzner2.key
-rw-------. 1 root root  636 Oct  3 07:21 ta.key
[root@osestaging1 openvpn]#

I also installed vim, epel-release, and openvpn on the staging node
I had an issue connecting to to the vpn from within the staging node; this appears to be an issue for trying to connect to a vpn from within a docker or lxc container https://serverfault.com/questions/429461/no-tun-device-in-lxc-guest-for-openvpn

[root@osestaging1 openvpn]# openvpn client.conf 
Thu Oct  3 07:29:17 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
Thu Oct  3 07:29:17 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 07:29:17 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:29:17 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 07:29:17 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:29:17 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 07:29:17 2019 UDP link local: (not bound)
Thu Oct  3 07:29:17 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 07:29:17 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=f2e8fcad efdb9311
Thu Oct  3 07:29:17 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 07:29:17 2019 VERIFY KU OK
Thu Oct  3 07:29:17 2019 Validating certificate extended key usage
Thu Oct  3 07:29:17 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 07:29:17 2019 VERIFY EKU OK
Thu Oct  3 07:29:17 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 07:29:17 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 07:29:17 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 07:29:18 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 07:29:18 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 0,cipher AES-256-GCM'
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 07:29:18 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 07:29:18 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:29:18 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 07:29:18 2019 ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Thu Oct  3 07:29:18 2019 ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
Thu Oct  3 07:29:18 2019 Exiting due to fatal error
[root@osestaging1 openvpn]#

the above link suggests following the arch linux guide to create an openvpn client systemd module within the container

[root@osestaging1 openvpn]# ls /usr/lib/systemd/system/openvpn-client\@.service
/usr/lib/systemd/system/openvpn-client@.service
[root@osestaging1 openvpn]# ls /etc/systemd/system/
basic.target.wants  default.target.wants  local-fs.target.wants    sysinit.target.wants
default.target      getty.target.wants    multi-user.target.wants  system-update.target.wants
[root@osestaging1 openvpn]# cp /usr/lib/systemd/system/openvpn-client\@.service /etc/systemd/system/
[root@osestaging1 openvpn]# grep /etc/systemd/system/openvpn-client\@.service LimitNPROC
grep: LimitNPROC: No such file or directory
[root@osestaging1 openvpn]# grep LimitNPROC /etc/systemd/system/openvpn-client\@.service
LimitNPROC=10
[root@osestaging1 openvpn]# vim /etc/systemd/system/openvpn-client\@.service
[root@osestaging1 openvpn]# grep LimitNPROC /etc/systemd/system/openvpn-client\@.service
#LimitNPROC=10
[root@osestaging1 openvpn]#

that didn't work; it wants something after the '@' I did that, and realized that I'll need to further modify it with the correct config file

[root@osestaging1 openvpn]# cd /etc/systemd/system
[root@osestaging1 system]# ls
basic.target.wants    getty.target.wants       openvpn-client@.service
default.target        local-fs.target.wants    sysinit.target.wants
default.target.wants  multi-user.target.wants  system-update.target.wants
[root@osestaging1 system]# mv openvpn-client\@.service openvpn-client\@dev.service 
[root@osestaging1 system]# systemctl status openvpn-client\@dev.service 
● openvpn-client@dev.service - OpenVPN tunnel for dev
   Loaded: loaded (/etc/systemd/system/openvpn-client@dev.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
[root@osestaging1 system]# systemctl start openvpn-client\@dev.service 
Job for openvpn-client@dev.service failed because the control process exited with error code. See "systemctl status openvpn-client@dev.service" and "journalctl -xe" for details.
[root@osestaging1 system]# systemctl status openvpn-client\@dev.service 
● openvpn-client@dev.service - OpenVPN tunnel for dev
   Loaded: loaded (/etc/systemd/system/openvpn-client@dev.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-10-03 07:44:09 UTC; 16s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
  Process: 557 ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config %i.conf (code=exited, status=1/FAILURE)
 Main PID: 557 (code=exited, status=1/FAILURE)

Oct 03 07:44:08 osestaging1 systemd[1]: Starting OpenVPN tunnel for dev...
Oct 03 07:44:09 osestaging1 openvpn[557]: Options error: In [CMD-LINE]:1: Error opening configuration file: dev.conf
Oct 03 07:44:09 osestaging1 openvpn[557]: Use --help for more information.
Oct 03 07:44:09 osestaging1 systemd[1]: openvpn-client@dev.service: main process exited, code=exited, status=...ILURE
Oct 03 07:44:09 osestaging1 systemd[1]: Failed to start OpenVPN tunnel for dev.
Oct 03 07:44:09 osestaging1 systemd[1]: Unit openvpn-client@dev.service entered failed state.
Oct 03 07:44:09 osestaging1 systemd[1]: openvpn-client@dev.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@osestaging1 system]# vim openvpn-client\@dev.service

I updated the working dir and changed the service name to match the name of the config file in there

[root@osestaging1 system]# cat openvpn-client\@dev.service
[Unit]
Description=OpenVPN tunnel for %I
After=syslog.target network-online.target
Wants=network-online.target
Documentation=man:openvpn(8)
Documentation=https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
Documentation=https://community.openvpn.net/openvpn/wiki/HOWTO

[Service]
Type=notify
PrivateTmp=true
WorkingDirectory=/etc/openvpn/client
ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config %i.conf
CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_ADMIN CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_DAC_OVERRIDE
#LimitNPROC=10
DeviceAllow=/dev/null rw   
DeviceAllow=/dev/net/tun rw
ProtectSystem=true
ProtectHome=true
KillMode=process

[Install]
WantedBy=multi-user.target 
[root@osestaging1 system]# vim openvpn-client\@dev.service
[root@osestaging1 system]# mv openvpn-client\@dev.service openvpn-client\@client.service 
[root@osestaging1 system]# cat openvpn-client\@client.service 
[Unit]
Description=OpenVPN tunnel for %I
After=syslog.target network-online.target
Wants=network-online.target
Documentation=man:openvpn(8)
Documentation=https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
Documentation=https://community.openvpn.net/openvpn/wiki/HOWTO

[Service]
Type=notify
PrivateTmp=true
#WorkingDirectory=/etc/openvpn/client
WorkingDirectory=/root/openvpn
ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config %i.conf
CapabilityBoundingSet=CAP_IPC_LOCK CAP_NET_ADMIN CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SYS_CHROOT CAP_DAC_OVERRIDE
#LimitNPROC=10
DeviceAllow=/dev/null rw
DeviceAllow=/dev/net/tun rw
ProtectSystem=true
ProtectHome=true
KillMode=process

[Install]
WantedBy=multi-user.target
[root@osestaging1 system]#

this failed; I gave up and went with manually creating the tun interface per the guide, even though someone else commented taht this would no longer work; it worked!

[root@osestaging1 openvpn]# openvpn client.conf
Thu Oct  3 08:02:50 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20
 2019
Thu Oct  3 08:02:50 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 08:02:50 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:02:50 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:02:50 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:02:50 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 08:02:50 2019 UDP link local: (not bound)
Thu Oct  3 08:02:50 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:02:50 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=10846fe0 74bf0345
Thu Oct  3 08:02:50 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 08:02:50 2019 VERIFY KU OK
Thu Oct  3 08:02:50 2019 Validating certificate extended key usage
Thu Oct  3 08:02:50 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 08:02:50 2019 VERIFY EKU OK
Thu Oct  3 08:02:50 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 08:02:50 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 08:02:50 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 08:02:51 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:02:51 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 0,cipher AES-256-GCM'
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 08:02:51 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 08:02:51 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:02:51 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:02:51 2019 ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Thu Oct  3 08:02:51 2019 ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
Thu Oct  3 08:02:51 2019 Exiting due to fatal error
[root@osestaging1 openvpn]# mkdir /dev/net
[root@osestaging1 openvpn]# mknod /dev/net/tun c 10 200
[root@osestaging1 openvpn]# chmod 666 /dev/net/tun
[root@osestaging1 openvpn]# openvpn client.conf
Thu Oct  3 08:03:42 2019 OpenVPN 2.4.7 x86_64-redhat-linux-gnu [Fedora EPEL patched] [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
Thu Oct  3 08:03:42 2019 library versions: OpenSSL 1.0.2k-fips  26 Jan 2017, LZO 2.06
Thu Oct  3 08:03:42 2019 Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:03:42 2019 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Thu Oct  3 08:03:42 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:03:42 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 08:03:42 2019 UDP link local: (not bound)
Thu Oct  3 08:03:42 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:03:42 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=dcadaef9 7ebea8f1
Thu Oct  3 08:03:42 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 08:03:42 2019 VERIFY KU OK
Thu Oct  3 08:03:42 2019 Validating certificate extended key usage
Thu Oct  3 08:03:42 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 08:03:42 2019 VERIFY EKU OK
Thu Oct  3 08:03:42 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 08:03:42 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 08:03:42 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 08:03:43 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:03:48 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:03:53 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:03:59 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:04 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:09 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:15 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:20 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:25 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:30 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:35 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:41 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:46 2019 No reply from server after sending 12 push requests
Thu Oct  3 08:04:46 2019 SIGUSR1[soft,no-push-reply] received, process restarting
Thu Oct  3 08:04:46 2019 Restart pause, 5 second(s)
Thu Oct  3 08:04:51 2019 TCP/UDP: Preserving recently used remote address: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:04:51 2019 Socket Buffers: R=[212992->212992] S=[212992->212992]
Thu Oct  3 08:04:51 2019 UDP link local: (not bound)
Thu Oct  3 08:04:51 2019 UDP link remote: [AF_INET]195.201.233.113:1194
Thu Oct  3 08:04:51 2019 TLS: Initial packet from [AF_INET]195.201.233.113:1194, sid=c3f6bcfa 04f701bb
Thu Oct  3 08:04:51 2019 VERIFY OK: depth=1, CN=osedev1
Thu Oct  3 08:04:51 2019 VERIFY KU OK
Thu Oct  3 08:04:51 2019 Validating certificate extended key usage
Thu Oct  3 08:04:51 2019 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Thu Oct  3 08:04:51 2019 VERIFY EKU OK
Thu Oct  3 08:04:51 2019 VERIFY OK: depth=0, CN=server
Thu Oct  3 08:04:51 2019 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 DHE-RSA-AES256-GCM-SHA384, 4096 bit RSA
Thu Oct  3 08:04:51 2019 [server] Peer Connection Initiated with [AF_INET]195.201.233.113:1194
Thu Oct  3 08:04:53 2019 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1)
Thu Oct  3 08:04:53 2019 PUSH: Received control message: 'PUSH_REPLY,route 10.241.189.1,topology net30,ping 10,ping-restart 120,ifconfig 10.241.189.10 10.241.189.9,peer-id 1,cipher AES-256-GCM'
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: timers and/or timeouts modified
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: --ifconfig/up options modified
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: route options modified
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: peer-id set
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: adjusting link_mtu to 1624
Thu Oct  3 08:04:53 2019 OPTIONS IMPORT: data channel crypto options modified
Thu Oct  3 08:04:53 2019 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:04:53 2019 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Thu Oct  3 08:04:53 2019 ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Thu Oct  3 08:04:53 2019 TUN/TAP device tun0 opened
Thu Oct  3 08:04:53 2019 TUN/TAP TX queue length set to 100
Thu Oct  3 08:04:53 2019 /sbin/ip link set dev tun0 up mtu 1500
Thu Oct  3 08:04:53 2019 /sbin/ip addr add dev tun0 local 10.241.189.10 peer 10.241.189.9
Thu Oct  3 08:04:53 2019 /sbin/ip route add 10.241.189.1/32 via 10.241.189.9
Thu Oct  3 08:04:53 2019 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Thu Oct  3 08:04:53 2019 Initialization Sequence Completed

I found that I've become stuck in a lxc console since the escape keyboard sequence uses the same keystroke as screen (ctrl-a). the solution is to define an alternate escape sequence (ie: ctrl-e) using `-e'^e'` https://serverfault.com/questions/567696/byobu-how-to-disconnect-from-lxc-console

[root@osedev1 ~]# lxc-console -e '^e' -n osestaging1

Connected to tty 1
				  Type <Ctrl+e q> to exit the console, <Ctrl+e Ctrl+e> to enter Ctrl+e itself

[root@osedev1 ~]#

I also had to change the tty to 0 to actually get access

[root@osedev1 ~]# lxc-console -e '^e' -n osestaging1 -t 0
lxc_container: commands.c: lxc_cmd_console: 724 Console 0 invalid, busy or all consoles busy.
																							 [root@osedev1 ~]# 
[root@osedev1 ~]#

I went ahead and connected to the vpn from 3x clients: my laptop, the staging container, and the prod server
oddly, I noticed that the ip address given to the staging server and the prod server were the same (they do use the same client cert, but I expected them to have a distinct ip address

user@ose:~/openvpn$ ip address show dev tun0
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.6 peer 10.241.189.5/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::2ab6:3617:63cc:c654/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
user@ose:~/openvpn$

[root@opensourceecology openvpn]# ip address show dev tun0
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
	link/none 
	inet 10.241.189.10 peer 10.241.189.9/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::a834:c77a:f65f:76fc/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@opensourceecology openvpn]#

[root@osestaging1 ~]# ip address show dev tun0
2: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.10 peer 10.241.189.9/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::5e8c:3af2:2e6:4aea/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osestaging1 ~]#

I noticed a few relevant options to our openvpn server config
1. by default, I have 'ifconfig-pool-persist ipp.txt' defined, which makes clients have the same ip address persistently across the server's reboots; we appear to be using '/etc/openvpn/ipp.txt' here. The one in the 'server' dir appears to be from earlier, probably when I started the server manually rather than through systemd. Interestingly, this isn't even right! From above, we see that my 'maltfield' user has '.6' while the 'hetzner2' users have '.10'. Hmm.

[root@osedev1 server]# grep -iB5 ipp server.conf
# Maintain a record of client <-> virtual IP address
# associations in this file.  If OpenVPN goes down or
# is restarted, reconnecting clients can be assigned
# the same virtual IP address from the pool that was
# previously assigned.
ifconfig-pool-persist ipp.txt
[root@osedev1 server]# find /etc/openvpn | grep -i ipp.txt
/etc/openvpn/server/ipp.txt
/etc/openvpn/ipp.txt
[root@osedev1 server]# cat /etc/openvpn/server/ipp.txt 
maltfield,10.241.189.4
[root@osedev1 server]# cat /etc/openvpn/ipp.txt 
maltfield,10.241.189.4
hetzner2,10.241.189.8

1. there's also an option that I have commented-out whoose comments say it should be uncommented if multiple clients will share the same cert

[root@osedev1 server]# grep -iB5 duplicate server.conf
#
# IF YOU HAVE NOT GENERATED INDIVIDUAL
# CERTIFICATE/KEY PAIRS FOR EACH CLIENT,
# EACH HAVING ITS OWN UNIQUE "COMMON NAME",
# UNCOMMENT THIS LINE OUT.
;duplicate-cn
[root@osedev1 server]#

I uncommented the above 'duplicate-cn' line and restarted openvpn on the dev node

[root@osedev1 server]# vim server.conf
[root@osedev1 server]# systemctl restart openvpn@server.service

I reconnected to the vpn from the staging & prod servers; they got new IP addresses

[root@opensourceecology openvpn]# ip address show dev tun0
5: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 100
	link/none 
	inet 10.241.189.14 peer 10.241.189.13/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::e5fb:f261:801b:1c3d/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@opensourceecology openvpn]#

[root@osestaging1 openvpn]# ip address show dev tun0
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.18 peer 10.241.189.17/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::27f3:9643:5530:bd0e/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osestaging1 openvpn]#

I confirmed that each client could ping themselves, but not each-other, so I uncommented the line 'client-to-client' and restarted the openvpn server again
after that, I confirmed that staging could ping prod, prod could ping staging, and my laptop could ping both staging & prod. Cool!
1. for some reason the servers could still not ping my laptop; maybe that's some complication in my like quad-NAT'd QubesOS networking stack flowing through two nested VPN connections. Anyway, that shouldn't be required *shrug*
and, holy shit, I was successfully able to ssh into the staging node from the production node through the private VPN IP

[maltfield@opensourceecology ~]$ ssh -p 32415 10.241.189.18
The authenticity of host '[10.241.189.18]:32415 ([10.241.189.18]:32415)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
ECDSA key fingerprint is MD5:ab:eb:7f:f2:bb:83:a1:e5:21:49:1e:22:93:17:70:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.18]:32415' (ECDSA) to the list of known hosts.
Last login: Thu Oct  3 08:56:23 2019 from gateway
[maltfield@osestaging1 ~]$

but I was unable to ssh into our staging node from my laptop. oddly, it *is* able to establish a connection, but it gets stuck at some handshake step

user@ose:~/openvpn$ ssh -vvvvvvp 32415 maltfield@10.241.189.18
OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2t  10 Sep 2019
debug1: Reading configuration data /home/user/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: resolving "10.241.189.18" port 32415
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to 10.241.189.18 [10.241.189.18] port 32415.
debug1: Connection established.
...
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

Connection closed by 10.241.189.18 port 32415
user@ose:~/openvpn$

ok, I fixed this issue by removing the second VPN (qubes was configured to use a vpn qube as its NetVM; changing this to 'sys-firewall' fixed this issue)

user@ose:~/openvpn$ ssh -p 32415 maltfield@10.241.189.18
Last login: Thu Oct  3 09:20:50 2019 from 10.241.189.6
[maltfield@osestaging1 ~]$

on second thought, I really should have static ip addresses unique for both the prod & staging nodes. to achieve this, I can't share the same cert; I'll just make '/root/openvpn' one of those dirs (like networking config dirs) that is not changed by the rsync
I commented-out the 'duplicate-cn' line again in the openvpn server config & restarted the openvpn server

[root@osedev1 openvpn]# systemctl restart openvpn@server.service
(reverse-i-search)`grep': ss -plan | ^Cep -i 8080
[root@osedev1 openvpn]# grep -B5 duplicate-cn server.conf 
#
# IF YOU HAVE NOT GENERATED INDIVIDUAL
# CERTIFICATE/KEY PAIRS FOR EACH CLIENT,
# EACH HAVING ITS OWN UNIQUE "COMMON NAME",
# UNCOMMENT THIS LINE OUT.
;duplicate-cn
[root@osedev1 openvpn]# systemctl restart openvpn@server.service

and I created a distinct cert for 'osestaging1'

[root@osedev1 3]# /usr/share/easy-rsa/3.0.6/easyrsa build-client-full osestaging1 nopass

Using SSL: openssl OpenSSL 1.0.2k-fips  26 Jan 2017
Generating a 2048 bit RSA private key
....+++
...........................+++
writing new private key to '/usr/share/easy-rsa/3/pki/private/osestaging1.key.WsJhUsDCny'
-----
Using configuration from /usr/share/easy-rsa/3/pki/safessl-easyrsa.cnf
Enter pass phrase for /usr/share/easy-rsa/3/pki/private/ca.key:
Check that the request matches the signature
Signature ok
The Subject's Distinguished Name is as follows
commonName            :ASN.1 12:'osestaging1'
Certificate is to be certified until Sep 17 10:34:03 2022 GMT (1080 days)

Write out database with 1 new entries
Data Base Updated
[root@osedev1 3]# cp pki/private/osestaging1.key /home/maltfield/
[root@osedev1 3]# cp pki/private/ta.key /home/maltfield/
[root@osedev1 3]# cp pki/issued/osestaging1.crt /home/maltfield/
[root@osedev1 3]# cp pki/ca.crt /home/maltfield/
[root@osedev1 3]# chown maltfield /home/maltfield/*.key
[root@osedev1 3]# chown maltfield /home/maltfield/*.crt
[root@osedev1 3]# logout

and on the staging server

[root@osestaging1 ~]# cd /root/openvpn/
[root@osestaging1 openvpn]# mv /home/maltfield/*.key .
mv: overwrite './ta.key'? y
[root@osestaging1 openvpn]# mv /home/maltfield/*.crt .
mv: overwrite './ca.crt'? y
[root@osestaging1 openvpn]# ls
ca.crt       hetzner2.crt  osestaging1.crt  ta.key
client.conf  hetzner2.key  osestaging1.key
[root@osestaging1 openvpn]# shred -u hetzner2.*
[root@osestaging1 openvpn]# ls -lah
total 32K
drwx------. 2 root      root      4.0K Oct  3 10:40 .
dr-xr-x---. 4 root      root      4.0K Oct  3 07:59 ..
-rw-------. 1 maltfield maltfield 1.9K Oct  3 10:36 ca.crt
-rw-r--r--. 1 root      root      3.6K Oct  3 07:27 client.conf
-rw-------. 1 maltfield maltfield 5.6K Oct  3 10:36 osestaging1.crt
-rw-------. 1 maltfield maltfield 1.7K Oct  3 10:36 osestaging1.key
-rw-------. 1 maltfield maltfield  636 Oct  3 10:36 ta.key
[root@osestaging1 openvpn]# chown root:root *.crt
[root@osestaging1 openvpn]# chown root:root *.key
[root@osestaging1 openvpn]# chmod 0600 client.conf 
[root@osestaging1 openvpn]# ls -lah
total 32K
drwx------. 2 root root 4.0K Oct  3 10:40 .
dr-xr-x---. 4 root root 4.0K Oct  3 07:59 ..
-rw-------. 1 root root 1.9K Oct  3 10:36 ca.crt
-rw-------. 1 root root 3.6K Oct  3 07:27 client.conf
-rw-------. 1 root root 5.6K Oct  3 10:36 osestaging1.crt
-rw-------. 1 root root 1.7K Oct  3 10:36 osestaging1.key
-rw-------. 1 root root  636 Oct  3 10:36 ta.key
[root@osestaging1 openvpn]# vim client.conf

I decided to make the following static IPs
1. 10.241.189.10 hetzner2 (prod)
2. 10.241.189.11 osestaging1
I did this by uncommenting the line 'client-config-dir ccd', creating a client-specifc config file in the '/etc/openvpn/ccd/' dir whoose name matches the CN (Common Name) on the client cert, and restarting the openvpn server service

[root@osedev1 openvpn]# vim server.conf
[root@osedev1 openvpn]# grep -Ei '^client-config-dir ccd' server.conf
client-config-dir ccd
[root@osedev1 openvpn]# echo "ifconfig-push 10.241.189.11 255.255.255.255" > ccd/osestaging1
[root@osedev1 openvpn]# systemctl restart openvpn@server.service
[root@osedev1 openvpn]#

I did the same for prod

[root@osedev1 openvpn]# echo "ifconfig-push 10.241.189.10 255.255.255.255" > ccd/hetzner2
[root@osedev1 openvpn]# systemctl restart openvpn@server.service
[root@osedev1 openvpn]#

now that it's static, I can update my ssh config to make connecting to the staging node easy after connecting to the vpn from my laptop

user@ose:~/openvpn$ vim ~/.ssh/config
user@ose:~/openvpn$ head -n21 ~/.ssh/config
# OSE
Host openbuildinginstitute.org *.openbuildinginstitute.org opensourceecology.org *.opensourceecology.org
	Port 32415
	ForwardAgent yes
	IdentityFile /home/user/.ssh/id_rsa.ose
	User maltfield

Host osedev1
	HostName 195.201.233.113
	Port 32415
	ForwardAgent yes
	IdentityFile /home/user/.ssh/id_rsa.ose
	User maltfield

Host osestaging1
	HostName 10.241.189.11
	Port 32415
	ForwardAgent yes
	IdentityFile /home/user/.ssh/id_rsa.ose
	User maltfield

user@ose:~/openvpn$ ssh osestaging1
The authenticity of host '[10.241.189.11]:32415 ([10.241.189.11]:32415)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.11]:32415' (ECDSA) to the list of known hosts.
Last login: Thu Oct  3 10:42:40 2019 from 10.241.189.10
[maltfield@osestaging1 ~]$

another issue remains: we need the staging node to connect to the vpn on startup, but I can't get the fucking systemd module to work

[root@osestaging1 system]# systemctl start openvpn-client\@client.service 
Job for openvpn-client@client.service failed because the control process exited with error code. See "systemctl status openvpn-client@client.service" and "journalctl -xe" for details.
[root@osestaging1 system]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-10-03 12:34:56 UTC; 8s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
  Process: 1295 ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf (code=exited, status=200/CHDIR)
 Main PID: 1295 (code=exited, status=200/CHDIR)

Oct 03 12:34:56 osestaging1 systemd[1]: Starting OpenVPN tunnel for client...
Oct 03 12:34:56 osestaging1 systemd[1]: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct 03 12:34:56 osestaging1 systemd[1]: Failed to start OpenVPN tunnel for client.
Oct 03 12:34:56 osestaging1 systemd[1]: Unit openvpn-client@client.service entered failed state.
Oct 03 12:34:56 osestaging1 systemd[1]: openvpn-client@client.service failed.
[root@osestaging1 system]# tail -n 7 /var/log/messages 
Oct  3 12:29:29 localhost systemd: openvpn-client@client.service failed.
Oct  3 12:34:56 localhost systemd: Starting OpenVPN tunnel for client...
Oct  3 12:34:56 localhost systemd: Failed at step CHDIR spawning /usr/sbin/openvpn: No such file or directory
Oct  3 12:34:56 localhost systemd: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct  3 12:34:56 localhost systemd: Failed to start OpenVPN tunnel for client.
Oct  3 12:34:56 localhost systemd: Unit openvpn-client@client.service entered failed state.
Oct  3 12:34:56 localhost systemd: openvpn-client@client.service failed.
[root@osestaging1 system]#

the /usr/sbin/openvpn file definitely exists; I think the issue is with the tun0 not existing or something
I gave the osestaging1 container a reboot
after a reboot, osestaging1 now says that the openvpn-client@client.service doesn't exist!

[maltfield@osestaging1 ~]$ systemctl start openvpn-client\@client.service
Failed to start openvpn-client@client.service: The name org.freedesktop.PolicyKit1 was not provided by any .service files
See system logs and 'systemctl status openvpn-client@client.service' for details.
[maltfield@osestaging1 ~]$ systemctl list-unit-files | grep -i vpn
openvpn-client@.service                disabled
openvpn-client@client.service          disabled
openvpn-server@.service                disabled
openvpn@.service                       disabled
[maltfield@osestaging1 ~]$

attempting to enable it failes

[maltfield@osestaging1 ~]$ systemctl enable /etc/systemd/system/openvpn-client\@client.service 
Failed to execute operation: The name org.freedesktop.PolicyKit1 was not provided by any .service files
[maltfield@osestaging1 ~]$

oh, duh, I wasn't root

[root@osestaging1 ~]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
[root@osestaging1 ~]# systemctl start openvpn-client\@client.service 
Job for openvpn-client@client.service failed because the control process exited with error code. See "systemctl status openvpn-client@client.service" and "journalctl -xe" for details.
[root@osestaging1 ~]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-10-03 12:52:39 UTC; 7s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
  Process: 379 ExecStart=/usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf (code=exited, status=200/CHDIR)
 Main PID: 379 (code=exited, status=200/CHDIR)

Oct 03 12:52:38 osestaging1 systemd[1]: Starting OpenVPN tunnel for client...
Oct 03 12:52:39 osestaging1 systemd[1]: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct 03 12:52:39 osestaging1 systemd[1]: Failed to start OpenVPN tunnel for client.
Oct 03 12:52:39 osestaging1 systemd[1]: Unit openvpn-client@client.service entered failed state.
Oct 03 12:52:39 osestaging1 systemd[1]: openvpn-client@client.service failed.
[root@osestaging1 ~]# tail -n 7 /var/log/messages 
Oct  3 12:52:38 localhost systemd: Created slice system-openvpn\x2dclient.slice.
Oct  3 12:52:38 localhost systemd: Starting OpenVPN tunnel for client...
Oct  3 12:52:39 localhost systemd: Failed at step CHDIR spawning /usr/sbin/openvpn: No such file or directory
Oct  3 12:52:39 localhost systemd: openvpn-client@client.service: main process exited, code=exited, status=200/CHDIR
Oct  3 12:52:39 localhost systemd: Failed to start OpenVPN tunnel for client.
Oct  3 12:52:39 localhost systemd: Unit openvpn-client@client.service entered failed state.
Oct  3 12:52:39 localhost systemd: openvpn-client@client.service failed.
[root@osestaging1 ~]#

after fighting with this shit for hours, I finally just copied all my files from /root/openvpn into /etc/openvpn/client/ and it worked!

[root@osestaging1 system]# cp /root/openvpn/* /etc/openvpn/client
[root@osestaging1 system]# vim openvpn-client\@client.service
...
[root@osestaging1 system]# systemctl daemon-reload
<30>systemd-fstab-generator[425]: Running in a container, ignoring fstab device entry for /dev/root.
[root@osestaging1 system]# systemctl restart openvpn-client\@client.service 
[root@osestaging1 system]# systemctl status openvpn-client\@client.service 
● openvpn-client@client.service - OpenVPN tunnel for client
   Loaded: loaded (/etc/systemd/system/openvpn-client@client.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-10-03 13:33:32 UTC; 1s ago
	 Docs: man:openvpn(8)
		   https://community.openvpn.net/openvpn/wiki/Openvpn24ManPage
		   https://community.openvpn.net/openvpn/wiki/HOWTO
 Main PID: 432 (openvpn)
   Status: "Initialization Sequence Completed"
   CGroup: /user.slice/user-1000.slice/session-582.scope/system.slice/system-openvpn\x2dclient.slice/openvpn-client@client.service
		   └─432 /usr/sbin/openvpn --suppress-timestamps --nobind --config client.conf

Oct 03 13:33:33 osestaging1 openvpn[432]: Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Oct 03 13:33:33 osestaging1 openvpn[432]: WARNING: Since you are using --dev tun with a point-to-point topology, the second arg...nowarn)
Oct 03 13:33:33 osestaging1 openvpn[432]: ROUTE_GATEWAY 192.168.122.1/255.255.255.0 IFACE=eth0 HWADDR=fe:07:06:a6:5f:1d
Oct 03 13:33:33 osestaging1 openvpn[432]: TUN/TAP device tun0 opened
Oct 03 13:33:33 osestaging1 openvpn[432]: TUN/TAP TX queue length set to 100
Oct 03 13:33:33 osestaging1 openvpn[432]: /sbin/ip link set dev tun0 up mtu 1500
Oct 03 13:33:33 osestaging1 openvpn[432]: /sbin/ip addr add dev tun0 local 10.241.189.11 peer 255.255.255.255
Oct 03 13:33:33 osestaging1 openvpn[432]: /sbin/ip route add 10.241.189.0/24 via 255.255.255.255
Oct 03 13:33:33 osestaging1 openvpn[432]: WARNING: this configuration may cache passwords in memory -- use the auth-nocache opt...nt this
Oct 03 13:33:33 osestaging1 openvpn[432]: Initialization Sequence Completed
Hint: Some lines were ellipsized, use -l to show in full.
[root@osestaging1 system]# ip address show dev tun0
2: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.11 peer 255.255.255.255/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::927:fae4:1356:9b90/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osestaging1 system]#

I confirmed that I could ssh into the staging node from my laptop
I rebooted the staging node
I confirmed that I could ssh into the staging node again after the reboot!
I'm not going to bother with trying to setup this with the prod node for now; I'm not in a place where I want to make & test that prod change by rebooting the server..
this is a good stopping point; I created another snapshot of the staging node

[root@osedev1 ~]# lxc-stop -n osestaging1
[root@osedev1 ~]# lxc-snapshot --name osestaging1 --list
snap0 (/var/lib/lxcsnaps/osestaging1) 2019:10:02 15:37:58
[root@osedev1 ~]# lxc-snapshot --name osestaging1 afterVPN
lxc_container: lxccontainer.c: lxcapi_snapshot: 2891 Snapshot of directory-backed container requested.
lxc_container: lxccontainer.c: lxcapi_snapshot: 2892 Making a copy-clone.  If you do want snapshots, then
lxc_container: lxccontainer.c: lxcapi_snapshot: 2893 please create an aufs or overlayfs clone first, snapshot that
lxc_container: lxccontainer.c: lxcapi_snapshot: 2894 and keep the original container pristine.
[root@osedev1 ~]# lxc-snapshot --name osestaging1 --list
snap1 (/var/lib/lxcsnaps/osestaging1) 2019:10:03 15:40:16
snap0 (/var/lib/lxcsnaps/osestaging1) 2019:10:02 15:37:58
[root@osedev1 ~]#

I started the staging container again, and I tested an rsync from prod to staging; first let's see the contents of /etc/varnish on staging

[root@osestaging1 ~]# ls -lah /etc | grep -i varnish
[root@osestaging1 ~]#

and the rsync; it failed. right, I need passwordless sudo on the staging node setup

[maltfield@opensourceecology ~]$ sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" -av --progress /etc/varnish maltfield@10.241.189.10:/etc/
[sudo] password for maltfield: 
The authenticity of host '[10.241.189.10]:32415 ([10.241.189.10]:32415)' can't be established.
ECDSA key fingerprint is SHA256:HclF8ZQOjGqx+9TmwL111kZ7QxgKkoEw8g3l2YxV0gk.
ECDSA key fingerprint is MD5:cd:87:b1:bb:c1:3e:d1:d1:d4:5d:16:c9:e8:30:6a:71.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.10]:32415' (ECDSA) to the list of known hosts.
sudo: no tty present and no askpass program specified
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]
[maltfield@opensourceecology ~]$

I added this line to the end of the staging node with 'visudo'

maltfield       ALL=(ALL)       NOPASSWD: ALL

doh, I gotta install rsync on the staging node. so many prereqs...

[maltfield@opensourceecology ~]$ sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" -av --progress /etc/varnish maltfield@10.241.189.11:/etc/
The authenticity of host '[10.241.189.11]:32415 ([10.241.189.11]:32415)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
ECDSA key fingerprint is MD5:ab:eb:7f:f2:bb:83:a1:e5:21:49:1e:22:93:17:70:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[10.241.189.11]:32415' (ECDSA) to the list of known hosts.
sudo: rsync: command not found
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]
[maltfield@opensourceecology ~]$

this time the rsync worked!

[maltfield@opensourceecology ~]$ sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" -av --progress /etc/varnish maltfield@10.241.189.11:/etc/
...
sent 192211 bytes  received 503 bytes  128476.00 bytes/sec
total size is 190106  speedup is 0.99
[maltfield@opensourceecology ~]$

here's the dir on staging node's side

[root@osestaging1 ~]# ls -lah /etc/varnish 
total 44K
drwxr-xr-x.  5 root root 4.0K Aug 27 06:19 .
drwxr-xr-x. 63 root root 4.0K Oct  3 13:52 ..
-rw-r--r--.  1 root root 1.4K Apr  9 19:10 all-vhosts.vcl
-rw-r--r--.  1 root root  697 Nov 19  2017 catch-all.vcl
drwxr-xr-x.  2 root root 4.0K Aug 27 06:17 conf
-rw-rw-r--.  1 1011 1011  737 Nov 23  2017 default.vcl
drwxr-xr-x.  2 root root 4.0K Apr 12  2018 lib
-rw-------.  1 root root  129 Apr 12  2018 secret
-rw-------.  1 root root  129 Apr 12  2018 secret.20180412.bak
drwxr-xr-x.  2 root root 4.0K Aug 27 06:18 sites-enabled
-rw-r--r--.  1 root root 1.1K Oct 21  2017 varnish.params
[root@osestaging1 ~]#

again, here's the dirs we want to exclude; the openvpn configs are already preserved

	 /root
	/etc/sudo*
	/etc/openvpn
	/usr/share/easy-rsa
	/dev
	/sys
	/proc
	/boot/
	/etc/sysconfig/network*
	/tmp
	/var/tmp
	/etc/fstab
	/etc/mtab
	/etc/mdadm.conf

aaaand *fingers crossed* I kicked-off the rsync

[maltfield@opensourceecology ~]$ time sudo -E rsync -e 'ssh -p 32415' --rsync-path="sudo rsync" --exclude=/root --exclude=/etc/sudo* --exclude=/etc/openvpn --exclude=/usr/share/easy-rsa --exclude=/dev --exclude=/sys --exclude=/proc --exclude=/boot/ --exclude=/etc/sysconfig/network* --exclude=/tmp --exclude=/var/tmp --exclude=/etc/fstab --exclude=/etc/mtab --exclude=/etc/mdadm.conf -av --progress / maltfield@10.241.189.11:/
...

whoops, I got ahead of myself! I killed it & left the staging server in a broken state, so I restored from snapshot & re-did the visudo & install rsync steps. But before we actually kick-off this whole-system rsync, I need to attach a hetzner cloud volume and mount it to /var. Else, the dev node's little disk will fill-up!

[root@osedev1 ~]# lxc-snapshot --name osestaging1 -r snap1
[root@osedev1 ~]# lxc-start -n osestaging1

Wed Oct 02, 2019

continuing on the dev node, I want to create a container for lxc. First I installed 'lxc'

[root@osedev1 ~]# yum install lxc
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
epel/x86_64/metalink                                                                                              |  27 kB  00:00:00
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu
base                                                                                                              | 3.6 kB  00:00:00
epel                                                                                                              | 5.3 kB  00:00:00
extras                                                                                                            | 2.9 kB  00:00:00
updates                                                                                                           | 2.9 kB  00:00:00
(1/6): base/7/x86_64/group_gz                                                                                     | 165 kB  00:00:00
(2/6): base/7/x86_64/primary_db                                                                                   | 6.0 MB  00:00:00
(3/6): epel/x86_64/updateinfo                                                                                     | 1.0 MB  00:00:00
(4/6): updates/7/x86_64/primary_db                                                                                | 1.1 MB  00:00:00
(5/6): epel/x86_64/primary_db                                                                                     | 6.8 MB  00:00:00
(6/6): extras/7/x86_64/primary_db                                                                                 | 152 kB  00:00:00
Resolving Dependencies
--> Running transaction check
---> Package lxc.x86_64 0:1.0.11-2.el7 will be installed
--> Processing Dependency: lua-lxc(x86-64) = 1.0.11-2.el7 for package: lxc-1.0.11-2.el7.x86_64
--> Processing Dependency: lua-alt-getopt for package: lxc-1.0.11-2.el7.x86_64
--> Processing Dependency: liblxc.so.1()(64bit) for package: lxc-1.0.11-2.el7.x86_64
--> Running transaction check
---> Package lua-alt-getopt.noarch 0:0.7.0-4.el7 will be installed
---> Package lua-lxc.x86_64 0:1.0.11-2.el7 will be installed
--> Processing Dependency: lua-filesystem for package: lua-lxc-1.0.11-2.el7.x86_64
---> Package lxc-libs.x86_64 0:1.0.11-2.el7 will be installed
--> Running transaction check
---> Package lua-filesystem.x86_64 0:1.6.2-2.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=========================================================================================================================================
 Package                              Arch                         Version                              Repository                  Size
=========================================================================================================================================
Installing:
 lxc                                  x86_64                       1.0.11-2.el7                         epel                       140 k
Installing for dependencies:
 lua-alt-getopt                       noarch                       0.7.0-4.el7                          epel                       7.4 k
 lua-filesystem                       x86_64                       1.6.2-2.el7                          epel                        28 k
 lua-lxc                              x86_64                       1.0.11-2.el7                         epel                        17 k
 lxc-libs                             x86_64                       1.0.11-2.el7                         epel                       276 k

Transaction Summary
=========================================================================================================================================
Install  1 Package (+4 Dependent packages)

Total download size: 468 k
Installed size: 1.0 M
Is this ok [y/d/N]: y
Downloading packages:
(1/5): lua-alt-getopt-0.7.0-4.el7.noarch.rpm                                                                      | 7.4 kB  00:00:00
(2/5): lua-filesystem-1.6.2-2.el7.x86_64.rpm                                                                      |  28 kB  00:00:00
(3/5): lua-lxc-1.0.11-2.el7.x86_64.rpm                                                                            |  17 kB  00:00:00
(4/5): lxc-1.0.11-2.el7.x86_64.rpm                                                                                | 140 kB  00:00:00
(5/5): lxc-libs-1.0.11-2.el7.x86_64.rpm                                                                           | 276 kB  00:00:00
-----------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                    717 kB/s | 468 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : lxc-libs-1.0.11-2.el7.x86_64                                                                                          1/5
  Installing : lua-filesystem-1.6.2-2.el7.x86_64                                                                                     2/5
  Installing : lua-lxc-1.0.11-2.el7.x86_64                                                                                           3/5
  Installing : lua-alt-getopt-0.7.0-4.el7.noarch                                                                                     4/5
  Installing : lxc-1.0.11-2.el7.x86_64                                                                                               5/5
  Verifying  : lua-lxc-1.0.11-2.el7.x86_64                                                                                           1/5
  Verifying  : lua-alt-getopt-0.7.0-4.el7.noarch                                                                                     2/5
  Verifying  : lxc-1.0.11-2.el7.x86_64                                                                                               3/5
  Verifying  : lua-filesystem-1.6.2-2.el7.x86_64                                                                                     4/5
  Verifying  : lxc-libs-1.0.11-2.el7.x86_64                                                                                          5/5

Installed:
  lxc.x86_64 0:1.0.11-2.el7

Dependency Installed:
  lua-alt-getopt.noarch 0:0.7.0-4.el7 lua-filesystem.x86_64 0:1.6.2-2.el7 lua-lxc.x86_64 0:1.0.11-2.el7 lxc-libs.x86_64 0:1.0.11-2.el7

Complete!
[root@osedev1 ~]#

by default, it appears that we have no lxc containers

[root@osedev1 ~]# ls -lah /usr/share/lxc/templates/
total 8.0K
drwxr-xr-x. 2 root root 4.0K Mar  7  2019 .
drwxr-xr-x. 6 root root 4.0K Oct  2 12:16 ..
[root@osedev1 ~]#

I installed the 'lxc-templates' package (also from epel), and it gave me templates for many distros, including centos

[root@osedev1 ~]# yum -y install lxc-templates
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu
Resolving Dependencies
--> Running transaction check
---> Package lxc-templates.x86_64 0:1.0.11-2.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=========================================================================================================================================
 Package                             Arch                         Version                               Repository                  Size
=========================================================================================================================================
Installing:
 lxc-templates                       x86_64                       1.0.11-2.el7                          epel                        81 k

Transaction Summary
=========================================================================================================================================
Install  1 Package

Total download size: 81 k
Installed size: 333 k
Downloading packages:
lxc-templates-1.0.11-2.el7.x86_64.rpm                                                                             |  81 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : lxc-templates-1.0.11-2.el7.x86_64                                                                                     1/1
  Verifying  : lxc-templates-1.0.11-2.el7.x86_64                                                                                     1/1

Installed:
  lxc-templates.x86_64 0:1.0.11-2.el7

Complete!
[root@osedev1 ~]# ls -lah /usr/share/lxc/templates/
total 348K
drwxr-xr-x. 2 root root 4.0K Oct  2 12:29 .
drwxr-xr-x. 6 root root 4.0K Oct  2 12:16 ..
-rwxr-xr-x. 1 root root  11K Mar  7  2019 lxc-alpine
-rwxr-xr-x. 1 root root  14K Mar  7  2019 lxc-altlinux
-rwxr-xr-x. 1 root root  11K Mar  7  2019 lxc-archlinux
-rwxr-xr-x. 1 root root 9.5K Mar  7  2019 lxc-busybox
-rwxr-xr-x. 1 root root  30K Mar  7  2019 lxc-centos
-rwxr-xr-x. 1 root root  11K Mar  7  2019 lxc-cirros
-rwxr-xr-x. 1 root root  18K Mar  7  2019 lxc-debian
-rwxr-xr-x. 1 root root  18K Mar  7  2019 lxc-download
-rwxr-xr-x. 1 root root  49K Mar  7  2019 lxc-fedora
-rwxr-xr-x. 1 root root  28K Mar  7  2019 lxc-gentoo
-rwxr-xr-x. 1 root root  14K Mar  7  2019 lxc-openmandriva
-rwxr-xr-x. 1 root root  14K Mar  7  2019 lxc-opensuse
-rwxr-xr-x. 1 root root  35K Mar  7  2019 lxc-oracle
-rwxr-xr-x. 1 root root  12K Mar  7  2019 lxc-plamo
-rwxr-xr-x. 1 root root 6.7K Mar  7  2019 lxc-sshd
-rwxr-xr-x. 1 root root  24K Mar  7  2019 lxc-ubuntu
-rwxr-xr-x. 1 root root  12K Mar  7  2019 lxc-ubuntu-cloud
[root@osedev1 ~]#

now I was successfully able to create an lxc container for our staging node named 'osestaging1' from the template 'centos'. I didn't specify the version, but it does appear to be centos7

[root@osedev1 ~]# lxc-create -n osestaging1 -t centos
Host CPE ID from /etc/os-release: cpe:/o:centos:centos:7
Checking cache download in /var/cache/lxc/centos/x86_64/7/rootfs ...
Downloading CentOS minimal ...
...
Download complete.
Copy /var/cache/lxc/centos/x86_64/7/rootfs to /var/lib/lxc/osestaging1/rootfs ... 
Copying rootfs to /var/lib/lxc/osestaging1/rootfs ...
sed: can't read /var/lib/lxc/osestaging1/rootfs/etc/init/tty.conf: No such file or directory
Storing root password in '/var/lib/lxc/osestaging1/tmp_root_pass'
Expiring password for user root.
passwd: Success
sed: can't read /var/lib/lxc/osestaging1/rootfs/etc/rc.sysinit: No such file or directory
sed: can't read /var/lib/lxc/osestaging1/rootfs/etc/rc.d/rc.sysinit: No such file or directory

Container rootfs and config have been created.
Edit the config file to check/enable networking setup.

The temporary root password is stored in:

		'/var/lib/lxc/osestaging1/tmp_root_pass'


The root password is set up as expired and will require it to be changed
at first login, which you should do as soon as possible.  If you lose the
root password or wish to change it without starting the container, you
can change it from the host by running the following command (which will
also reset the expired flag):

		chroot /var/lib/lxc/osestaging1/rootfs passwd

[root@osedev1 ~]#

the sync from prod to sync is going to override the staging root password, so I won't bother creating & setting a distinct root password for this staging container
`lxc-top` shows that we have 0 containers running

[root@osedev1 ~]# lxc-top

Container            CPU      CPU      CPU      BlkIO        Mem
Name                Used      Sys     User      Total       Used
TOTAL (0 )          0.00     0.00     0.00    0.00       0.00

I tried to start the staging container, but I got a networking error

[root@osedev1 ~]# lxc-start -n osestaging1
lxc-start: conf.c: instantiate_veth: 3115 failed to attach 'vethWX1L1G' to the bridge 'virbr0': No such device
																											  lxc-start: conf.c: lxc_create_network: 3407 failed to create netdev
										lxc-start: start.c: lxc_spawn: 875 failed to create the network
																									   lxc-start: start.c: __lxc_start: 1149 failed to spawn 'osestaging1'
								 lxc-start: lxc_start.c: main: 336 The container failed to start.
lxc-start: lxc_start.c: main: 340 Additional information can be obtained by setting the --logfile and --logpriority options.
[root@osedev1 ~]#

it looks like there is no 'vibr0' device; we only have the loopback, ethernet, and tun device for openvpn

[root@osedev1 ~]# ip -all address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host 
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
	inet 195.201.233.113/32 brd 195.201.233.113 scope global dynamic eth0
	   valid_lft 56775sec preferred_lft 56775sec
	inet6 2a01:4f8:c010:3ca0::1/64 scope global 
	   valid_lft forever preferred_lft forever
	inet6 fe80::9400:ff:fe2e:489d/64 scope link 
	   valid_lft forever preferred_lft forever
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none 
	inet 10.241.189.1 peer 10.241.189.2/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::4ca6:2d27:e97f:1a66/64 scope link flags 800 
	   valid_lft forever preferred_lft forever
[root@osedev1 ~]#

Ideally, the container would not be given an internet-facing ip address, anyway. It would be better to give it a bridge on the tun0 openvpn network
it looks like the relevant files for containers is in /var/lib/lxc/<containerName>/

[root@osedev1 osestaging1]# date
Wed Oct  2 12:47:07 CEST 2019
[root@osedev1 osestaging1]# pwd
/var/lib/lxc/osestaging1
[root@osedev1 osestaging1]# ls
config  rootfs  tmp_root_pass
[root@osedev1 osestaging1]#

here is the default config

[root@osedev1 osestaging1]# cat config 
# Template used to create this container: /usr/share/lxc/templates/lxc-centos
# Parameters passed to the template:
# For additional config options, please look at lxc.container.conf(5)
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = virbr0
lxc.network.hwaddr = fe:07:06:a6:5f:1d
lxc.rootfs = /var/lib/lxc/osestaging1/rootfs

# Include common configuration
lxc.include = /usr/share/lxc/config/centos.common.conf

lxc.arch = x86_64
lxc.utsname = osestaging1

lxc.autodev = 1

# When using LXC with apparmor, uncomment the next line to run unconfined:
#lxc.aa_profile = unconfined

# example simple networking setup, uncomment to enable
#lxc.network.type = veth
#lxc.network.flags = up
#lxc.network.link = lxcbr0
#lxc.network.name = eth0
# Additional example for veth network type
#    static MAC address,
#lxc.network.hwaddr = 00:16:3e:77:52:20
#    persistent veth device name on host side
#        Note: This may potentially collide with other containers of same name!
#lxc.network.veth.pair = v-osestaging1-e0

[root@osedev1 osestaging1]#

to my horror, I discovered that iptables was disabled on the dev server! why!?!

[root@osedev1 osestaging1]# iptables-save
[root@osedev1 osestaging1]# ip6tables-save
[root@osedev1 osestaging1]# service iptables status
Redirecting to /bin/systemctl status iptables.service
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
[root@osedev1 osestaging1]# service iptables start
Redirecting to /bin/systemctl start iptables.service
[root@osedev1 osestaging1]# iptables-save
# Generated by iptables-save v1.4.21 on Wed Oct  2 12:58:21 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [17:1396]
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 32415 -j ACCEPT
-A INPUT -p udp -m state --state NEW -m udp --dport 1194 -j ACCEPT
-A INPUT -j DROP
COMMIT
# Completed on Wed Oct  2 12:58:21 2019
[root@osedev1 osestaging1]# ip6tables-save
root@osedev1 osestaging1]# service ip6tables start
Redirecting to /bin/systemctl start ip6tables.service
[root@osedev1 osestaging1]# ip6tables-save
# Generated by ip6tables-save v1.4.21 on Wed Oct  2 12:59:51 2019
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p ipv6-icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -d fe80::/64 -p udp -m udp --dport 546 -m state --state NEW -j ACCEPT
-A INPUT -j REJECT --reject-with icmp6-adm-prohibited
-A FORWARD -j REJECT --reject-with icmp6-adm-prohibited
COMMIT
# Completed on Wed Oct  2 12:59:51 2019
[root@osedev1 osestaging1]#

systemd says that both iptables.service & ip6tables.service are 'loaded active exited'

[root@osedev1 osestaging1]# systemctl list-units | grep -Ei 'iptables|ip6tables'
ip6tables.service                                                                           loaded active exited    IPv6 firewall with ip6tables
iptables.service                                                                            loaded active exited    IPv4 firewall with iptables
[root@osedev1 osestaging1]#

systemd status shows both services are 'disabled'

[root@osedev1 osestaging1]# systemctl status iptables.service
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:58:17 CEST; 7min ago
  Process: 29121 ExecStart=/usr/libexec/iptables/iptables.init start (code=exited, status=0/SUCCESS)
 Main PID: 29121 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/iptables.service

Oct 02 12:58:17 osedev1 systemd[1]: Starting IPv4 firewall with iptables...
Oct 02 12:58:17 osedev1 iptables.init[29121]: iptables: Applying firewall rules: [  OK  ]
Oct 02 12:58:17 osedev1 systemd[1]: Started IPv4 firewall with iptables.
[root@osedev1 osestaging1]# systemctl status ip6tables.service
● ip6tables.service - IPv6 firewall with ip6tables
   Loaded: loaded (/usr/lib/systemd/system/ip6tables.service; disabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:59:46 CEST; 6min ago
  Process: 29233 ExecStart=/usr/libexec/iptables/ip6tables.init start (code=exited, status=0/SUCCESS)
 Main PID: 29233 (code=exited, status=0/SUCCESS)

Oct 02 12:59:46 osedev1 systemd[1]: Starting IPv6 firewall with ip6tables...
Oct 02 12:59:46 osedev1 ip6tables.init[29233]: ip6tables: Applying firewall rules: [  OK  ]
Oct 02 12:59:46 osedev1 systemd[1]: Started IPv6 firewall with ip6tables.
[root@osedev1 osestaging1]#

I enabled both, and I confirmed that they're now set to 'enabled' (see second line)

[root@osedev1 osestaging1]# systemctl enable iptables.service
Created symlink from /etc/systemd/system/basic.target.wants/iptables.service to /usr/lib/systemd/system/iptables.service.
[root@osedev1 osestaging1]# systemctl enable ip6tables.service
Created symlink from /etc/systemd/system/basic.target.wants/ip6tables.service to /usr/lib/systemd/system/ip6tables.service.
[root@osedev1 osestaging1]# systemctl status iptables.service
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; enabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:58:17 CEST; 8min ago
 Main PID: 29121 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/iptables.service

Oct 02 12:58:17 osedev1 systemd[1]: Starting IPv4 firewall with iptables...
Oct 02 12:58:17 osedev1 iptables.init[29121]: iptables: Applying firewall rules: [  OK  ]
Oct 02 12:58:17 osedev1 systemd[1]: Started IPv4 firewall with iptables.
[root@osedev1 osestaging1]# systemctl status ip6tables.service
● ip6tables.service - IPv6 firewall with ip6tables
   Loaded: loaded (/usr/lib/systemd/system/ip6tables.service; enabled; vendor preset: disabled)
   Active: active (exited) since Wed 2019-10-02 12:59:46 CEST; 7min ago
 Main PID: 29233 (code=exited, status=0/SUCCESS)

Oct 02 12:59:46 osedev1 systemd[1]: Starting IPv6 firewall with ip6tables...
Oct 02 12:59:46 osedev1 ip6tables.init[29233]: ip6tables: Applying firewall rules: [  OK  ]
Oct 02 12:59:46 osedev1 systemd[1]: Started IPv6 firewall with ip6tables.
[root@osedev1 osestaging1]#

actually, it doesn't make sense to have the staging server only have an ip address on the openvpn subnet; if that were the case, then it couldn't access the internet...which would make developing a POC nearly impossible. We want to prevent forwarding ports from the internet to the machine, but we do want to let it reach OUT to the internet. Perhaps we should setup the bridge per normal and then just have the openvpn client running on he staging server. Indeed, we'll need the prod server to be running an openvpn client, so we should be able to just duplicate this config (they'll be the same anyway!)
I looked into what options are available for 'lxc.network.type', which is listed in section 5 of the man page for 'lxc.container.conf' = `man 5 lxc.container.conf`

lxc.network.type
specify what kind of network virtualization to be used for the container. Each time a lxc.network.type field is found a
new round of network configuration begins. In this way, several network virtualization types can be specified for the
same container, as well as assigning several network interfaces for one container. The different virtualization types
can be:

none: will cause the container to share the host's network namespace. This means the host network devices are usable in
the container. It also means that if both the container and host have upstart as init, 'halt' in a container (for
instance) will shut down the host.

empty: will create only the loopback interface.

veth: a virtual ethernet pair device is created with one side assigned to the container and the other side attached to
a bridge specified by the lxc.network.link option. If the bridge is not specified, then the veth pair device will be
created but not attached to any bridge. Otherwise, the bridge has to be created on the system before starting the con‐
tainer. lxc won't handle any configuration outside of the container. By default, lxc chooses a name for the network
device belonging to the outside of the container, but if you wish to handle this name yourselves, you can tell lxc to
set a specific name with the lxc.network.veth.pair option (except for unprivileged containers where this option is
ignored for security reasons).

vlan: a vlan interface is linked with the interface specified by the lxc.network.link and assigned to the container.
The vlan identifier is specified with the option lxc.network.vlan.id.

macvlan: a macvlan interface is linked with the interface specified by the lxc.network.link and assigned to the con‐
tainer. lxc.network.macvlan.mode specifies the mode the macvlan will use to communicate between different macvlan on
the same upper device. The accepted modes are private, the device never communicates with any other device on the same
upper_dev (default), vepa, the new Virtual Ethernet Port Aggregator (VEPA) mode, it assumes that the adjacent bridge
returns all frames where both source and destination are local to the macvlan port, i.e. the bridge is set up as a
reflective relay. Broadcast frames coming in from the upper_dev get flooded to all macvlan interfaces in VEPA mode,
local frames are not delivered locally, or bridge, it provides the behavior of a simple bridge between different
macvlan interfaces on the same port. Frames from one interface to another one get delivered directly and are not sent
out externally. Broadcast frames get flooded to all other bridge ports and to the external interface, but when they
come back from a reflective relay, we don't deliver them again. Since we know all the MAC addresses, the macvlan bridge
mode does not require learning or STP like the bridge module does.

phys: an already existing interface specified by the lxc.network.link is assigned to the container.

1. we want the container to be able to touch the internet, so hat rules out 'empty'
2. we don't have a spare physical interface on the server for each container, so that rules out 'phys'
3. I'm unclear on the distinction between macvlan, vlan, veth, and none. Probably we want veth and we need to get the 'virbr0' interface actually working
google says our error may be caused by libvert not being installed
I didn't have libvirt installed, so I did so

[root@osedev1 osestaging1]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
	inet 195.201.233.113/32 brd 195.201.233.113 scope global dynamic eth0
	   valid_lft 50735sec preferred_lft 50735sec
	inet6 2a01:4f8:c010:3ca0::1/64 scope global
	   valid_lft forever preferred_lft forever
	inet6 fe80::9400:ff:fe2e:489d/64 scope link
	   valid_lft forever preferred_lft forever
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none
	inet 10.241.189.1 peer 10.241.189.2/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::4ca6:2d27:e97f:1a66/64 scope link flags 800
	   valid_lft forever preferred_lft forever
[root@osedev1 osestaging1]# rpm -qa | grep -i libvirt
[root@osedev1 osestaging1]# yum -y install libvirt
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de   
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu 
Resolving Dependencies
...
Complete!
[root@osedev1 osestaging1]#

but there didn't appear to be any changes; I had to manually start the libvirtd service to get the changes; now it shows two new interfaces: 'virbr0' & 'virbr0-nic'

[root@osedev1 osestaging1]# systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
	 Docs: man:libvirtd(8)
		   https://libvirt.org
[root@osedev1 osestaging1]# systemctl start libvirtd
[root@osedev1 osestaging1]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
	inet 195.201.233.113/32 brd 195.201.233.113 scope global dynamic eth0
	   valid_lft 50619sec preferred_lft 50619sec
	inet6 2a01:4f8:c010:3ca0::1/64 scope global
	   valid_lft forever preferred_lft forever
	inet6 fe80::9400:ff:fe2e:489d/64 scope link
	   valid_lft forever preferred_lft forever
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
	link/none
	inet 10.241.189.1 peer 10.241.189.2/32 scope global tun0
	   valid_lft forever preferred_lft forever
	inet6 fe80::4ca6:2d27:e97f:1a66/64 scope link flags 800
	   valid_lft forever preferred_lft forever
6: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
	inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
	   valid_lft forever preferred_lft forever
7: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
[root@osedev1 osestaging1]#

and there's some changes to the routing table too

[root@osedev1 osestaging1]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
	link/ether 96:00:00:2e:48:9d brd ff:ff:ff:ff:ff:ff
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 100
	link/none 
6: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
7: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000
	link/ether 52:54:00:7d:01:71 brd ff:ff:ff:ff:ff:ff
[root@osedev1 osestaging1]# ip r
default via 172.31.1.1 dev eth0 
10.241.189.0/24 via 10.241.189.2 dev tun0 
10.241.189.2 dev tun0 proto kernel scope link src 10.241.189.1 
169.254.0.0/16 dev eth0 scope link metric 1002 
172.31.1.1 dev eth0 scope link 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 
[root@osedev1 osestaging1]#

now I was successfully able to start the 'osestaging1' container

[root@osedev1 osestaging1]# lxc-start -n osestaging1
systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to CentOS Linux 7 (Core)!

Running in a container, ignoring fstab device entry for /dev/root.
Cannot add dependency job for unit display-manager.service, ignoring: Unit not found.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Swap.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Created slice Root Slice.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Paths.
[  OK  ] Listening on Delayed Shutdown Socket.
[  OK  ] Created slice System Slice.
[  OK  ] Created slice system-getty.slice.
		 Starting Journal Service...
		 Mounting POSIX Message Queue File System...
[  OK  ] Reached target Slices.
		 Starting Read and set NIS domainname from /etc/sysconfig/network...
		 Mounting Huge Pages File System...
		 Starting Remount Root and Kernel File Systems...
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Started Journal Service.
[  OK  ] Started Read and set NIS domainname from /etc/sysconfig/network.
[  OK  ] Started Remount Root and Kernel File Systems.
[  OK  ] Reached target Local File Systems (Pre).
		 Starting Configure read-only root support...
		 Starting Rebuild Hardware Database...
		 Starting Flush Journal to Persistent Storage...
<46>systemd-journald[14]: Received request to flush runtime journal from PID 1
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started Configure read-only root support.
		 Starting Load/Save Random Seed...
[  OK  ] Reached target Local File Systems.
		 Starting Rebuild Journal Catalog...
		 Starting Mark the need to relabel after reboot...
		 Starting Create Volatile Files and Directories...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Reached target Local File Systems.
		 Starting Rebuild Journal Catalog...
		 Starting Mark the need to relabel after reboot...
		 Starting Create Volatile Files and Directories...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Rebuild Journal Catalog.
[  OK  ] Started Mark the need to relabel after reboot.
[  OK  ] Started Create Volatile Files and Directories.
		 Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Started Rebuild Hardware Database.
		 Starting Update is Completed...
[  OK  ] Started Update is Completed.
[  OK  ] Reached target System Initialization.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
		 Starting LSB: Bring up/down networking...
		 Starting Permit User Sessions...
		 Starting Login Service...
		 Starting OpenSSH Server Key Generation...
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Started Permit User Sessions.
		 Starting Cleanup of Temporary Directories...
[  OK  ] Started Command Scheduler.
[  OK  ] Started Console Getty.
[  OK  ] Reached target Login Prompts.
[  OK  ] Started Cleanup of Temporary Directories.
[  OK  ] Started Login Service.
[  OK  ] Started OpenSSH Server Key Generation.

CentOS Linux 7 (Core)
Kernel 3.10.0-957.21.3.el7.x86_64 on an x86_64

osestaging1 login:

I was successfully able to login as root, but it made me change the password immedately. I just set it to the same root password as our prod server

osestaging1 login: root
Password: 
You are required to change your password immediately (root enforced)
Changing password for root.
(current) UNIX password: 
New password: 
Retype new password: 
[root@osestaging1 ~]#

this new container has an ip address of '192.168.122.201', and it does have access to the internet

[root@osestaging1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host 
	   valid_lft forever preferred_lft forever
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
	link/ether fe:07:06:a6:5f:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
	inet 192.168.122.201/24 brd 192.168.122.255 scope global dynamic eth0
	   valid_lft 3310sec preferred_lft 3310sec
	inet6 fe80::fc07:6ff:fea6:5f1d/64 scope link 
	   valid_lft forever preferred_lft forever
[root@osestaging1 ~]# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=55 time=5.46 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=55 time=5.48 ms

--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 5.468/5.474/5.480/0.006 ms
[root@osestaging1 ~]#

on the dev node host, we can also see the bridge with `brctl`

[root@osedev1 osestaging1]# brctl show
bridge name     bridge id               STP enabled     interfaces
virbr0          8000.5254007d0171       yes             vethYMJVGD
														virbr0-nic
[root@osedev1 osestaging1]#

now I think we're about ready to initiate this sync. Interesting decision: we could either rsync (via ssh) to the dev node or to the staging container. I think it would be safer to go to the container, as you can't fuck up the host dev node in that case.
I confirmed that ssh is listening on the default install of the staging container

[root@osestaging1 ~]# ss -plan | grep -i ssh
u_str  ESTAB      0      0         * 162265                * 0                   users:(("sshd",pid=298,fd=2),("sshd",pid=298,fd=1))
tcp    LISTEN     0      128       *:22                    *:*                   users:(("sshd",pid=298,fd=3))
tcp    LISTEN     0      128    [::]:22                 [::]:*                   users:(("sshd",pid=298,fd=4))
[root@osestaging1 ~]#

I did some basic bootstrap config of the staging container, following my documentation for doing the same to its host dev server Maltfield_Log/2019_Q3#Tue_Aug_20.2C_2019

[root@osestaging1 ~]# useradd maltfield
[root@osestaging1 ~]# su - maltfield
[maltfield@osestaging1 ~]$ mkdir .ssh
[maltfield@osestaging1 ~]$ echo "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGNYjR7UKiJSAG/AbP+vlCBqNfQZ2yuSXfsEDuM7cEU8PQNJyuJnS7m0VcA48JRnpUpPYYCCB0fqtIEhpP+szpMg2LByfTtbU0vDBjzQD9mEfwZ0mzJsfzh1Nxe86l/d6h6FhxAqK+eG7ljYBElDhF4l2lgcMAl9TiSba0pcqqYBRsvJgQoAjlZOIeVEvM1lyfWfrmDaFK37jdUCBWq8QeJ98qpNDX4A76f9T5Y3q5EuSFkY0fcU+zwFxM71bGGlgmo5YsMMdSsW+89fSG0652/U4sjf4NTHCpuD0UaSPB876NJ7QzeDWtOgyBC4nhPpS8pgjsnl48QZuVm6FNDqbXr9bVk5BdntpBgps+gXdSL2j0/yRRayLXzps1LCdasMCBxCzK+lJYWGalw5dNaIDHBsEZiK55iwPp0W3lU9vXFO4oKNJGFgbhNmn+KAaW82NBwlTHo/tOlj2/VQD9uaK5YLhQqAJzIq0JuWZWFLUC2FJIIG0pJBIonNabANcN+vq+YJqjd+JXNZyTZ0mzuj3OAB/Z5zS6lT9azPfnEjpcOngFs46P7S/1hRIrSWCvZ8kfECpa8W+cTMus4rpCd40d1tVKzJA/n0MGJjEs2q4cK6lC08pXxq9zAyt7PMl94PHse2uzDFhrhh7d0ManxNZE+I5/IPWOnG1PJsDlOe4Yqw== michael@opensourceecology.org" > .ssh/authorized_keys
[maltfield@osestaging1 ~]$ chmod 0700 .ssh
[maltfield@osestaging1 ~]$ chmod 0600 .ssh/authorized_keys
[maltfield@osestaging1 ~]$

I confirmed that I could now successfully ssh in as 'maltfield' using my key into staging from within dev

user@ose:~$ ssh -A osedev1
Last login: Wed Oct  2 12:09:35 2019 from 5.254.96.238
[maltfield@osedev1 ~]$ ssh maltfield@192.168.122.201 hostname
The authenticity of host '192.168.122.201 (192.168.122.201)' can't be established.
ECDSA key fingerprint is SHA256:a6NpVsq/qdOCV8o7u3TXeVfZIxp7hpgMqXFOifTuNrI.
ECDSA key fingerprint is MD5:ab:eb:7f:f2:bb:83:a1:e5:21:49:1e:22:93:17:70:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.122.201' (ECDSA) to the list of known hosts.
osestaging1
[maltfield@osedev1 ~]$

and continued with the bootstrap of my user, giving myself sudo rights

[root@osestaging1 ~]# yum -y install sudo
...
Installed:
  sudo.x86_64 0:1.8.23-4.el7                                                                                                             

Complete!
[root@osestaging1 ~]# passwd maltfield
Changing password for user maltfield.
New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.
[root@osestaging1 ~]# gpasswd -a maltfield wheel
Adding user maltfield to group wheel
[root@osestaging1 ~]# su - maltfield
Last login: Wed Oct  2 13:00:29 UTC 2019 on lxc/console
[maltfield@osestaging1 ~]$ sudo su -

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

	#1) Respect the privacy of others.
	#2) Think before you type.
	#3) With great power comes great responsibility.

[sudo] password for maltfield: 
Last login: Wed Oct  2 12:33:00 UTC 2019 on lxc/console
[root@osestaging1 ~]#

this time I took the hardened config from dev and gave it to staging; first on dev I ran:

user@ose:~$ ssh osedev1
Last login: Wed Oct  2 14:57:15 2019 from 5.254.96.238
[maltfield@osedev1 ~]$ sudo cp /etc/ssh/sshd_config .
[maltfield@osedev1 ~]$ sudo chown maltfield sshd_config 
[maltfield@osedev1 ~]$ scp sshd_config 192.168.122.201:
sshd_config                                   100% 4455     5.7MB/s   00:00    
[maltfield@osedev1 ~]$

and then in staging

[maltfield@osestaging1 ~]$ ls
sshd_config
[maltfield@osestaging1 ~]$ sudo su -
[sudo] password for maltfield:
Last login: Wed Oct  2 13:02:02 UTC 2019 on lxc/console
[root@osestaging1 ~]# cd /etc/ssh
[root@osestaging1 ssh]# mv sshd_config sshd_config.20191002.orig
[root@osestaging1 ssh]# mv /home/maltfield/sshd_config .
[root@osestaging1 ssh]# ls -lah
total 620K
drwxr-xr-x.  2 root      root      4.0K Oct  2 13:16 .
drwxr-xr-x. 60 root      root      4.0K Oct  2 13:01 ..
-rw-r--r--.  1 root      root      569K Aug  9 01:40 moduli
-rw-r--r--.  1 root      root      2.3K Aug  9 01:40 ssh_config
-rw-r-----.  1 root      ssh_keys   227 Oct  2 12:28 ssh_host_ecdsa_key
-rw-r--r--.  1 root      root       162 Oct  2 12:28 ssh_host_ecdsa_key.pub
-rw-r-----.  1 root      ssh_keys   387 Oct  2 12:28 ssh_host_ed25519_key
-rw-r--r--.  1 root      root        82 Oct  2 12:28 ssh_host_ed25519_key.pub
-rw-r-----.  1 root      ssh_keys  1.7K Oct  2 12:28 ssh_host_rsa_key
-rw-r--r--.  1 root      root       382 Oct  2 12:28 ssh_host_rsa_key.pub
-rw-------.  1 maltfield maltfield 4.4K Oct  2 13:07 sshd_config
-rw-------.  1 root      root      3.9K Aug  9 01:40 sshd_config.20191002.orig
[root@osestaging1 ssh]# chown root:root sshd_config
[root@osestaging1 ssh]# ls -lah
total 620K
drwxr-xr-x.  2 root root     4.0K Oct  2 13:16 .
drwxr-xr-x. 60 root root     4.0K Oct  2 13:01 ..
-rw-r--r--.  1 root root     569K Aug  9 01:40 moduli
-rw-r--r--.  1 root root     2.3K Aug  9 01:40 ssh_config
-rw-r-----.  1 root ssh_keys  227 Oct  2 12:28 ssh_host_ecdsa_key
-rw-r--r--.  1 root root      162 Oct  2 12:28 ssh_host_ecdsa_key.pub
-rw-r-----.  1 root ssh_keys  387 Oct  2 12:28 ssh_host_ed25519_key
-rw-r--r--.  1 root root       82 Oct  2 12:28 ssh_host_ed25519_key.pub
-rw-r-----.  1 root ssh_keys 1.7K Oct  2 12:28 ssh_host_rsa_key
-rw-r--r--.  1 root root      382 Oct  2 12:28 ssh_host_rsa_key.pub
-rw-------.  1 root root     4.4K Oct  2 13:07 sshd_config
-rw-------.  1 root root     3.9K Aug  9 01:40 sshd_config.20191002.orig
[root@osestaging1 ssh]# grep AllowGroups sshd_config
AllowGroups sshaccess
[root@osestaging1 ssh]# grep sshaccess /etc/group
[root@osestaging1 ssh]# groupadd sshaccess
[root@osestaging1 ssh]# gpasswd -a maltfield sshaccess
Adding user maltfield to group sshaccess
[root@osestaging1 ssh]# grep sshaccess /etc/group
sshaccess:x:1001:maltfield
[root@osestaging1 ssh]# systemctl restart sshd
[root@osestaging1 ssh]#

confirmed that I could still ssh-in on the new non-standard port from dev to staging

user@ose:~$ ssh osedev1
Last login: Wed Oct  2 15:13:21 2019 from 5.254.96.225
[maltfield@osedev1 ~]$ ssh maltfield@192.168.122.201 hostname
ssh: connect to host 192.168.122.201 port 22: Connection refused
[maltfield@osedev1 ~]$ ssh -p 32415 maltfield@192.168.122.201 hostname
osestaging1
[maltfield@osedev1 ~]$

I could go on further to setup iptables to block things incoming, but the beauty of the fact that this is a container with a NAT'd private ip address on a host with iptables locked-down on its internet-facing ip address is that we really don't need to do that. It's already inaccessible to the internet, and it will only be accessible from the dev node--onto which our developers will vpn into as a necessary prerequisite to reach this staging node
let's make it so that prod can touch staging; we'll create a cert for openvpn for our prod node, and install it on both our prod & staging nodes. Then we'll update our openvpn config to include the client-to-client option https://openvpn.net/community-resources/how-to/#scope
before continuing, it would be wise to create a snapshot of the staging container

[root@osedev1 ssh]# lxc-snapshot --name osestaging1 --list
No snapshots
[root@osedev1 ssh]# lxc-snapshot --name osestaging1 afterBootstrap
lxc_container: lxccontainer.c: lxcapi_snapshot: 2891 Snapshot of directory-backed container requested.
lxc_container: lxccontainer.c: lxcapi_snapshot: 2892 Making a copy-clone.  If you do want snapshots, then
lxc_container: lxccontainer.c: lxcapi_snapshot: 2893 please create an aufs or overlayfs clone first, snapshot that
lxc_container: lxccontainer.c: lxcapi_snapshot: 2894 and keep the original container pristine.
lxc_container: lxccontainer.c: lxcapi_clone: 2643 error: Original container (osestaging1) is running
lxc_container: lxccontainer.c: lxcapi_snapshot: 2899 clone of /var/lib/lxc:osestaging1 failed
lxc_container: lxc_snapshot.c: do_snapshot: 55 Error creating a snapshot
[root@osedev1 ssh]#

I tried to create a snapshot; it told me that it can't do deltas unless I use overlayfs or aufs (or probably also zfs, butter, etc). It failed probably because the container is not stopped. I stopped it and tried again.

[root@osedev1 ssh]# lxc-snapshot --name osestaging1 afterBootstrap
lxc_container: lxccontainer.c: lxcapi_snapshot: 2891 Snapshot of directory-backed container requested.
lxc_container: lxccontainer.c: lxcapi_snapshot: 2892 Making a copy-clone.  If you do want snapshots, then
lxc_container: lxccontainer.c: lxcapi_snapshot: 2893 please create an aufs or overlayfs clone first, snapshot that
lxc_container: lxccontainer.c: lxcapi_snapshot: 2894 and keep the original container pristine.
[root@osedev1 ssh]# lxc-snapshot --name osestaging1 --list
snap0 (/var/lib/lxcsnaps/osestaging1) 2019:10:02 15:37:58
[root@osedev1 ssh]#

so our container is 0.5G, and so is our 1x snapshot

[root@osedev1 ssh]# du -sh /var/lib/lxcsnaps/*
459M    /var/lib/lxcsnaps/osestaging1
[root@osedev1 ssh]# du -sh /var/lib/lxc/*
459M    /var/lib/lxc/osestaging1
[root@osedev1 ssh]#

eventually we'll need to mount the external block volume to /var/, especially before the sync from pod

[root@osedev1 ssh]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        19G  2.4G   16G  14% /
devtmpfs        873M     0  873M   0% /dev
tmpfs           896M     0  896M   0% /dev/shm
tmpfs           896M   17M  879M   2% /run
tmpfs           896M     0  896M   0% /sys/fs/cgroup
/dev/sdb        9.8G   37M  9.3G   1% /mnt/HC_Volume_3110278
tmpfs           180M     0  180M   0% /run/user/1000
[root@osedev1 ssh]#

as for backups, I created new API keys that have access to only the 'ose-dev-server-backups' bucket.
because randomware is a topic of concern (and where the randomware deletes your backups), I also noticed that when we create the api key, we can remove the 'deleteFiles' and 'deleteBuckets' capabilities (the cleanup is actually done by the storage rules on backblaze's sides--not our script's logic) Apparently there's no way to edit the capabilities of exiting keys, so this would be a non-trivial change.
I wrote the api key creds to osedev1:/root/scripts/backup.settings
And I created a new 4K encryption key. TO make it clearer, I named it 'ose-dev-backups-cron.201910.key'. I added it to the shared ose keepass db under "backups" (files attached are under the "Advanced" tab)
I also installed the b2cli depends to the dev node, unfortunately I hit some issues https://wiki.opensourceecology.org/wiki/Backblaze#Install_CLI

[root@osedev1 backups]# yum install python-virtualenv
...
Installed:
  python-virtualenv.noarch 0:15.1.0-2.el7

Dependency Installed:
  python-devel.x86_64 0:2.7.5-86.el7    python-rpm-macros.noarch 0:3-32.el7  python-srpm-macros.noarch 0:3-32.el7
  python2-rpm-macros.noarch 0:3-32.el7

Dependency Updated:
  python.x86_64 0:2.7.5-86.el7                          python-libs.x86_64 0:2.7.5-86.el7

Complete!
[root@osedev1 backups]# yum install python-setuptools
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.alpix.eu
 * epel: mirror.wiuwiu.de
 * extras: centosmirror.netcup.net
 * updates: mirror.alpix.eu
Package python-setuptools-0.9.8-7.el7.noarch already installed and latest version
Nothing to do
[root@osedev1 backups]# yum install git
...
Installed:
  git.x86_64 0:1.8.3.1-20.el7                                                                                     

Dependency Installed:
  perl-Error.noarch 1:0.17020-2.el7   perl-Git.noarch 0:1.8.3.1-20.el7   perl-TermReadKey.x86_64 0:2.30-20.el7  

Complete!
[root@osedev1 backups]# adduser b2user
[root@osedev1 backups]# sudo su - b2user
[b2user@osedev1 ~]$ mkdir virtualenv
[b2user@osedev1 ~]$ cd virtualenv/
[b2user@osedev1 virtualenv]$ virtualenv .
New python executable in /home/b2user/virtualenv/bin/python
Installing setuptools, pip, wheel...done.
[b2user@osedev1 virtualenv]$ cd ..
[b2user@osedev1 ~]$ mkdir sandbox
[b2user@osedev1 ~]$ cd sandbox/
[b2user@osedev1 sandbox]$ git clone https://github.com/Backblaze/B2_Command_Line_Tool.git
Cloning into 'B2_Command_Line_Tool'...
remote: Enumerating objects: 151, done.
remote: Counting objects: 100% (151/151), done.
remote: Compressing objects: 100% (93/93), done.
remote: Total 7130 (delta 90), reused 102 (delta 55), pack-reused 6979
Receiving objects: 100% (7130/7130), 1.80 MiB | 3.35 MiB/s, done.
Resolving deltas: 100% (5127/5127), done.
[b2user@osedev1 sandbox]$ cd B2_Command_Line_Tool/
[b2user@osedev1 B2_Command_Line_Tool]$ python setup.py install
setuptools 20.2 or later is required. To fix, try running: pip install "setuptools>=20.2"
[b2user@osedev1 B2_Command_Line_Tool]$

I hate using pip; it often breaks the OS and apps installed, but I bit my tounge & proceeded (I wouldn't do this on prod)

[root@osedev1 backups]# yum install python3-setuptools
Installed:
  python3-setuptools.noarch 0:39.2.0-10.el7

Dependency Installed:   
  python3.x86_64 0:3.6.8-10.el7      python3-libs.x86_64 0:3.6.8-10.el7      python3-pip.noarch 0:9.0.3-5.el7

Complete!
[root@osedev1 backups]#
[root@osedev1 backups]# pip install "setuptools>=20.2"
-bash: pip: command not found
[root@osedev1 backups]# yum install python-pip
...
Installed:
  python2-pip.noarch 0:8.1.2-10.el7

Complete!
[root@osedev1 backups]# pip install "setuptools>=20.2"
Collecting setuptools>=20.2
  Downloading https://files.pythonhosted.org/packages/b2/86/095d2f7829badc207c893dd4ac767e871f6cd547145df797ea26baea4e2e/setuptools-41.2.0-py2.py3-none-any.whl (576kB)
	100% || 583kB 832kB/s
Installing collected packages: setuptools
  Found existing installation: setuptools 0.9.8
	Uninstalling setuptools-0.9.8:
	  Successfully uninstalled setuptools-0.9.8
Successfully installed setuptools-41.2.0
You are using pip version 8.1.2, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[root@osedev1 backups]# pip install --upgrade pip
Collecting pip
  Downloading https://files.pythonhosted.org/packages/30/db/9e38760b32e3e7f40cce46dd5fb107b8c73840df38f0046d8e6514e675a1/pip-19.2.3-py2.py3-none-any.whl (1.4MB)
	100% || 1.4MB 511kB/s
Installing collected packages: pip
  Found existing installation: pip 8.1.2
	Uninstalling pip-8.1.2:
	  Successfully uninstalled pip-8.1.2
Successfully installed pip-19.2.3
[root@osedev1 backups]#

when it came time to install it, I had to add the '--user' flag

[b2user@osedev1 B2_Command_Line_Tool]$ python setup.py install --user
...
Installed /home/b2user/.local/lib/python2.7/site-packages/python_dateutil-2.8.0-py2.7.egg
Searching for setuptools==41.2.0
Best match: setuptools 41.2.0
Adding setuptools 41.2.0 to easy-install.pth file
Installing easy_install script to /home/b2user/.local/bin
Installing easy_install-3.6 script to /home/b2user/.local/bin

Using /usr/lib/python2.7/site-packages
Finished processing dependencies for b2==1.4.1
[b2user@osedev1 B2_Command_Line_Tool]$ 
[b2user@osedev1 B2_Command_Line_Tool]$ ^C
[b2user@osedev1 B2_Command_Line_Tool]$  ~/.local/bin/b2 version
b2 command line tool, version 1.4.1
[b2user@osedev1 B2_Command_Line_Tool]$

Maltfield Log/2019 Q4: Difference between revisions