Maltfield Log/2020 Q3
My work log from the year 2020 Quarter 3. I intentionally made this verbose to make future admin's work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.
See Also
Thr Jun 15, 2020
1. Rob mentioned he couldn't get into staging 2. I checked and it was down 3. I checked the dev node, and all my screens were gone. Uptime is listed as 13 minutes?!? 4. I launched a new screen and started the staging server with the lxc-start command
screen -S staging lxc-start --name osestaging1
6. I still couldn't get to the staing sites, dns didn't return anything
user@ose:~/openvpn$ echo "nameserver 10.241.189.1" | sudo tee /etc/resolv.conf nameserver 10.241.189.1 user@ose:~/openvpn$ dig opensourceecology.org
- <<>> DiG 9.10.3-P4-Debian <<>> opensourceecology.org
- global options
- +cmd
- connection timed out; no servers could be reached
user@ose:~/openvpn$
8. looks like dnsmasq isn't setup to start on booot on the dev node; I changed that
[maltfield@osedev1 ~]$ sudo su - Last login: Mon Jun 15 14:56:22 CEST 2020 on pts/2 [root@osedev1 ~]# systemctl list-unit-files | grep dnsm dnsmasq.service disabled [root@osedev1 ~]# systemctl enable dnsmasq.service Created symlink from /etc/systemd/system/multi-user.target.wants/dnsmasq.service to /usr/lib/systemd/system/dnsmasq.service. [root@osedev1 ~]# systemctl list-unit-files | grep dnsm dnsmasq.service enabled [root@osedev1 ~]# systemctl start dnsmasq.service [root@osedev1 ~]#
10. now dns works
user@ose:~/openvpn$ dig opensourceecology.org
- <<>> DiG 9.10.3-P4-Debian <<>> opensourceecology.org
- global options
- +cmd
- Got answer
- ->>HEADER<<- opcode
- QUERY, status: NOERROR, id: 62798
- flags
- qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
- OPT PSEUDOSECTION
- EDNS
- version: 0, flags:; udp: 4096
- QUESTION SECTION
- opensourceecology.org. IN A
- ANSWER SECTION
opensourceecology.org. 0 IN A 10.241.189.11
- Query time
- 172 msec
- SERVER
- 10.241.189.1#53(10.241.189.1)
- WHEN
- Mon Jun 15 18:51:58 +0545 2020
- MSG SIZE rcvd
- 66
user@ose:~/openvpn$
12. The first entry in the journalctl on the dev server starts at June 15 (today) at 01:50:03
[root@osedev1 ~]# journalctl | head -- Logs begin at Mon 2020-06-15 01:50:03 CEST, end at Mon 2020-06-15 15:07:45 CEST. -- Jun 15 01:50:03 localhost systemd-journal[101]: Runtime journal is using 8.0M (max allowed 89.5M, trying to leave 134.2M free of 887.0M available → current limit 89.5M). Jun 15 01:50:03 localhost kernel: Initializing cgroup subsys cpuset Jun 15 01:50:03 localhost kernel: Initializing cgroup subsys cpu Jun 15 01:50:03 localhost kernel: Initializing cgroup subsys cpuacct Jun 15 01:50:03 localhost kernel: Linux version 3.10.0-957.21.3.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Tue Jun 18 16:35:19 UTC 2019 Jun 15 01:50:03 localhost kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.21.3.el7.x86_64 root=UUID=1e457b76-5100-4b53-bcdc-667ca122b941 ro crashkernel=auto consoleblank=0 systemd.show_status=true elevator=noop console=tty1 console=ttyS0 Jun 15 01:50:03 localhost kernel: e820: BIOS-provided physical RAM map: Jun 15 01:50:03 localhost kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable Jun 15 01:50:03 localhost kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved [root@osedev1 ~]#
14. the entries preceeding that in /var/log/messages aren't very helpful
Jun 15 00:56:34 osedev1 chronyd[674]: System clock wrong by -3.278855 seconds, adjustment started Jun 15 00:57:39 osedev1 chronyd[674]: System clock wrong by -1.017910 seconds, adjustment started Jun 15 00:58:44 osedev1 chronyd[674]: System clock wrong by 1.292079 seconds, adjustment started Jun 15 00:59:49 osedev1 chronyd[674]: System clock wrong by -1.323920 seconds, adjustment started Jun 15 01:01:01 osedev1 systemd: Created slice User Slice of root. Jun 15 01:01:01 osedev1 systemd: Started Session 205569 of user root. Jun 15 01:01:01 osedev1 systemd: Removed slice User Slice of root. Jun 15 01:01:59 osedev1 chronyd[674]: System clock wrong by 1.104784 seconds, adjustment started Jun 15 01:02:08 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:02:08 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:02:11 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:03:04 osedev1 chronyd[674]: Selected source 176.9.103.244 Jun 15 01:03:13 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:03:15 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:04:08 osedev1 chronyd[674]: Selected source 176.9.103.244 Jun 15 01:04:17 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:04:20 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:05:13 osedev1 chronyd[674]: Selected source 2a02:c207:3003:930::1 Jun 15 01:05:22 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:09:30 osedev1 chronyd[674]: Selected source 176.9.103.244 Jun 15 01:09:30 osedev1 chronyd[674]: System clock wrong by 1.213375 seconds, adjustment started Jun 15 01:12:55 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:12:56 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:14:50 osedev1 dnsmasq-dhcp[1346]: DHCPREQUEST(virbr0) 192.168.122.201 fe:07:06:a6:5f:1d Jun 15 01:14:50 osedev1 dnsmasq-dhcp[1346]: DHCPACK(virbr0) 192.168.122.201 fe:07:06:a6:5f:1d osestaging1 Jun 15 01:16:09 osedev1 chronyd[674]: Selected source 176.9.103.244 Jun 15 01:16:09 osedev1 chronyd[674]: Selected source 195.201.19.162 Jun 15 01:17:13 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:17:15 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:19:15 osedev1 chronyd[674]: Selected source 2a02:c207:3003:930::1 Jun 15 01:19:22 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:20:19 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:20:26 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:22:28 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:25:42 osedev1 chronyd[674]: Source 176.9.103.244 replaced with 213.209.109.44 Jun 15 01:28:56 osedev1 chronyd[674]: Selected source 213.209.109.44 Jun 15 01:31:06 osedev1 chronyd[674]: Can't synchronise: no selectable sources Jun 15 01:34:21 osedev1 chronyd[674]: Selected source 213.209.109.44 Jun 15 01:36:29 osedev1 chronyd[674]: Can't synchronise: no selectable sources Jun 15 01:36:50 osedev1 dnsmasq-dhcp[1346]: DHCPREQUEST(virbr0) 192.168.122.201 fe:07:06:a6:5f:1d Jun 15 01:36:50 osedev1 dnsmasq-dhcp[1346]: DHCPACK(virbr0) 192.168.122.201 fe:07:06:a6:5f:1d osestaging1 Jun 15 01:37:34 osedev1 chronyd[674]: Selected source 213.209.109.44 Jun 15 01:37:34 osedev1 chronyd[674]: System clock wrong by 1.293791 seconds, adjustment started Jun 15 01:50:03 osedev1 kernel: Initializing cgroup subsys cpuset Jun 15 01:50:03 osedev1 kernel: Initializing cgroup subsys cpu
16. meanwhile, I still can't get to the server on staging as nginx is stopped, but just a simple nginx restart fixed it *shrug*
[maltfield@osestaging1 ~]$ systemctl status nginx ● nginx.service - The nginx HTTP and reverse proxy server
Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2020-06-15 12:56:46 UTC; 13min ago Process: 343 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=1/FAILURE) Process: 327 ExecStartPre=/usr/bin/rm -f /run/nginx.pid (code=exited, status=0/SUCCESS)
Jun 15 12:56:46 osestaging1 nginx[343]: nginx: [warn] the "ssl" directive is deprecated, use the "listen ...de:11 Jun 15 12:56:46 osestaging1 nginx[343]: nginx: [warn] the "ssl" directive is deprecated, use the "listen ...de:11 Jun 15 12:56:46 osestaging1 nginx[343]: nginx: [warn] conflicting server name "_" on 10.241.189.11:443, ignored Jun 15 12:56:46 osestaging1 nginx[343]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok Jun 15 12:56:46 osestaging1 systemd[1]: nginx.service: control process exited, code=exited status=1 Jun 15 12:56:46 osestaging1 nginx[343]: nginx: [emerg] bind() to 10.241.189.11:4443 failed (99: Cannot as...ress) Jun 15 12:56:46 osestaging1 nginx[343]: nginx: configuration file /etc/nginx/nginx.conf test failed Jun 15 12:56:46 osestaging1 systemd[1]: Failed to start The nginx HTTP and reverse proxy server. Jun 15 12:56:46 osestaging1 systemd[1]: Unit nginx.service entered failed state. Jun 15 12:56:46 osestaging1 systemd[1]: nginx.service failed. Hint: Some lines were ellipsized, use -l to show in full. Last login: Sun Jun 14 21:02:44 UTC 2020 on pts/1 [root@osestaging1 ~]# systemctl restart nginx [root@osestaging1 ~]#
18. Anyway, high availability on dev/staging isn't really a priority, so whatever. If this is our first outage and it was fixable in 20 minutes, it's fine.
19. Rob is unblocked; I asked him to continue on his wordpress upgrade task and let me know if he encounteres any other issues