Maltfield Log/2020 Q3

From Open Source Ecology
< Maltfield Log
Revision as of 11:06, 27 September 2020 by Maltfield (talk | contribs) (Created page with "My work log from the year 2020 Quarter 3. I intentionally made this verbose to make future admin's work easier when troubleshooting. The more keywords, error messages, etc tha...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

My work log from the year 2020 Quarter 3. I intentionally made this verbose to make future admin's work easier when troubleshooting. The more keywords, error messages, etc that are listed in this log, the more helpful it will be for the future OSE Sysadmin.

See Also

  1. Maltfield_Log
  2. User:Maltfield
  3. Special:Contributions/Maltfield

Thr Jun 15, 2020

1. Rob mentioned he couldn't get into staging 2. I checked and it was down 3. I checked the dev node, and all my screens were gone. Uptime is listed as 13 minutes?!? 4. I launched a new screen and started the staging server with the lxc-start command

screen -S staging lxc-start --name osestaging1


6. I still couldn't get to the staing sites, dns didn't return anything

user@ose:~/openvpn$ echo "nameserver 10.241.189.1" | sudo tee /etc/resolv.conf nameserver 10.241.189.1 user@ose:~/openvpn$ dig opensourceecology.org

<<>> DiG 9.10.3-P4-Debian <<>> opensourceecology.org
global options
+cmd
connection timed out; no servers could be reached

user@ose:~/openvpn$


8. looks like dnsmasq isn't setup to start on booot on the dev node; I changed that

[maltfield@osedev1 ~]$ sudo su - Last login: Mon Jun 15 14:56:22 CEST 2020 on pts/2 [root@osedev1 ~]# systemctl list-unit-files | grep dnsm dnsmasq.service disabled [root@osedev1 ~]# systemctl enable dnsmasq.service Created symlink from /etc/systemd/system/multi-user.target.wants/dnsmasq.service to /usr/lib/systemd/system/dnsmasq.service. [root@osedev1 ~]# systemctl list-unit-files | grep dnsm dnsmasq.service enabled [root@osedev1 ~]# systemctl start dnsmasq.service [root@osedev1 ~]#


10. now dns works

user@ose:~/openvpn$ dig opensourceecology.org

<<>> DiG 9.10.3-P4-Debian <<>> opensourceecology.org
global options
+cmd
Got answer
->>HEADER<<- opcode
QUERY, status: NOERROR, id: 62798
flags
qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
OPT PSEUDOSECTION
EDNS
version: 0, flags:; udp: 4096
QUESTION SECTION
opensourceecology.org. IN A
ANSWER SECTION

opensourceecology.org. 0 IN A 10.241.189.11

Query time
172 msec
SERVER
10.241.189.1#53(10.241.189.1)
WHEN
Mon Jun 15 18:51:58 +0545 2020
MSG SIZE rcvd
66

user@ose:~/openvpn$


12. The first entry in the journalctl on the dev server starts at June 15 (today) at 01:50:03

[root@osedev1 ~]# journalctl | head -- Logs begin at Mon 2020-06-15 01:50:03 CEST, end at Mon 2020-06-15 15:07:45 CEST. -- Jun 15 01:50:03 localhost systemd-journal[101]: Runtime journal is using 8.0M (max allowed 89.5M, trying to leave 134.2M free of 887.0M available → current limit 89.5M). Jun 15 01:50:03 localhost kernel: Initializing cgroup subsys cpuset Jun 15 01:50:03 localhost kernel: Initializing cgroup subsys cpu Jun 15 01:50:03 localhost kernel: Initializing cgroup subsys cpuacct Jun 15 01:50:03 localhost kernel: Linux version 3.10.0-957.21.3.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Tue Jun 18 16:35:19 UTC 2019 Jun 15 01:50:03 localhost kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.21.3.el7.x86_64 root=UUID=1e457b76-5100-4b53-bcdc-667ca122b941 ro crashkernel=auto consoleblank=0 systemd.show_status=true elevator=noop console=tty1 console=ttyS0 Jun 15 01:50:03 localhost kernel: e820: BIOS-provided physical RAM map: Jun 15 01:50:03 localhost kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable Jun 15 01:50:03 localhost kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved [root@osedev1 ~]#


14. the entries preceeding that in /var/log/messages aren't very helpful

Jun 15 00:56:34 osedev1 chronyd[674]: System clock wrong by -3.278855 seconds, adjustment started Jun 15 00:57:39 osedev1 chronyd[674]: System clock wrong by -1.017910 seconds, adjustment started Jun 15 00:58:44 osedev1 chronyd[674]: System clock wrong by 1.292079 seconds, adjustment started Jun 15 00:59:49 osedev1 chronyd[674]: System clock wrong by -1.323920 seconds, adjustment started Jun 15 01:01:01 osedev1 systemd: Created slice User Slice of root. Jun 15 01:01:01 osedev1 systemd: Started Session 205569 of user root. Jun 15 01:01:01 osedev1 systemd: Removed slice User Slice of root. Jun 15 01:01:59 osedev1 chronyd[674]: System clock wrong by 1.104784 seconds, adjustment started Jun 15 01:02:08 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:02:08 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:02:11 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:03:04 osedev1 chronyd[674]: Selected source 176.9.103.244 Jun 15 01:03:13 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:03:15 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:04:08 osedev1 chronyd[674]: Selected source 176.9.103.244 Jun 15 01:04:17 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:04:20 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:05:13 osedev1 chronyd[674]: Selected source 2a02:c207:3003:930::1 Jun 15 01:05:22 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:09:30 osedev1 chronyd[674]: Selected source 176.9.103.244 Jun 15 01:09:30 osedev1 chronyd[674]: System clock wrong by 1.213375 seconds, adjustment started Jun 15 01:12:55 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:12:56 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:14:50 osedev1 dnsmasq-dhcp[1346]: DHCPREQUEST(virbr0) 192.168.122.201 fe:07:06:a6:5f:1d Jun 15 01:14:50 osedev1 dnsmasq-dhcp[1346]: DHCPACK(virbr0) 192.168.122.201 fe:07:06:a6:5f:1d osestaging1 Jun 15 01:16:09 osedev1 chronyd[674]: Selected source 176.9.103.244 Jun 15 01:16:09 osedev1 chronyd[674]: Selected source 195.201.19.162 Jun 15 01:17:13 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:17:15 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:19:15 osedev1 chronyd[674]: Selected source 2a02:c207:3003:930::1 Jun 15 01:19:22 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:20:19 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:20:26 osedev1 chronyd[674]: Can't synchronise: no majority Jun 15 01:22:28 osedev1 chronyd[674]: Selected source 193.30.35.11 Jun 15 01:25:42 osedev1 chronyd[674]: Source 176.9.103.244 replaced with 213.209.109.44 Jun 15 01:28:56 osedev1 chronyd[674]: Selected source 213.209.109.44 Jun 15 01:31:06 osedev1 chronyd[674]: Can't synchronise: no selectable sources Jun 15 01:34:21 osedev1 chronyd[674]: Selected source 213.209.109.44 Jun 15 01:36:29 osedev1 chronyd[674]: Can't synchronise: no selectable sources Jun 15 01:36:50 osedev1 dnsmasq-dhcp[1346]: DHCPREQUEST(virbr0) 192.168.122.201 fe:07:06:a6:5f:1d Jun 15 01:36:50 osedev1 dnsmasq-dhcp[1346]: DHCPACK(virbr0) 192.168.122.201 fe:07:06:a6:5f:1d osestaging1 Jun 15 01:37:34 osedev1 chronyd[674]: Selected source 213.209.109.44 Jun 15 01:37:34 osedev1 chronyd[674]: System clock wrong by 1.293791 seconds, adjustment started Jun 15 01:50:03 osedev1 kernel: Initializing cgroup subsys cpuset Jun 15 01:50:03 osedev1 kernel: Initializing cgroup subsys cpu


16. meanwhile, I still can't get to the server on staging as nginx is stopped, but just a simple nginx restart fixed it *shrug*

[maltfield@osestaging1 ~]$ systemctl status nginx ● nginx.service - The nginx HTTP and reverse proxy server

  Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; vendor preset: disabled)
  Active: failed (Result: exit-code) since Mon 2020-06-15 12:56:46 UTC; 13min ago
 Process: 343 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=1/FAILURE)
 Process: 327 ExecStartPre=/usr/bin/rm -f /run/nginx.pid (code=exited, status=0/SUCCESS)

Jun 15 12:56:46 osestaging1 nginx[343]: nginx: [warn] the "ssl" directive is deprecated, use the "listen ...de:11 Jun 15 12:56:46 osestaging1 nginx[343]: nginx: [warn] the "ssl" directive is deprecated, use the "listen ...de:11 Jun 15 12:56:46 osestaging1 nginx[343]: nginx: [warn] conflicting server name "_" on 10.241.189.11:443, ignored Jun 15 12:56:46 osestaging1 nginx[343]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok Jun 15 12:56:46 osestaging1 systemd[1]: nginx.service: control process exited, code=exited status=1 Jun 15 12:56:46 osestaging1 nginx[343]: nginx: [emerg] bind() to 10.241.189.11:4443 failed (99: Cannot as...ress) Jun 15 12:56:46 osestaging1 nginx[343]: nginx: configuration file /etc/nginx/nginx.conf test failed Jun 15 12:56:46 osestaging1 systemd[1]: Failed to start The nginx HTTP and reverse proxy server. Jun 15 12:56:46 osestaging1 systemd[1]: Unit nginx.service entered failed state. Jun 15 12:56:46 osestaging1 systemd[1]: nginx.service failed. Hint: Some lines were ellipsized, use -l to show in full. Last login: Sun Jun 14 21:02:44 UTC 2020 on pts/1 [root@osestaging1 ~]# systemctl restart nginx [root@osestaging1 ~]#


18. Anyway, high availability on dev/staging isn't really a priority, so whatever. If this is our first outage and it was fixable in 20 minutes, it's fine. 19. Rob is unblocked; I asked him to continue on his wordpress upgrade task and let me know if he encounteres any other issues