Web server configuration

From Open Source Ecology
(Redirected from Awstats)
Jump to: navigation, search

This document will describe how our web server is configured.

We should refrain from actually posting configuration files here, lest we create a documentation maintenance nightmare (which would invariable result in stale, useless content). The source-of-truth for our ever-changing server's configuration files' contents should be the server itself.

Rather, in this document, we will describe the overall architecture. For specific directories and configuration files that are relevant, we will simply name their location on the server.

The files on the server are backed-up on a daily basis. This wiki does-not and should-not serve as a backup of our configuration files.

For more information on how to access our server and information on our backups, see OSE Server

Architecture

Our http(s) content is served using the following web servers:

  1. Nginx
  2. Varnish
  3. Apache

The traffic flows in-order, ie: Internet -> nginx -> varnish -> apache

And then back out to the client in the reverse order: apache -> varnish -> nginx -> Internet

Additionally, the following software assists in the configuration of the above web servers:

  1. logrotate
  2. awstats

Why??

OSE's principles aims for simplicity--so you ask, "Why aren't we simply using only Apache? Why Nginx as well? And why Varnish?" Great question!

Before simplicity, OSE is radically committed to using FLOSS. We're also an ecologically-aware, low-budget, non-profit with limited financial & computational resources. Keeping this in mind, below are the reasons for the complexity described in this documentation:

  1. Varnish is a cache. It's an essential component that allows us to serve a very high volume of requests on many websites from a single server. Unfortunately, the free version of Varnish does not speak https.
  2. Nginx is our tls-terminator. It listens to our encrypted traffic over https & passes unencrypted http traffic internally onto varnish.
  3. Nginx has great DOS protection and rate-limiting built-in.
  4. Nginx being distinct from Apache gives us the ability to serve a SITE_DOWN vhost to users for a specific domain, while devs are still able to iterate & test changes to the backend Apache server. Note that we have only 1 dedicated host, and we don't have a load balancer.
  5. Most people who use https + varnish specifically use Nginx to terminate their https. Therefore, there is better documentation & a user-support-base for this architecture.
  6. History. I (Michael Altfield) came on-board in 2017 with only Apache running. I added https to protect our user's passwords that were being sent in cleartext, and--in doing so--I had to abandon CloudFront so a third-party didn't have our private keys. At the time, CF was our CDN cache, so I had to implement a self-hosted cache. I chose varnish, and then had to add nginx before it for https termination.

Nginx

This section will describe our use of the Nginx server in our web server configuration, which serves as our https terminator, basic DOS protection, and SITE_DOWN tool.

Why Nginx?

Nginx is necessary to terminate https prior to our varnish cache, as the free version of varnish does not speak https. Hitch was an option as well, but it lacked many features:

  1. Nginx has DOS protection. Hitch does not.
  2. Nginx has powerful rewrite/redirect rules, such as http->https or for subdir-to-subdomain redirects. Hitch can't do redirects.
  3. Nginx is generally very popular and very well documented. Hitch is generally very poorly documented.
  4. Specifically for the case of terminating https for varnish, more varnish users use Nginx for this than Hitch.
  5. Nginx allows you to define a dhparams file. Hitch requires a silly process of concatinating the file into a hitch-specific pem file, which convolutes our every-90-day Let's Encrypt cert renewal process.
  6. Nginx permits us to do a meta "return 444" to drop requests entirely. Apache nor varnish nor hitch has this awesome feature.

Important Files & Directories

For more information about our nginx web server's configuration, please see the following files & directories on the server:

/etc/nginx/nginx.conf
/etc/nginx/nginx/conf.d/<vhost_fqdn>.conf
/var/log/nginx/<vhost_fqdn>/access.log
/var/log/nginx/<vhost_fqdn>/error.log
/var/log/nginx/access.log
/var/log/nginx/error.log

https

In 2017 & 2018, Michael Altfield migrated OSE sites to use https with Let's Encrypt certificates.

Because the free version of varnish does not speak https, we terminate https using nginx.

Nginx's https config was hardened using Mozilla's ssl-config-generator and the Qualys ssllabs.com SSL Server Test.

Let's Encrypt

We use Let's Encrypt, a FLOSS CA, to generate our free https certificate.

Let's Encrypt certificates are valid for 90-days, and the are automatically extended using the `certbot` tool via a cron job. For more information, see the following files & directories:

/etc/letsencrypt/
/var/log/letsencrypt/
/etc/cron.d/letsencrypt
/root/bin/letsencrypt/renew.sh
/var/log/letsEncryptRenew.log
/etc/nginx/conf.d/ssl.<domain>.<tld>.include
/etc/letsencrypt/live/<domain>.<tld>/
/etc/pki/tls/hpkpBackupKeys/

To add a new subdomain (actually a SAN = subject alternate name) to a certificate, you must renew the certificate, listing all the existing domains. First get a list of all the existing domains:

[root@hetzner2 htdocs]# certbot certificates
Saving debug log to /var/log/letsencrypt/letsencrypt.log

-------------------------------------------------------------------------------
Found the following certs:
  Certificate Name: opensourceecology.org
    Domains: fef.opensourceecology.org osemain.opensourceecology.org oswh.opensourceecology.org
    Expiry Date: 2018-04-03 22:37:19+00:00 (VALID: 74 days)
    Certificate Path: /etc/letsencrypt/live/opensourceecology.org/fullchain.pem
    Private Key Path: /etc/letsencrypt/live/opensourceecology.org/privkey.pem
  Certificate Name: openbuildinginstitute.org
    Domains: openbuildinginstitute.org awstats.openbuildinginstitute.org seedhome.openbuildinginstitute.org www.openbuildinginstitute.org
    Expiry Date: 2018-03-18 03:46:32+00:00 (VALID: 57 days)
    Certificate Path: /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem
    Private Key Path: /etc/letsencrypt/live/openbuildinginstitute.org/privkey.pem
-------------------------------------------------------------------------------

The webroots for existing domains can be determined by checking the letsencrypt domain's config files:

[root@hetzner2 htdocs]# cat /etc/letsencrypt/renewal/*.conf
# renew_before_expiry = 30 days
version = 0.19.0
archive_dir = /etc/letsencrypt/archive/openbuildinginstitute.org
cert = /etc/letsencrypt/live/openbuildinginstitute.org/cert.pem
privkey = /etc/letsencrypt/live/openbuildinginstitute.org/privkey.pem
chain = /etc/letsencrypt/live/openbuildinginstitute.org/chain.pem
fullchain = /etc/letsencrypt/live/openbuildinginstitute.org/fullchain.pem

# Options used in the renewal process
[renewalparams]
authenticator = webroot
...
webroot_path = /var/www/html/www.openbuildinginstitute.org/htdocs, /var/www/html/seedhome.openbuildinginstitute.org
[[webroot_map]]
openbuildinginstitute.org = /var/www/html/www.openbuildinginstitute.org/htdocs
awstats.openbuildinginstitute.org = /var/www/html/www.openbuildinginstitute.org/htdocs
seedhome.openbuildinginstitute.org = /var/www/html/seedhome.openbuildinginstitute.org
www.openbuildinginstitute.org = /var/www/html/www.openbuildinginstitute.org/htdocs
# renew_before_expiry = 30 days
version = 0.19.0
archive_dir = /etc/letsencrypt/archive/opensourceecology.org
cert = /etc/letsencrypt/live/opensourceecology.org/cert.pem
privkey = /etc/letsencrypt/live/opensourceecology.org/privkey.pem
chain = /etc/letsencrypt/live/opensourceecology.org/chain.pem
fullchain = /etc/letsencrypt/live/opensourceecology.org/fullchain.pem

# Options used in the renewal process
[renewalparams]
authenticator = webroot
...
webroot_path = /var/www/html/fef.opensourceecology.org/htdocs, /var/www/html/oswh.opensourceecology.org/htdocs, /var/www/html/www.opensourceecology.org/htdocs
[[webroot_map]]
fef.opensourceecology.org = /var/www/html/fef.opensourceecology.org/htdocs
www.opensourceecology.org = /var/www/html/www.opensourceecology.org/htdocs
oswh.opensourceecology.org = /var/www/html/oswh.opensourceecology.org/htdocs

Then extend the domain:

certbot -nv --expand --cert-name opensourceecology.org certonly -v --webroot -w /var/www/html/fef.opensourceecology.org/htdocs/ -d fef.opensourceecology.org -w /var/www/html/www.opensourceecology.org/htdocs -d www.opensourceecology.org -w /var/www/html/oswh.opensourceecology.org/htdocs/ -d oswh.opensourceecology.org -w /var/www/html/forum.opensourceecology.org/htdocs -d forum.opensourceecology.org
/bin/chmod 0400 /etc/letsencrypt/archive/*/pri*
nginx -t && service nginx reload 

HPKP

HTTP Public Key Pinning (HPKP) can brick your domain if not done properly. For safety, 14 keys were pinned following the Let's Encrypt HPKP Best Practices Guide, including:

  1. Two distinct, pre-generated backup keys' CSRs @ /etc/pki/tls/hpkpBackupKeys/
  2. Our leaf certificate issued by Let's Encrypt using certbot @ /etc/letsencrypt/live/opensourceecology.org/cert.pem
  3. The intermediate Let's Encrypt certificate that signed our certificate @ /etc/letsencrypt/live/opensourceecology.org/chain.pem
  4. The Internet Security Research Group (ISRG) Root Certificate for Let's Encrypt
  5. The IdenTrust Root Certificate, which cross-signed the Let's Encrypt Root Certificate
  6. In case Let's Encrypt is no longer usable in the future, all root certificates & the root certificates of their cross-signers for CloudFlare, since they offer free certificates. This includes digicert, addtrust, globalsign, and gtecybertrust (now digicert)
  7. In case Let's Encrypt is no longer usable in the future, all root certificates & the root certificates of their cross-signers for SSL.com, since they offer free certificates for 90 days.

Moreover, apache was configured with a report-uri, which can be checked on the server to debug potential client-side hpkp issues

report-uri="http://opensourceecology.org/hpkp-report"

For more information on our hpkp config, see the following file:

/etc/nginx/conf.d/ssl.<domain>.<tld>.include
/etc/pki/tls/hpkpBackupKeys/

Varnish

This section will describe our use of the Varnish server in our web server configuration, which serves as our in-memory cache.

Why Varnish?

Our biggest site is this wiki (running on Mediawiki). As of 2017, Wikipedia (the largest site running on Mediawiki) has chosen Varnish as their cache-of-choice, after experimenting with Squid & Nginx caching. If the biggest user of our biggest site's application backend is using Varnish, we should use it too. And I found good wordpress plugins that play nicely with Varnish as well.

Useful Commands

Below are some useful commands for working with varnish on our server

# check for valid configuration
varnishd -Cf /etc/varnish/default.vcl

# reload configuration
service varnish reload

# check for valid config + reload if OK
varnishd -Cf /etc/varnish/default.vcl &> /dev/null && service varnish reload

# see current varnish requests
varnishlog

# see varnish requests for a specific client ip address
varnishlog -q "ReqHeader eq 'X-Forwarded-For: 209.208.216.133'"

# see recent varnish statistics
varnishstat

# purge the entire varnish cache for all vhosts
varnishadm 'ban req.url ~ "."'

# purge the varnish cache for urls containing a specific string
varnishadm 'ban req.http.host ~ "www.opensourceecology.org"'
varnishadm 'ban req.url ~ "css"'

Important Files & Directories

For more information about our varnish web server's configuration, please see the following files & directories on the server:

/etc/varnish/

Apache

This section will describe our use of the Apache server in our web server configuration, which serves as our backend application web server.

Why Apache?

While apache is not without its issues, it is extraordinarily popular. At any time, if we were to ask all of the active OSE Devs who has web server experience working with Apache, probably more hands would raise for Apache than any other web server. This maintains a low barrier-of-entry that's extremely important when choosing the software to run a long-lived nonprofit with short-lived volunteers.

Debugging Apache Directly

Sometimes when debugging a site, it may be useful to isolate tests to just apache, in order to eliminate potential issues with nginx/https or the varnish cache. This section will describe how to use ssh tcp port forwarding to test a vhost on apache directly over 127.0.0.1:8000.

Step 1: /etc/hosts

Edit the hosts file on your workstation to point the domain you're testing to 127.0.0.1.

user@workstation:~$ cat /etc/hosts
127.0.0.1 opensourceecology.org www.opensourceecology.org
...

Step 2: SSH Port Forward

Forward your workstation's 127.0.0.1:80 to the server's 127.0.0.1:8000. Run this on your workstation.

user@workstation:~$ sudo sh -c 'ssh -F /home/${SUDO_USER}/.ssh/config -p 32415 -L 80:127.0.0.1:8000 openbuildinginstitute.org'

Note that, because we're using a port <1024 on the workstation, it requires administrator privileges on the workstation, so we use sudo. A side-effect is that we have to specify to user the normal user's ssh config.

Step 3: Visit site in Browser

You should now be able to access the website from your workstation's browser. SSH is listening for traffic on your workstation's port 80 & seemlessly forwarding it to port 127.0.0.1:8000 on the server. Therefore, you're hitting Apache directly without going through nginx or varnish.

For example, open "http://www.opensourceecology.org" in your browser.

Tip: Use private/incognito browsing to avoid cached DNS addresses.

Note: In order for this to work, the protocol (http:// or https://) must *not* be specified in the wp-config.php file's WP_HOME & WP_SITEURL variables.

Useful Commands

This section will describe useful commands when working with Apache

# see all running vhosts
httpd -S

mod_security

Our OSE Server uses mod_security & the CRS for additional web application security in Apache. This can trigger many issues with some applications' normal & expected behaviour. If mod_security is blocking requests, your browser's debugger will show "403 Forbidden" responses to your requests. These will correspond to log entries to the mod_security log file at /var/log/httpd/modsec_audit.log. Below is an example entry to modsec_audit.log:

--df82886e A--
[11/Aug/2017:22:56:32 +0000] WY42IEb1WWRl5vtNXLPk4QAAAA4 216.244.66.245 41996 138.201.84.223 80
--df82886e-B--
GET /?s=%E5%B0%8F%E6%81%92%E6%8C%8720%E5%85%83%E6%89%8B%E7%BB%AD%E8%B4%B9%E5%A4%9A%E5%B0%91cpyx18.com HTTP/1.1
Host: openbuildinginstitute.org
Accept: */*
User-agent: Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)
Accept-Charset: utf-8;q=0.7,iso-8859-1;q=0.2,*;q=0.1

--df82886e-F--
HTTP/1.1 403 Forbidden
X-Frame-Options: SAMEORIGIN
Last-Modified: Thu, 16 Oct 2014 13:20:58 GMT
Accept-Ranges: bytes
Content-Length: 4897
X-XSS-Protection: 1; mode=block
Content-Type: text/html; charset=UTF-8

--df82886e-H--
Message: Access denied with code 403 (phase 2). Pattern match "\\W{4,}" at ARGS:s. [file "/etc/httpd/modsecurity.d/activated_rules/modsecurity_crs_40_generic_attacks.conf"] [line "37"] [id "960024"] [rev "2"] [msg "Meta-Character Anomaly Detection Alert - Repetative Non-Word Characters"] [data "Matched Data: \xe5\xb0\x8f\xe6\x81\x92\xe6\x8c\x87 found within ARGS:s: \xe5\xb0\x8f\xe6\x81\x92\xe6\x8c\x8720\xe5\x85\x83\xe6\x89\x8b\xe7\xbb\xad\xe8\xb4\xb9\xe5\xa4\x9a\xe5\xb0\x91cpyx18.com"] [ver "OWASP_CRS/2.2.9"] [maturity "9"] [accuracy "8"]
Action: Intercepted (phase 2)
Stopwatch: 1502492192808118 605 (- - -)
Stopwatch2: 1502492192808118 605; combined=235, p1=145, p2=76, p3=0, p4=0, p5=14, sr=42, sw=0, l=0, gc=0
Producer: ModSecurity for Apache/2.7.3 (http://www.modsecurity.org/); OWASP_CRS/2.2.9.
Server: Apache
Engine-Mode: "ENABLED"

--df82886e-Z--

The above request shows that mod_security rule id = 960024 blocked a request to openbuildinginstitute.org because the request contained an anomaly of "Repetative Non-Word Characters" In this case, the block appears valid. If the block is invalid, you can blacklist false-positive rules by id in the apache vhost file, like /etc/httpd/conf.d/00-openbuildinginstitute.org.conf

<Location "/wp-admin/">
   <IfModule security2_module>                                                                                                                                 
      SecRuleRemoveById 960015 981173 960024 960904 960015 960017
   </IfModule>
</Location>

Or, if needed, disable mod_security for the entire vhost:

<Location "/">
<IfModule security2_module>                                                                                                                                 
      SecRuleEngine Off
   </IfModule>
</Location>

But try not to disable mod_security entirely.

Web Applications

Our apache server runs the following Web Applications. Please see their corresponding articles for more info:

  1. Wordpress
  2. Mediawiki
  3. Munin
  4. Awstats

Important Files & Directories

For more information about our apache web server's configuration, please see the following files & directories on the server:

/etc/httpd/conf/httpd.conf
/etc/httpd/conf.d/
/var/www/html/<vhost>/
/var/log/httpd/
/var/log/httpd/modsec_audit.log
/var/log/httpd/<vhost>/

Logrotate

Logrotate is an essential daemon for any production server. If log files aren't rotated, sooner or later your disks will fill and the server will malfunction.

We have configured logrotate to manage our logfiles on the server, including all the software described in this document.

Important Files & Directories

For more information about our logrotate configuration, please see the following files & directories on the server:

/etc/logrotate.conf
/etc/logrotate.d/

awstats

We use Awstats to monitor statistics on our site. This is preferred to using google analytics because [a] we care about the privacy of our users, [b] we prefer self-hosted FLOSS to closed-source SaaS [c] we prefer to utilize FLOSS over shipping our user's demographics to a third-party (one of the largest multinational corporations) that profits directly from this data

We investigated the use of Piwik as an alternative to awstats in 2017, but found too many security concerns to justify the few benefits over awstats. For more information, see Piwik.

Accessing

Important Files & Directories

For more information about our awstats configuration, please see the following files & directories on the server:

/etc/awstats/
/etc/cron.d/awstats_generate_static_files

See Also