Mediawiki: Difference between revisions

From Open Source Ecology
Jump to navigation Jump to search
No edit summary
Line 46: Line 46:
So, yes, the OSE wiki can certainly scale. But the complexity grows significantly as it does scale. So until we're ready to handle that growth (ie: budget for $100k-$1m per year for server and salary expenses), we should keep our footprint as reasonably small as possible.
So, yes, the OSE wiki can certainly scale. But the complexity grows significantly as it does scale. So until we're ready to handle that growth (ie: budget for $100k-$1m per year for server and salary expenses), we should keep our footprint as reasonably small as possible.


That said, the current obvious expense that grows with the growth of our wiki is our backups. As of 2018-06, we're spending about $100/year on ~1T of storage on backups split between Amazon Glacier (for our long-storage monthlys) and Amazon S3 (for our daily backups).
That said, the current obvious expense that grows with the growth of our wiki is our backups (a 20G mediawiki quickly becomes much, much larger once you consider a few copies of daily backups and several copies of monthly backups encrypted & shipped off-site to some durable, geographically distinct location). As of 2018-06, we're spending about $100/year on ~1T of storage on backups split between Amazon Glacier (for our long-storage monthlys) and Amazon S3 (for our daily backups).


=Proper File/Directory Ownership & Permissions=
=Proper File/Directory Ownership & Permissions=

Revision as of 21:50, 15 June 2018

Deleting Users by Request

In fact, users cannot be safely deleted from Mediawiki without damaging the wiki

* https://meta.wikimedia.org/wiki/MediaWiki_FAQ#How_do_I_delete_a_user_from_my_list_of_users.3F

Instead, if a user requests to be deleted from the wiki, we should do the following:

  1. Replace the email address associated with their account to something bogus, like 'no@example.com'. The user can do this themselves with Special:ChangeEmail page, but--as an Administrator--this must be done from command line.
pushd /var/www/html/wiki.opensourceecology.org/htdocs/maintenance
# example.com is actually a reserved domain name that cannot actually exist; we should be using it here
php resetUserEmail.php 'SomeUser' 'no@example.com'
popd
  1. Rename their username to something bogus, like deleteduser001
  2. Block the user account with 'indefinite' expiration and uncheck all the boxes.

Note that this is distinct from the process for blocking malicious or spamming users.

LocalSettings.php

This section will describe some of our decisions in configuring Mediawiki via LocalSettings.php

$maxUploadSize

As of 2018, we set the maximum upload size to 1M. Prior to the wiki migration to hetzner2 on 2018-05-24, there was no limit. The result: people were casually dropping unnecessarily large (ie: >2M images) into articles. The result: our wiki was growing at an unsustainable rate.

A note on growth: yes, mediawiki scales. Yes, wikipedia doesn't need to implement such caps. But we currently don't have a defined budget for IT while wikipedia spends literally millions of dollars per year on their infrastructure.

When we begin to compare the scalability of wikipedia, it's important to remember that their system is composed of many distinct servers. For example, they have:

  1. Load Balancers
  2. Nginx servers (ssl termination)
  3. Varnish front-end servers
  4. Varnish back-end servers
  5. Apache servers
  6. Memcached servers
  7. DB Master servers
  8. DB Slave servers
  9. Swift (Open Stack) servers
  10. Kafka and logstash servers

source: https://meta.wikimedia.org/wiki/Wikimedia_servers

So, yes, the OSE wiki can certainly scale. But the complexity grows significantly as it does scale. So until we're ready to handle that growth (ie: budget for $100k-$1m per year for server and salary expenses), we should keep our footprint as reasonably small as possible.

That said, the current obvious expense that grows with the growth of our wiki is our backups (a 20G mediawiki quickly becomes much, much larger once you consider a few copies of daily backups and several copies of monthly backups encrypted & shipped off-site to some durable, geographically distinct location). As of 2018-06, we're spending about $100/year on ~1T of storage on backups split between Amazon Glacier (for our long-storage monthlys) and Amazon S3 (for our daily backups).

Proper File/Directory Ownership & Permissions

This section will describe how the file permissions should be set on an OSE mediawiki site.

For the purposes of this documentation, let's assume:

  1. vhost dir = /var/www/html/wiki.opensourceecology.org
  2. mediawiki docroot = /var/www/html/wiki.opensourceecology.org/htdocs

Then the ideal permissions are:

  1. Files containing passwords (ie: LocalSettings.php) should be located outside the wiki docroot with not-apache:apache-admins 0040
  2. Files in the 'images/' dir should be apache:apache 0660
  3. Directories in the 'images/' dir should be apache:apache 0770
  4. Files in the 'cache/' dir (outside the docroot) should be apache:apache 0660
  5. Directories in the 'cache/' dir (outside the docroot) should be apache:apache 0770
  6. All other files in the vhost dir should be not-apache:apache 0040
  7. All other directories in the vhost dir should be not-apache:apache 0050

This is achievable with the following idempotent commands:

vhostDir="/var/www/html/wiki.opensourceecology.org"
mwDocroot="${vhostDir}/htdocs"

chown -R not-apache:apache "${vhostDir}"
find "${vhostDir}" -type d -exec chmod 0050 {} \;
find "${vhostDir}" -type f -exec chmod 0040 {} \;

chown not-apache:apache-admins "${vhostDir}/LocalSettings.php"
chmod 0040 "${vhostDir}/LocalSettings.php"

[ -d "${mwDocroot}/images" ] || mkdir "${mwDocroot}/images"
chown -R apache:apache "${mwDocroot}/images"
find "${mwDocroot}/images" -type f -exec chmod 0660 {} \;
find "${mwDocroot}/images" -type d -exec chmod 0770 {} \;

[ -d "${vhostDir}/cache" ] || mkdir "${vhostDir}/cache"
chown -R apache:apache "${vhostDir}/cache"
find "${vhostDir}/cache" -type f -exec chmod 0660 {} \;
find "${vhostDir}/cache" -type d -exec chmod 0770 {} \;

Such that:

  1. the 'not-apache' user is a new user that doesn't run any software (ie: a daemon such as a web server) and whose shell is "/sbin/nologin" and home is "/dev/null".
  2. the apache user is in the apache-admins group
  3. the apache user is in the apache group
  4. any human users that need read-only access to the mediawiki vhost files for debugging purposes and/or write access to the 'images/' directory (ie: to upload large files that are too large to be handled by the web servers chain), then that user should be added to the 'apache' group
  5. any human users that need read-only access to the mediawiki vhost files, including config files containing passwords (ie: LocalSettings.php), should be added to the 'apache-admins' group
  6. for anyone to make changes to any files in the docroot (other than 'images/'), they must be the root user. I think this is fair if they don't have the skills necessary to become root, they probably shouldn't modify the mediawiki core files anyway.

Why?

The following explains why the above permissions are ideal:

  1. All of the files & directories that don't need write permissions should not have write permissions. That's every file in a mediawiki docroot except the folder "images/" and its subfiles/dirs.
  2. World permissions (not-user && not-group) for all files & directories inside the docroot (and including the docroot dir itself!) should be set to 0 for all files & all directories.
  3. Excluding 'images/', these files should also not be owned by the user that runs a webserver (in cent, that's the 'apache' user). For even if the file is set to '0400', but it's owned by the 'apache' user, the 'apache' user can ignore the permissions & write to it anyway. We don't want the apache user (which runs the apache process) to be able to modify files. If it could, then a compromised webserver could modify a php file and effectively do a remote code execution.
  4. Excluding 'images/', all directories in the docroot (including the docroot dir itself!) should be owned by a group that contains the user that runs our webserver (in cent, that's the apache user). The permissions for this group must be not include write access for files or directories. For even if a file is set to '0040', but the containing directory is '0060', any user in the group that owns the directory can delete the existing file and replace it with a new file, effectively ignoring the read-only permission set for the file.

For more information, see the official mediawikiwiki:Manual:Security guide from Mediawiki

Updating Mediawiki

First of all, it is not uncommon for an attempt to update mediawiki to result in an entirely broken site. If you do not have linux and bash literacy, do not attempt to update mediawiki. Moreover, you should be well-versed in how to work with mysqldump, tar, rsync, chmod, chown, & sudo. If you are not confident in how all of these commands work, do not proceed. Hire someone with sysops experience to follow this guide; it should take them less than a couple hours to update and/or revert if the update fails.

Step 0: Trigger Backup Scripts for System-Wide backup

For good measure, trigger a backup of the entire system's database & files:

sudo su -
sudo time /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log

When finished, SSH into the dreamhost server to verify that the whole system backup was successful before proceeding

source /root/backups/backup.settings
ssh $RSYNC_USER@$RSYNC_HOST 'du -sh backups/hetzner2/*'

Step 1: Set variables

Type these commands to set some variables, which will be used by the commands in the sections below. Replace 'osemain' with the corresponding directory for the wp site you're updating.

export vhostDir=/var/www/html/wiki.opensourceecology.org

Step 2: Make Vhost-specific backups

The backups made in the previous step are huge. Because it's easier to work with vhost-specific backups, let's make a redundant copy available in /var/tmp/:

sudo su -

dbName=osewiki_db
dbUser=osewiki_user
 dbPass=CHANGEME
 rootDbPass=CHANGEME

stamp=`date +%Y%m%d_%T`
tmpDir=/var/tmp/dbChange.$stamp
mkdir $tmpDir
chown root:root $tmpDir
chmod 0700 $tmpDir
pushd $tmpDir
service httpd stop

# create backup of all DBs for good measure
 time nice mysqldump -uroot -p$rootDbPass --all-databases | gzip -c > preBackup.all_databases.$stamp.sql.gz

# dump wp DB contents
 time nice mysqldump -u$dbUser -p$dbPass --database $dbName > $dbName.$stamp.sql

# files backup
rsync -av --progress "${vhostDir}" "./vhostDir.${stamp}.bak/"

Step 3: Permissions

TODO link to other section

Step 4: Download Latest Mediawiki Core

TODO copy from upgrade section below

Step 5: Extensions & Skins

Run the following commands to get your Extensions & Skins from git

TODO

Step 6: Set Permissions

TODO: link to above section

Step 7: Update database

TODO: maintenance/update.php

Step 8: Validate

TODO describe a test for sanity of successful upgrade

Revert

TODO restore procedure

CLI Guides

This section will provide commands to achieve certain actions for managing Mediawiki

See Also

  1. OSE Server
  2. 2FA
  3. Web server configuration
  4. Wordpress
  5. CHG-2018-05-22
  1. Wiki Validation