Mediawiki

From Open Source Ecology
Jump to: navigation, search

Extensions

This section will provide a list of the extensions used in our wiki

  1. CategoryTree
  2. ConfirmAccount
  3. Cite
  4. ParserFunctions
  5. Gadgets
  6. UserMerge
  7. OATHAuth
    1. note that this will be included with Mediawiki on the next update (to v1.31)
  8. Replace Text
    1. note that this will be included with Mediawiki on the next update (to v1.31)
  9. Renameuser
    1. note that this will be included with Mediawiki on the next update (to v1.31)

For a more accurate list of what this wiki is currently running, see Special:Version

Proposed

We may want to consider testing & adding these extensions in the future

  1. 3DAlloy for displaying intractable 3d models (ie: stl files) within our wiki directly with WebGL
  2. WikiEditor
  3. WikiSEO
  4. CookieWarning for GDPR

Special Pages

This section will link to useful Special pages

  1. Special:EmailUser Email users. Note that this does not show up on Special:SpecialPages, but if you go the user's page (ie: User:Maltfield) there's an "Email this user" link on the left-hand navigation panel.

Guides

This section will describe the process of routine actions needed for dealing with Mediawiki

Deleting Users by Request

In fact, users cannot be safely deleted from Mediawiki without damaging the wiki

* https://meta.wikimedia.org/wiki/MediaWiki_FAQ#How_do_I_delete_a_user_from_my_list_of_users.3F

Instead, if a user requests to be deleted from the wiki, we should do the following:

  1. Replace the email address associated with their account to something bogus, like 'no@example.com'. The user can do this themselves with Special:ChangeEmail page, but--as an Administrator--this must be done from command line.
pushd /var/www/html/wiki.opensourceecology.org/htdocs/maintenance
# example.com is actually a reserved domain name that cannot actually exist; we should be using it here
php resetUserEmail.php 'SomeUser' 'no@example.com'
popd
  1. Rename their username to something bogus, like deleteduser001
  2. Block the user account with 'indefinite' expiration and uncheck all the boxes.

Note that this is distinct from the process for blocking malicious or spamming users.

Scaling

At the time of writing (2018), we host our wiki (as well as many other sites) on a single dedicated server with hetnzer (see OSE Server). However, looking forward, if we wish to scale-up mediawiki, we may need to pay for many distinct servers and hire a full-time sysadmin to deal with the corresponding complexity.

Comparison to Wikipedia

When we begin to compare the scalability of wikipedia, it's important to remember that their system is composed of many distinct servers. For example, they have:

  1. Load Balancers
  2. Nginx servers (ssl termination)
  3. Varnish front-end servers
  4. Varnish back-end servers
  5. Apache servers
  6. Memcached servers
  7. DB Master servers
  8. DB Slave servers
  9. Swift (Open Stack) object servers
  10. Kafka, logstash, grafana, etc servers

source: https://meta.wikimedia.org/wiki/Wikimedia_servers

2017-2018

In the Wikimedia Foundation's 2017-2018 [1] financial statement, they listed an expense of

  • $2.1 million in "Internet hosting"

It's important to note, however, that this line-item is obfuscated by the fact that WMF owns much of their equipment. If they were leasing server space (by the hour in a cloud or dedicated servers by the month, for example), then the cost would be much higher. In the same 2017-2018 financial statement, they also listed assets of:

  • $13.2 million of "computer equipment"
  • $0.9 million of "Capital lease computer equipment"

As well as

  • $1.5 million in "Purchase of computer equipment"

It is not trivial to see the current number of servers running wikipedia, but their most recent reported figures were 300 in Florida + 44 in Amsterdam from 2009 [2]

And in 2015, they list 520 servers in the main cluster (eqiad), but don't list other clusters[3]. The source listed here is ganglia, which is no longer online[4]. Wikipedia's Grafana (which _is_ online) doesn't list such numbers[5].

2012

Regarding object size, the swift (open stack) media storage for wikipedia was 20T as of 2012 [6]

2009-2010

Looking back at 2009 when we know there were 344 servers, the financial report from 2009-2010 shows an expense of $1,067 on "internet hosting." The ownsership of computer equipment is not explicitly broken-down here, but their entire total assets are $15.4k in this year. [7]

In this year (2009-2010), wikipedia had 12 billion average monthly page views.

LocalSettings.php

This section will describe some of our decisions in configuring Mediawiki via LocalSettings.php

$maxUploadSize

As of 2018, we set the maximum upload size to 1M. Prior to the wiki migration to hetzner2 on 2018-05-24, there was no limit. The result: people were casually dropping unnecessarily large (ie: >2M images) into articles. The result: our wiki was growing at an unsustainable rate.

A note on growth: yes, mediawiki scales. Yes, wikipedia doesn't need to implement such caps. But we currently don't have a defined budget for IT while wikipedia spends literally millions of dollars per year on their infrastructure[8] and owns over $13 million in "computer equipment" [9]

For more information on Wikipedia's scaling, see Mediawiki#Comparison_to_Wikipedia.

By comparison, in 2018, OSE has literally 1 server. We don't even have a budget for a development server or a paid sysadmin.

So, yes, the OSE wiki can certainly scale. But the complexity grows significantly as it does scale. So until we're ready to handle that growth (ie: budget for $100k-$1m per year for server and salary expenses), we should keep our footprint as reasonably small as possible.

That said, the current obvious expense that grows with the growth of our wiki is our backups (a 20G mediawiki quickly becomes much, much larger once you consider a few copies of daily backups and several copies of monthly backups encrypted & shipped off-site to some durable, geographically distinct location). As of 2018-06, we're spending about $100/year on ~1T of storage on backups split between Amazon Glacier (for our long-storage monthlys) and Amazon S3 (for our daily backups).

Therefore, it's best to cap uploads to 1M. Files larger than this can be stored in github.com and/or archive.org, then simply linked-to from within our wiki.

For information on how to do a batch resize of images prior to uploading them to the wiki, see Batch Resize. I recommend 1024x768.

Proper File/Directory Ownership & Permissions

This section will describe how the file permissions should be set on an OSE mediawiki site.

For the purposes of this documentation, let's assume:

  1. vhost dir = /var/www/html/wiki.opensourceecology.org
  2. mediawiki docroot = /var/www/html/wiki.opensourceecology.org/htdocs

Then the ideal permissions are:

  1. Files containing passwords (ie: LocalSettings.php) should be located outside the wiki docroot [10] with not-apache:apache-admins 0040
  2. Files in the 'images/' dir should be apache:apache 0660
  3. Directories in the 'images/' dir should be apache:apache 0770
  4. Files in the 'cache/' dir (outside the docroot) should be apache:apache 0660
  5. Directories in the 'cache/' dir (outside the docroot) should be apache:apache 0770
  6. All other files in the vhost dir should be not-apache:apache 0040
  7. All other directories in the vhost dir should be not-apache:apache 0050

This is achievable with the following idempotent commands:

vhostDir="/var/www/html/wiki.opensourceecology.org"
mwDocroot="${vhostDir}/htdocs"

chown -R not-apache:apache "${vhostDir}"
find "${vhostDir}" -type d -exec chmod 0050 {} \;
find "${vhostDir}" -type f -exec chmod 0040 {} \;

chown not-apache:apache-admins "${vhostDir}/LocalSettings.php"
chmod 0040 "${vhostDir}/LocalSettings.php"

[ -d "${mwDocroot}/images" ] || mkdir "${mwDocroot}/images"
chown -R apache:apache "${mwDocroot}/images"
find "${mwDocroot}/images" -type f -exec chmod 0660 {} \;
find "${mwDocroot}/images" -type d -exec chmod 0770 {} \;

[ -d "${vhostDir}/cache" ] || mkdir "${vhostDir}/cache"
chown -R apache:apache "${vhostDir}/cache"
find "${vhostDir}/cache" -type f -exec chmod 0660 {} \;
find "${vhostDir}/cache" -type d -exec chmod 0770 {} \;

Such that:

  1. the 'not-apache' user is a new user that doesn't run any software (ie: a daemon such as a web server) and whose shell is "/sbin/nologin" and home is "/dev/null".
  2. the apache user is in the apache-admins group
  3. the apache user is in the apache group
  4. any human users that need read-only access to the mediawiki vhost files for debugging purposes and/or write access to the 'images/' directory (ie: to upload large files that are too large to be handled by the web servers chain), then that user should be added to the 'apache' group
  5. any human users that need read-only access to the mediawiki vhost files, including config files containing passwords (ie: LocalSettings.php), should be added to the 'apache-admins' group
  6. for anyone to make changes to any files in the docroot (other than 'images/'), they must be the root user. I think this is fair if they don't have the skills necessary to become root, they probably shouldn't modify the mediawiki core files anyway.

Why?

The following explains why the above permissions are ideal:

  1. All of the files & directories that don't need write permissions should not have write permissions. That's every file in a mediawiki docroot except the folder "images/" and its subfiles/dirs.
  2. World permissions (not-user && not-group) for all files & directories inside the docroot (and including the docroot dir itself!) should be set to 0 for all files & all directories.
  3. Excluding 'images/', these files should also not be owned by the user that runs a webserver (in cent, that's the 'apache' user). For even if the file is set to '0400', but it's owned by the 'apache' user, the 'apache' user can ignore the permissions & write to it anyway. We don't want the apache user (which runs the apache process) to be able to modify files. If it could, then a compromised webserver could modify a php file and effectively do a remote code execution.
  4. Excluding 'images/', all directories in the docroot (including the docroot dir itself!) should be owned by a group that contains the user that runs our webserver (in cent, that's the apache user). The permissions for this group must be not include write access for files or directories. For even if a file is set to '0040', but the containing directory is '0060', any user in the group that owns the directory can delete the existing file and replace it with a new file, effectively ignoring the read-only permission set for the file.

For more information, see the official mediawikiwiki:Manual:Security guide from Mediawiki

Updating Mediawiki

First of all, it is not uncommon for an attempt to update mediawiki to result in an entirely broken site. If you do not have linux and bash literacy, do not attempt to update mediawiki. Moreover, you should be well-versed in how to work with mysqldump, tar, rsync, chmod, chown, & sudo. If you are not confident in how all of these commands work, do not proceed. Hire someone with sysops experience to follow this guide; it should take them less than a couple hours to update and/or revert if the update fails.

Note that you certainly want to do this on a staging site & thoroughly test it (follow the Wiki Validation document as a guide) before proceeding with production. There almost certainly will be issues.

Step 0: Trigger Backup Scripts for System-Wide backup

For good measure, trigger a backup of the entire system's database & files:

sudo su -
sudo time /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log

When finished, SSH into the dreamhost server to verify that the whole system backup was successful before proceeding

source /root/backups/backup.settings
ssh $RSYNC_USER@$RSYNC_HOST 'du -sh backups/hetzner2/*'

Step 1: Set variables

Type these commands to set some variables, which will be used by the commands in the sections below. Carefully review the contents of each variable before proceeding; many may need updating.

export vhostDir=/var/www/html/wiki.opensourceecology.org
export mwDocroot="${vhostDir}/htdocs"

# get this from keepass
export dbSuperUser=CHANGEME
export dbSuperPass=CHANGEME

stamp=`date +%Y%m%d_%T`
mwUpgradeTmpDir="/var/tmp/mwUpgrade.${stamp}"

# set this to the latest stable version of mediawiki
export newMediawikiSourceUrl='https://releases.wikimedia.org/mediawiki/1.30/mediawiki-1.30.0.tar.gz'

Step 2: Make Vhost-specific backups

The backups made in the previous step are huge. Because it's easier to work with vhost-specific backups, let's make a redundant copy available in /var/tmp/:

sudo su -

dbName=osewiki_db
dbUser=osewiki_user
 dbPass=CHANGEME
 rootDbPass=CHANGEME

stamp=`date +%Y%m%d_%T`
tmpDir=/var/tmp/dbChange.$stamp
mkdir $tmpDir
chown root:root $tmpDir
chmod 0700 $tmpDir
pushd $tmpDir
service httpd stop

# create backup of all DBs for good measure
 time nice mysqldump -uroot -p$rootDbPass --all-databases | gzip -c > preBackup.all_databases.$stamp.sql.gz

# dump wp DB contents
 time nice mysqldump -u$dbUser -p$dbPass --database $dbName > $dbName.$stamp.sql

# files backup
rsync -av --progress "${vhostDir}" "./vhostDir.${stamp}.bak/"

Step 3: Permissions

Set the permissions in the vhost dir by running the commands listed in the Proper File/Directory Ownership & Permissions section.

Step 4: Download Latest Mediawiki Core

# download mediawiki core source code (note this must be done instead of using
# git since [a] git does not include the vendor dir contents and [b] we cannot
# use Composer since it would require breaking our hardened php.ini config

# first, do some string analysis to determine the file, version, and branch name
mwSourceFile=`basename "${newMediawikiSourceUrl}"`
mwReleaseName=`echo "${mwSourceFile}" | awk -F'mediawiki-' '{print $2}' | awk -F. '{print "REL" $1 "_" $2}'`
mwExtractedDir="`echo $mwSourceFile | awk -F'.tar.gz' '{print $1}'`"


pushd "${mwUpgradeTmpDir}"
wget "${newMediawikiSourceUrl}"
tar -xzvf "${mwSourceFile}"
mkdir "${mwDocroot}"
rsync -av --progress "${mwExtractedDir}/" "${mwDocroot}/"

Step 5: Extensions & Skins

Run the following commands to get your Extensions & Skins from git

# extensions
pushd "${mwDocroot}/extensions"
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/CategoryTree.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ConfirmAccount.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Cite.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ParserFunctions.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Gadgets.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/UserMerge.git

git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Widgets.git
pushd Widgets
git submodule init
git submodule update
popd

# skins
pushd "${mwDocroot}/skins"
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/CologneBlue.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/Modern.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/MonoBook.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/Vector.git
popd

Step 6: Merge images

TODO: rsync images dir from old site into new $mwDocroot

Step 7: Merge data

TODO: import db dump from old site into new db

Step 8: Set Permissions

Once again set the permissions in the vhost dir by running the commands listed in the Proper File/Directory Ownership & Permissions section.

Step 9: Update script

Run the mediawiki update script using the superusr db user/pass found in keepass (note: the superusr password is intentionally *not* stored on the server outside of keepass)

# attempt to update
pushd ${mwDocroot}/maintenance
php update.php --dbuser "${dbSuperUser}" --dbpass "${dbSuperPass}"

Step 10: Validate

See Wiki Validation

Revert

TODO restore procedure

CLI Guides

This section will provide commands to achieve certain actions for managing Mediawiki

Changes

As of 2018-07, we have no ticket tracking or change control process. For now, everything is on the wiki as there's higher priorities. Hence, here's some articles used to track production changes:

  1. CHG-2018-05-22 - migration of wiki from hetzner1 to hetzner2 by Michael Altfield


See Also

  1. OSE Server
  2. 2FA
  3. Web server configuration
  4. Wordpress
  5. CHG-2018-05-22
  6. Wiki Validation
  7. Wiki instructions
  8. Wiki maintenance

References