Mediawiki
Contents
- 1 Extensions
- 2 Special Pages
- 3 Guides
- 4 Scaling
- 5 LocalSettings.php
- 6 Proper File/Directory Ownership & Permissions
- 7 Updating Mediawiki
- 7.1 Step 0: Trigger Backup Scripts for System-Wide backup
- 7.2 Step 1: Set variables
- 7.3 Step 2: Make Vhost-specific backups
- 7.4 Step 3: Permissions
- 7.5 Step 4: Download Latest Mediawiki Core
- 7.6 Step 5: Extensions & Skins
- 7.7 Step 6: Merge images
- 7.8 Step 7: Merge data
- 7.9 Step 8: Set Permissions
- 7.10 Step 9: Update script
- 7.11 Step 10: Validate
- 7.12 Revert
- 8 CLI Guides
- 9 Changes
- 10 See Also
- 11 References
Extensions
This section will provide a list of the extensions used in our wiki
- CategoryTree
- ConfirmAccount
- Cite
- ParserFunctions
- Gadgets
- UserMerge
- OATHAuth
- note that this will be included with Mediawiki on the next update (to v1.31)
- Replace Text
- note that this will be included with Mediawiki on the next update (to v1.31)
- Renameuser
- note that this will be included with Mediawiki on the next update (to v1.31)
For a more accurate list of what this wiki is currently running, see Special:Version
Proposed
We may want to consider testing & adding these extensions in the future
- 3DAlloy for displaying intractable 3d models (ie: stl files) within our wiki directly with WebGL
- WikiEditor
- WikiSEO
- CookieWarning for GDPR
Special Pages
This section will link to useful Special pages
- Special:EmailUser Email users. Note that this does not show up on Special:SpecialPages, but if you go the user's page (ie: User:Maltfield) there's an "Email this user" link on the left-hand navigation panel.
Guides
This section will describe the process of routine actions needed for dealing with Mediawiki
Deleting Content by Request
Mediawiki intentionally makes permanently deleting content difficult. You may delete it from an article, but it will still appear in the revision history of the article.
To permanently delete PII at the request of a user, you must delete previous revisions using RevisionDelete
Deleting Users by Request
In fact, users cannot be safely deleted from Mediawiki without damaging the wiki [1][2]
Instead, if a user requests to be deleted from the wiki, we should do the following:
- Replace the email address associated with their account to something bogus, like 'no@example.com'. The user can do this themselves with Special:ChangeEmail page, but--as an Administrator--this must be done from command line.
pushd /var/www/html/wiki.opensourceecology.org/htdocs/maintenance # example.com is actually a reserved domain name that cannot actually exist; we should be using it here php resetUserEmail.php 'SomeUser' 'no@example.com' popd
- Rename their username to something bogus, like deleteduser001
- Block the user account with 'indefinite' expiration and uncheck all the boxes.
Note that this is distinct from the process for blocking malicious or spamming users.
Tips
Here's some tips to help you coorelate data stored in our wiki db with a specific user.
Lookup user by email address
Mediawiki appears to intentionally not make it possible to lookup users by email address in the WUI as a protection of user's privacy[3][4]
Instead, lookups can be done by manually querying the database.
[root@opensourceecology ~]# cd /var/www/html/wiki.opensourceecology.org/ [root@opensourceecology wiki.opensourceecology.org]# grep wgDB LocalSettings.php ... $wgDBname = "OBFUSCATED_DB"; $wgDBuser = "OBFUSCATED_USER"; $wgDBpassword = "OBFUSCATED_PASSWORD"; $wgDBprefix = "OBFUSCATED_PREFIX"; ... [root@opensourceecology wiki.opensourceecology.org]# mysql -u OBFUSCATED_USER -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 12037358 Server version: 5.5.56-MariaDB MariaDB Server Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> use OBFUSCATED_DB; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed MariaDB [OBFUSCATED_DB]> select user_id,user_name FROM OBFUSCATED_PREFIX_user WHERE user_email = 'michael@opensourceecology.org'; +---------+-----------+ | user_id | user_name | +---------+-----------+ | 3709 | Maltfield | +---------+-----------+ 1 row in set (0.00 sec) MariaDB [OBFUSCATED_DB]>
Find contributions from a given user
Visit Special:Contributions
Scaling
At the time of writing (2018), we host our wiki (as well as many other sites) on a single dedicated server with hetnzer (see OSE Server). However, looking forward, if we wish to scale-up mediawiki, we may need to pay for many distinct servers and hire a full-time sysadmin to deal with the corresponding complexity.
Comparison to Wikipedia
When we begin to compare the scalability of wikipedia, it's important to remember that their system is composed of many distinct servers. For example, they have:
- Load Balancers
- Nginx servers (ssl termination)
- Varnish front-end servers
- Varnish back-end servers
- Apache servers
- Memcached servers
- DB Master servers
- DB Slave servers
- Swift (Open Stack) object servers
- Kafka, logstash, grafana, etc servers
source: https://meta.wikimedia.org/wiki/Wikimedia_servers
2017-2018
In the Wikimedia Foundation's 2017-2018 [5] financial statement, they listed an expense of
- $2.1 million in "Internet hosting"
It's important to note, however, that this line-item is obfuscated by the fact that WMF owns much of their equipment. If they were leasing server space (by the hour in a cloud or dedicated servers by the month, for example), then the cost would be much higher. In the same 2017-2018 financial statement, they also listed assets of:
- $13.2 million of "computer equipment"
- $0.9 million of "Capital lease computer equipment"
As well as
- $1.5 million in "Purchase of computer equipment"
It is not trivial to see the current number of servers running wikipedia, but their most recent reported figures were 300 in Florida + 44 in Amsterdam from 2009 [6]
And in 2015, they list 520 servers in the main cluster (eqiad), but don't list other clusters[7]. The source listed here is ganglia, which is no longer online[8]. Wikipedia's Grafana (which _is_ online) doesn't list such numbers[9].
2012
Regarding object size, the swift (open stack) media storage for wikipedia was 20T as of 2012 [10]
2009-2010
Looking back at 2009 when we know there were 344 servers, the financial report from 2009-2010 shows an expense of $1,067 on "internet hosting." The ownsership of computer equipment is not explicitly broken-down here, but their entire total assets are $15.4k in this year. [11]
In this year (2009-2010), wikipedia had 12 billion average monthly page views.
LocalSettings.php
This section will describe some of our decisions in configuring Mediawiki via LocalSettings.php
$maxUploadSize
As of 2018, we set the maximum upload size to 1M. Prior to the wiki migration to hetzner2 on 2018-05-24, there was no limit. The result: people were casually dropping unnecessarily large (ie: >2M images) into articles. The result: our wiki was growing at an unsustainable rate.
A note on growth: yes, mediawiki scales. Yes, wikipedia doesn't need to implement such caps. But we currently don't have a defined budget for IT while wikipedia spends literally millions of dollars per year on their infrastructure[12] and owns over $13 million in "computer equipment" [13]
For more information on Wikipedia's scaling, see Mediawiki#Comparison_to_Wikipedia.
By comparison, in 2018, OSE has literally 1 server. We don't even have a budget for a development server or a paid sysadmin.
So, yes, the OSE wiki can certainly scale. But the complexity grows significantly as it does scale. So until we're ready to handle that growth (ie: budget for $100k-$1m per year for server and salary expenses), we should keep our footprint as reasonably small as possible.
That said, the current obvious expense that grows with the growth of our wiki is our backups (a 20G mediawiki quickly becomes much, much larger once you consider a few copies of daily backups and several copies of monthly backups encrypted & shipped off-site to some durable, geographically distinct location). As of 2018-06, we're spending about $100/year on ~1T of storage on backups split between Amazon Glacier (for our long-storage monthlys) and Amazon S3 (for our daily backups).
Therefore, it's best to cap uploads to 1M. Files larger than this can be stored in github.com and/or archive.org, then simply linked-to from within our wiki.
For information on how to do a batch resize of images prior to uploading them to the wiki, see Batch Resize. I recommend 1024x768.
Proper File/Directory Ownership & Permissions
This section will describe how the file permissions should be set on an OSE mediawiki site.
For the purposes of this documentation, let's assume:
- vhost dir = /var/www/html/wiki.opensourceecology.org
- mediawiki docroot = /var/www/html/wiki.opensourceecology.org/htdocs
Then the ideal permissions are:
- Files containing passwords (ie: LocalSettings.php) should be located outside the wiki docroot [14] with not-apache:apache-admins 0040
- Files in the 'images/' dir should be apache:apache 0660
- Directories in the 'images/' dir should be apache:apache 0770
- Files in the 'cache/' dir (outside the docroot) should be apache:apache 0660
- Directories in the 'cache/' dir (outside the docroot) should be apache:apache 0770
- All other files in the vhost dir should be not-apache:apache 0040
- All other directories in the vhost dir should be not-apache:apache 0050
This is achievable with the following idempotent commands:
vhostDir="/var/www/html/wiki.opensourceecology.org" mwDocroot="${vhostDir}/htdocs" chown -R not-apache:apache "${vhostDir}" find "${vhostDir}" -type d -exec chmod 0050 {} \; find "${vhostDir}" -type f -exec chmod 0040 {} \; chown not-apache:apache-admins "${vhostDir}/LocalSettings.php" chmod 0040 "${vhostDir}/LocalSettings.php" [ -d "${mwDocroot}/images" ] || mkdir "${mwDocroot}/images" chown -R apache:apache "${mwDocroot}/images" find "${mwDocroot}/images" -type f -exec chmod 0660 {} \; find "${mwDocroot}/images" -type d -exec chmod 0770 {} \; [ -d "${vhostDir}/cache" ] || mkdir "${vhostDir}/cache" chown -R apache:apache "${vhostDir}/cache" find "${vhostDir}/cache" -type f -exec chmod 0660 {} \; find "${vhostDir}/cache" -type d -exec chmod 0770 {} \;
Such that:
- the 'not-apache' user is a new user that doesn't run any software (ie: a daemon such as a web server) and whose shell is "/sbin/nologin" and home is "/dev/null".
- the apache user is in the apache-admins group
- the apache user is in the apache group
- any human users that need read-only access to the mediawiki vhost files for debugging purposes and/or write access to the 'images/' directory (ie: to upload large files that are too large to be handled by the web servers chain), then that user should be added to the 'apache' group
- any human users that need read-only access to the mediawiki vhost files, including config files containing passwords (ie: LocalSettings.php), should be added to the 'apache-admins' group
- for anyone to make changes to any files in the docroot (other than 'images/'), they must be the root user. I think this is fair if they don't have the skills necessary to become root, they probably shouldn't modify the mediawiki core files anyway.
Why?
The following explains why the above permissions are ideal:
- All of the files & directories that don't need write permissions should not have write permissions. That's every file in a mediawiki docroot except the folder "images/" and its subfiles/dirs.
- World permissions (not-user && not-group) for all files & directories inside the docroot (and including the docroot dir itself!) should be set to 0 for all files & all directories.
- Excluding 'images/', these files should also not be owned by the user that runs a webserver (in cent, that's the 'apache' user). For even if the file is set to '0400', but it's owned by the 'apache' user, the 'apache' user can ignore the permissions & write to it anyway. We don't want the apache user (which runs the apache process) to be able to modify files. If it could, then a compromised webserver could modify a php file and effectively do a remote code execution.
- Excluding 'images/', all directories in the docroot (including the docroot dir itself!) should be owned by a group that contains the user that runs our webserver (in cent, that's the apache user). The permissions for this group must be not include write access for files or directories. For even if a file is set to '0040', but the containing directory is '0060', any user in the group that owns the directory can delete the existing file and replace it with a new file, effectively ignoring the read-only permission set for the file.
For more information, see the official mediawikiwiki:Manual:Security guide from Mediawiki
Updating Mediawiki
First of all, it is not uncommon for an attempt to update mediawiki to result in an entirely broken site. If you do not have linux and bash literacy, do not attempt to update mediawiki. Moreover, you should be well-versed in how to work with mysqldump, tar, rsync, chmod, chown, & sudo. If you are not confident in how all of these commands work, do not proceed. Hire someone with sysops experience to follow this guide; it should take them less than a couple hours to update and/or revert if the update fails.
Note that you certainly want to do this on a staging site & thoroughly test it (follow the Wiki Validation document as a guide) before proceeding with production. There almost certainly will be issues.
Step 0: Trigger Backup Scripts for System-Wide backup
For good measure, trigger a backup of the entire system's database & files:
sudo su - sudo time /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log
When finished, SSH into the dreamhost server to verify that the whole system backup was successful before proceeding
source /root/backups/backup.settings ssh $RSYNC_USER@$RSYNC_HOST 'du -sh backups/hetzner2/*'
Step 1: Set variables
Type these commands to set some variables, which will be used by the commands in the sections below. Carefully review the contents of each variable before proceeding; many may need updating.
export vhostDir=/var/www/html/wiki.opensourceecology.org export mwDocroot="${vhostDir}/htdocs" # get this from keepass export dbSuperUser=CHANGEME export dbSuperPass=CHANGEME stamp=`date +%Y%m%d_%T` mwUpgradeTmpDir="/var/tmp/mwUpgrade.${stamp}" # set this to the latest stable version of mediawiki export newMediawikiSourceUrl='https://releases.wikimedia.org/mediawiki/1.30/mediawiki-1.30.0.tar.gz'
Step 2: Make Vhost-specific backups
The backups made in the previous step are huge. Because it's easier to work with vhost-specific backups, let's make a redundant copy available in /var/tmp/:
sudo su - dbName=osewiki_db dbUser=osewiki_user dbPass=CHANGEME rootDbPass=CHANGEME stamp=`date +%Y%m%d_%T` tmpDir=/var/tmp/dbChange.$stamp mkdir $tmpDir chown root:root $tmpDir chmod 0700 $tmpDir pushd $tmpDir service httpd stop # create backup of all DBs for good measure time nice mysqldump -uroot -p$rootDbPass --all-databases | gzip -c > preBackup.all_databases.$stamp.sql.gz # dump wp DB contents time nice mysqldump -u$dbUser -p$dbPass --database $dbName > $dbName.$stamp.sql # files backup rsync -av --progress "${vhostDir}" "./vhostDir.${stamp}.bak/"
Step 3: Permissions
Set the permissions in the vhost dir by running the commands listed in the Proper File/Directory Ownership & Permissions section.
Step 4: Download Latest Mediawiki Core
# download mediawiki core source code (note this must be done instead of using # git since [a] git does not include the vendor dir contents and [b] we cannot # use Composer since it would require breaking our hardened php.ini config # first, do some string analysis to determine the file, version, and branch name mwSourceFile=`basename "${newMediawikiSourceUrl}"` mwReleaseName=`echo "${mwSourceFile}" | awk -F'mediawiki-' '{print $2}' | awk -F. '{print "REL" $1 "_" $2}'` mwExtractedDir="`echo $mwSourceFile | awk -F'.tar.gz' '{print $1}'`" pushd "${mwUpgradeTmpDir}" wget "${newMediawikiSourceUrl}" tar -xzvf "${mwSourceFile}" mkdir "${mwDocroot}" rsync -av --progress "${mwExtractedDir}/" "${mwDocroot}/"
Step 5: Extensions & Skins
Run the following commands to get your Extensions & Skins from git
# extensions pushd "${mwDocroot}/extensions" git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/CategoryTree.git git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ConfirmAccount.git git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Cite.git git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ParserFunctions.git git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Gadgets.git git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/UserMerge.git git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Widgets.git pushd Widgets git submodule init git submodule update popd # skins pushd "${mwDocroot}/skins" git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/CologneBlue.git git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/Modern.git git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/MonoBook.git git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/Vector.git popd
Step 6: Merge images
TODO: rsync images dir from old site into new $mwDocroot
Step 7: Merge data
TODO: import db dump from old site into new db
Step 8: Set Permissions
Once again set the permissions in the vhost dir by running the commands listed in the Proper File/Directory Ownership & Permissions section.
Step 9: Update script
Run the mediawiki update script using the superusr db user/pass found in keepass (note: the superusr password is intentionally *not* stored on the server outside of keepass)
# attempt to update pushd ${mwDocroot}/maintenance php update.php --dbuser "${dbSuperUser}" --dbpass "${dbSuperPass}"
Step 10: Validate
See Wiki Validation
Revert
TODO restore procedure
CLI Guides
This section will provide commands to achieve certain actions for managing Mediawiki
Changes
As of 2018-07, we have no ticket tracking or change control process. For now, everything is on the wiki as there's higher priorities. Hence, here's some articles used to track production changes:
- CHG-2018-05-22 - migration of wiki from hetzner1 to hetzner2 by Michael Altfield
See Also
- OSE Server
- 2FA
- Web server configuration
- Wordpress
- CHG-2018-05-22
- Wiki Validation
- Wiki instructions
- Wiki maintenance
References
- ↑ https://www.mediawiki.org/wiki/GDPR_(General_Data_Protection_Regulation)_and_MediaWiki_software#Deleting_user_accounts
- ↑ https://www.mediawiki.org/wiki/Manual:FAQ#.E2.80.A6is_it_a_good_idea_to_keep_user_accounts.3F
- ↑ https://en.wikipedia.org/wiki/Wikipedia:Emailing_users#Privacy_issues_and_protecting_personal_information
- ↑ https://www.mediawiki.org/wiki/Extension:LookupUser
- ↑ "Financial Statements, year ending June 30, 2017", Wikimedia Foundation, Inc.
- ↑ "Wikipedia Hardware operations and support", Wikipedia
- ↑ "Wikipedia Foundation Hardware", Wikipedia
- ↑ "Wikimedia Ganglia"
- ↑ "Wikimedia Grafana"
- ↑ Hartshorne, Ben. "Swift at Wikimedia", Wikimedia Foundation, Inc.
- ↑ "Wikimedia Foundation annual report 2009-2010", Wikimedia Foundation, Inc.
- ↑ "Finanical Summary Mid-year review 2017-2018", Wikimedia Foundation, Inc.
- ↑ "Financial Statements, year ending June 30, 2017", Wikimedia Foundation, Inc.
- ↑ https://www.mediawiki.org/wiki/Manual:Security#Move_sensitive_information