Mediawiki: Difference between revisions

From Open Source Ecology
Jump to navigation Jump to search
No edit summary
 
(89 intermediate revisions by the same user not shown)
Line 1: Line 1:
=CLI Guides=
=Extensions=
 
This section will provide a list of the extensions used in our wiki
 
# [https://www.mediawiki.org/wiki/Extension:CategoryTree CategoryTree]
# [https://www.mediawiki.org/wiki/Extension:ConfirmAccount ConfirmAccount]
# [https://www.mediawiki.org/wiki/Extension:Cite Cite]
# [https://www.mediawiki.org/wiki/Extension:ParserFunctions ParserFunctions]
# [https://www.mediawiki.org/wiki/Extension:Gadgets Gadgets]
# [https://www.mediawiki.org/wiki/Extension:UserMerge UserMerge]
# [https://www.mediawiki.org/wiki/Extension:OATHAuth OATHAuth]
## note that this will be included with Mediawiki on the next update (to v1.31)
# [https://www.mediawiki.org/wiki/Extension:Replace_Text Replace Text]
## note that this will be included with Mediawiki on the next update (to v1.31)
# [https://www.mediawiki.org/wiki/Extension:Renameuser Renameuser]
## note that this will be included with Mediawiki on the next update (to v1.31)
 
For a more accurate list of what this wiki is currently running, see [[Special:Version]]
 
==Proposed==
 
We may want to consider testing & adding these extensions in the future
 
# [https://www.mediawiki.org/wiki/Extension:3DAlloy 3DAlloy] for displaying intractable 3d models (ie: stl files) within our wiki directly with WebGL
# [https://www.mediawiki.org/wiki/Extension:WikiEditor WikiEditor]
# [https://www.mediawiki.org/wiki/Extension:WikiSEO WikiSEO]
# [https://www.mediawiki.org/wiki/Extension:CookieWarning CookieWarning] for GDPR
 
=Special Pages=
 
This section will link to useful Special pages
 
# [[Special:EmailUser]] Email users. Note that this does not show up on [[Special:SpecialPages]], but if you go the user's page (ie: [[User:Maltfield]]) there's an "Email this user" link on the left-hand navigation panel.
 
=Guides=
 
This section will describe the process of routine actions needed for dealing with Mediawiki
 
==Deleting Content by Request==
 
Mediawiki intentionally makes permanently deleting content difficult. You may delete it from an article, but it will still appear in the revision history of the article.
 
To permanently delete PII at the request of a user, you must delete previous revisions using [https://www.mediawiki.org/wiki/Help:RevisionDelete RevisionDelete]
 
==Deleting Users by Request==
 
In fact, users cannot be safely deleted from Mediawiki without damaging the wiki <ref>https://www.mediawiki.org/wiki/GDPR_(General_Data_Protection_Regulation)_and_MediaWiki_software#Deleting_user_accounts</ref><ref>https://www.mediawiki.org/wiki/Manual:FAQ#.E2.80.A6is_it_a_good_idea_to_keep_user_accounts.3F</ref>
 
Instead, if a user requests to be deleted from the wiki, we should do the following:
 
# Replace the email address associated with their account to something bogus, like 'no@example.com'. The user can do this themselves with [[Special:ChangeEmail]] page, but--as an Administrator--this must be done from command line.
<pre>
pushd /var/www/html/wiki.opensourceecology.org/htdocs/maintenance
# example.com is actually a reserved domain name that cannot actually exist; we should be using it here
php resetUserEmail.php 'SomeUser' 'no@example.com'
popd
</pre>
# [[Special:RenameUser|Rename]] their username to something bogus, like deleteduser001
# [[Special:Block|Block]] the user account with 'indefinite' expiration and uncheck all the boxes.
 
Note that this is distinct from the process for blocking malicious or spamming users.
 
===Tips===
 
Here's some tips to help you coorelate data stored in our wiki db with a specific user.
 
====Lookup user by email address====
 
Mediawiki appears to intentionally not make it possible to lookup users by email address in the WUI as a protection of user's privacy<ref>https://en.wikipedia.org/wiki/Wikipedia:Emailing_users#Privacy_issues_and_protecting_personal_information</ref><ref>https://www.mediawiki.org/wiki/Extension:LookupUser</ref>
 
Instead, lookups can be done by manually querying the database.
 
<pre>
[root@opensourceecology ~]# cd /var/www/html/wiki.opensourceecology.org/
[root@opensourceecology wiki.opensourceecology.org]# grep wgDB LocalSettings.php
...
$wgDBname          = "OBFUSCATED_DB";
$wgDBuser          = "OBFUSCATED_USER";
$wgDBpassword      = "OBFUSCATED_PASSWORD";
$wgDBprefix        = "OBFUSCATED_PREFIX";
...
[root@opensourceecology wiki.opensourceecology.org]# mysql -u OBFUSCATED_USER -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 12037358
Server version: 5.5.56-MariaDB MariaDB Server
 
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> use OBFUSCATED_DB;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
MariaDB [OBFUSCATED_DB]> select user_id,user_name FROM OBFUSCATED_PREFIX_user WHERE user_email = 'michael@opensourceecology.org';
+---------+-----------+
| user_id | user_name |
+---------+-----------+
|    3709 | Maltfield |
+---------+-----------+
1 row in set (0.00 sec)
 
MariaDB [OBFUSCATED_DB]>
</pre>
 
====Find contributions from a given user====
 
Visit [[Special:Contributions]]
 
=Scaling=
 
At the time of writing (2018), we host our wiki (as well as many other sites) on a single dedicated server with hetnzer (see [[OSE Server]]). However, looking forward, if we wish to scale-up mediawiki, we may need to pay for many distinct servers and hire a full-time sysadmin to deal with the corresponding complexity.
 
==Comparison to Wikipedia==
 
When we begin to compare the scalability of wikipedia, it's important to remember that their system is composed of many distinct servers. For example, they have:
 
# Load Balancers
# Nginx servers (ssl termination)
# Varnish front-end servers
# Varnish back-end servers
# Apache servers
# Memcached servers
# DB Master servers
# DB Slave servers
# Swift (Open Stack) object servers
# Kafka, logstash, grafana, etc servers
 
source: https://meta.wikimedia.org/wiki/Wikimedia_servers
 
===2017-2018===
 
In the Wikimedia Foundation's 2017-2018 <ref>[https://upload.wikimedia.org/wikipedia/foundation/d/da/Wikimedia_Foundation_Audit_Report_-_FY16-17.pdf "Financial Statements, year ending June 30, 2017"], ''Wikimedia Foundation, Inc.''</ref>
financial statement, they listed an expense of
 
* $2.1 million in "Internet hosting"
 
It's important to note, however, that this line-item is obfuscated by the fact that WMF owns much of their equipment. If they were leasing server space (by the hour in a cloud or dedicated servers by the month, for example), then the cost would be much higher. In the same 2017-2018 financial statement, they also listed assets of:
 
* $13.2 million of "computer equipment"
* $0.9 million of "Capital lease computer equipment"
 
As well as
 
* $1.5 million in "Purchase of computer equipment"
 
It is not trivial to see the current number of servers running wikipedia, but their most recent reported figures were 300 in Florida + 44 in Amsterdam from 2009 <ref>[https://en.wikipedia.org/wiki/Wikipedia#Hardware_operations_and_support "Wikipedia Hardware operations and support"], ''Wikipedia''</ref>
 
And in 2015, they list 520 servers in the main cluster (eqiad), but don't list other clusters<ref>[https://en.wikipedia.org/wiki/Wikimedia_Foundation#Hardware "Wikipedia Foundation Hardware"], ''Wikipedia''</ref>. The source listed here is ganglia, which is no longer online<ref>[https://ganglia.wikimedia.org/latest/ "Wikimedia Ganglia"]</ref>. Wikipedia's Grafana (which _is_ online) doesn't list such numbers<ref>[https://grafana.wikimedia.org/ "Wikimedia Grafana"]</ref>.
 
===2012===
 
Regarding object size, the swift (open stack) media storage for wikipedia was 20T as of 2012 <ref>[https://wikitech.wikimedia.org/w/index.php?title=File:Swift_Presentation_for_WMF_Openstack_Meetup_2012-03-22.pdf&page=2 Hartshorne, Ben. "Swift at Wikimedia"], ''Wikimedia Foundation, Inc.''</ref>
 
===2009-2010===
 
Looking back at 2009 when we know there were 344 servers, the financial report from 2009-2010 shows an expense of $1,067 on "internet hosting." The ownsership of computer equipment is not explicitly broken-down here, but their entire total assets are $15.4k in this year. <ref>[https://upload.wikimedia.org/wikipedia/commons/9/9f/AR_web_all-spreads_24mar11_72_FINAL.pdf "Wikimedia Foundation annual report 2009-2010"], ''Wikimedia Foundation, Inc.''</ref>
 
In this year (2009-2010), wikipedia had 12 billion average monthly page views.
 
=LocalSettings.php=
 
This section will describe some of our decisions in configuring Mediawiki via LocalSettings.php
 
==$maxUploadSize==
 
As of 2018, we set the maximum upload size to 1M. Prior to the [[CHG-2018-05-22|wiki migration to hetzner2 on 2018-05-24]], there was no limit. The result: people were casually dropping unnecessarily large (ie: >2M images) into articles. The result: our wiki was growing at an unsustainable rate.
 
A note on growth: yes, mediawiki scales. Yes, wikipedia doesn't need to implement such caps. But we currently don't have a defined budget for IT while wikipedia spends literally millions of dollars per year on their infrastructure<ref>[https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2017-2018/Mid-year_review#Financial_Summary "Finanical Summary Mid-year review 2017-2018"], ''Wikimedia Foundation, Inc.''</ref> and owns over $13 million in "computer equipment" <ref>[https://upload.wikimedia.org/wikipedia/foundation/d/da/Wikimedia_Foundation_Audit_Report_-_FY16-17.pdf "Financial Statements, year ending June 30, 2017"], ''Wikimedia Foundation, Inc.''</ref>
 
For more information on Wikipedia's scaling, see [[Mediawiki#Comparison_to_Wikipedia]].
 
By comparison, in 2018, OSE has literally 1 server. We don't even have a budget for a development server or a paid sysadmin.
 
So, yes, the OSE wiki can certainly scale. But the complexity grows significantly as it does scale. So until we're ready to handle that growth (ie: budget for $100k-$1m per year for server and salary expenses), we should keep our footprint as reasonably small as possible.
 
That said, the current obvious expense that grows with the growth of our wiki is our backups (a 20G mediawiki quickly becomes much, much larger once you consider a few copies of daily backups and several copies of monthly backups encrypted & shipped off-site to some durable, geographically distinct location). As of 2018-06, we're spending about $100/year on ~1T of storage on backups split between Amazon Glacier (for our long-storage monthlys) and Amazon S3 (for our daily backups).
 
Therefore, it's best to cap uploads to 1M. Files larger than this can be stored in github.com and/or archive.org, then simply linked-to from within our wiki.


This section will provide commands to achieve certain actions for managing Mediawiki
For information on how to do a batch resize of images prior to uploading them to the wiki, see [[Batch Resize]]. I recommend 1024x768.


=Proper File/Directory Ownership & Permissions=
=Proper File/Directory Ownership & Permissions=
Line 13: Line 193:
Then the ideal permissions are:
Then the ideal permissions are:


# Files containing passwords (ie: LocalSettings.php) should be located outside the wiki docroot with not-apache:apache-admins 0040
# Files containing passwords (ie: LocalSettings.php) should be located outside the wiki docroot <ref>https://www.mediawiki.org/wiki/Manual:Security#Move_sensitive_information</ref> with not-apache:apache-admins 0040
# Files in the 'images/' dir should be apache:apache 0660
# Files in the 'images/' dir should be apache:apache 0660
# Directories in the 'images/' dir should be apache:apache 0770
# Directories in the 'images/' dir should be apache:apache 0770
# Files in the 'cache/' dir (outside the docroot) should be apache:apache 0660
# Directories in the 'cache/' dir (outside the docroot) should be apache:apache 0770
# All other files in the vhost dir should be not-apache:apache 0040
# All other files in the vhost dir should be not-apache:apache 0040
# All other directories in the vhost dir should be not-apache:apache 0050
# All other directories in the vhost dir should be not-apache:apache 0050
Line 36: Line 218:
find "${mwDocroot}/images" -type f -exec chmod 0660 {} \;
find "${mwDocroot}/images" -type f -exec chmod 0660 {} \;
find "${mwDocroot}/images" -type d -exec chmod 0770 {} \;
find "${mwDocroot}/images" -type d -exec chmod 0770 {} \;
[ -d "${vhostDir}/cache" ] || mkdir "${vhostDir}/cache"
chown -R apache:apache "${vhostDir}/cache"
find "${vhostDir}/cache" -type f -exec chmod 0660 {} \;
find "${vhostDir}/cache" -type d -exec chmod 0770 {} \;
</pre>
</pre>


Line 58: Line 245:
For more information, see the official [[mediawikiwiki:Manual:Security]] guide from Mediawiki
For more information, see the official [[mediawikiwiki:Manual:Security]] guide from Mediawiki


==migrate site from hetzner1 to hetzner2==
=Updating Mediawiki=


this process was used to migrate the mediawiki site from hetzner1 (shared hosting) to hetzner2 (dedicated server)
First of all, it is not uncommon for an attempt to update mediawiki to result in an entirely broken site. If you do not have linux and bash literacy, do not attempt to update mediawiki. Moreover, you should be well-versed in how to work with mysqldump, tar, rsync, chmod, chown, & sudo. If you are not confident in how all of these commands work, do not proceed. Hire someone with sysops experience to follow this guide; it should take them less than a couple hours to update and/or revert if the update fails.


<pre>
Note that you certainly want to do this on a staging site & thoroughly test it (follow the [[Wiki Validation]] document as a guide) before proceeding with production. There almost certainly will be issues.
####################
# run on hetzner1 #
####################


# STEP 0: CREATE BACKUPS
==Step 0: Trigger Backup Scripts for System-Wide backup==
source /usr/home/osemain/backups/backup.settings
/usr/home/osemain/backups/backup.sh


# when finished, SSH into the dreamhost server to verify that the whole system backup was successful before proceeding
For good measure, trigger a backup of the entire system's database & files:
bash -c 'source /usr/home/osemain/backups/backup.settings; ssh $RSYNC_USER@$RSYNC_HOST du -sh backups/hetzner1/*'


# DECLARE VARIABLES
<pre>
source /usr/home/osemain/backups/backup.settings
sudo su -
stamp=`date +%Y%m%d`
sudo time /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log
backupDir_hetzner1="/usr/home/osemain/noBackup/tmp/backups_for_migration_to_hetzner2/wiki_${stamp}"
</pre>
backupFileName_db_hetzner1="mysqldump_wiki.${stamp}.sql.bz2"
backupFileName_files_hetzner1="wiki_files.${stamp}.tar.gz"
vhostDir_hetzner1='/usr/www/users/osemain/w'
dbName_hetzner1='osewiki'
dbUser_hetzner1="${mysqlUser_wiki}"
dbPass_hetzner1="${mysqlPass_wiki}"


# STEP 1: BACKUP DB
When finished, SSH into the dreamhost server to verify that the whole system backup was successful before proceeding
mkdir -p ${backupDir_hetzner1}/{current,old}
pushd ${backupDir_hetzner1}/current/
mv ${backupDir_hetzner1}/current/* ${backupDir_hetzner1}/old/
time nice mysqldump -u"${dbUser_hetzner1}" -p"${dbPass_hetzner1}" --all-databases --single-transaction | bzip2 -c > ${backupDir_hetzner1}/current/${backupFileName_db_hetzner1}


# STEP 2: BACKUP FILES
<pre>
time nice tar -czvf ${backupDir_hetzner1}/current/${backupFileName_files_hetzner1} ${vhostDir_hetzner1}
source /root/backups/backup.settings
ssh $RSYNC_USER@$RSYNC_HOST 'du -sh backups/hetzner2/*'
</pre>


####################
==Step 1: Set variables==
# run on hetzner2 #
####################


sudo su -
Type these commands to set some variables, which will be used by the commands in the sections below. Carefully review the contents of each variable before proceeding; many may need updating.


# STEP 0: CREATE BACKUPS
<pre>
# for good measure, trigger a backup of the entire system's database & files:
export vhostDir=/var/www/html/wiki.opensourceecology.org
time /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log
export mwDocroot="${vhostDir}/htdocs"


# when finished, SSH into the dreamhost server to verify that the whole system backup was successful before proceeding
# get this from keepass
bash -c 'source /root/backups/backup.settings; ssh $RSYNC_USER@$RSYNC_HOST du -sh backups/hetzner2/*'
export dbSuperUser=CHANGEME
export dbSuperPass=CHANGEME


# DECLARE VARIABLES
stamp=`date +%Y%m%d_%T`
source /root/backups/backup.settings
mwUpgradeTmpDir="/var/tmp/mwUpgrade.${stamp}"
stamp=`date +%Y%m%d`
backupDir_hetzner1="/usr/home/osemain/noBackup/tmp/backups_for_migration_to_hetzner2/wiki_${stamp}"
backupDir_hetzner2="/var/tmp/backups_for_migration_from_hetzner1/wiki_${stamp}"
backupFileName_db_hetzner1="mysqldump_wiki.${stamp}.sql.bz2"
backupFileName_files_hetzner1="wiki_files.${stamp}.tar.gz"
dbName_hetzner1='osewiki'
dbName_hetzner2='osewiki_db'
dbUser_hetzner2="osewiki_user"
dbPass_hetzner2="CHANGEME"
vhostDir_hetzner2="/var/www/html/wiki.opensourceecology.org"
docrootDir_hetzner2="${vhostDir_hetzner2}/htdocs"
newMediawikiSourceUrl='https://releases.wikimedia.org/mediawiki/1.30/mediawiki-1.30.0.tar.gz'


# STEP 1: COPY FROM HETZNER1
# set this to the latest stable version of mediawiki
export newMediawikiSourceUrl='https://releases.wikimedia.org/mediawiki/1.30/mediawiki-1.30.0.tar.gz'
</pre>


mkdir -p ${backupDir_hetzner2}/{current,old}
==Step 2: Make Vhost-specific backups==
mv ${backupDir_hetzner2}/current/* ${backupDir_hetzner2}/old/
scp -P 222 osemain@dedi978.your-server.de:${backupDir_hetzner1}/current/* ${backupDir_hetzner2}/current/


# STEP 2: ADD DB
The backups made in the previous step are huge. Because it's easier to work with vhost-specific backups, let's make a redundant copy available in /var/tmp/:


# create backup before we start changing the sql file
<pre>
pushd ${backupDir_hetzner2}/current
sudo su -
cp ${backupFileName_db_hetzner1} ${backupFileName_db_hetzner1}.orig


# extract .sql.bz2 -> .sql
dbName=osewiki_db
bzip2 -dc ${backupFileName_db_hetzner1} > db.sql
dbUser=osewiki_user
dbPass=CHANGEME
rootDbPass=CHANGEME


# verify the first 2 (non-comment) occurances of $dbName meet the naming convention of "<siteName>_db
stamp=`date +%Y%m%d_%T`
vim db.sql
tmpDir=/var/tmp/dbChange.$stamp
mkdir $tmpDir
chown root:root $tmpDir
chmod 0700 $tmpDir
pushd $tmpDir
service httpd stop


# fix youtube embeds
# create backup of all DBs for good measure
fromString='https://www.youtube.com/embed/'
time nice mysqldump -uroot -p$rootDbPass --all-databases | gzip -c > preBackup.all_databases.$stamp.sql.gz
toString='https://www.youtube.com/embed/'
sed -i "s^$fromString^$toString^g" db.sql


# fix issuu embeds
# dump wp DB contents
fromString='https://static.issuu.com/webembed/'
time nice mysqldump -u$dbUser -p$dbPass --database $dbName > $dbName.$stamp.sql
toString='https://static.issuu.com/webembed/'
sed -i "s^$fromString^$toString^g" db.sql


# fix scrumy embeds
# files backup
fromString='https://scrumy.com/'
rsync -av --progress "${vhostDir}" "./vhostDir.${stamp}.bak/"
toString='https://scrumy.com/'
</pre>
sed -i "s^$fromString^$toString^g" db.sql


# fix ted embeds
==Step 3: Permissions==
fromString='https://embed.ted.com/'
Set the permissions in the vhost dir by running the commands listed in the [[Mediawiki#Proper_File.2FDirectory_Ownership_.26_Permissions|Proper File/Directory Ownership & Permissions]] section.
toString='https://embed.ted.com/'
sed -i "s^$fromString^$toString^g" db.sql


# fix vimeo embeds
==Step 4: Download Latest Mediawiki Core==
fromString='https://player.vimeo.com/'
toString='https://player.vimeo.com/'
sed -i "s^$fromString^$toString^g" db.sql
 
time nice mysql -uroot -p${mysqlPass} -sNe "DROP DATABASE IF EXISTS ${dbName_hetzner2};"
time nice mysql -uroot -p${mysqlPass} -sNe "CREATE DATABASE ${dbName_hetzner2}; USE ${dbName_hetzner2};"
time nice mysql -uroot -p${mysqlPass} < "db.sql"
time nice mysql -uroot -p${mysqlPass} -sNe "GRANT SELECT, INSERT, UPDATE, DELETE ON ${dbName_hetzner2}.* TO '${dbUser_hetzner2}'@'localhost' IDENTIFIED BY '${dbPass_hetzner2}'; FLUSH PRIVILEGES;"
 
# STEP 3: Add vhost files
mv ${vhostDir_hetzner2}/* ${backupDir_hetzner2}/old/
time nice tar -xzvf ${backupFileName_files_hetzner1}
 
# set ['Database'] Name/User/Password
# add logic to block IE6 so we can safely remove the XSS bugfix 28235 .htaccess that breaks css
# set `$wgTmpDirectory = "/var/lib/php/tmp_upload"`
# set `$wgLogo = "/images/ose-logo.png"`
vim "${vhostDir_hetzner2}/LocalSettings.php"


<pre>
# download mediawiki core source code (note this must be done instead of using
# download mediawiki core source code (note this must be done instead of using
# git since [a] git does not include the vendor dir contents and [b] we cannot
# git since [a] git does not include the vendor dir contents and [b] we cannot
Line 189: Line 331:
mwExtractedDir="`echo $mwSourceFile | awk -F'.tar.gz' '{print $1}'`"
mwExtractedDir="`echo $mwSourceFile | awk -F'.tar.gz' '{print $1}'`"


pushd "${mwUpgradeTmpDir}"
wget "${newMediawikiSourceUrl}"
wget "${newMediawikiSourceUrl}"
tar -xzvf "${mwSourceFile}"
tar -xzvf "${mwSourceFile}"
mkdir "${docrootDir_hetzner2}"
mkdir "${mwDocroot}"
rsync -av --progress "${mwExtractedDir}/" "${docrootDir_hetzner2}/"
rsync -av --progress "${mwExtractedDir}/" "${mwDocroot}/"


# mediawiki ships with lots of calls to unsafe php functions that we've
</pre>
# intentionally disabled in our hardened php.ini config. They're not necessary
# and just flood our log files with warnings; so let's just comment them out now


find "${docrootDir_hetzner2}/includes/" -type f -exec sed -i 's%^\(\s*\)ini_set\(.*\)%\1#ini_set\2%' '{}' \;
==Step 5: Extensions & Skins==
find "${docrootDir_hetzner2}/includes/" -type f -exec sed -i 's%^\(\s*\)putenv\(.*\)%\1#putenv\2%' '{}' \;


# copy-in our images from backups
Run the following commands to get your Extensions & Skins from git
rsync -av --progress "usr/www/users/osemain/w/images/" "${docrootDir_hetzner2}/images/"
 
# and move the lone image sticking in root into the images directory
rsync -av --progress "usr/www/users/osemain/w/ose-logo.png" "${docrootDir_hetzner2}/images/"
 
# create LocalSettings.php that just requires the file from outside the docroot
# write multi-line to file for documentation copy & paste
cat << EOF > "${docrootDir_hetzner2}/LocalSettings.php"
<?php
# including separate file that contains the database password so that it is not stored within the document root.
# For more info see:
#  * https://www.mediawiki.org/wiki/Manual:Security
#  * https://wiki.r00tedvw.com/index.php/Mediawiki/Hardening
 
\$docRoot = dirname( __FILE__ );
require_once "\$docRoot/../LocalSettings.php";
?>
EOF


<pre>
# extensions
# extensions
pushd "${docrootDir_hetzner2}/extensions"
pushd "${mwDocroot}/extensions"
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/CategoryTree.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/CategoryTree.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ConfirmAccount.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ConfirmAccount.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ConfirmEdit.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Cite.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Cite.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ParserFunctions.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ParserFunctions.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Gadgets.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Gadgets.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ReplaceText.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Renameuser.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/UserMerge.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/UserMerge.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Nuke.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/OATHAuth.git


git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Widgets.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Widgets.git
Line 242: Line 361:


# skins
# skins
pushd "${docrootDir_hetzner2}/skins"
pushd "${mwDocroot}/skins"
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/CologneBlue.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/CologneBlue.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/Modern.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/Modern.git
Line 248: Line 367:
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/Vector.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/Vector.git
popd
popd
</pre>


# make the cache dir outside the docroot where Mediawiki will cache interface messages as files per Aaron Schulz's recommendations
==Step 6: Merge images==
mkdir /var/www/html/wiki.opensourceecology.org/cache


# set permissions
TODO: rsync images dir from old site into new $mwDocroot
chown -R not-apache:apache "${vhostDir_hetzner2}"
find "${vhostDir_hetzner2}" -type d -exec chmod 0050 {} \;
find "${vhostDir_hetzner2}" -type f -exec chmod 0040 {} \;


chown not-apache:apache-admins "${vhostDir_hetzner2}/LocalSettings.php"
==Step 7: Merge data==
chmod 0040 "${vhostDir_hetzner2}/LocalSettings.php"


[ -d "${docrootDir_hetzner2}/images" ] || mkdir "${docrootDir_hetzner2}/images"
TODO: import db dump from old site into new db
chown -R apache:apache "${docrootDir_hetzner2}/images"
find "${docrootDir_hetzner2}/images" -type f -exec chmod 0660 {} \;
find "${docrootDir_hetzner2}/images" -type d -exec chmod 0770 {} \;


chown -R apache:apache /var/www/html/wiki.opensourceecology.org/cache
==Step 8: Set Permissions==
chmod -R 0770 /var/www/html/wiki.opensourceecology.org/cache


# remove the block that attempts to fix bug 28235, as it breaks css
Once again set the permissions in the vhost dir by running the commands listed in the [[Mediawiki#Proper_File.2FDirectory_Ownership_.26_Permissions|Proper File/Directory Ownership & Permissions]] section.
vim "${docrootDir_hetzner2}/.htaccess"


==Step 9: Update script==
Run the mediawiki update script using the superusr db user/pass found in keepass (note: the superusr password is intentionally *not* stored on the server outside of keepass)
<pre>
# attempt to update
# attempt to update
pushd ${docrootDir_hetzner2}/maintenance
pushd ${mwDocroot}/maintenance
php update.php
php update.php --dbuser "${dbSuperUser}" --dbpass "${dbSuperPass}"
</pre>
 
==Step 10: Validate==
 
See [[Wiki Validation]]
 
==Revert==
 
TODO restore procedure
 
=CLI Guides=
 
This section will provide commands to achieve certain actions for managing Mediawiki
 
=Changes=
 
As of 2018-07, we have no ticket tracking or change control process. For now, everything is on the wiki as there's higher priorities. Hence, here's some articles used to track production changes:
 
# [[CHG-2018-05-22]] - migration of wiki from hetzner1 to hetzner2 by [[User:Maltfield|Michael Altfield]]


popd
</pre>


=See Also=
=See Also=
Line 283: Line 415:
# [[Web server configuration]]
# [[Web server configuration]]
# [[Wordpress]]
# [[Wordpress]]
# [[CHG-2018-05-22]]
# [[Wiki Validation]]
# [[Wiki instructions]]
# [[Wiki maintenance]]


[[Category: IT Infrastructure]]
[[Category: Software]]
[[Category: Software]]
[[Category: Wiki]]
=References=
{{Reflist}}

Latest revision as of 16:56, 18 May 2020

Extensions

This section will provide a list of the extensions used in our wiki

  1. CategoryTree
  2. ConfirmAccount
  3. Cite
  4. ParserFunctions
  5. Gadgets
  6. UserMerge
  7. OATHAuth
    1. note that this will be included with Mediawiki on the next update (to v1.31)
  8. Replace Text
    1. note that this will be included with Mediawiki on the next update (to v1.31)
  9. Renameuser
    1. note that this will be included with Mediawiki on the next update (to v1.31)

For a more accurate list of what this wiki is currently running, see Special:Version

Proposed

We may want to consider testing & adding these extensions in the future

  1. 3DAlloy for displaying intractable 3d models (ie: stl files) within our wiki directly with WebGL
  2. WikiEditor
  3. WikiSEO
  4. CookieWarning for GDPR

Special Pages

This section will link to useful Special pages

  1. Special:EmailUser Email users. Note that this does not show up on Special:SpecialPages, but if you go the user's page (ie: User:Maltfield) there's an "Email this user" link on the left-hand navigation panel.

Guides

This section will describe the process of routine actions needed for dealing with Mediawiki

Deleting Content by Request

Mediawiki intentionally makes permanently deleting content difficult. You may delete it from an article, but it will still appear in the revision history of the article.

To permanently delete PII at the request of a user, you must delete previous revisions using RevisionDelete

Deleting Users by Request

In fact, users cannot be safely deleted from Mediawiki without damaging the wiki [1][2]

Instead, if a user requests to be deleted from the wiki, we should do the following:

  1. Replace the email address associated with their account to something bogus, like 'no@example.com'. The user can do this themselves with Special:ChangeEmail page, but--as an Administrator--this must be done from command line.
pushd /var/www/html/wiki.opensourceecology.org/htdocs/maintenance
# example.com is actually a reserved domain name that cannot actually exist; we should be using it here
php resetUserEmail.php 'SomeUser' 'no@example.com'
popd
  1. Rename their username to something bogus, like deleteduser001
  2. Block the user account with 'indefinite' expiration and uncheck all the boxes.

Note that this is distinct from the process for blocking malicious or spamming users.

Tips

Here's some tips to help you coorelate data stored in our wiki db with a specific user.

Lookup user by email address

Mediawiki appears to intentionally not make it possible to lookup users by email address in the WUI as a protection of user's privacy[3][4]

Instead, lookups can be done by manually querying the database.

[root@opensourceecology ~]# cd /var/www/html/wiki.opensourceecology.org/
[root@opensourceecology wiki.opensourceecology.org]# grep wgDB LocalSettings.php
...
$wgDBname           = "OBFUSCATED_DB";
$wgDBuser           = "OBFUSCATED_USER";
$wgDBpassword       = "OBFUSCATED_PASSWORD";
$wgDBprefix         = "OBFUSCATED_PREFIX";
...
[root@opensourceecology wiki.opensourceecology.org]# mysql -u OBFUSCATED_USER -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 12037358
Server version: 5.5.56-MariaDB MariaDB Server

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use OBFUSCATED_DB;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [OBFUSCATED_DB]> select user_id,user_name FROM OBFUSCATED_PREFIX_user WHERE user_email = 'michael@opensourceecology.org';
+---------+-----------+
| user_id | user_name |
+---------+-----------+
|    3709 | Maltfield |
+---------+-----------+
1 row in set (0.00 sec)

MariaDB [OBFUSCATED_DB]> 

Find contributions from a given user

Visit Special:Contributions

Scaling

At the time of writing (2018), we host our wiki (as well as many other sites) on a single dedicated server with hetnzer (see OSE Server). However, looking forward, if we wish to scale-up mediawiki, we may need to pay for many distinct servers and hire a full-time sysadmin to deal with the corresponding complexity.

Comparison to Wikipedia

When we begin to compare the scalability of wikipedia, it's important to remember that their system is composed of many distinct servers. For example, they have:

  1. Load Balancers
  2. Nginx servers (ssl termination)
  3. Varnish front-end servers
  4. Varnish back-end servers
  5. Apache servers
  6. Memcached servers
  7. DB Master servers
  8. DB Slave servers
  9. Swift (Open Stack) object servers
  10. Kafka, logstash, grafana, etc servers

source: https://meta.wikimedia.org/wiki/Wikimedia_servers

2017-2018

In the Wikimedia Foundation's 2017-2018 [5] financial statement, they listed an expense of

  • $2.1 million in "Internet hosting"

It's important to note, however, that this line-item is obfuscated by the fact that WMF owns much of their equipment. If they were leasing server space (by the hour in a cloud or dedicated servers by the month, for example), then the cost would be much higher. In the same 2017-2018 financial statement, they also listed assets of:

  • $13.2 million of "computer equipment"
  • $0.9 million of "Capital lease computer equipment"

As well as

  • $1.5 million in "Purchase of computer equipment"

It is not trivial to see the current number of servers running wikipedia, but their most recent reported figures were 300 in Florida + 44 in Amsterdam from 2009 [6]

And in 2015, they list 520 servers in the main cluster (eqiad), but don't list other clusters[7]. The source listed here is ganglia, which is no longer online[8]. Wikipedia's Grafana (which _is_ online) doesn't list such numbers[9].

2012

Regarding object size, the swift (open stack) media storage for wikipedia was 20T as of 2012 [10]

2009-2010

Looking back at 2009 when we know there were 344 servers, the financial report from 2009-2010 shows an expense of $1,067 on "internet hosting." The ownsership of computer equipment is not explicitly broken-down here, but their entire total assets are $15.4k in this year. [11]

In this year (2009-2010), wikipedia had 12 billion average monthly page views.

LocalSettings.php

This section will describe some of our decisions in configuring Mediawiki via LocalSettings.php

$maxUploadSize

As of 2018, we set the maximum upload size to 1M. Prior to the wiki migration to hetzner2 on 2018-05-24, there was no limit. The result: people were casually dropping unnecessarily large (ie: >2M images) into articles. The result: our wiki was growing at an unsustainable rate.

A note on growth: yes, mediawiki scales. Yes, wikipedia doesn't need to implement such caps. But we currently don't have a defined budget for IT while wikipedia spends literally millions of dollars per year on their infrastructure[12] and owns over $13 million in "computer equipment" [13]

For more information on Wikipedia's scaling, see Mediawiki#Comparison_to_Wikipedia.

By comparison, in 2018, OSE has literally 1 server. We don't even have a budget for a development server or a paid sysadmin.

So, yes, the OSE wiki can certainly scale. But the complexity grows significantly as it does scale. So until we're ready to handle that growth (ie: budget for $100k-$1m per year for server and salary expenses), we should keep our footprint as reasonably small as possible.

That said, the current obvious expense that grows with the growth of our wiki is our backups (a 20G mediawiki quickly becomes much, much larger once you consider a few copies of daily backups and several copies of monthly backups encrypted & shipped off-site to some durable, geographically distinct location). As of 2018-06, we're spending about $100/year on ~1T of storage on backups split between Amazon Glacier (for our long-storage monthlys) and Amazon S3 (for our daily backups).

Therefore, it's best to cap uploads to 1M. Files larger than this can be stored in github.com and/or archive.org, then simply linked-to from within our wiki.

For information on how to do a batch resize of images prior to uploading them to the wiki, see Batch Resize. I recommend 1024x768.

Proper File/Directory Ownership & Permissions

This section will describe how the file permissions should be set on an OSE mediawiki site.

For the purposes of this documentation, let's assume:

  1. vhost dir = /var/www/html/wiki.opensourceecology.org
  2. mediawiki docroot = /var/www/html/wiki.opensourceecology.org/htdocs

Then the ideal permissions are:

  1. Files containing passwords (ie: LocalSettings.php) should be located outside the wiki docroot [14] with not-apache:apache-admins 0040
  2. Files in the 'images/' dir should be apache:apache 0660
  3. Directories in the 'images/' dir should be apache:apache 0770
  4. Files in the 'cache/' dir (outside the docroot) should be apache:apache 0660
  5. Directories in the 'cache/' dir (outside the docroot) should be apache:apache 0770
  6. All other files in the vhost dir should be not-apache:apache 0040
  7. All other directories in the vhost dir should be not-apache:apache 0050

This is achievable with the following idempotent commands:

vhostDir="/var/www/html/wiki.opensourceecology.org"
mwDocroot="${vhostDir}/htdocs"

chown -R not-apache:apache "${vhostDir}"
find "${vhostDir}" -type d -exec chmod 0050 {} \;
find "${vhostDir}" -type f -exec chmod 0040 {} \;

chown not-apache:apache-admins "${vhostDir}/LocalSettings.php"
chmod 0040 "${vhostDir}/LocalSettings.php"

[ -d "${mwDocroot}/images" ] || mkdir "${mwDocroot}/images"
chown -R apache:apache "${mwDocroot}/images"
find "${mwDocroot}/images" -type f -exec chmod 0660 {} \;
find "${mwDocroot}/images" -type d -exec chmod 0770 {} \;

[ -d "${vhostDir}/cache" ] || mkdir "${vhostDir}/cache"
chown -R apache:apache "${vhostDir}/cache"
find "${vhostDir}/cache" -type f -exec chmod 0660 {} \;
find "${vhostDir}/cache" -type d -exec chmod 0770 {} \;

Such that:

  1. the 'not-apache' user is a new user that doesn't run any software (ie: a daemon such as a web server) and whose shell is "/sbin/nologin" and home is "/dev/null".
  2. the apache user is in the apache-admins group
  3. the apache user is in the apache group
  4. any human users that need read-only access to the mediawiki vhost files for debugging purposes and/or write access to the 'images/' directory (ie: to upload large files that are too large to be handled by the web servers chain), then that user should be added to the 'apache' group
  5. any human users that need read-only access to the mediawiki vhost files, including config files containing passwords (ie: LocalSettings.php), should be added to the 'apache-admins' group
  6. for anyone to make changes to any files in the docroot (other than 'images/'), they must be the root user. I think this is fair if they don't have the skills necessary to become root, they probably shouldn't modify the mediawiki core files anyway.

Why?

The following explains why the above permissions are ideal:

  1. All of the files & directories that don't need write permissions should not have write permissions. That's every file in a mediawiki docroot except the folder "images/" and its subfiles/dirs.
  2. World permissions (not-user && not-group) for all files & directories inside the docroot (and including the docroot dir itself!) should be set to 0 for all files & all directories.
  3. Excluding 'images/', these files should also not be owned by the user that runs a webserver (in cent, that's the 'apache' user). For even if the file is set to '0400', but it's owned by the 'apache' user, the 'apache' user can ignore the permissions & write to it anyway. We don't want the apache user (which runs the apache process) to be able to modify files. If it could, then a compromised webserver could modify a php file and effectively do a remote code execution.
  4. Excluding 'images/', all directories in the docroot (including the docroot dir itself!) should be owned by a group that contains the user that runs our webserver (in cent, that's the apache user). The permissions for this group must be not include write access for files or directories. For even if a file is set to '0040', but the containing directory is '0060', any user in the group that owns the directory can delete the existing file and replace it with a new file, effectively ignoring the read-only permission set for the file.

For more information, see the official mediawikiwiki:Manual:Security guide from Mediawiki

Updating Mediawiki

First of all, it is not uncommon for an attempt to update mediawiki to result in an entirely broken site. If you do not have linux and bash literacy, do not attempt to update mediawiki. Moreover, you should be well-versed in how to work with mysqldump, tar, rsync, chmod, chown, & sudo. If you are not confident in how all of these commands work, do not proceed. Hire someone with sysops experience to follow this guide; it should take them less than a couple hours to update and/or revert if the update fails.

Note that you certainly want to do this on a staging site & thoroughly test it (follow the Wiki Validation document as a guide) before proceeding with production. There almost certainly will be issues.

Step 0: Trigger Backup Scripts for System-Wide backup

For good measure, trigger a backup of the entire system's database & files:

sudo su -
sudo time /bin/nice /root/backups/backup.sh &>> /var/log/backups/backup.log

When finished, SSH into the dreamhost server to verify that the whole system backup was successful before proceeding

source /root/backups/backup.settings
ssh $RSYNC_USER@$RSYNC_HOST 'du -sh backups/hetzner2/*'

Step 1: Set variables

Type these commands to set some variables, which will be used by the commands in the sections below. Carefully review the contents of each variable before proceeding; many may need updating.

export vhostDir=/var/www/html/wiki.opensourceecology.org
export mwDocroot="${vhostDir}/htdocs"

# get this from keepass
export dbSuperUser=CHANGEME
export dbSuperPass=CHANGEME

stamp=`date +%Y%m%d_%T`
mwUpgradeTmpDir="/var/tmp/mwUpgrade.${stamp}"

# set this to the latest stable version of mediawiki
export newMediawikiSourceUrl='https://releases.wikimedia.org/mediawiki/1.30/mediawiki-1.30.0.tar.gz'

Step 2: Make Vhost-specific backups

The backups made in the previous step are huge. Because it's easier to work with vhost-specific backups, let's make a redundant copy available in /var/tmp/:

sudo su -

dbName=osewiki_db
dbUser=osewiki_user
 dbPass=CHANGEME
 rootDbPass=CHANGEME

stamp=`date +%Y%m%d_%T`
tmpDir=/var/tmp/dbChange.$stamp
mkdir $tmpDir
chown root:root $tmpDir
chmod 0700 $tmpDir
pushd $tmpDir
service httpd stop

# create backup of all DBs for good measure
 time nice mysqldump -uroot -p$rootDbPass --all-databases | gzip -c > preBackup.all_databases.$stamp.sql.gz

# dump wp DB contents
 time nice mysqldump -u$dbUser -p$dbPass --database $dbName > $dbName.$stamp.sql

# files backup
rsync -av --progress "${vhostDir}" "./vhostDir.${stamp}.bak/"

Step 3: Permissions

Set the permissions in the vhost dir by running the commands listed in the Proper File/Directory Ownership & Permissions section.

Step 4: Download Latest Mediawiki Core

# download mediawiki core source code (note this must be done instead of using
# git since [a] git does not include the vendor dir contents and [b] we cannot
# use Composer since it would require breaking our hardened php.ini config

# first, do some string analysis to determine the file, version, and branch name
mwSourceFile=`basename "${newMediawikiSourceUrl}"`
mwReleaseName=`echo "${mwSourceFile}" | awk -F'mediawiki-' '{print $2}' | awk -F. '{print "REL" $1 "_" $2}'`
mwExtractedDir="`echo $mwSourceFile | awk -F'.tar.gz' '{print $1}'`"


pushd "${mwUpgradeTmpDir}"
wget "${newMediawikiSourceUrl}"
tar -xzvf "${mwSourceFile}"
mkdir "${mwDocroot}"
rsync -av --progress "${mwExtractedDir}/" "${mwDocroot}/"

Step 5: Extensions & Skins

Run the following commands to get your Extensions & Skins from git

# extensions
pushd "${mwDocroot}/extensions"
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/CategoryTree.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ConfirmAccount.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Cite.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/ParserFunctions.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Gadgets.git
git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/UserMerge.git

git clone -b "${mwReleaseName}" https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Widgets.git
pushd Widgets
git submodule init
git submodule update
popd

# skins
pushd "${mwDocroot}/skins"
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/CologneBlue.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/Modern.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/MonoBook.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/skins/Vector.git
popd

Step 6: Merge images

TODO: rsync images dir from old site into new $mwDocroot

Step 7: Merge data

TODO: import db dump from old site into new db

Step 8: Set Permissions

Once again set the permissions in the vhost dir by running the commands listed in the Proper File/Directory Ownership & Permissions section.

Step 9: Update script

Run the mediawiki update script using the superusr db user/pass found in keepass (note: the superusr password is intentionally *not* stored on the server outside of keepass)

# attempt to update
pushd ${mwDocroot}/maintenance
php update.php --dbuser "${dbSuperUser}" --dbpass "${dbSuperPass}"

Step 10: Validate

See Wiki Validation

Revert

TODO restore procedure

CLI Guides

This section will provide commands to achieve certain actions for managing Mediawiki

Changes

As of 2018-07, we have no ticket tracking or change control process. For now, everything is on the wiki as there's higher priorities. Hence, here's some articles used to track production changes:

  1. CHG-2018-05-22 - migration of wiki from hetzner1 to hetzner2 by Michael Altfield


See Also

  1. OSE Server
  2. 2FA
  3. Web server configuration
  4. Wordpress
  5. CHG-2018-05-22
  6. Wiki Validation
  7. Wiki instructions
  8. Wiki maintenance

References