: Mass deletion of spam revisions in Mediawiki Basically my 'private' mediawiki instance was about as secure as a toddlers piggybank. I've tightened it up now, but am left with about a hundred
Basically my 'private' mediawiki instance was about as secure as a toddlers piggybank. I've tightened it up now, but am left with about a hundred or so new pages and revisions generated by hundreds of randomly generated users.
2 part question;
Is there a way to delete all orphaned pages?
Can I say to roll back all revisions NOT made by a particular user (me)?
More posts by @Turnbaugh106
7 Comments
Sorted by latest first Latest Oldest Best
I strongly recommend not to mess with MediaWiki's SQL! MediaWiki is a complex beast, very optimized for Wikipedia. There are some weird things going on in SQL and if you simply DELETE rows things might loose consistency.
If you have some programming skills, go through the API. Pywikibot is a good choice.
Otherwise, check the tools in the maintenance/ directory. You could try my own tool, mewsh to help with that (and I just added "anti-spam tools" as a todo there).
If it's only one hundred spammy pages you're not doing too badly. I had to clean up a wiki which had thousands of spammed pages. I came across some good tips by User:Halz on this page: www.mediawiki.org/wiki/User:Halz/Mass_despamming including a breakdown of the limitations of the various tools.
At the bottom he's provided a useful SQL query which runs a bit slowly but helps you find pages which are most likely spam, particularly if you can identify the time period when the wiki got taken over by spammers. Halz also has a hacked version of Extension:Nuke which presents these kinds of query-able parameters for easy mass-deletion. He gave me a copy to use, but I don't think he's published it.
I took over an installation and found over 47,000 spam entries in the user table and almost 900,000 spam externallinks. I used Sequel Pro and visited each table and deleted entries not made by authentic users. I found spam in externallinks,page,searchindex,user,watchlist. It was fairly time-efficient; the bulk of my time was waiting for delete queries to run. I was lucky because most of the authentic edits happened early in the order of things.
If you don't want to use the export-and-reinstall method suggested by danlefree, you might also find the Nuke extension useful. Once installed, visiting the special page Special:Nuke as an administrator gives you a form like this:
There are also several built-in MediaWiki maintenance scripts that could be useful, including:
cleanupSpam.php, which can be used to rollback and/or delete all revisions containing a link to a particular hostname,
deleteBatch.php, which can be used to delete all pages listed in a file, and
rollbackEdits.php (which doesn't currently seem to have proper on-wiki documentation), which can be used to roll back all edits of a specified user.
Spam cleanup using direct database access
It's also be possible to do what you want by directly manipulating the database. There details can vary a bit depending on your situation, but the basic steps would go something like this:
Set your wiki to read-only mode. You do not want someone to try editing the wiki while you're messing with the database.
Make a backup of your wiki. (This is highly recommended before any irreversible mass deletions anyway.)
Delete all user accounts created by the spammers. If, as in the question above, you were the only valid user, you can just do:
DELETE FROM user WHERE user_id != YOUR_USER_ID;
Alternatively, if no new valid accounts were created after the spammers discovered the wiki, you can find the highest valid user ID number and do:
DELETE FROM user WHERE user_id > LAST_VALID_USER_ID;
Or you can use an admin tool like phpMyAdmin to manually pick out the valid accounts and delete the rest.
Clean up the extra data associated with the deleted accounts. This is not strictly necessary, but those orphaned records have no use and will just clutter your database if you don't delete them:
DELETE FROM user_groups WHERE ug_user NOT IN (SELECT user_id FROM user);
DELETE FROM user_properties WHERE up_user NOT IN (SELECT user_id FROM user);
DELETE FROM user_newtalk WHERE user_id NOT IN (SELECT user_id FROM user);
Delete any revisions not made by a valid user:
This is the big step; everything before it was preparation, everything after it is cleanup. With all the spam accounts deleted, you can simply do:
DELETE FROM revision WHERE rev_user > 0 AND rev_user NOT IN (SELECT user_id FROM user);
If your wiki had anonymous editing disabled (which I strongly recommend for private / test wikis), the query above should be enough to get rid of all the spam revisions. If you had anon editing enabled, though, you'll have to nuke the anonymous spam separately.
If you're sure that all anon edits on your wiki are spam, the only edits made by UID 0 that we may need to preserve are those made by MediaWiki itself (such as pages imported from outside the wiki). In that case, something like the following query should work:
DELETE FROM revision WHERE rev_user = 0 AND rev_user_text BETWEEN '1' AND '999';
This will delete any revisions by UID 0 where the username looks (vaguely) like an IPv4 address; that is, it starts with a digit between 1 and 9.
If your wiki has some actual legitimate anon edits, you may have to get a bit more creative. If the number of IP addresses used by legitimate unregistered editors is limited, you can just add a clause like AND rev_user_text NOT IN ('1.2.3.4', '5.6.7.8', '9.10.11.12') to the query above to exclude contributions by those IPs from deletion. You can also add conditions like, say, AND rev_user_text NOT LIKE '192.168.%' to save all edits from IP addresses beginning with a particular prefix.
The queries above will get rid of the spam revisions (although their content will still remain in the text table), but will leave the page_latest field of any affected pages pointing to a nonexistent revision. This could cause confusion, so we'd better fix it.
First, we need to wipe out the page_latest column for all pages:
UPDATE page SET page_latest = 0;
Next, we'll rebuild the column, either by running the attachLatest.php maintenance script (recommended; remember to use the --fix parameter so that the script actually changes the database) or with a manual SQL query:
UPDATE page SET page_latest =
(SELECT MAX(rev_id) FROM revision WHERE rev_page = page_id);
Finally, we'll delete all pages for which no valid revisions could be found (because they were created by spammers, and never had any valid content):
DELETE FROM page WHERE page_latest = 0;
For a final touch, rebuild the links, text index and recent changes tables by running the rebuildall.php maintenance script. You may also want to remove the content of the deleted spam revisions from the database, so that they won't take up unnecessary space there, by running the purgeOldText.php maintenance script.
Once that's all done, check that everything looks good, and if so, turn off read-only mode — hopefully after installing some anti-spam features to keep the problem from reoccurring.
For small wikis, I highly recommend the QuestyCaptcha extension, which allows you to configure a simple custom text-based CAPTCHA. The trick is that, with every wiki having its own set of questions, programming a spambot to answer them correctly would be a lot of work for very little gain. I installed it on my own wiki after getting hit by XRumer a couple of times, and have seen no spam ever since.
Ps. I have used these instructions to nuke about 35,000 spam revisions created by equally many users from a small wiki. Everything went fine. In this particular case, the wiki (fortunately!) did not allow anonymous editing, and almost all of the legitimate users were created before the spammers found the wiki, so I could fairly easily first delete all the spam accounts, and then all the revisions they'd created. (I did accidentally delete one legitimate account at first, so I had to restore from backup and redo the process more carefully.) I've updated the instructions above to better reflect what I actually ended up doing, and to be a bit more generic.
The easiest way to handle this situation is to install extension DeleteBatch. Use Special:AllPages on your wiki to get a script file of the page names you want deleted, and load it into Special:DeleteBatch.
In theory, you could write a MediaWiki extension to do whatever you like to a MediaWiki instance, including to do the things you mentioned.
Short of that, and short of the "nuke'n'pave" suggested by danlefree, you might find the User Merge and Delete extension useful: you can use it to consolidate multiple spambot accounts into a single account whose edits can then be addressed more easily.
The easiest way to handle this situation (if you don't mind a nuke'n'pave) would be to export all wiki pages created or edited by your username, reinstall the wiki, and import the export file you'd generated.
"Reinstall" in this context would mean:
Export articles created by you (presumably logged in as the WikiSysop user or similar)
Drop the MW database
Create an empty MW database
Copy your LocalSettings.php file to a safe location
Re-upload the /config/ directory
Run the installation process on the new MW database (note that you will want to re-create your old admin user)
Delete the /config/ directory and move your old LocalSettings.php file back to the MW root
Import the file created at Step #1
Edit: You may want to pull down a database backup (including spam revisions) in case you encounter any problems with this process or would like to experiment with alternate ways to purge the spam.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.