Mobile app version of vmapp.org
Login or Join
Heady270

: How to dump a MediaWiki for offline use? I would like to be able to make an offline version of a MediaWiki site on a weekly basis. The DumpHTML extension actually does what I want, as it

@Heady270

Posted in: #Mediawiki

I would like to be able to make an offline version of a MediaWiki site on a weekly basis.

The DumpHTML extension actually does what I want, as it dumps all articles and media files, but I can't see any index of all the articles it have dumped, so I can't navigate in the dump.

Reading about the XML dump feature MediaWiki have, I wonder if it would be possible to either use a program to view these files or perhaps convert them to html?

Or are there other ways to make an offline version of a MediaWiki site?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Heady270

2 Comments

Sorted by latest first Latest Oldest Best

 

@Kimberly868

You can take the -pages-articles.xml.bz2 from the Wikimedia dumps site and process them with WikiTaxi (download in upper left corner). Wikitaxi Import tool will create a .taxi(around 15Gb for Wikipedia) file out of the .bz2 file. That file will be used by WikiTaxi program to search through articles. The experience is very similar to the browser experience.

Or you can use Kiwix, faster to set up because it also provides the already processed dumps (.zim files).

Taking Wikimedia stuff with wget is not good practice. If too many people would do that it can flood the sites with requests.


Later edit for the case you want also the images offline:

XOWA Project

If you want a complete mirror of Wikipedia (including images) full HTML formatting intact that will download in aprox 30 hours, you should use:

English Wikipedia has a lot of data. There are 13.9+ million pages with 20.0+ GB of text, as well as 3.7+ million thumbnails.

XOWA:


Setting all this up on your computer will not be a quick process...
The import itself will require 80GB of disk space and five hours
processing time for the text version. If you want images as well, the
numbers increase to 100GB of disk space and 30 hours of processing
time. However, when you are done, you will have a complete, recent
copy of English Wikipedia with images that can fit on a 128GB SD card.


But the offline version is very much like the online version, includes photos etc:
(I tested the bellow article completely offline)




Later edit if none of the above apply:

If the wiki is not part of Wikimedia or doesn't have a dump there is a project on github that downloads that wiki using its API:

WikiTeam - We archive wikis, from Wikipedia to tiniest wikis

10% popularity Vote Up Vote Down


 

@Goswami781

You could use a webcrawler tool which will save the site as HTML files. All the links will be converted, so you can open the main page, say, and then click on links and get to all the site.

There are a number of these tools available. I use wget, which is command line based and has thousands of options, so not very friendly. However it is quite powerful.

For example, here is the command line I used to dump my own mediawiki site. I suggest you understand each option though before using it yourself:

"c:program fileswgetwget" -k -p -r -R '*Special*' -R '*Help*' -E example.com/wiki

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme