Mobile app version of vmapp.org
Login or Join
Holmes151

: Google not indexing my Mediawiki's jpg files description pages My website es.wikineos.com has been running for a few years with the MediaWiki Content Management System like the one they use at

@Holmes151

Posted in: #Google #Googlebot #GoogleIndex #Indexing #Mediawiki

My website es.wikineos.com has been running for a few years with the MediaWiki Content Management System like the one they use at Wikipedia.

Google seems to be indexing my site correctly at a quick glance but after investigating further I have discovered that Google is not indexing URLS ending with image based file extensions such as jpg, png and gif etc.

MediaWiki creates file description pages with the following format:

domain.com/wiki/Archivo:*.*name.description

Which will look something like:

domain.com/wiki/Archivo:image.jpg

The site does not have any metas with with no-index nor does it have any blocked URLS in the robots.txt file. Google Webmaster Tools does not report any errors and this seems to be only happening on image based URLS. Google is also indexing SVG and PDF files correctly.

If you Google: site:http://es.wikineos.com + url:jpg vs site:en.wikipedia.org + url:jpg you can clearly see that my site is not indexing jpegs while WikiPedia is. How can I get Google to index these urls.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Holmes151

1 Comments

Sorted by latest first Latest Oldest Best

 

@Turnbaugh106

Generally Google does not like indexing URLs ending with certain file extensions, especially those ending with .JPG... regardless of the received header response. Typically you should avoid using file related extensions in the URL. The likes of Wikipedia is one of the biggest and busiest sites around and they most likely get special treatment, and wouldn't be surprised if the bot they receive is a dedicated one that only scans Wikipedia and is heavily optimized just for them.

In regards of this issue there's a few reports around the net with other people having similar trouble, your best bet would be to simply rewrite the file extension to something more friendly on the index front.

One user that reported a fix was simply adding a tailing slash to the URLS but this will affect all URLS:


SOURCE

To configure your wiki that way is quite simple. I give the description here for the standard way to rewrite MediaWiki URLs using mod_rewrite - if you used another technique, you have to adapt it to that tool.

In the file "LocalSetting.php" in your wiki directory there is a
variable $wgArticlePath that you have to adapt. I added the slash
there, so set it to:

$wgArticlePath = "/wiki//";


That makes MediaWiki format all URLs correctly with the additional
slash. Obviously, you also have to modify the mod_rewrite code, that
resides in the .htaccess file:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^wiki/(.*)/ w/index.php?title= [PT,L,QSA]
RewriteRule ^wiki/*$ wiki/Main_Page [L,QSA]
RewriteRule ^/*$ wiki/Main_Page [L,QSA]


Additionally, I added a few lines to rewrite the old URLs (without the
slash), so that older links to my site don't end in nirvana but are
forwarded to the new location. Acutally, I use a 301 http redirect to
the new site, so I don't lose any link power. The code is not perfect,
but works in most cases (it does not work when a slash is included in
the article name itself):

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^wiki/(.*)$ www.domain.com/wiki// [R=301,L]


Then, your site will be index by google as well. Have fun!


Searching the net simply for the term "Powered by MediaWiki" I can see many other MediaWiki's operating with this problem.

EG.

chrishecker.com/File:Chris_Hecker%27s_Home_Page_%2820070221%29.png http://wiki.c2b2.columbia.edu/califanolab/index.php/File:Header.png


Also it would seem that not even WikiMedia have there images indexed either > site:http://www.mediawiki.org/ + url:.jpg. It's also worth mentioning that Google will consider these pages low value so why would you want them to be indexed? chances are they will never be found in any case.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme