Mobile app version of vmapp.org
Login or Join
Samaraweera270

: No description for any page on the website is available in Google despite robots.txt allowing crawling I seem to have the weirdest issue with Search Engine Optimization, and I asked the IT folks

@Samaraweera270

Posted in: #Google #GoogleSearch #Indexing #Joomla #RobotsTxt

I seem to have the weirdest issue with Search Engine Optimization, and I asked the IT folks at my university, I asked people on Joomla forums and I am trying to sort this issue out using Google Webmaster Tools for more than 2 months to little avail. I want to know if I have some blatantly wrong configuration somewhere that is causing search engines to be unable to index this site. I noticed a similar issue with another website I searched for online (ECEGSA - The University of British Columbia at gsa.ece.ubc.ca), making me believe this might be a concern that people might be looking an answer for.

Here are the details:
The website in question is: gsa.ece.umd.edu/. It runs using Joomla 2.5.x (latest). The site was up since around mid December of 2013, and I noticed right from the get go that the site was not being indexed correctly on Google. Specifically I see the following message when I search for the website on Google:

A description for this result is not available because of this site's robots.txt – learn more.


The thing is in December till around March I used the default Joomla robots.txt file which is:

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/


Nothing there should stop Google from searching my website. And even more confusingly, when I go to Google Webmaster tools, under "Blocked URLs" tab, when I try many of the links on the site, they are all shown up as "Allowed". I then tried adding a sitemap, putting it in the robots.txt file. That did not help. Same exact search result, same behavior in the "Blocked URLs" tab on the webmaster tools. Now additionally, the "sitemaps" tab says for several links an error saying "URL is robotted out". I tried those exact links in the "Blocked URLs" and they are allowed!

I then tried deleting the robots.txt file. No use. Same exact problem.

Here is an example screenshot from Google's Webmaster Tools:


At this point I cannot give a rational explanation to why this is happening and neither can anyone in the IT department here. No one on Joomla forums can seem to understand what is going on.

Based on what I explained, does it seem that I have somehow set a setting in the robots.txt or in .htaccess or somewhere else, incorrectly?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Samaraweera270

2 Comments

Sorted by latest first Latest Oldest Best

 

@Ravi8258870

Thank you Stephen, for your help. Your observation was correct and indeed it was due to the 303 Redirect that the search engine indexing was not working correctly.

I want to add to this discussion and point people to my answer on, Joomla Stackexchange: Bad configuration of Externallogin extension causing search engine indexing problem, which clearly explains what the configuration error is with regard to the plugin externallogin that will cause this issue, so that others who may encounter this issue know how to fix it.

FIX
In short, to fix this issue, if you ever encounter it, you must correct the configuration of the plugin as follows:
1. In the extensions menu, under "External Login>", go to the server configuration.
2. In the connections tab, for "Automatic Login/Logout", choose 'No'.

The error occurs if you choose 'Yes' in step 2, since in this case, the extension automatically adds a 303 redirect to all pages of your website, to check whether a user has been logged in to the CAS in the browser session, and if so it automatically logs you into the site. This feature will cause search engine indexing problems.

10% popularity Vote Up Vote Down


 

@Heady270

You are 303 redirecting to a login page which then redirects back:

$ curl --head gsa.ece.umd.edu/ HTTP/1.1 303 See other
Date: Sun, 25 May 2014 10:45:59 GMT
Server: Apache/2.2.25 (Unix) mod_ssl/2.2.25 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4
X-Powered-By: PHP/5.4.21
Set-Cookie: c32b08af5ad5c16062381828f6a1b64e=6d66a9870a7067c444930d8deb190cd9; path=/
Location: login.umd.edu/cas/login?service=http%3A%2F%2Fgsa.ece.umd.edu%2F%3Fserver%3D1&gateway=true Content-Type: text/html; charset=utf-8


The login page to which you are redirecting is blocked by robots.txt:

$ curl -s login.umd.edu/robots.txt User-agent: *
Disallow: /cas/

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme