Mobile app version of vmapp.org
Login or Join
Murray155

: How to make Google index files retrieved from database? We use Joomla with Remository to store and manage publications (don't ask me why). Files (PDF) are stored in a database and can be accessed

@Murray155

Posted in: #Dynamic #Indexing #Links #Pdf #Seo

We use Joomla with Remository to store and manage publications (don't ask me why). Files (PDF) are stored in a database and can be accessed via dynamic, rewritten links of the form
domain.de/some/path/filename.html

Here is an example: some file

Current browsers reliably detect that they get a PDF. wget uses the .html filename but after renaming I get a working PDF file. curl behaves similarly; piping its output into a (suitably named) files gives a working file. All this leads me to believe that -- against all odds, one might say -- the data our system provides is generally valid and understandable for clients.

However, Google does not seem to index PDF files referenced by such links. Our publication list is indexed, but the PDFs linked there are not (they don't show up in web and Scholar searches).

How can we tell search robots to retrieve our files and index them?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Murray155

1 Comments

Sorted by latest first Latest Oldest Best

 

@Odierno851

You cannot tell them but give them a strong hint by providing a sitemap. Google may or may not index those these even with a sitemap. It will tell you how many of the sitemap files were indexed. You need a Google Webmaster Tools account and register your website with them. Once done, sitemap submissions and index status appears the reports.

From a search engine's perspective it really does not matter where the data comes from, only that it is accessible. You may be doing something fancy that Google does not like but it is not the fact your documents are in the database.

From the link you provided, I see something automatically trying to download when clicking on your links which may count as an unwanted drive-by download, so be careful and is really a poor user experience. If the link is meant to be a download, then there are pages too many. Check your mime-types too as they may simply be confusing the Google crawler.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme