Mobile app version of vmapp.org
Login or Join
Sent6035632

: How to get search engines to properly index an ajax driven search page I have an ajax-driven search page that will allow users to search through a large collection of records. Each search result

@Sent6035632

Posted in: #Ajax #Html #RobotsTxt #SearchEngines #Seo

I have an ajax-driven search page that will allow users to search through a large collection of records. Each search result points to index.php?id=xyz (where xyz is the id of the record). The initial view does not have any records listed, and there is no interface that allows you to browse through all records. You can only conduct a search.

How do I build the page so that spiders can crawl each record? Or is there another way (outside of this specific search page) that will allow me to point spiders to a list of all records.

FYI, the collection is rather large, so dumping links to every record in a single request is not a workable solution. Outputting the records must be done in multiple requests.

Each record can be viewed via a single page (eg "record.php?id=xyz"). I would like all the records indexed without anything indexed from the sitemap that shows where the records exist, for example:

<a href="/result.php?id=record1">Record 1</a>
<a href="/result.php?id=record2">Record 2</a>
<a href="/result.php?id=record3">Record 3</a>

<a href="/seo.php?page=2">next</a>


Assuming this is the correct approach, I have these questions:


How would the search engines find the crawl page?
Is it possible to prevent the search engines from indexing the words "Record 1", etc. and "next"? Can I output only the links? Or maybe something like:  

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Sent6035632

2 Comments

Sorted by latest first Latest Oldest Best

 

@Candy875

My current strategy is:


The initial view is a paged list of all items, starting with page 1.
The links to each page re-load the entire page, specifying a page number.
Using javascript, I tweak the page list so that clicking on a page triggers an ajax load instead of a full page refresh.


Items 1 and 2 will allow search engines to crawl the entire collection of records, and item 3 will only work for people who have javascript enabled (who doesn't anymore?!)

But to do this right, I should be changing the document hash so that users can copy the URL and link directly to any page.

10% popularity Vote Up Vote Down


 

@Mendez628

You may use mod_rewrite to haver better url's, that would masquerade what you want.

But the main problem here is that you say that you don't have links or a way to discover your content except for the search box.

Even assuming that your users know exactly what to look for, that is still a poor way to provide content, there may be a lot of things that may be useful or interesting but that nobody will ever discover because there is no way to get there.

A crawler is like a user, it will get to the first page it finds, whichever it is, and will follow links around. Obviously they have algorithms to check for alternatives, plus they may get links from other sites that point to specific records in your collection and then index also those, but still, is going to be hard for anybody, human or machine.

One thing you can do, is provide lists or indexes, may be not for all the content, but grouped by categories. There has to be ways to group your content and you can post those categories/groups. You can also provide related links in those categories or in any result, that will also help to index properly all your site/records.

UPDATE
Considering your last comment, having related posts in each page will be the best way to go. You will have to work an algorithm that really links to useful subjects, but if they are really useful, then not just the bots, but the users will follow them, and that will provide external links and references back to your site, improving each page relevance and making it more visited. If your related system also considers some rotation of those related links, it will be even better.

An example will be like so, page A talks about a subject, and your related column shows ten results, when I visit that page today, I get results 1 - 10, but if I visit the page tomorrow, I get results 1,2,3 and 11-17. The idea is that some results may be closer than others, so the closest ones, should always be there, but the less related, can be rotated with other of same relevance, that helps the visitors and help you because when the bots comes back, it will get different links to follow.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme