Mobile app version of vmapp.org
Login or Join
Looi9037786

: How do I get google to prefer one branch/directory of my site over another? I write a software library that maintains "latest" and "beta" branches: gojs.net/latest gojs.net/beta Ideally all of

@Looi9037786

Posted in: #Google #Googlebot #RobotsTxt #Search #Seo

I write a software library that maintains "latest" and "beta" branches:

gojs.net/latest

gojs.net/beta


Ideally all of this content would be available via Google search, so I'm reluctant to disallow indexing of the beta branch. However, many of the pages are the same or very similar, such as API pages

However, searching Google for a query such as "site:gojs.net flowchart" returns results from the /beta directory for several pages, and few/none from /latest

Is there a way to get Google and other web crawlers to prefer/recognize /latest as the canonical directory, but still index /beta?

Or If I can't reasonably do that, then should I disallow indexing of /beta to ensure that users from organic google search land at a more appropriate place?

10.05% popularity Vote Up Vote Down


Login to follow query

More posts by @Looi9037786

5 Comments

Sorted by latest first Latest Oldest Best

 

@Speyer207

You have a few options here.


Add a canonical URL to those pages in the 'beta' version of the site where the content did not change and the canonical URL being the corresponding page in the 'latest' version of the website. This would give a sign to Google as to which one to prefer while listing.
In your XML sitemap,give a higher priority to the ones in the 'latest' version of the site.
Give the original source meta tag as a URL to all the pages in the 'beta' version of the site, with the original source being the 'latest' version. This can be absolute URL or just a part of the domain name. So you can just add 'gojs.net/latest' to all the pages in 'gojs.net/beta' as an original source meta tag.
And of course, your internal linking should prefer 'latest' version of the site. All the backlinks you build in the future should also point to 'latest' version.


All the above will give Google a strong signal as to which URL to list. Do not give a 'noindex' header to the pages. This would prevent the pages from getting indexed and they will eventually disappear from search results. This can give undesirable results when the search terms are competitive. If the corresponding pages on the 'latest' version does not rank as good as the ones in stable (for some reason), then by adding a 'noindex' you are throwing away some search engine traffic.

While Google index a website, it groups all the pages with similar content and pick one URL out to add to the index. Which URL to pick depends on some strong signals the site provide. I have listed some of them above.
googlewebmastercentral.blogspot.in/2008/09/demystifying-duplicate-content-penalty.html
Even after this, some users might end up in the beta version. You can point them to the latest version of the website by showing them a link in an eye catching fashion with a request to go to that page. Hope this answers some queries.

10% popularity Vote Up Vote Down


 

@Sue5673885

Is there a way to get Google and other web crawlers to prefer/recognize
/latest as the canonical directory, but still index /beta?


No. But, you can set a canonical link in some of the /beta pages to the corresponding /latests page. And Google may or may not decide to follow your recommendation and display the /latest page over the /beta page.

Or, you could set some of your pages in /beta as noindex which would leave Google with the option of indexing the corresponding page in /latest. Unfortunately, this cannot be set at a directory level.

If some /beta pages are left without a canonical link, Google may treat them as separate pages.

Or If I can't reasonably do that, then should I disallow indexing of /beta
to ensure that users from organic google search land at a more appropriate
place?


If beta means content is not really ready or it is work in progress, then this would be the better choice. Google wants quality and it does not like duplicates or near duplicates.

From a SEO perspective, it is better to be patient and to expose only quality work. A lot of near duplicate content is a bad signal.

10% popularity Vote Up Vote Down


 

@Moriarity557

Firstly, I'd make sure the pages are crawlable. Google will choose a URL as canonical if it regards both as the same, but will usually index both if there are any differences (pushing the non-canonical one down the listings).

You can test this by running a crawler (eg an xml sitemap generator), and making sure all the pages can be found without a sitemap.

Note that Google say the canonical tag is for "duplicate or similar content", and not just duplicate content. The canonical tag is to improve signals of which is the right one rather than it always being used. Anything else you can do is also just to improve such signals. I'm pretty sure if two URLs with completely different content said they are canonical, then both would be indexed and the tag would be ignored.

Personally, I would just use the canonical tag.

However, if you just want to improve relevance of one over the other without that, then link sculpting may be the way to go.

To do this, I'd make sure each /beta page linked to the equivalent /latest page, but not the other way round. If it needs to link the other way, use rel="nofollow" on the link.

You may also want to create both xml and html sitemaps which mention the /latest pages, but leave Google to find the /beta ones by crawling.

Finally, you may want to try rel=”next” and rel=”prev” in order to tell Google that the /beta version is page 2 of the /latest version. Whilst not semantic, it should link the pages together in their index and say that the latest version is page 1.

10% popularity Vote Up Vote Down


 

@Sherry384

There is little you can do to control what search engines do, but here are some ideas.

As for canonical links, this is a way to say to Google this is like that. Some argue that this is for identical pages while others recommend using this option for similar pages too. Either way, you are telling Google to take notice of one page and not the other because it is in effect, a duplicate page similar or identical it does not matter. I am not sure if this works for you. Only you can decide.

Outside of de-listing by blocking using robots.txt or using nofollow, it seems you are looking for a nueanced option without losing one entire set of pages in the search engine. Here are a few ideas. Together, I think you will get what you want.

Background: Some of these techniques are used in similar situations where newer and older versions of software is offered. It is a matter of de-tuning one set of pages in such a way that they are easily found by the user and remain valuable, but rank less in importance for search engines. Ironically, this is easily done.

Here is a thought to begin with. If /beta was discovered by the search engines first, these pages may have de-facto priority over /latest if the pages are similar. You will likely want to change the page structure to make sure that similar pages from one set /latest to the other /beta are actually not that similar. You can de-tune the content of the /beta pages without disappointing your users. I am talking about how the content is formatted. Make the /latest pages more content rich and the /beta pages less content rich without disappointing your users. This trick is often used where the preferred page is made significantly more robust than the non-preferred page.

Two of the most important SEO clues to search engines are the title tag and h1 tag. This with internal links and inbound backlinks which I will talk about later in this answer.

The following are example title and h1 tags.

husqvarna motorcycles new
husqvarna motorcycles used
new husqvarna motorcycles
used husqvarna motorcycles
new motorcycles husqvarna
used motorcycles husqvarna

All three are going to perform well for husqvarna and/or motorcycles. Keywords new and used will have little effect because they are not seen as important keywords compared to husqvarna and motorcycles. Remember this concept. In this case the top two examples are optimized the best. However, the last two are de-optimized for husqvarna. But that may not be enough for your needs.

I do not know what your sites keywords are, you can discover this using a log file analyzer. So I will give an example, but you will have to fill in the blanks.

Some background. Search engines order search terms and terms found in tags in order of importance from left to right simply because most of our brains were trained to think that way. This is a result of learning to read left to right. So when people search, they tend to use the front of mind terms first being the most important unless they are dyslexic or Chinese. For your /latest pages you want to tune your SEO efforts using this theory ordering the most important keywords from left to right. For /beta, it may not be enough to de-tune your SEO efforts by reversing the order, afterall, the pages would only perform differently for the same keywords. Instead, you will seek a new set of keywords that apply specifically to your /beta pages. Of course you will have some overlap, just de-tune the terms on /beta that are important to /latest. Below is an example, albeit, not a great one!

JavaScript Library
Beta Trial Software Library JS

I substituted JavaScript for JS and front-loaded some /beat specific keywords.

Of course, you can redirect all Google referrals for /beta to /latest. I have not tested the code example below, but it should be close. This would mean that if a user clicks a link on Google for /beta they would be redirected to /latest.

RewriteCond %{HTTP_REFERER} google [NC]
RewriteCond %{REQUEST_URI} /beta(.*) [NC, OR]
RewriteRule (.*) www.gojs.net/latest%1 [R=301,L]


Now here is an idea, though I do not think it will effect the SERPs, there are some elements of truth to some of this and should be considered. I will discuss this immediately after the sitemap section.

If you have not created a sitemap, perhaps, and I mean perhaps, you can effect how Google looks at pages using the values that are last modification and priority. Of course you can up the priority for the /latest pages. I do not think this effect search, however, it would not hurt and would still be close to the truth. For the last modification dates, your /latest pages I would assume would be less recent than /beta which seems contradictory. You can sorta fudge the truth and make /latest seem fresher than /beta. Here is a sitemap example.

<url>
<loc>http://www.gojs.net/latest</loc>
<lastmod>2014-04-22</lastmod>
<priority>1</priority>
</url>
<url>
<loc>http://www.gojs.net/latest/something</loc>
<lastmod>2014-04-22</lastmod>
<priority>1</priority>
</url>
<url>
<loc>http://www.gojs.net/beta</loc>
<lastmod>2012-03-11</lastmod>
<priority>0.5</priority>
</url>
<url>
<loc>http://www.gojs.net/beta/something</loc>
<lastmod>2011-02-09</lastmod>
<priority>0.5</priority>
</url>


That may not be enough. Who knows for sure. But worth a try.

But here is an idea that will work. I do not know what configuration you have. If it is Linux/Apache, then this can be really easy. For Windows/IIS, this is still easy, but a bit trickier.

You can always manipulate the last modification dates of your files. I would not be a good SE participant if I did not include a link. This describes how to change the modification date of a Linux file.
askubuntu.com/questions/62492/how-can-i-change-the-date-modified-created-of-a-file
Windows protects certain file properties, I do not know why. But file dates can be changed programmatically. This link lists some tools toward the bottom for modifying the modification date.
www.techrepublic.com/article/build-your-skills-learn-to-manipulate-file-time-stamps-in-windows/
One of the MOST important clues for search engines are links. There are internal links and inbound backlinks, but most people do not realize that they are treated almost exactly the same. So you want to use this fact to your favor.

First internal linking.

Any link in a navigational bar is highly important and any link higher in the content is more important than a link lower in the content. You will want to tune your links so that /latest is more important than /beta.

One way to do this is to make sure that /latest is in the navigation bar and /beta is not. Another way is to link /beta toward the bottom of the page and repeat a link to /latest at the top of the page. This may not be enough though. You can skip the link to /beta altogether except from the /latest pages made toward the bottom of the page. This is a really common trick.

How a link is made is just as important as where it the link exists. Remember the keyword examples above for the title and h1 tags? You want to tune your links to /latest, /latest/something, and /beta, beta/something the same way. Since your directory structure already exists, the only part of the link left to you is the link text. Tune links to /latest and de-tune links to beta without effecting the user or on-page CTR which is obviously not the same as SERP CTR.

For your inbound backlinks, make sure that you are link building to /latest period. You cannot help how people link to your content, but you can out link any links to /beta with links to /latest. I cannot tell you what the ratio should be, the least 2:1 sounds like it can work fine, but 3:1 may be better and 4:1 outrageously awesomely fantastically way way better.

That is it. These tactics should allow your /latest pages take a priority in the SERPs over /beta mostly using standard SEO but with the knowledge of how to make a set of pages perform better than another. The redirect is just a bonus!

Of course, search engines are notoriously slow so this will take a while, though I suspect the re-ordering can begin 30-60 days from modifications.

10% popularity Vote Up Vote Down


 

@Jessie594

Why not set canonical URL's to suggest to Google which content you want indexed.
support.google.com/webmasters/answer/139066?hl=en

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme