Mobile app version of vmapp.org
Login or Join
Berryessa370

: Splitting a sitemap by content type I currently am tasked with submitting our website sitemap to the search engines every week. We have a module which does offer sitemap generation but we find

@Berryessa370

Posted in: #SearchEngines #Sitemap #XmlSitemap

I currently am tasked with submitting our website sitemap to the search engines every week.

We have a module which does offer sitemap generation but we find using it does not work very well as not all pages are included and it does not split the sitemap by content.

I've used various (online and offline) tools to generate the sitemaps which is not the problem. The problem is that after every generation (which takes most of each Monday) I have to manually go through the sitemap and categorise the links in to products, pages, categories and sub categories.

I've experimented successfully with XSL to split the sitemap but it is still a labour intensive process.

Does anyone know of a good method to split the sitemap?

Currently there are around 20,000 links (iirc) in total.

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Berryessa370

3 Comments

Sorted by latest first Latest Oldest Best

 

@BetL925

I would try to get it on this way.

Firstly ensure, that URLs are "speaking", so e.g. they look like domain.com/category1/item1, domain.com/category2/item2.

Then export all existing URLs into different files, which are sitemaps, by sorting URLs alphabetically, so each sitemap file gets URLs from single category (category1.xml, category2.xml)

It all can be done by simplest PHP-script, which reads all internal URLs and writes them into once created files. You can even create these sitemap-files manually, for each category.

Look here: php.net/manual/en/domdocument.save.php.

10% popularity Vote Up Vote Down


 

@Goswami781

I can't think of anything a sitemap is used for that requires content split up by content type. Can you dig into that a bit and give us a more detailed picture of what is being created, how it is being used, and why it would want to be edited?

The big worry is the pages that aren't being generated--that is content that Google and other search engines aren't seeing (or are seeing only to the extent that they follow links from pages in the site map). That would be where I would spend time before anything else.

Related to this is the possibility that you are also using a variant of your site map to help people find things on your site. That is usually not useful with a site as large as yours, which would raise the question of how people navigate, what are they looking for, how do you convey a "scent of information," etc.

We often think of websites as places where people land on the home page and then wander about. In reality, they land on the page that a search engine or link suggested would answer a question or meet a need, and then do some minor exploring (or, most often, go on to another website). So, I also raise the possibility that there is an architecture and/or SEO issue--but that takes us far afield from the question as it was asked, and gets into analytics telling you how people are using the site, and then using that as a guide for further development.

In short--a site map is a shortcut invented to help search engines find your content. Need to understand better why you aren't getting a whole site listed, and what you are hoping to accomplish by editing.

10% popularity Vote Up Vote Down


 

@Megan663

There are not off the shelf products that are going to do what you want. You could write a script to do it. It wouldn't be too hard in a language like Perl or Python.

If you sitemap entries were on single lines, it would be much easier. You will need to hook up an XML parser instead of just reading the file line by line.

Once you can read the file line by line in the script, you place them in various buckets using the same criteria that you use to do so by hand.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme