: Is there an XML sitemap generator with command line interface for nginx on Linux? I'm looking for an XML sitemap generator that can be triggered from the command line, supports nginx, and works
I'm looking for an XML sitemap generator that can be triggered from the command line, supports nginx, and works on Linux (Debian). What can you recommend?
More posts by @Gretchen104
1 Comments
Sorted by latest first Latest Oldest Best
Did you trying Google'ing this? First result on the first page:
code.google.com/p/sitemap-generators/wiki/SitemapGenerators
Edit:
As per the comments, I tried out the following sitemap generator:
sitemap-generators.googlecode.com/svn/trunk/docs/en/sitemap-generator.html
The downloaded zip bundle contains a few files:
drwxr-xr-x 19 user group 646 Apr 10 05:22 .
drwxr-xr-x 3 user group 102 Apr 10 05:12 ..
-r--r-----@ 1 user group 23 Jun 16 2005 AUTHORS
-r--r-----@ 1 user group 1791 Jun 16 2005 COPYING
-r--r--r--@ 1 user group 2267 Dec 5 2005 ChangeLog
-rw-r--r--@ 1 user group 258 Dec 5 2005 PKG-INFO
-r--r--r--@ 1 user group 1111 Dec 5 2005 README
drwxr-xr-x 3 user group 102 Apr 10 05:16 build
-r--r--r--@ 1 user group 5662 Sep 7 2005 example_config.xml
-r--r-----@ 1 user group 996 Jun 16 2005 example_urllist.txt
-r-xr-xr-x@ 1 user group 317 Dec 5 2005 setup.py
-r-xr-xr-x@ 1 user group 73063 Dec 5 2005 sitemap_gen.py
-r-xr-xr-x@ 1 user group 28551 Sep 7 2005 test_sitemap_gen.py
Using the provided example_config.xml, I modified it in the following manner:
<?xml version="1.0" encoding="UTF-8"?>
<site
base_url="http://YOURDOMAIN.com/"
store_into="/var/www/sitemap_gen-1.4/sitemap.xml"
verbose="1"
>
<url href="http://YOURDOMAIN.com/stats?q=name" />
<url
href="http://YOURDOMAIN.com/stats?q=age"
lastmod="2004-11-14T01:00:00-07:00"
changefreq="yearly"
priority="0.3"
/>
<urllist path="urllist.txt" encoding="UTF-8" />
<!-- Exclude URLs that end with a '~' (IE: emacs backup files) -->
<filter action="drop" type="wildcard" pattern="*~" />
<!-- Exclude URLs within UNIX-style hidden files or directories -->
<filter action="drop" type="regexp" pattern="/.[^/]*" />
</site>
I think that serves as the template for generating the sitemap.xml. Now, the generator supports pulling URLS from apache style access logs or pulling from a url list file. I opted to pull from a url list file, since I was testing from my laptop.
To generate the url list, I employed 'wget' to spider the site:
wget -mk --spider -r -l2 YOURDOMAIN.COM/
or
wget -mk --spider -r -l2 YOURDOMAIN.COM/ -o urlinfolist.txt
-r: recursive; -l2: depth (if not set, depth = unlimited). See wget manual page.
Then extracted the URLS from the wget-log that is generated:
cat wget-log | tr ' ' '12' | grep "^http" | egrep -vi "[?]|[.]jpg$" | sort -u > urllist.txt
or
cat urlinfolist.txt | tr ' ' '12' | grep "^http" | egrep -vi "[?]|[.]jpg$" | sort -u > urllist.txt
Note: Some of the exclusions I had in my line were not needed because the config file either already excluded them or could very easily exclude them.
Then, ran the generator:
python sitemap_gen.py --config=example_config.xml
Which produced the sitemap.xml file.
The script looks to be designed to run in an automated fashion. But it worked for my test run. The wget can take a while to run. However, if you don't have any special rewrites/etc, you can just scan your site's static content path with a 'find' and maybe do some filtering on it before dumping it into the url list file.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.