Why Does The Number of URL's In The Sitemap Not Correspond?

Started by supertrooper, September 03, 2010, 04:22:25 PM

supertrooper

My Site has approx 4,000 pages but Google Webmaster Tools only reports 2,659 URL's submitted in the sitemap. When creating the image sitemap in the A1 Sitemap Generator, there were over 11,000 items scanned. My question are

1) whether there is something wrong with the sitemap?
2) Is it possible that the A1 Sitemap Generator is dropping some of the URL's? Without counting them it's difficult to know the actual number.

You can check out my sitemap on http://kenbillington.ch/sitemap.xml

:) :) :)

Webhelpforums

#1
Hi,


Offhand possible reasons an URL is not included after website scan or when creating sitemaps could be:


URL marked "canonical" to another URL
URL filtered by robots.txt
URL has "noindex"
URL is only linked by "nofollow" links
URL has meta refresh redirect to another URL
http://www.microsystools.com/products/sitemap-generator/help/sitemap-robots-noindex-nofollow/

URL has a response code not in default settings accepted when creating sitemaps:
http://www.microsystools.com/products/sitemap-generator/help/sitemap-website-scan-errors/

Other things could be e.g. output filters:
http://www.microsystools.com/products/sitemap-generator/help/website-crawler-output-filters/


Are there any URLs you find missing after website scan? If so, you can try disable option:
Crawler options | Apply webmaster and list filters after website scan.
You can also try disable "Obey nofollow" etc. (see above link)

If you suddenly now see new URLs you would prefer to have included in XML sitemap, you can try see if they e.g. been detected noindex or similar:
http://www.microsystools.com/products/sitemap-generator/help/sitemap-robots-noindex-nofollow/


If you can mention any single URL that you believe A1 Sitemap Generator misses either after website scan or inside XML sitemap, please let me know.

I will add an easy way to count URLs included into XML sitemap. I will try have that included in 2.3.4
(until then you can try count <loc> in e.g. a text editor)

I will *probably* also soon include XML sitemap(s) sorting options soon (in 2.3.3 URLs are sorted after website-URL-structure/alphabet, but I may add e.g. importance/priority as a sorting option. Something I have been preparing for / toying with. But I rather not have too many possibly-confusing and/or rarely-used options)
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

supertrooper

An analysis of the sitemap shows that approx. 20% of the URL's do not show up in the sitemap. As an example I just took one of the categories where there should be 60 URL's but only 46 are to be found in the sitemap. The missing 14 URL's are:

http://kenbillington.ch/photobank/Swans/Black Swan (Cygnus atratus)/slides/Black Swan (Cygnus atratus) (6).html
http://kenbillington.ch/photobank/Swans/Black Swan (Cygnus atratus)/slides/Black Swan (Cygnus atratus) (2).html
http://kenbillington.ch/photobank/Swans/Black Swan (Cygnus atratus)/slides/Black Swan (Cygnus atratus) (1).html
http://kenbillington.ch/photobank/Swans/Coscoroba Swan (Coscoroba coscoroba)/slides/Coscoroba Swan (Coscoroba coscoroba) (1).html
http://kenbillington.ch/photobank/Swans/Mute Swan (Cygnus olor)/slides/Mute Swan (Cygnus olor) (11).html
http://kenbillington.ch/photobank/Swans/Mute Swan (Cygnus olor)/slides/Mute Swan (Cygnus olor) (1).html
http://kenbillington.ch/photobank/Swans/Mute Swan (Cygnus olor)/slides/Mute Swan (Cygnus olor) (10).html
http://kenbillington.ch/photobank/Swans/Trumpeter Swan (Cygnus buccinator)/slides/Trumpeter Swan (Cygnus buccinator) (1).html
http://kenbillington.ch/photobank/Swans/Whooper Swan (Cygnus cygnus)/slides/Whooper Swan (Cygnus cygnus) (12).html
http://kenbillington.ch/photobank/Swans/Whooper Swan (Cygnus cygnus)/slides/Whooper Swan (Cygnus cygnus) (11).html
http://kenbillington.ch/photobank/Swans/Whooper Swan (Cygnus cygnus)/slides/Whooper Swan (Cygnus cygnus) (6).html
http://kenbillington.ch/photobank/Swans/Whooper Swan (Cygnus cygnus)/slides/Whooper Swan (Cygnus cygnus) (4).html
http://kenbillington.ch/photobank/Swans/Whooper Swan (Cygnus cygnus)/slides/Whooper Swan (Cygnus cygnus) (10).html
http://kenbillington.ch/photobank/Swans/Whooper Swan (Cygnus cygnus)/slides/Whooper Swan (Cygnus cygnus) (1).html

It seems that for some reason the A1 Sitemap Generator is missing or skipping many of the URL's.

Please advise.

Webhelpforums

Hi, I just took a test scan using default settings version 2.3.3

And in my XML sitemap they were all there. Remember per default that space gets encoded to %20. See:
http://www.microsystools.com/products/sitemap-generator/help/character-percentage-url-encoding/

Thje URLs you listed also all seem to be included in outputted XML sitemap and website scan results:
  <url>
    <loc>http://kenbillington.ch/photobank/Swans/Black%20Swan%20(Cygnus%20atratus)/slides/Black%20Swan%20(Cygnus%20atratus)%20(1).html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>
  <url>
    <loc>http://kenbillington.ch/photobank/Swans/Black%20Swan%20(Cygnus%20atratus)/slides/Black%20Swan%20(Cygnus%20atratus)%20(2).html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>
  <url>
    <loc>http://kenbillington.ch/photobank/Swans/Black%20Swan%20(Cygnus%20atratus)/slides/Black%20Swan%20(Cygnus%20atratus)%20(3).html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>
  <url>
    <loc>http://kenbillington.ch/photobank/Swans/Black%20Swan%20(Cygnus%20atratus)/slides/Black%20Swan%20(Cygnus%20atratus)%20(5).html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>
  <url>
    <loc>http://kenbillington.ch/photobank/Swans/Black%20Swan%20(Cygnus%20atratus)/slides/Black%20Swan%20(Cygnus%20atratus)%20(6).html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>
  <url>
    <loc>http://kenbillington.ch/photobank/Swans/Black%20Swan%20(Cygnus%20atratus)/slides/Black%20Swan%20(Cygnus%20atratus)%20(7).html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>


Same with the rest I checked, e.g.
http://kenbillington.ch/photobank/Swans/Mute%20Swan%20(Cygnus%20olor)/slides/Mute%20Swan%20(Cygnus%20olor)%20(11).html


If you don't like URL encoding, you can disable it in A1 Sitemap Generator (see the above linked help page), but it's parts of XML sitemaps specification that URLs get URL encoded
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Webhelpforums

Looking at your sitemap.xml file there is something confusing:
http://kenbillington.ch/sitemap.xml

Normally sitemap.xml is used for standard XML sitemaps, but yours seem is an image sitemap! I think maybe the naming (for whatever reason) got messed up? A1 Sitemap Generator will with default settings use sitemap.xml for normal XML sitemaps and sitemap-image.xml for image sitemaps.
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Webhelpforums

TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

supertrooper

I also got 3,668 URL's using the latest version of A1 Sitemap Generator and the default xml sitemap.

However when I change the presets to Google Image Sitemap, re-run the scan and then build an XML-Sitemap (Google Image) there are again only 2,659 URL's in the sitemap. It would seem that that some URL's with images in them are being missed out.

What do I need to do to get an image sitemap containing all of the URL's containing images?

Please advise.

Webhelpforums

A1 Sitemap Generator only list each image *once* in image sitemap. This means if two page URLs link same image, only one of pages is selected as "source" for the image in the generated image sitemap. In some cases this means some page URLs end up having no unique images and thus not included in image sitemaps. (Since if you just want to let Google know of all page URLs in your website, you can use the normal XML sitemap for that)

Anyways, I guess I could make that behavior optional to satisfy everyone. (Those who want above -and- those who want an image listed for all different URLs where the given image is used)
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

supertrooper

Such an option would be great. Please let me know when I can download the updated version of the software.

Webhelpforums

TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Webhelpforums

I can now confirm that 2.3.4 will support

1)
importance sorting of ULRs in sitemaps
struture/alphabet sorting of ULRs in sitemaps (default)

2)
Text replace will also now tell how many replaces it has done (making it easy to count words etc.)

3)
And it will also support option if image sitemaps will list images max-once or for all pages where it is used.
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Webhelpforums

By the way, just noticed during regular scans for creating normal XML sitemaps of your website.


In
Scan website | Crawler options
you may want to enable:
Try search for links in Javascript and CSS

It results in more URLs
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Webhelpforums

2.3.4 released

You find the new options in:
Create sitemap | XML sitemap extensions | Google image sitemaps
(bottom)

Create sitemap | Document options | Generated sitemap options
(bottom)
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

More About Our Webmaster Tools for Windows and Mac

HTML, image, video and hreflang XML sitemap generatorA1 Sitemap Generator
      
website analysis spider tool for technical SEOA1 Website Analyzer
      
SEO tools for managing keywords and keyword listsA1 Keyword Research
      
complete website copier toolA1 Website Download
      
create custom website search enginesA1 Website Search Engine
      
scrape data into CSV, SQL and databasesA1 Website Scraper