Webmaster Forums - Website and SEO Help

Microsys Products and Webmaster Tools => A1 Sitemap Generator => Topic started by: ljs on June 10, 2015, 11:48:46 PM

Title: Scanning takes forever
Post by: ljs on June 10, 2015, 11:48:46 PM
Same as many others here. I let the scan running overnight and stopped it this morning after 14 hours to check results so far. It looks like every possible url was scanned.

The number of 'Jobs waiting in crawler engine' keeps growing and A1 wants to keep going and going. I'm not sure what to do.
Title: Re: Scanning takes forever
Post by: Webhelpforums on June 11, 2015, 06:01:13 AM
The best thing to do in such a situation is to stop the scan and inspect the URLs collected.

Remember hat you can resume scans later:
http://www.microsystools.com/products/sitemap-generator/help/sitemap-generator-resume-scan/ (http://www.microsystools.com/products/sitemap-generator/help/sitemap-generator-resume-scan/)

Anyhow, most likely your problem is that your website generates inifinite / near-infinite number of URLs.

From http://www.microsystools.com/products/sitemap-generator/help/creating-sitemaps-large-websites/ (http://www.microsystools.com/products/sitemap-generator/help/creating-sitemaps-large-websites/) help page:

List of things to check:

A) Check if your website is generating an infinite amount of unique URLs. If it does, it will cause the crawler to never stop as new unique page URLs are found all the time. A good method to discover and solve these kinds of problems is by:

  • Start a website scan.
  • Stop the website scan after e.g. half an hour.
  • Inspect if everything appears correct, i.e. if most of the URLs found seem correct.

Example #1
A website returns 200 instead of 404 for broken page URLs. Example of infinite pattern: Original 1/broken.html links to 1/1/broken.html links to 1/1/1/broken.html etc.

Example #2
The website platform CMS generates a huge number of 100% duplicate URLs for each actual existing URL. To read more about duplicate URLs, see this help page. Remember that you can analyze and investigate internal website linking incase something looks wrong.

B) Check if your project configuration and website content will cause the crawler to download files hundred of megabytes large.

Example #1: Your website contains many huge files (like hundreds of megabytes) the crawler must download. (While the memory is freed after the download has completed, it can still cause problems on computers with low memory.)

Remember you can see internal linking after you stop the scan - this will usually show how an URL was found:
http://www.microsystools.com/products/sitemap-generator/help/sitemaps-generator-analyze-links/ (http://www.microsystools.com/products/sitemap-generator/help/sitemaps-generator-analyze-links/)

To help more, I will need to see the website. Feel free to post it here or email it: http://www.microsystools.com/home/contact.php (http://www.microsystools.com/home/contact.php)
Title: Re: Scanning takes forever
Post by: ljs on June 11, 2015, 06:16:34 AM
Actually, it's a car parts web shop. So there are hundreds of car brands and models, that all have hundreds of matching parts. Every model/part combination generates a URL. There are now about 65000 internal URL's listed. Should I go on or is there an alternative way to make a sitemap for this type of website?

Title: Re: Scanning takes forever
Post by: Webhelpforums on June 11, 2015, 08:37:47 AM
If there are URL patterns you know are duplicates of some sort, you can exlude them in analysis filters and output filters before starting the scan. You can probably shorten the scan time quite drasticly if applied carefully.

The first step will be to stop the scan and inspect the results for urls you do not want in the final XML sitemap.

If the website/server can handle it, you can also increase amount of threads and simultaneous connections (default settings in A1SG is around 10% of max)

Be sure to check this help page for more info and links to other relevan help pages:
http://www.microsystools.com/products/sitemap-generator/help/creating-sitemaps-large-websites/ (http://www.microsystools.com/products/sitemap-generator/help/creating-sitemaps-large-websites/)