Continuous Run ...

Started by drgeorgep, January 03, 2017, 04:07:43 PM


Hi ... some time ago, I reported A1 Sitemap Generator ran continuously, reporting as many as 500,000 URLs for a site with fewer than 2100 URLs. At that time, I was advised to have a tech check my data base. After many unrelated delays, I managed to find a competent data base tech. Here's what he reported: "I ran A1 for over 6 hours today. I ran into the same situation as you, where it detected too many links. It also incorrectly formed the links -- some of them went on and on with "/article/1012/article/1030/article..." From my experience that would leave me to believe that their tool is crawling your website, following links, and continuing to find links back to articles it's already visited, causing the long, incorrect link chains." I should add that this problem originated in 7.x version and the tech tested it with 8.x. Can someone please help? Thanks so much. drgeorgep


From your description I can make a qualified guess at the problem:

Your website does no return 404 error response for URLs that do not exist

Your website uses relative links

So suppose A1 finds a link to

but your website returns normal "200 found" reponse and have the page link to "newfolder/"

A1 will then test

but your website returns normal "200 found" reponse and have the page link to "newfolder/"

A1 will then test


Please bear in mind that above is a guess based on my experience from people reporting similar - the specifics of your case can be different. Please email your website address + a few example URLs

TechSEO360 |  | A1 Sitemap Generator, A1 Website Analyzer etc.


Hi ... website is Here are five URL:;;;; I have asked my host to confirm that 404 error messages are generated for URLs that do not exist. I am sure such messages are generated, but checked to be sure. Will report, asap I hear from host support. Thanks. dgp


Hi ... I confirmed, with the host, that, indeed, 404 error responses are sent when someone attempts to load a non-existent URL. This is to up-date my previous response. Thanks so much. dgp



Okay - i just tested

and created

The second URL should return 404 - but instead returns error response 200 - just verified in FireFox Live HTTP Headers plugin

GET /articles/index/2327/as-i-hear-it-rainy-day-in-nyc/abdjgysdjgfsjg HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: da,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Cookie: PHPSESSID=dkr25hfdsra0gnbkjntegkpfv7; _ga=GA1.2.543169924.1483635724; _gat=1
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1

HTTP/1.1 200 OK
Date: Thu, 05 Jan 2017 17:03:44 GMT
Server: Apache/2.4.23 (cPanel) OpenSSL/1.0.1e-fips mod_bwlimited/1.4
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

You can test as well using e.g. or with the above mentioned URL

So my initial diagnosis based on your description of symptoms sofar seems correct :)

You can either

Fix the problem

Exclude unwanted URL patterns in A1 Sitemap Generator output filers and analysis filters

I consider solution #1 best since it will work with all crawlers and search engines, but otherwise see:

If you are ever in doubt how A1 finds a specific URL you can see "linked-by" and "redirected-by" and "sourced-by" tabs. You can then follow the trail to see where a problem originates from:
TechSEO360 |  | A1 Sitemap Generator, A1 Website Analyzer etc.

More About Our Webmaster Tools for Windows and Mac

HTML, image, video and hreflang XML sitemap generatorA1 Sitemap Generator
website analysis spider tool for technical SEOA1 Website Analyzer
SEO tools for managing keywords and keyword listsA1 Keyword Research
complete website copier toolA1 Website Download
create custom website search enginesA1 Website Search Engine
scrape data into CSV, SQL and databasesA1 Website Scraper