Continuous Run ...

  • 4 Replies
  • 1017 Views
*

drgeorgep

  • Newbie
  • *
  • 5
  • +0/-0
    • View Profile
Continuous Run ...
« on: January 03, 2017, 04:07:43 PM »
Hi ... some time ago, I reported A1 Sitemap Generator ran continuously, reporting as many as 500,000 URLs for a site with fewer than 2100 URLs. At that time, I was advised to have a tech check my data base. After many unrelated delays, I managed to find a competent data base tech. Here's what he reported: "I ran A1 for over 6 hours today. I ran into the same situation as you, where it detected too many links. It also incorrectly formed the links -- some of them went on and on with "/article/1012/article/1030/article..." From my experience that would leave me to believe that their tool is crawling your website, following links, and continuing to find links back to articles it's already visited, causing the long, incorrect link chains." I should add that this problem originated in 7.x version and the tech tested it with 8.x. Can someone please help? Thanks so much. drgeorgep

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1385
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Continuous Run ...
« Reply #1 on: January 03, 2017, 04:51:46 PM »
From your description I can make a qualified guess at the problem:

1)
Your website does no return 404 error response for URLs that do not exist

2)
Your website uses relative links


So suppose A1 finds a link to example.com/doesnotexist/

but your website returns normal "200 found" reponse and have the page link to "newfolder/"

A1 will then test example.com/doesnotexist/newfolder/

but your website returns normal "200 found" reponse and have the page link to "newfolder/"

A1 will then test example.com/doesnotexist/newfolder/newfolder/

etc.

Please bear in mind that above is a guess based on my experience from people reporting similar - the specifics of your case can be different. Please email your website address + a few example URLs



MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

drgeorgep

  • Newbie
  • *
  • 5
  • +0/-0
    • View Profile
Re: Continuous Run ...
« Reply #2 on: January 03, 2017, 05:36:30 PM »
Hi ... website is grubstreet.ca. Here are five URL: http://grubstreet.ca/articles/index/2327/as-i-hear-it-rainy-day-in-nyc; http://grubstreet.ca/articles/index/1174/as-i-hear-it-summers-here; http://grubstreet.ca/articles/index/1896/as-i-hear-it-hot-cars-in-summer; http://grubstreet.ca/articles/index/1080/as-i-hear-it-the-dreaded-event; http://grubstreet.ca/articles/index/1258/as-i-hear-it-radio-reductions. I have asked my host to confirm that 404 error messages are generated for URLs that do not exist. I am sure such messages are generated, but checked to be sure. Will report, asap I hear from host support. Thanks. dgp

*

drgeorgep

  • Newbie
  • *
  • 5
  • +0/-0
    • View Profile
Re: Continuous Run ...
« Reply #3 on: January 03, 2017, 06:43:15 PM »
Hi ... I confirmed, with the grubstreet.ca host, that, indeed, 404 error responses are sent when someone attempts to load a non-existent URL. This is to up-date my previous response. Thanks so much. dgp

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1385
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Continuous Run ...
« Reply #4 on: January 05, 2017, 12:02:52 PM »
Hi,

Okay - i just tested
http://grubstreet.ca/articles/index/2327/as-i-hear-it-rainy-day-in-nyc

and created
http://grubstreet.ca/articles/index/2327/as-i-hear-it-rainy-day-in-nyc/abdjgysdjgfsjg

The second URL should return 404 - but instead returns error response 200 - just verified in FireFox Live HTTP Headers plugin


Code: [Select]
http://grubstreet.ca/articles/index/2327/as-i-hear-it-rainy-day-in-nyc/abdjgysdjgfsjg

GET /articles/index/2327/as-i-hear-it-rainy-day-in-nyc/abdjgysdjgfsjg HTTP/1.1
Host: grubstreet.ca
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: da,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Cookie: PHPSESSID=dkr25hfdsra0gnbkjntegkpfv7; _ga=GA1.2.543169924.1483635724; _gat=1
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1

HTTP/1.1 200 OK
Date: Thu, 05 Jan 2017 17:03:44 GMT
Server: Apache/2.4.23 (cPanel) OpenSSL/1.0.1e-fips mod_bwlimited/1.4
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

You can test as well using e.g. https://httpstatus.io or http://web-sniffer.net with the above mentioned URL

So my initial diagnosis based on your description of symptoms sofar seems correct :)

You can either

1)
Fix the problem

2)
Exclude unwanted URL patterns in A1 Sitemap Generator output filers and analysis filters

I consider solution #1 best since it will work with all crawlers and search engines, but otherwise see:


If you are ever in doubt how A1 finds a specific URL you can see "linked-by" and "redirected-by" and "sourced-by" tabs. You can then follow the trail to see where a problem originates from:
http://www.microsystools.com/products/sitemap-generator/help/sitemaps-generator-analyze-links/
« Last Edit: February 25, 2017, 07:16:27 AM by Webhelpforums »
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

 




See Our Webmaster Tools for Windows and Mac

A1 Sitemap Generator
      
A1 Website Analyzer
      
A1 Keyword Research
      
A1 Website Download
      
A1 Website Search Engine
      
A1 Website Scraper