Help reducing scan time

  • 7 Replies
  • 1490 Views
*

Laptop Plus

  • Newbie
  • *
  • 7
  • +0/-0
    • View Profile
Help reducing scan time
« on: May 12, 2014, 10:21:29 PM »
I've gone through the Help index, but the suggestions haven't sped up the scan time all that much.

We've recently implemented some changes to the website which increased the amount of interlinking significantly. There's approximately 75K-100K unique pages, but A1 Sitemap Generator is picking up 50Million+ pages for "Init" found link (check if unique). So I assume this is why the scan is now taking a significant amount of time to complete, as there are literally millions of new links which end up pointing to the same URLs.

Is there any way to reduce the time taken for unique link checking?

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1364
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Help reducing scan time
« Reply #1 on: May 13, 2014, 08:18:05 AM »
In later/recent versions A1 Sitemap Generator will test new links before they get queued and all links in existing queue at intervals. If it then finds a given link is the same as an URL already identified, it is handled immediately (all necessary information regarding links-to / linked-by etc. is of course updated)

However - that only minimizes memory usage in keeping the queed "waiting" number down. For speed, I think the only thing you can do is to add the number of workers threads to 100. But you will need a really powerful computer for itt to make a difference with *lots* of memory as well.

Maybe you could split the website crawl into small sections? (e.g. directories)
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

Laptop Plus

  • Newbie
  • *
  • 7
  • +0/-0
    • View Profile
Re: Help reducing scan time
« Reply #2 on: May 14, 2014, 03:00:49 AM »
Thanks for the reply. A combination of what you've suggested should eventually get the job done. However, I'm having a bit of trouble scanning individual directories due to the .NET nature of the site.

For example, for the Asus subdirectory, there'll be something of a doorway page such as:

www.mywebsite.com/Asus.aspx?manid=1&catid=1

The above page will then link to many pages within the Asus subdirectory, such as:

www.mywebsite.com/Asus/Series.aspx?seriesid=1&manid=1&catid=1

I guess the issue is that the Asus subdirectory and all of the pages contained in it don't actually exist without being linked to from the Asus doorway page. The pages are created dynamically and don't exist within a static directory structure on the server.

Also, is there a way of limiting the scan so that it will only scan descending subdirectory levels, to avoid it reaching the root directory again and trying to crawl the whole site?

« Last Edit: May 14, 2014, 03:06:29 AM by Laptop Plus »

*

Laptop Plus

  • Newbie
  • *
  • 7
  • +0/-0
    • View Profile
Re: Help reducing scan time
« Reply #3 on: May 14, 2014, 03:38:20 AM »
There's a relatively small number of subdirectories (19) off the root directory, so I may be able to use analysis filters to scan only one subdirectory at a time.

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1364
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Help reducing scan time
« Reply #4 on: May 14, 2014, 03:41:07 AM »
1)
Set root to your domain.

2)
Add start search paths to some pages that link into the directories you want

3)
Configure "analysis filters" to only allow those URLs + URLs found in the wanted directories

4)
Configure "output filters" to only allow URLs found in the wanted directories
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

Laptop Plus

  • Newbie
  • *
  • 7
  • +0/-0
    • View Profile
Re: Help reducing scan time
« Reply #5 on: May 15, 2014, 03:39:57 AM »
Cheers, I've got it sorted now. Awesome piece of software.

*

heiberlin

  • Newbie
  • *
  • 1
  • +0/-0
    • View Profile
Re: Help reducing scan time
« Reply #6 on: May 22, 2014, 07:54:17 AM »
I have the same problem. A little website (7 pages) - the scan needs more than 7 hours!
sitemap Generator 5.1.0 - bought, downloaded and unlocked. No options changed.

What about my sites with more than 100 pages?
The first attempts I cancelled after more tha 20 hours.

The hints here in the thread causes no improving.

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1364
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Help reducing scan time
« Reply #7 on: May 22, 2014, 02:51:19 PM »
Hi heiberlin,

Your problem sounds to be entirely different. Some possible causes could be:

Please email the problemeatic website URL or post it here.
« Last Edit: May 22, 2014, 02:53:14 PM by Webhelpforums »
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.