Resuming scan starts at the top again...

  • 16 Replies
  • 3293 Views
*

Walrus

  • Newbie
  • *
  • 9
  • +0/-0
    • View Profile
Resuming scan starts at the top again...
« on: December 05, 2015, 05:39:11 AM »
I created a project, saved it, scanned the site then stopped.

Next day, I open the project, choose Resume (full) and Recrawl.

However, the Jobs done starts at the top again.

When I stopped yesterday, "Init found link" read 175151, after running for almost 2 hours, it at 115412 this morning.

Is this expected? What can I do to speed the process?

Thanks.

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1387
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Resuming scan starts at the top again...
« Reply #1 on: December 05, 2015, 03:42:00 PM »
Don't choose recrawl if you want to resume - that will solve your problem. See this:
http://www.microsystools.com/products/sitemap-generator/help/sitemap-generator-resume-scan/
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

Walrus

  • Newbie
  • *
  • 9
  • +0/-0
    • View Profile
Re: Resuming scan starts at the top again...
« Reply #2 on: December 06, 2015, 04:29:04 AM »
Thanks but I'd already read that page, and tried resuming with and without the Recrawl setting.

I just tried it again. With Recrawl set to OFF, it's been scanning for 20 minutes but still hasn't reached the numbers of last night.

When saved last night, it said Jobs done: 2711490, after 20 mins rescan it says Jobs done: 127418.

It's seems faster but I expected it to just pick up where it left off.

Is this delay normal?

Update: after 45 mins scanning, jobs done is 168201, still far off from where it left off last night.
« Last Edit: December 06, 2015, 04:48:07 AM by Walrus »

*

Walrus

  • Newbie
  • *
  • 9
  • +0/-0
    • View Profile
Re: Resuming scan starts at the top again...
« Reply #3 on: December 06, 2015, 07:03:51 AM »
Update: it's taken approximately 3 hours to arrive at the same jobs done as when I left off last night.

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1387
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Resuming scan starts at the top again...
« Reply #4 on: December 06, 2015, 11:29:22 AM »
From
http://www.microsystools.com/products/sitemap-generator/help/sitemap-generator-resume-scan/

Quote
Resume (Full):

    Keeps all URLs from earlier scan.
    Will crawl and analyze all URLs that are not flagged as fully analyzed.
    If any new URLs are found, these will be crawled as well.



What this means is if page A links to B C D E F G H and H has not been resolved when you stop the scan (it could e.g. be an URL that redirects... which can affect how "link juice score flows" which again can affect priority scores) URL A will be done again.


To minimize this problem, you can set this option like described:

Quote
To configure this, disable: Scan website > Crawler engine > Default to GET for page requests

This will vastly reduce the problem you describe
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

Walrus

  • Newbie
  • *
  • 9
  • +0/-0
    • View Profile
Re: Resuming scan starts at the top again...
« Reply #5 on: December 06, 2015, 12:00:30 PM »
Thanks...Will try the 'Default to Get for requests...' setting tomorrow at off peak time.

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1387
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Resuming scan starts at the top again...
« Reply #6 on: December 06, 2015, 12:22:10 PM »
Switch the option off (!) - it is on by default which is bad for resume :)
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

Walrus

  • Newbie
  • *
  • 9
  • +0/-0
    • View Profile
Re: Resuming scan starts at the top again...
« Reply #7 on: December 06, 2015, 12:58:46 PM »
That's bad news...It's off by default (see screenshot http://stockbite.uk/files/a10SG-crawler-engine.png).

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1387
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Resuming scan starts at the top again...
« Reply #8 on: December 06, 2015, 10:17:04 PM »
Hmmm... It appears you may have found an issue in the A1SG 7.0.0 beta (3) for OS X (since I can confirm behavior here, but not on Windows build)

Please wait 36-48 hours for a conclusion/fix

« Last Edit: December 06, 2015, 10:18:39 PM by Webhelpforums »
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1387
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Resuming scan starts at the top again...
« Reply #9 on: December 07, 2015, 12:27:02 AM »
New betas will be released - the visual state of checkboxes was not represented correctly on OS X :)
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

Walrus

  • Newbie
  • *
  • 9
  • +0/-0
    • View Profile
Re: Resuming scan starts at the top again...
« Reply #10 on: December 07, 2015, 03:06:47 AM »
OK...So does that mean it's actually selected by default when appearance says it's deselected? IOW, do I need to select it?

BTW, I'm trying out version 7.0.0, not beta.

You might consider moving the About menu to the AISG menu (expected in Mac OS).

One other minor thing I reported but not had a reply is every time I start A1SG, the tip of the day window shows, even though DNSA checkbox is selected.

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1387
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Resuming scan starts at the top again...
« Reply #11 on: December 07, 2015, 03:29:02 AM »
New version released - download again from the website - it should fix the issue you reported. Sorry for he trouble :(

7.0 is the first version with support for OS X and has required rewriting a ton of code to make it portable between two different (but inspired by each other) tools and frameworks

When A1 reached beta3 without my (admitably small) group of testers reporting problems, I decided to make the download button on product pages lead to that when visitors were using Mac / OS X

(However, it does state "beta" in parentheses on the product page)

I think you will find the new version solves all issues/confusion related to checkboxes :)

I will look into moving the about menu per your suggestion - thanks!
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

Walrus

  • Newbie
  • *
  • 9
  • +0/-0
    • View Profile
Re: Resuming scan starts at the top again...
« Reply #12 on: December 07, 2015, 03:41:35 AM »
Understood...I will it give a spin tomorrow and give you feedback if needed..Current version is now churning away...It reached 300,000+ unique links yesterday, so seems to be heading in right direction.

I wish there was some way to force A1SG to just snag the friendly URL for each image, which is all that's really needed on a stock photo site, apart from a few CMS pages about site info.

Will the settings in 'Webmaster crawler' filters do that? Must confess, it needs to be a preset user can just choose as not all of us are Webmasters!  :(

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1387
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Resuming scan starts at the top again...
« Reply #13 on: December 07, 2015, 11:13:33 AM »
You can use analysis filters to control exactly which urls are analyzed/crawled
http://www.microsystools.com/products/website-analyzer/help/website-crawler-scanner-filters/

You can use output filters to conrol which files are included in output
http://www.microsystools.com/products/website-analyzer/help/website-crawler-output-filters/

Webmaser filers are more if the crawler should obey robots.txt, nofollow, noindex etc.

If you can give some examples of what you mean by "friendly url for each image", I can probably help you better :)
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

Walrus

  • Newbie
  • *
  • 9
  • +0/-0
    • View Profile
Re: Resuming scan starts at the top again...
« Reply #14 on: December 07, 2015, 11:34:57 AM »
I looked at those settings  :o but without doing a lot of reading and head banging, don't know which ones to select.  :'(

The site is PHP/SQL-based, and contains just photos. Each page has an unfriendly URL by default but I have option to change it to a friendly URL, which I have done. I want the friendly URLs to be in the sitemap, so search engines can list photo title.

Here's a sample of URLs waiting to be replaced by friendly URLs and one that has been found and replaced:

 <url>
    <loc>http://stockbite.uk/photo/933/photo-details.html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>
  <url>
    <loc>http://stockbite.uk/photo/934/City-Hall-headquarters-of-the-Greater-London-Assembly-GLA-Southwark.html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>
  <url>
    <loc>http://stockbite.uk/photo/934/photo-details.html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>


Thanks.

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1387
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: Resuming scan starts at the top again...
« Reply #15 on: December 07, 2015, 04:49:23 PM »
Hi,

Feel free to email your project file to into [at] microsystools.com and I will email you back revised project files.

I will exclude bad-looking urls

Do you want both
- standard xml sitemap
- image sitemap
?
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

Walrus

  • Newbie
  • *
  • 9
  • +0/-0
    • View Profile
Re: Resuming scan starts at the top again...
« Reply #16 on: December 08, 2015, 03:03:58 AM »
The checkboxes are selected in the new beta version. I deselected 'Default to get...' but left 'Default to persistent...' as selected.

In 'Webmaster filters' all options are now selected. I left them as is. Will see how I get on with speed and send you project file.

Thanks...Great support!  :)


 




See Our Webmaster Tools for Windows and Mac

A1 Sitemap Generator
      
A1 Website Analyzer
      
A1 Keyword Research
      
A1 Website Download
      
A1 Website Search Engine
      
A1 Website Scraper