Resuming scan starts at the top again...

Started by Walrus, December 05, 2015, 05:39:11 AM

Walrus

I created a project, saved it, scanned the site then stopped.

Next day, I open the project, choose Resume (full) and Recrawl.

However, the Jobs done starts at the top again.

When I stopped yesterday, "Init found link" read 175151, after running for almost 2 hours, it at 115412 this morning.

Is this expected? What can I do to speed the process?

Thanks.

Webhelpforums

TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Walrus

#2
Thanks but I'd already read that page, and tried resuming with and without the Recrawl setting.

I just tried it again. With Recrawl set to OFF, it's been scanning for 20 minutes but still hasn't reached the numbers of last night.

When saved last night, it said Jobs done: 2711490, after 20 mins rescan it says Jobs done: 127418.

It's seems faster but I expected it to just pick up where it left off.

Is this delay normal?

Update: after 45 mins scanning, jobs done is 168201, still far off from where it left off last night.

Walrus

Update: it's taken approximately 3 hours to arrive at the same jobs done as when I left off last night.

Webhelpforums

From
http://www.microsystools.com/products/sitemap-generator/help/sitemap-generator-resume-scan/

QuoteResume (Full):

    Keeps all URLs from earlier scan.
    Will crawl and analyze all URLs that are not flagged as fully analyzed.
    If any new URLs are found, these will be crawled as well.



What this means is if page A links to B C D E F G H and H has not been resolved when you stop the scan (it could e.g. be an URL that redirects... which can affect how "link juice score flows" which again can affect priority scores) URL A will be done again.


To minimize this problem, you can set this option like described:

QuoteTo configure this, disable: Scan website > Crawler engine > Default to GET for page requests

This will vastly reduce the problem you describe
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Walrus

Thanks...Will try the 'Default to Get for requests...' setting tomorrow at off peak time.

Webhelpforums

Switch the option off (!) - it is on by default which is bad for resume :)
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Walrus

That's bad news...It's off by default (see screenshot http://stockbite.uk/files/a10SG-crawler-engine.png).

Webhelpforums

#8
Hmmm... It appears you may have found an issue in the A1SG 7.0.0 beta (3) for OS X (since I can confirm behavior here, but not on Windows build)

Please wait 36-48 hours for a conclusion/fix

TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Webhelpforums

New betas will be released - the visual state of checkboxes was not represented correctly on OS X :)
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Walrus

OK...So does that mean it's actually selected by default when appearance says it's deselected? IOW, do I need to select it?

BTW, I'm trying out version 7.0.0, not beta.

You might consider moving the About menu to the AISG menu (expected in Mac OS).

One other minor thing I reported but not had a reply is every time I start A1SG, the tip of the day window shows, even though DNSA checkbox is selected.

Webhelpforums

New version released - download again from the website - it should fix the issue you reported. Sorry for he trouble :(

7.0 is the first version with support for OS X and has required rewriting a ton of code to make it portable between two different (but inspired by each other) tools and frameworks

When A1 reached beta3 without my (admitably small) group of testers reporting problems, I decided to make the download button on product pages lead to that when visitors were using Mac / OS X

(However, it does state "beta" in parentheses on the product page)

I think you will find the new version solves all issues/confusion related to checkboxes :)

I will look into moving the about menu per your suggestion - thanks!
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Walrus

Understood...I will it give a spin tomorrow and give you feedback if needed..Current version is now churning away...It reached 300,000+ unique links yesterday, so seems to be heading in right direction.

I wish there was some way to force A1SG to just snag the friendly URL for each image, which is all that's really needed on a stock photo site, apart from a few CMS pages about site info.

Will the settings in 'Webmaster crawler' filters do that? Must confess, it needs to be a preset user can just choose as not all of us are Webmasters!  :(

Webhelpforums

You can use analysis filters to control exactly which urls are analyzed/crawled
http://www.microsystools.com/products/website-analyzer/help/website-crawler-scanner-filters/

You can use output filters to conrol which files are included in output
http://www.microsystools.com/products/website-analyzer/help/website-crawler-output-filters/

Webmaser filers are more if the crawler should obey robots.txt, nofollow, noindex etc.

If you can give some examples of what you mean by "friendly url for each image", I can probably help you better :)
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Walrus

I looked at those settings  :o but without doing a lot of reading and head banging, don't know which ones to select.  :'(

The site is PHP/SQL-based, and contains just photos. Each page has an unfriendly URL by default but I have option to change it to a friendly URL, which I have done. I want the friendly URLs to be in the sitemap, so search engines can list photo title.

Here's a sample of URLs waiting to be replaced by friendly URLs and one that has been found and replaced:

<url>
    <loc>http://stockbite.uk/photo/933/photo-details.html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>
  <url>
    <loc>http://stockbite.uk/photo/934/City-Hall-headquarters-of-the-Greater-London-Assembly-GLA-Southwark.html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>
  <url>
    <loc>http://stockbite.uk/photo/934/photo-details.html</loc>
    <changefreq>monthly</changefreq>
    <priority>0.1</priority>
  </url>


Thanks.

Webhelpforums

Hi,

Feel free to email your project file to into [at] microsystools.com and I will email you back revised project files.

I will exclude bad-looking urls

Do you want both
- standard xml sitemap
- image sitemap
?
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Walrus

The checkboxes are selected in the new beta version. I deselected 'Default to get...' but left 'Default to persistent...' as selected.

In 'Webmaster filters' all options are now selected. I left them as is. Will see how I get on with speed and send you project file.

Thanks...Great support!  :)


More About Our Webmaster Tools for Windows and Mac

HTML, image, video and hreflang XML sitemap generatorA1 Sitemap Generator
      
website analysis spider tool for technical SEOA1 Website Analyzer
      
SEO tools for managing keywords and keyword listsA1 Keyword Research
      
complete website copier toolA1 Website Download
      
create custom website search enginesA1 Website Search Engine
      
scrape data into CSV, SQL and databasesA1 Website Scraper