Large sites eat all available memory

seofreelance · August 19, 2013, 11:14:58 AM

Hi,
I am running A1SG (always latest pro version) on a 2generation i7 with 8gb RAM and find that historically A1SG eats all available memory with larger projects, meaning large as > 100K URLs

(this snapshot comes from this problem happening while scanning france-voyage.com)

I know I have settings for making querying and memory usage lighter, but since I use this (great) tool for SEO purposes I need the closest to real info, specially from missconfigured or tricky servers (the ones that may give problems if I enable the suggested in thread http://webhelpforums.net/index.php?topic=4283.0)

This memory issue has a workaround: stop the crawler, let it get all pending requests and save the Project. Then mark the option for continue where left and keep going for next batch, and so on (mucho so on if large site!)
Reverse lecture on this workaround: can't leave it working while picking kids from school (an example), if I forget or rely too much on the time allowed to run, I may find myself with a stuck laptop and have to interrupt the A1SG process from Task Admin, then all process time and collected data go to waste.

Is it an A1SG bug? Is is anything you can fix in next versions?

I'd be happy enough with serious alternates such as:
- Autosaving settings
- Memory limit triggers (say stop&save, save and prompt, data batch like those ".part" compressed packs being assempled at final stage)
- Whatever you can think of

Thanks for the atention

Webhelpforums · August 19, 2013, 05:27:35 PM

One of the reasons it helps stopping the scan is that there sometimes are huge queues of links found that needs to be checked. This can on *some* sites actually cause large memory usage far beyond what is normal. I think I have a suggestion that will solve your problem

Try uncheck Scan website > Crawler engine > Default to GET for page requests

This solution often works well because it clears the queue much faster. For details see:
http://www.microsystools.com/products/sitemap-generator/help/website-crawl-progress-status/

Incidently, if you do start/resume scans, using HEAD requests will also be better since it can mark more pages URLs as "scanned completely" faster. For details on that, see:

http://www.microsystools.com/products/sitemap-generator/help/sitemap-generator-resume-scan/

The reason HEAD is not default is that some servers (and website CMS plugins) error/denie such requests + using GET has proven to be slighly faster on most websites

seofreelance · August 19, 2013, 06:28:48 PM

Thanks, will try this settings out in short.

See Our Webmaster Tools for Windows and Mac

Large sites eat all available memory

seofreelance

Webhelpforums

seofreelance