HELP!!! Spent 3 DAYS analyzing website for sitemap it is REBOOTED!!!

Started by camaro92, February 15, 2011, 03:40:30 AM

camaro92

1. I tried to make a sitemap using the Sitemap generator. it's advertised at in little as 20 seconds to make a sitemap which is NOT true. What is not told and now I discover that is that you have to analyze the website. Which is a bit over 20 seconds.. More like 3 DAYS!!

2. I ran the program to analyze the website and got thru 3 DAYS and then it just popped up an error and in a second before I could read it, the computer just REBOOTED. Now I return back to the program and see NO way to resume?!?!?!!!! Are you telling me that I just spent 3 DAYS (computer left on for 3 days straight) just to have your program crash and now I LOST IT ALL!?!?

Webhelpforums

I don't think I am "advertising" it as 20 seconds anywhere although it's certainly a good estimate/example for small/normal websites. And it's certainly no secret that A1 Sitemap Generator has to crawl your website... How else would it find your pages? (It does not have direct access to a database containing all URLs)

A1 Sitemap Generator has to crawl your website. So if you website is either buggy and creates endless new URLs or you simply have e.g. a million page website then it should be no surprise it will take some time...

e.g. scanning http://www.computergameplayer.com with 31 (default is 5) simultaneous connections took 24 seconds to *analyze content* of 422 URLs. That is about 17.5 pages / second. Without too much effort configuring settings / stop listening to online radio etc. I could probably tweak that speed up (Or maybe set to 100 simultaneous connections in registered version. Not something I recommend though.) Also doesn't help the website is hosted on other side of the Atlantic sea. So overall that's a good example :)

If you actually had A1SG crawling for 3 days straight you either have:
1) buggy website
2) huge website

Possibly your computer crashed because it ran out of memory although it's pretty drastic :(

If you wish to use resume functionality, you should read documentation:
http://www.microsystools.com/products/sitemap-generator/help/sitemap-generator-resume-scan/

If you wish to increase amount of URLs A1SG can keep in memory while scanning:
http://www.microsystools.com/products/sitemap-generator/help/creating-sitemaps-large-websites/

But if your website is really THAT big... Then you probably need to get a custom solution for your website. (Something that reads your database? directly which is infinitely easier.) Maybe find a plugin?

If you believe your website is mid-size, say e.g. 10000-100000 URLs (or more for that matter) then normally there would not be any problem. A common reason for troubles like yours would be if your website is generating new URLs dynamicly (possibly even returning "200, All OK" response to those URLs). That leads to crawl continues forever. If you are interested in pursuing this, I will be glad to help you :)


Otherwise I wish you well in finding another solution more suitable for you! :)
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

camaro92

The website itself is fairly small but has a phpbb forum which consist of over 10,000 users and over 130,000 Posts.. I now set it for 31 connections but going on 5 hours and it's still going..

Webhelpforums

Forum websites are sometimes due to database load (since each page request execute the same SQL queries against the database backend)

However, I recommend you first check how to *best* use resume:
http://www.microsystools.com/products/sitemap-generator/help/sitemap-generator-resume-scan/
Then stop, save your project and then use resume later.

Consider dropping extended data:
http://www.microsystools.com/products/sitemap-generator/help/creating-sitemaps-large-websites/
(saves memory although with "just" e.g. 150000 URLs it *should* most often *not* be necessary)

...

Also, you may be able to cut down on URLs. E.g. if you don't need member pages? you could add URL filters for both *output* and *analysis*. Doing that will speed up your crawl and save memory. (Same goes if you can avoid e.g. duplicate URLs you don't really need to *analyze* and have *output* to sitemap)

http://www.microsystools.com/products/sitemap-generator/help/website-crawler-output-filters/
http://www.microsystools.com/products/sitemap-generator/help/website-crawler-scanner-filters/

(remember add URLs to both)

...

What I do when I need to handle such big websites that also may create multiple unwanted URLs leading to same content etc. I first take a few test scans (e.g. 1000 URLs) and see if there is something I don't want. It avoids situations where a scan takes forever and forever because of some unknown reason.

...

I am considering adding some more presets for common websites, e.g. wordpress, phpbb etc. Maybe I should prioritize getting those done :)
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

camaro92

One of your competitors has a feature where you select if you are running phpbb and in automatically inserts the following to bypass the endless URLs that phpbb and other forums like to use.


Quote
Exclude URLs:
p=
mode=
mark=
order=
highlight=
profile.php
privmsg.php
posting.php
view=previous
view=next
search.php


Do not parse URLs: view=print

Can your program do this? Thanks


(Note edited your quoted text to less to avoid forum infringing on any possible content/copyright. Just to be safe!)

Webhelpforums

Sure. You won't find anything more flexible for filtering than A1 Sitemap Generator. Well, my opinion anyways, but please do check out the output+analysis filters documentation links in my earlier post :)

Only thing is that I don't have a preset for phpbb :)
But really, only thing you need do in A1SG compared to default settings is to add those paths listed to the output + analysis filters. That's it :) there's just no preset for it at present.


EDIT:
I will have something in 3.1.2 beta (!) later today :)
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Webhelpforums

You can find new beta of 3.1.2 here:
http://www.microsystools.com/products/sitemap-generator/betas.php

There's now a "phpBB" preset in:
Scan website | Quick presets... button

It seems to work in  a (very quick) test I made. If you have problems, feel free to write/PM/email your website address :)
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

camaro92

I tried the beta version but I don't see anything new when you select the phpbb quick-set. None of the toggles are changed, it just seems like an unlinked form button that doesnt do anything.

How we we know what was even selected? I don't see anything indicating what was selected or what changes it made, if any.

Webhelpforums

If you click the phpBB preset button,
the exclude section in both analysis + output filters get added a lot of items.

That way, you don't need to add them yourself.


All done was click the phpBB item in the Presets... menu.
As you can see Output filters got lots of new excluded items added.
I also tested it actually work on a phpBB forum, but if you have any particular in mind, feel free to let me know the URL.
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

More About Our Webmaster Tools for Windows and Mac

HTML, image, video and hreflang XML sitemap generatorA1 Sitemap Generator
      
website analysis spider tool for technical SEOA1 Website Analyzer
      
SEO tools for managing keywords and keyword listsA1 Keyword Research
      
complete website copier toolA1 Website Download
      
create custom website search enginesA1 Website Search Engine
      
scrape data into CSV, SQL and databasesA1 Website Scraper