Webmaster Forums - Website and SEO Help

Microsys Products and Webmaster Tools => A1 Website Analyzer => Topic started by: togfather on December 26, 2016, 06:46:34 PM

Title: only crawl and show robot allowed pages + show source of duplicated pages links.
Post by: togfather on December 26, 2016, 06:46:34 PM
Hello. 

Is there a tutorial for this software?

How do I get a1 website analyser to stop crawling pages disallowed in the robots.txt file?

Also, when listing pages with identical titles, can I get it to show me which pages link to the duplicated pages?

Many thanks

Tog
Title: Re: only crawl and show robot allowed pages + show source of duplicated pages links.
Post by: Webhelpforums on December 27, 2016, 09:04:01 AM
A1 Website Analyzer will not per default crawl pages disallowed in robots.txt

However, it will show them post scan since they were discovered.  (They will have a flag, so you can see the URLs were disallowed in robots.txt)

This behavior is configurable. If you want to have such URLs removed, before scan enable option:
Scan website | Crawler options | Apply "webmaster" and "output filters" after website scan stops.

See also:
http://www.microsystools.com/products/website-analyzer/help/crawl-robots-noindex-nofollow/ (http://www.microsystools.com/products/website-analyzer/help/crawl-robots-noindex-nofollow/)

...

To get started using A1 Website Analyzer see:
http://www.microsystools.com/products/website-analyzer/help/site-analysis-seo-audit/ (http://www.microsystools.com/products/website-analyzer/help/site-analysis-seo-audit/)

Search Engine People also written various tutorials - listed here among other guides written by users:
http://www.microsystools.com/products/website-analyzer/help/seo-analysis-guides/ (http://www.microsystools.com/products/website-analyzer/help/seo-analysis-guides/)

Title: Re: only crawl and show robot allowed pages + show source of duplicated pages links.
Post by: togfather on December 28, 2016, 05:46:22 AM
Thank you very much.

That was very helpful and I have now got the pages covered by robots.txt removed.

best wishes

Tog