thank-you pages unnecessary exposed?

Started by eran_more, July 07, 2014, 01:08:40 AM

eran_more

Hi,
I have thank-you pages which should not be 'exposed' to crawlers.
But still, mycrosys crawling 'finds' them.
Can you please explain why?
Here is one example: " http://www.example.com/thank-you-payment-one-day "
By the way, it's a Wordpress website - are there like 'hidden directories' back-doors?
Thanks,
Eran.

Webhelpforums

#1
Before you crawl your website, switch off "easy mode":
http://www.microsystools.com/products/sitemap-generator/help/easy-sitemap-generator-mode/

Then in "Scan website | Crawler options" uncheck
"Apply webmaster and output filters after website scan stops"

After the scan, you can then see that:

  • http://www.example.com/thank-you-payment-one-day

is used by:

  • http://www.example.com/thank-you-payment-mini
  • http://www.example.com/thank-you-payment-one-time

To investigate further see help page:
http://www.microsystools.com/products/sitemap-generator/help/sitemaps-generator-analyze-links/
and be sure to check "linked-by", "used-by" and "redirected-by" of each.

For reference, note that e.g. http://www.example.com/thank-you-payment-mini has code:
<meta name="robots" content="noindex,follow,noarchive,noodp,noydir" />

Which is why A1SG removes URL after scan when using default setings, but does follow links/references.
Those URLs among other things use prev/next (i.e. link tag which A1SG considers a kind of "use" and not "link" which is why you will find such references in "uses" and "used-by" tabs when analyzing internal linking)

...

Alternatively, you can enable logging in  "Scan website | Data collection"
and then reduce worker threads to one in "Scan website | Crawler engine"

That will slow the scan a lot through, but you can search results afterwards in a  text file.

...

And to answer your question, no, A1 Sitemap Generator does not utilize any "hidden" techniques to uncover URLs. It simply follows links and references to URLs :)
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

eran_more

Thank you for your quick reply!
It is probably the 'rel prev' and 'rel next' which linked those pages.

More About Our Webmaster Tools for Windows and Mac