thank-you pages unnecessary exposed?

eran_more · July 07, 2014, 01:08:40 AM

Hi,
I have thank-you pages which should not be 'exposed' to crawlers.
But still, mycrosys crawling 'finds' them.
Can you please explain why?
Here is one example: " http://www.example.com/thank-you-payment-one-day "
By the way, it's a Wordpress website - are there like 'hidden directories' back-doors?
Thanks,
Eran.

Webhelpforums · July 08, 2014, 04:44:36 AM

Before you crawl your website, switch off "easy mode":
http://www.microsystools.com/products/sitemap-generator/help/easy-sitemap-generator-mode/

Then in "Scan website | Crawler options" uncheck
"Apply webmaster and output filters after website scan stops"

After the scan, you can then see that:

http://www.example.com/thank-you-payment-one-day

is used by:

http://www.example.com/thank-you-payment-mini
http://www.example.com/thank-you-payment-one-time

To investigate further see help page:
http://www.microsystools.com/products/sitemap-generator/help/sitemaps-generator-analyze-links/
and be sure to check "linked-by", "used-by" and "redirected-by" of each.

For reference, note that e.g. http://www.example.com/thank-you-payment-mini has code:

Code Select

<meta name="robots" content="noindex,follow,noarchive,noodp,noydir" />

Which is why A1SG removes URL after scan when using default setings, but does follow links/references.
Those URLs among other things use prev/next (i.e. link tag which A1SG considers a kind of "use" and not "link" which is why you will find such references in "uses" and "used-by" tabs when analyzing internal linking)

...

Alternatively, you can enable logging in "Scan website | Data collection"
and then reduce worker threads to one in "Scan website | Crawler engine"

That will slow the scan a lot through, but you can search results afterwards in a text file.

...

And to answer your question, no, A1 Sitemap Generator does not utilize any "hidden" techniques to uncover URLs. It simply follows links and references to URLs

eran_more · July 08, 2014, 11:07:04 AM

Thank you for your quick reply!
It is probably the 'rel prev' and 'rel next' which linked those pages.

See Our Webmaster Tools for Windows and Mac

thank-you pages unnecessary exposed?

eran_more

Webhelpforums

eran_more