thank-you pages unnecessary exposed?

  • 2 Replies
  • 1420 Views
*

eran_more

  • Newbie
  • *
  • 2
  • +0/-0
    • View Profile
thank-you pages unnecessary exposed?
« on: July 07, 2014, 01:08:40 AM »
Hi,
I have thank-you pages which should not be 'exposed' to crawlers.
But still, mycrosys crawling 'finds' them.
Can you please explain why?
Here is one example: " http://www.example.com/thank-you-payment-one-day "
By the way, it's a Wordpress website - are there like 'hidden directories' back-doors?
Thanks,
Eran.
« Last Edit: July 08, 2014, 11:38:14 AM by Webhelpforums »

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1364
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: thank-you pages unnecessary exposed?
« Reply #1 on: July 08, 2014, 04:44:36 AM »
Before you crawl your website, switch off "easy mode":
http://www.microsystools.com/products/sitemap-generator/help/easy-sitemap-generator-mode/

Then in "Scan website | Crawler options" uncheck
"Apply webmaster and output filters after website scan stops"

After the scan, you can then see that:
  • http://www.example.com/thank-you-payment-one-day

is used by:
  • http://www.example.com/thank-you-payment-mini
  • http://www.example.com/thank-you-payment-one-time

To investigate further see help page:
http://www.microsystools.com/products/sitemap-generator/help/sitemaps-generator-analyze-links/
and be sure to check "linked-by", "used-by" and "redirected-by" of each.

For reference, note that e.g. http://www.example.com/thank-you-payment-mini has code:
Code: [Select]
<meta name="robots" content="noindex,follow,noarchive,noodp,noydir" />
Which is why A1SG removes URL after scan when using default setings, but does follow links/references.
Those URLs among other things use prev/next (i.e. link tag which A1SG considers a kind of "use" and not "link" which is why you will find such references in "uses" and "used-by" tabs when analyzing internal linking)

...

Alternatively, you can enable logging in  "Scan website | Data collection"
and then reduce worker threads to one in "Scan website | Crawler engine"

That will slow the scan a lot through, but you can search results afterwards in a  text file.

...

And to answer your question, no, A1 Sitemap Generator does not utilize any "hidden" techniques to uncover URLs. It simply follows links and references to URLs :)
« Last Edit: July 08, 2014, 11:37:58 AM by Webhelpforums »
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

eran_more

  • Newbie
  • *
  • 2
  • +0/-0
    • View Profile
Re: thank-you pages unnecessary exposed?
« Reply #2 on: July 08, 2014, 11:07:04 AM »
Thank you for your quick reply!
It is probably the 'rel prev' and 'rel next' which linked those pages.