A1 Website download exclude pages

  • 1 Replies
  • 52 Views
*

ericgourmet

  • Newbie
  • *
  • 1
  • +0/-0
    • View Profile
A1 Website download exclude pages
« on: July 09, 2017, 06:47:12 AM »
Hi,
I am trying to exclude web pages based on their names beginning with the characters MS_
The pages are downloaded in different directories, each page name beginning with MS_ and html extension. How do I exclude these pages from both analysis and output?

Exemple :
dir1/MS_987.html should be excluded
dir2/MS_48732.html should be excluded
dir1/boxes.html should be included

Thanks!

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1311
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: A1 Website download exclude pages
« Reply #1 on: July 09, 2017, 06:58:34 AM »
Please see:
https://www.microsystools.com/products/website-download/help/website-download-convert-links/

Sounds like those files (MS_ + .HTML)  are those created when two different URLs will map to the same file name on disk (this is because file names on disk do now allow the same characters as URLs do)

I have contemplated a different way - but a problem I encountered with huge websites was that if I just removed illegal characters or similar "simple" method - I would receive examples of where URLs would collide when saved to disk.



General way of excluding pages from crawl:
https://www.microsystools.com/products/website-download/help/website-crawler-scanner-filters/

General way of excluding pages from final output:
https://www.microsystools.com/products/website-download/help/website-crawler-output-filters/
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

 




See Our Webmaster Tools for Windows and Mac

A1 Sitemap Generator
      
A1 Website Analyzer
      
A1 Keyword Research
      
A1 Website Download
      
A1 Website Search Engine
      
A1 Website Scraper