A1 Website download exclude pages

  • 5 Replies
  • 265 Views
*

ericgourmet

  • Newbie
  • *
  • 2
  • +0/-0
    • View Profile
A1 Website download exclude pages
« on: July 09, 2017, 06:47:12 AM »
Hi,
I am trying to exclude web pages based on their names beginning with the characters MS_
The pages are downloaded in different directories, each page name beginning with MS_ and html extension. How do I exclude these pages from both analysis and output?

Exemple :
dir1/MS_987.html should be excluded
dir2/MS_48732.html should be excluded
dir1/boxes.html should be included

Thanks!

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1329
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: A1 Website download exclude pages
« Reply #1 on: July 09, 2017, 06:58:34 AM »
Please see:
https://www.microsystools.com/products/website-download/help/website-download-convert-links/

Sounds like those files (MS_ + .HTML)  are those created when two different URLs will map to the same file name on disk (this is because file names on disk do now allow the same characters as URLs do)

I have contemplated a different way - but a problem I encountered with huge websites was that if I just removed illegal characters or similar "simple" method - I would receive examples of where URLs would collide when saved to disk.



General way of excluding pages from crawl:
https://www.microsystools.com/products/website-download/help/website-crawler-scanner-filters/

General way of excluding pages from final output:
https://www.microsystools.com/products/website-download/help/website-crawler-output-filters/
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

ericgourmet

  • Newbie
  • *
  • 2
  • +0/-0
    • View Profile
Re: A1 Website download exclude pages
« Reply #2 on: July 29, 2017, 02:50:36 AM »
Ok, that solved my problem. I had to go in the Crawler Options and check the Cutout "?" (GET parameters) in internal links.
Thanks!

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1329
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: A1 Website download exclude pages
« Reply #3 on: July 29, 2017, 04:50:31 AM »
That will not be a favourtable solution with websites where ? parameters are an important part of URLs

e.g. fetchpage?page=about and fetchpage?page=contact

The way A1WD handles it by default ensures both pages are downloaded (but renamed because e.g. "?" can not be part of file name on Windows) + internal linking works

It is possible one could keep more of the old names and simply append the "MS_xx" part to them - would that be better? If so, I will add it to wishlist / create an option for it.
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

*

RichardBaker

  • Newbie
  • *
  • 1
  • +0/-0
    • View Profile
    • royalediting.com
Re: A1 Website download exclude pages
« Reply #4 on: September 13, 2017, 09:14:28 AM »
 Were you able to figure out how to exclude these pages from both analysis and output?

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1329
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
Re: A1 Website download exclude pages
« Reply #5 on: September 17, 2017, 11:48:53 AM »
That will not be a favourtable solution with websites where ? parameters are an important part of URLs

e.g. fetchpage?page=about and fetchpage?page=contact

The way A1WD handles it by default ensures both pages are downloaded (but renamed because e.g. "?" can not be part of file name on Windows) + internal linking works

It is possible one could keep more of the old names and simply append the "MS_xx" part to them - would that be better? If so, I will add it to wishlist / create an option for it.

Update: Option exists now and is found in "Scan website | Download options"
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

 




See Our Webmaster Tools for Windows and Mac

A1 Sitemap Generator
      
A1 Website Analyzer
      
A1 Keyword Research
      
A1 Website Download
      
A1 Website Search Engine
      
A1 Website Scraper