How to restrict the scanning to only the URLs imported from text file?

  • 1 Replies
  • 2737 Views
*

mediterrano

  • Newbie
  • *
  • 1
  • +0/-0
    • View Profile
I import URLs to be scanned from file

File contains the below URLs:
http://www.swissfirms.ch/details.aspx?ENOFI=01301085
http://www.swissfirms.ch/details.aspx?ENOFI=02802247


The resulting scraped.csv contains:
http://www.swissfirms.ch/details.aspx?enofi=01301085#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=de-ch#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=de-ch
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=en-us#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=en-us
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=fr-ch#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=fr-ch
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=it-ch#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=it-ch
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=zh-cn#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=zh-cn
http://www.swissfirms.ch/details.aspx?enofi=01301085
http://www.swissfirms.ch/details.aspx?enofi=02802247#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=de-ch#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=de-ch
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=en-us#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=en-us
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=fr-ch#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=fr-ch
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=it-ch#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=it-ch
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=zh-cn#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=zh-cn
http://www.swissfirms.ch/details.aspx?enofi=02802247


But I want the scraped.csv to contain only the specified URLs:
http://www.swissfirms.ch/details.aspx?ENOFI=01301085
http://www.swissfirms.ch/details.aspx?ENOFI=02802247


How can I achieve this?

for screenshots of all relevant settings pages, just use the below DropBox-link
https://www.dropbox.com/sh/397h1wr8bdp3ank/AABNU9zayaBfjeYxrFUC3_SBa?dl=0
« Last Edit: March 26, 2015, 06:18:29 PM by mediterrano »

*

Webhelpforums

  • Administrator
  • Hero Member
  • *****
  • 1301
  • +6/-0
  • Shared between Microsys, WebHelpForums and helpers
    • View Profile
    • Webmaster and Website Help Forums
If I understand you correctly, your problem is that A1 Website Scraper scrapes from otherURLs you do not want scraped?

You have to limit the "analysis" and "output" to the wanted URLs.

See the help page for importing:
http://www.microsystools.com/products/website-scraper/help/scrape-content-pages-list/

Quote
If you only want the imported URLs checked/analyzed, tick the recrawl option.
If you do not use this option, A1 Website Scraper will perform a full crawl starting from the website root.
It is crucial to use this option in case:
You set limit include in analysis filters, e.g. by using the button as shown above.
You only want external URLs checked and/or analyzed.

If you continue to have problems, send and email with your project file:
http://www.microsystools.com/home/contact.php
« Last Edit: March 29, 2015, 04:51:26 PM by Webhelpforums »
MicrosysTools.com | Website and SEO Software for webmasters | A1 Sitemap Generator, A1 Website Analyzer etc.

 




See Our Webmaster Tools for Windows and Mac

A1 Sitemap Generator
      
A1 Website Analyzer
      
A1 Keyword Research
      
A1 Website Download
      
A1 Website Search Engine
      
A1 Website Scraper