How to restrict the scanning to only the URLs imported from text file?

Started by mediterrano, March 26, 2015, 06:16:23 PM

mediterrano

I import URLs to be scanned from file

File contains the below URLs:
http://www.example.com/details.aspx?ENOFI=01301085
http://www.example.com/details.aspx?ENOFI=02802247


The resulting scraped.csv contains:
http://www.example.com/details.aspx?enofi=01301085#
http://www.example.com/details.aspx?enofi=01301085&lang=de-ch#
http://www.example.com/details.aspx?enofi=01301085&lang=de-ch
http://www.example.com/details.aspx?enofi=01301085&lang=en-us#
http://www.example.com/details.aspx?enofi=01301085&lang=en-us
http://www.example.com/details.aspx?enofi=01301085&lang=fr-ch#
http://www.example.com/details.aspx?enofi=01301085&lang=fr-ch
http://www.example.com/details.aspx?enofi=01301085&lang=it-ch#
http://www.example.com/details.aspx?enofi=01301085&lang=it-ch
http://www.example.com/details.aspx?enofi=01301085&lang=zh-cn#
http://www.example.com/details.aspx?enofi=01301085&lang=zh-cn
http://www.example.com/details.aspx?enofi=01301085
http://www.example.com/details.aspx?enofi=02802247#
http://www.example.com/details.aspx?enofi=02802247&lang=de-ch#
http://www.example.com/details.aspx?enofi=02802247&lang=de-ch
http://www.example.com/details.aspx?enofi=02802247&lang=en-us#
http://www.example.com/details.aspx?enofi=02802247&lang=en-us
http://www.example.com/details.aspx?enofi=02802247&lang=fr-ch#
http://www.example.com/details.aspx?enofi=02802247&lang=fr-ch
http://www.example.com/details.aspx?enofi=02802247&lang=it-ch#
http://www.example.com/details.aspx?enofi=02802247&lang=it-ch
http://www.example.com/details.aspx?enofi=02802247&lang=zh-cn#
http://www.example.com/details.aspx?enofi=02802247&lang=zh-cn
http://www.example.com/details.aspx?enofi=02802247


But I want the scraped.csv to contain only the specified URLs:
http://www.example.com/details.aspx?ENOFI=01301085
http://www.example.com/details.aspx?ENOFI=02802247


How can I achieve this?

for screenshots of all relevant settings pages, just use the below DropBox-link
https://www.dropbox.com/sh/397h1wr8bdp3ank/AABNU9zayaBfjeYxrFUC3_SBa?dl=0

Webhelpforums

If I understand you correctly, your problem is that A1 Website Scraper scrapes from otherURLs you do not want scraped?

You have to limit the "analysis" and "output" to the wanted URLs.

See the help page for importing:
http://www.microsystools.com/products/website-scraper/help/scrape-content-pages-list/

QuoteIf you only want the imported URLs checked/analyzed, tick the recrawl option.
If you do not use this option, A1 Website Scraper will perform a full crawl starting from the website root.
It is crucial to use this option in case:
You set limit include in analysis filters, e.g. by using the button as shown above.
You only want external URLs checked and/or analyzed.

If you continue to have problems, send and email with your project file:
http://www.microsystools.com/home/contact.php
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

More About Our Webmaster Tools for Windows and Mac

HTML, image, video and hreflang XML sitemap generatorA1 Sitemap Generator
      
website analysis spider tool for technical SEOA1 Website Analyzer
      
SEO tools for managing keywords and keyword listsA1 Keyword Research
      
complete website copier toolA1 Website Download
      
create custom website search enginesA1 Website Search Engine
      
scrape data into CSV, SQL and databasesA1 Website Scraper