Webmaster Forums - Website and SEO Help

Microsys Products and Webmaster Tools => A1 Website Scraper => Topic started by: mediterrano on March 26, 2015, 06:16:23 PM

Title: How to restrict the scanning to only the URLs imported from text file?
Post by: mediterrano on March 26, 2015, 06:16:23 PM
I import URLs to be scanned from file

File contains the below URLs:
http://www.swissfirms.ch/details.aspx?ENOFI=01301085
http://www.swissfirms.ch/details.aspx?ENOFI=02802247


The resulting scraped.csv contains:
http://www.swissfirms.ch/details.aspx?enofi=01301085#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=de-ch#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=de-ch
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=en-us#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=en-us
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=fr-ch#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=fr-ch
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=it-ch#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=it-ch
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=zh-cn#
http://www.swissfirms.ch/details.aspx?enofi=01301085&lang=zh-cn
http://www.swissfirms.ch/details.aspx?enofi=01301085
http://www.swissfirms.ch/details.aspx?enofi=02802247#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=de-ch#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=de-ch
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=en-us#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=en-us
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=fr-ch#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=fr-ch
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=it-ch#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=it-ch
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=zh-cn#
http://www.swissfirms.ch/details.aspx?enofi=02802247&lang=zh-cn
http://www.swissfirms.ch/details.aspx?enofi=02802247


But I want the scraped.csv to contain only the specified URLs:
http://www.swissfirms.ch/details.aspx?ENOFI=01301085
http://www.swissfirms.ch/details.aspx?ENOFI=02802247


How can I achieve this?

for screenshots of all relevant settings pages, just use the below DropBox-link
https://www.dropbox.com/sh/397h1wr8bdp3ank/AABNU9zayaBfjeYxrFUC3_SBa?dl=0
Title: Re: How to restrict the scanning to only the URLs imported from text file?
Post by: Webhelpforums on March 29, 2015, 04:48:57 PM
If I understand you correctly, your problem is that A1 Website Scraper scrapes from otherURLs you do not want scraped?

You have to limit the "analysis" and "output" to the wanted URLs.

See the help page for importing:
http://www.microsystools.com/products/website-scraper/help/scrape-content-pages-list/ (http://www.microsystools.com/products/website-scraper/help/scrape-content-pages-list/)

Quote
If you only want the imported URLs checked/analyzed, tick the recrawl option.
If you do not use this option, A1 Website Scraper will perform a full crawl starting from the website root.
It is crucial to use this option in case:
You set limit include in analysis filters, e.g. by using the button as shown above.
You only want external URLs checked and/or analyzed.

If you continue to have problems, send and email with your project file:
http://www.microsystools.com/home/contact.php (http://www.microsystools.com/home/contact.php)