Webmaster Forums - Website and SEO Help
Microsys Products and Webmaster Tools => A1 Website Scraper => Topic started by: mediterrano on March 26, 2015, 06:16:23 PM
-
I import URLs to be scanned from file
File contains the below URLs:
http://www.example.com/details.aspx?ENOFI=01301085
http://www.example.com/details.aspx?ENOFI=02802247
The resulting scraped.csv contains:
http://www.example.com/details.aspx?enofi=01301085#
http://www.example.com/details.aspx?enofi=01301085&lang=de-ch#
http://www.example.com/details.aspx?enofi=01301085&lang=de-ch
http://www.example.com/details.aspx?enofi=01301085&lang=en-us#
http://www.example.com/details.aspx?enofi=01301085&lang=en-us
http://www.example.com/details.aspx?enofi=01301085&lang=fr-ch#
http://www.example.com/details.aspx?enofi=01301085&lang=fr-ch
http://www.example.com/details.aspx?enofi=01301085&lang=it-ch#
http://www.example.com/details.aspx?enofi=01301085&lang=it-ch
http://www.example.com/details.aspx?enofi=01301085&lang=zh-cn#
http://www.example.com/details.aspx?enofi=01301085&lang=zh-cn
http://www.example.com/details.aspx?enofi=01301085
http://www.example.com/details.aspx?enofi=02802247#
http://www.example.com/details.aspx?enofi=02802247&lang=de-ch#
http://www.example.com/details.aspx?enofi=02802247&lang=de-ch
http://www.example.com/details.aspx?enofi=02802247&lang=en-us#
http://www.example.com/details.aspx?enofi=02802247&lang=en-us
http://www.example.com/details.aspx?enofi=02802247&lang=fr-ch#
http://www.example.com/details.aspx?enofi=02802247&lang=fr-ch
http://www.example.com/details.aspx?enofi=02802247&lang=it-ch#
http://www.example.com/details.aspx?enofi=02802247&lang=it-ch
http://www.example.com/details.aspx?enofi=02802247&lang=zh-cn#
http://www.example.com/details.aspx?enofi=02802247&lang=zh-cn
http://www.example.com/details.aspx?enofi=02802247
But I want the scraped.csv to contain only the specified URLs:
http://www.example.com/details.aspx?ENOFI=01301085
http://www.example.com/details.aspx?ENOFI=02802247
How can I achieve this?
for screenshots of all relevant settings pages, just use the below DropBox-link
https://www.dropbox.com/sh/397h1wr8bdp3ank/AABNU9zayaBfjeYxrFUC3_SBa?dl=0
-
If I understand you correctly, your problem is that A1 Website Scraper scrapes from otherURLs you do not want scraped?
You have to limit the "analysis" and "output" to the wanted URLs.
See the help page for importing:
http://www.microsystools.com/products/website-scraper/help/scrape-content-pages-list/ (http://www.microsystools.com/products/website-scraper/help/scrape-content-pages-list/)
If you only want the imported URLs checked/analyzed, tick the recrawl option.
If you do not use this option, A1 Website Scraper will perform a full crawl starting from the website root.
It is crucial to use this option in case:
You set limit include in analysis filters, e.g. by using the button as shown above.
You only want external URLs checked and/or analyzed.
If you continue to have problems, send and email with your project file:
http://www.microsystools.com/home/contact.php (http://www.microsystools.com/home/contact.php)