How to include linked pages, same site but other dir

Started by spiweb, January 28, 2012, 02:01:35 PM

spiweb

Hi there,

I am trying A1 Website Download 3.4.8 trial on Windows Vista, and I would like to download (for offline browsing) some pages from a web site where I contribute some content.
Let's say the site is www.example.org and I am interested in www.example.org/users/spiweb/ , so I set this URL as directory path in Scan website / Paths / Website ...

The scan works fine and correctly downloads a number of pages, let's say
www.example.org/users/spiweb/list_of_pages-A
www.example.org/users/spiweb/list_of_pages-B
www.example.org/users/spiweb/list_of_pages-C
and so on.

But then, I am also interested in the pages linked in listA, listB, listC, etc.,
Let's say listA links to
www.example.org/content/page3124
www.example.org/content/page6349
www.example.org/content/page7420
www.example.org/content/page9875

listB links to
www.example.org/content/page1130
www.example.org/content/page5434

and so on...

So I would like A1 Website Downloader to retrieve those HTML pages too, and the JPG images linked in there.

The problem is, they are on the same site BUT NOT in the /users/spiweb/ path.

Besides, I only want those specific pages, not the whole www.example.org/content/ directory, which is very large.


How do I do that?

Thank you!


Webhelpforums

If you know what URL "areas" you want before starting the scan, you can do what you need by:

1) Disable "Easy mode"
http://www.microsystools.com/products/website-download/help/easy-website-download-mode/

2) Set root to www.example.org/

3) Set analysis filters (which URLs to analyze for links)
http://www.microsystools.com/products/website-download/help/website-crawler-scanner-filters/

4) Possibly set up more Start search paths:
http://www.microsystools.com/products/website-download/help/root-aliases-start-paths/

5) Set output filters (pages downloaded to disk)
http://www.microsystools.com/products/website-download/help/website-crawler-output-filters/

If you have very advanced needs, you will benefit from learning the basics of regular expressions since both the "limit-to" and "exclude" filtering options for both "analysis" and "output" filters support regex.
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

spiweb

Thanks a lot for your explanation (and sorry for this late reply of mine), ...but I am a bit lost! I probably should try and study the help pages more than I did, but I couldn't find a solution so far. Anyway, you say "Set root to www.example.org/", but that would mean thousands of pages to scan in my case. I only need to download the pages in my personal area, let's say www.example.org/spiweb (and that's easy), PLUS the pages in www.example.org that are directly linked from a page of mine (and that I don't understand).
Yes I know RegEx, but I don't know how to apply them in my case. In Analysis Filters I can limit to or exclude URLs by using a RegEx, but I don't have a specific section of the site to download (apart from my own section), but just any page in the site directly linked from my pages. How do I do that? :)
Thanks again!

Webhelpforums

Can you email support with some exact URL examples?
http://www.microsystools.com/home/contact.php

...

Can't you either limit-include-to and/or exclude URLs using regular expressions?

Do also remember there's both "analysis" and "output" (a.k.a download-to-disk) filters.




TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

More About Our Webmaster Tools for Windows and Mac

HTML, image, video and hreflang XML sitemap generatorA1 Sitemap Generator
      
website analysis spider tool for technical SEOA1 Website Analyzer
      
SEO tools for managing keywords and keyword listsA1 Keyword Research
      
complete website copier toolA1 Website Download
      
create custom website search enginesA1 Website Search Engine
      
scrape data into CSV, SQL and databasesA1 Website Scraper