See Our Webmaster Tools for Windows and Mac

                                  

How to include linked pages, same site but other dir

Started by spiweb, January 28, 2012, 02:01:35 PM

spiweb

Hi there,

I am trying A1 Website Download 3.4.8 trial on Windows Vista, and I would like to download (for offline browsing) some pages from a web site where I contribute some content.
Let's say the site is www.example.org and I am interested in www.example.org/users/spiweb/ , so I set this URL as directory path in Scan website / Paths / Website ...

The scan works fine and correctly downloads a number of pages, let's say
www.example.org/users/spiweb/list_of_pages-A
www.example.org/users/spiweb/list_of_pages-B
www.example.org/users/spiweb/list_of_pages-C
and so on.

But then, I am also interested in the pages linked in listA, listB, listC, etc.,
Let's say listA links to
www.example.org/content/page3124
www.example.org/content/page6349
www.example.org/content/page7420
www.example.org/content/page9875

listB links to
www.example.org/content/page1130
www.example.org/content/page5434

and so on...

So I would like A1 Website Downloader to retrieve those HTML pages too, and the JPG images linked in there.

The problem is, they are on the same site BUT NOT in the /users/spiweb/ path.

Besides, I only want those specific pages, not the whole www.example.org/content/ directory, which is very large.


How do I do that?

Thank you!


Webhelpforums

If you know what URL "areas" you want before starting the scan, you can do what you need by:

1) Disable "Easy mode"
http://www.microsystools.com/products/website-download/help/easy-website-download-mode/

2) Set root to www.example.org/

3) Set analysis filters (which URLs to analyze for links)
http://www.microsystools.com/products/website-download/help/website-crawler-scanner-filters/

4) Possibly set up more Start search paths:
http://www.microsystools.com/products/website-download/help/root-aliases-start-paths/

5) Set output filters (pages downloaded to disk)
http://www.microsystools.com/products/website-download/help/website-crawler-output-filters/

If you have very advanced needs, you will benefit from learning the basics of regular expressions since both the "limit-to" and "exclude" filtering options for both "analysis" and "output" filters support regex.
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

spiweb

Thanks a lot for your explanation (and sorry for this late reply of mine), ...but I am a bit lost! I probably should try and study the help pages more than I did, but I couldn't find a solution so far. Anyway, you say "Set root to www.example.org/", but that would mean thousands of pages to scan in my case. I only need to download the pages in my personal area, let's say www.example.org/spiweb (and that's easy), PLUS the pages in www.example.org that are directly linked from a page of mine (and that I don't understand).
Yes I know RegEx, but I don't know how to apply them in my case. In Analysis Filters I can limit to or exclude URLs by using a RegEx, but I don't have a specific section of the site to download (apart from my own section), but just any page in the site directly linked from my pages. How do I do that? :)
Thanks again!

Webhelpforums

Can you email support with some exact URL examples?
http://www.microsystools.com/home/contact.php

...

Can't you either limit-include-to and/or exclude URLs using regular expressions?

Do also remember there's both "analysis" and "output" (a.k.a download-to-disk) filters.




TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.





recommended related video to How to include linked pages, same site but other dir
Note: Check our video related to keywords in "How to include linked pages, same site but other dir" on YouTube.