Semicolons in URL killed by Sitemap Generator

Started by kuguy, June 29, 2010, 01:03:37 PM


Just started using A1 Sitemap Generator (2.2.0) on a trial basis. Seems complicated *and* powerful. But I'm having a problem with semicolons (;). The site I'm mapping has some URLs of the form;text;

I have unchecked the crawler option Cutout "?" but the above address is reported by the Generator as

That is, the semicolons have been removed. This causes much trouble and surprises me because I believe that the semicolon is a legal sub-delimiter. Have I missed something in the program?



Is it possible for you email me your website address?

I will take a look at the problem either way, but if I can test on a website, it always helps :)

(posting answer here as well in addition to email conversation)

Some code in my URL encoding routine (which works by splitting parameters and encoding them) is what converts;text;

I am not 100% sure if A1SG is wrong about "fixing" the URL and will need refresh my memory, but e.g. would be an extremely uncommon URL as well. But possibly semicolons supports different usage than ampersand, I will need check some specs :)

That said, URL encoding code should not change the URL unnecessary, so I am thinking about reworking it and then add the "URL fix" into separate code and option.

you can find the URL encoding option in:
Scan website | Crawler options | URL encode query params

This issue will be be fixed in version 2.2.2 soon released :)

Thanks for reporting!

I may later add an option later that removes empty/superfluous & ;
e.g. converts "?&" to "?" and so on.

But URL encoding algorithm was never meant to do that! :) What happened was that the routine would split up the GET URL and check all values to see if they should be URL encoded. The rebuilding of the URL then ignored empty/superfluous stuff.
