Semicolons in URL killed by Sitemap Generator

Started by kuguy, June 29, 2010, 01:03:37 PM

kuguy

Just started using A1 Sitemap Generator (2.2.0) on a trial basis. Seems complicated *and* powerful. But I'm having a problem with semicolons (;). The site I'm mapping has some URLs of the form

  http://www.example.com/example.cgi/GLOS?;text;

I have unchecked the crawler option Cutout "?" but the above address is reported by the Generator as

  http://www.example.com/example.cgi/GLOS?text

That is, the semicolons have been removed. This causes much trouble and surprises me because I believe that the semicolon is a legal sub-delimiter. Have I missed something in the program?

Webhelpforums

Hi,


Is it possible for you email me your website address?

I will take a look at the problem either way, but if I can test on a website, it always helps :)

TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Webhelpforums

#2
(posting answer here as well in addition to email conversation)

Some code in my URL encoding routine (which works by splitting parameters and encoding them) is what converts

http://www.example.com/example.cgi/GLOS?;text;
to
http://www.example.com/example.cgi/GLOS?text;

I am not 100% sure if A1SG is wrong about "fixing" the URL and will need refresh my memory, but e.g.
http://www.example.com/example.cgi/GLOS?&text& would be an extremely uncommon URL as well. But possibly semicolons supports different usage than ampersand, I will need check some specs :)


That said, URL encoding code should not change the URL unnecessary, so I am thinking about reworking it and then add the "URL fix" into separate code and option.

you can find the URL encoding option in:
Scan website | Crawler options | URL encode query params

TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

Webhelpforums

This issue will be be fixed in version 2.2.2 soon released :)

Thanks for reporting!

I may later add an option later that removes empty/superfluous & ;
e.g. converts "?&" to "?" and so on.

But URL encoding algorithm was never meant to do that! :) What happened was that the routine would split up the GET URL and check all values to see if they should be URL encoded. The rebuilding of the URL then ignored empty/superfluous stuff.
TechSEO360 | MicrosysTools.com  | A1 Sitemap Generator, A1 Website Analyzer etc.

More About Our Webmaster Tools for Windows and Mac

HTML, image, video and hreflang XML sitemap generatorA1 Sitemap Generator
      
website analysis spider tool for technical SEOA1 Website Analyzer
      
SEO tools for managing keywords and keyword listsA1 Keyword Research
      
complete website copier toolA1 Website Download
      
create custom website search enginesA1 Website Search Engine
      
scrape data into CSV, SQL and databasesA1 Website Scraper