Running a web site with hundreds or thousands of links it is essential to have automatic mechanisms for testing these links. Many tools are available for this. I am using the Perl library LWP. My page www.it-sky-consulting.com has only about 130 links, but off course I want to establish processes that scale.
The problem is that in some cases naïve automatic tests do not work very well, mostly because the web servers react differently to test scripts than to a real browser. Some examples:
- Language settings of the browser sometimes lead to language specific pages. It would be best to test with several language settings.
- Some pages result in an error code (usually 500) when accessed by a script, but work fine in a browser.
- Some servers avoid returning the error code 404 (or maybe 403) for pages that no longer exist. Instead they forward to a human readable error page with code 200, which looks ok to the script. The page forwarded to contains a friendly description of the error, which is hard (but not totally impossible) to recognize by a script. Often the name of the error page contains „404“.
- Domains are actually given up. Usually some company grabs these domains and puts their content on them, hoping to gather some part of the traffic of the former web site. This is often commercial, but might even be x-rated content.
So automatic checking of web links remains difficult and still requires some manual work. 5% of the links cause about 95% of the work.
I am interested in improving these processes in order to increase the quality of the tests and to decrease the effort.