LWP (Library for WWW in Perl)
If you want to automatically process web pages to extract data, you have a number of tools available. You can bring a web page down to your computer using “curl” or “wget”
curl http:.//aplawrence.com > mysite
If you don’t really want the html, use “lynx –dump http://whatever.com > /yourstorage/whatever.txt” to get a text representation of the page. Check the man page for options you might want like “–nolist” and also see lynx alternatives
You can also easily be selective and pull only the data you want from a page with simple Perl scripts.
$url = 'http://aplawrence.com";
$content = get $url;
And then of course you’d process the $content as desired. It’s only a little more complex if you are dealing with forms; see http://aplawrence.com/Words/2005_03_05.html for a small example of that.
A book that covers LWP is reviewed at http://aplawrence.com/Books/webc.html.
*Originally published at APLawrence.com
A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com