LWP (Library for WWW in Perl)

    May 27, 2005

If you want to automatically process web pages to extract data, you have a number of tools available. You can bring a web page down to your computer using “curl” or “wget”

curl http:.//aplawrence.com > mysite

If you don’t really want the html, use “lynx –dump http://whatever.com > /yourstorage/whatever.txt” to get a text representation of the page. Check the man page for options you might want like “–nolist” and also see lynx alternatives

You can also easily be selective and pull only the data you want from a page with simple Perl scripts.

use LWP::Simple;
$url = 'http://aplawrence.com";
$content = get $url;
print $content;

And then of course you’d process the $content as desired. It’s only a little more complex if you are dealing with forms; see http://aplawrence.com/Words/2005_03_05.html for a small example of that.

A book that covers LWP is reviewed at http://aplawrence.com/Books/webc.html.

*Originally published at APLawrence.com

A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com