Quantcast

LWP (Library for WWW in Perl)

Get the WebProNews Newsletter:
[ Business]

If you want to automatically process web pages to extract data, you have a number of tools available. You can bring a web page down to your computer using “curl” or “wget”

curl http:.//aplawrence.com > mysite

If you don’t really want the html, use “lynx –dump http://whatever.com > /yourstorage/whatever.txt” to get a text representation of the page. Check the man page for options you might want like “–nolist” and also see lynx alternatives

You can also easily be selective and pull only the data you want from a page with simple Perl scripts.

#!/usr/bin/perl
use LWP::Simple;
$url = 'http://aplawrence.com";
$content = get $url;
print $content;

And then of course you’d process the $content as desired. It’s only a little more complex if you are dealing with forms; see http://aplawrence.com/Words/2005_03_05.html for a small example of that.

A book that covers LWP is reviewed at http://aplawrence.com/Books/webc.html.

*Originally published at APLawrence.com

A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com

LWP (Library for WWW in Perl)
Top Rated White Papers and Resources
  • http://rowanlandscaping.com/questions/home-repair-questions home repair

    Another great article AP

  • Join for Access to Our Exclusive Web Tools
  • Sidebar Top
  • Sidebar Middle
  • Sign Up For The Free Newsletter
  • Sidebar Bottom