<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>WebProNews &#187; MapReduce</title>
	<atom:link href="http://www.webpronews.com/tag/mapreduce/feed" rel="self" type="application/rss+xml" />
	<link>http://www.webpronews.com</link>
	<description>Breaking News in Tech, Search, Social, &#38; Business</description>
	<lastBuildDate>Mon, 13 Feb 2012 04:32:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Google Software Engineer Explains MapReduce Concept</title>
		<link>http://www.webpronews.com/google-software-engineer-explains-mapreduce-concept-2008-01</link>
		<comments>http://www.webpronews.com/google-software-engineer-explains-mapreduce-concept-2008-01#comments</comments>
		<pubDate>Thu, 24 Jan 2008 15:08:30 +0000</pubDate>
		<dc:creator>Philipp Lenssen</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[developers]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.webpronews.com/?p=43656</guid>
		<description><![CDATA[<p>Mark Chu-Carroll is a Google software engineer, and in a <a href="http://scienceblogs.com/goodmath/2008/01/databases_are_hammers_mapreduc.php">personal blog post</a> this week explained the concept of one of Google&#8217;s programming models: MapReduce, which splits a task onto many computers on Google&#8217;s server farm (server farm, or single super computer, depending on how you look at it) to be quickly crunched.</p>]]></description>
			<content:encoded><![CDATA[<p>Mark Chu-Carroll is a Google software engineer, and in a <a href="http://scienceblogs.com/goodmath/2008/01/databases_are_hammers_mapreduc.php">personal blog post</a> this week explained the concept of one of Google&rsquo;s programming models: MapReduce, which splits a task onto many computers on Google&rsquo;s server farm (server farm, or single super computer, depending on how you look at it) to be quickly crunched.</p>
<blockquote><p>What is MapReduce? What does it do?</p>
<p> Suppose you&rsquo;re at work, and you need to do something that&rsquo;s going to take a long time to run on your computer. You don&rsquo;t want to wait. But you don&rsquo;t want to go out and spend a couple of million dollars buying a supercomputer. How do you make it run faster? One way is buy a whole bunch of cheap machines, and make it run on all of them at once. Another is to notice that your office has lots of computers &ndash; pretty much every office has a computer on the desk of every employee. And at any given moment, most of those computers aren&rsquo;t doing much. So why not take advantage of that? When your machine isn&rsquo;t doing much, you let you coworkers borrow the capability you&rsquo;re not using; when you need to do something, you can borrow their machines. So when you need to run something big, you can easily find a pool of a dozen machines.</p>
<p> The problem with that approach is that most programs aren&rsquo;t written to run on a dozen machines. They&rsquo;re written to run on one machine. To split a hard task among a lot of computers is hard.</p>
<p> MapReduce is a library that lets you adopt a particular, stylized way of programming that&rsquo;s easy to split among a bunch of machines. The basic idea is that you divide the job into two parts: a Map, and a Reduce. Map basically takes the problem, splits it into sub-parts, and sends the sub-parts to different machines &ndash; so all the pieces run at the same time. Reduce takes the results from the sub-parts and combines them back together to get a single answer.</p>
<p> The key to how MapReduce does things is to take input as, conceptually, a list of records. The records are split among the different machines by the map. The result of the map computation is a list of key/value pairs. Reduce takes each set of values that has the same key, and combines them into a single value. So Map takes a set of data chunks, and produces key/value pairs; reduce merges things, so that instead of a set of key/value pair sets, you get one result. You can&rsquo;t tell whether the job was split into 100 pieces or 2 pieces; the end result looks pretty much like the result of a single map.<br /> <br type="_moz" /></p></blockquote>
<p>Mark adds that &ldquo;The beauty of MapReduce is that it&rsquo;s easy to write.&rdquo; and that MapReduce (or &ldquo;M/R&rdquo;) programs are &ldquo;really as easy as parallel programming ever gets.&rdquo; For a more in-depth look at MapReduce and some actual source code, take a look at the <a href="http://labs.google.com/papers/mapreduce.html">Google research publication</a> on the subject.</p>
<p class="via">[Via <a href="http://friendfeed.com/">Friendfeed</a>. Mark&rsquo;s post is <a href="http://creativecommons.org/licenses/by-nc-nd/2.5/">Creative Commons-licensed</a>.]</p>
<p><a title="Comment on MapReduce" href="http://blogoscoped.com/forum/create/7570/"> Comments</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.webpronews.com/google-software-engineer-explains-mapreduce-concept-2008-01/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using memcached
Database Caching 1/9 queries in 0.005 seconds using memcached
Object Caching 199/210 objects using memcached

Served from: webpronews.com @ 2012-02-13 05:20:38 -->
