为什么要用MapReduce_广州|网站建设|网页设计|网站设计|

0 Comments

为什么要用MapReduce

发布于：2012-12-04 | 作者：广州网站建设 | 已聚集：人围观

MapReduce的流行是有理由的。它非常简单、易于实现且扩展性强。大家可以通过它轻易地编写出同时在多台主机上运行的程序，也可以使用Ruby、Python、PHP和C++等非Java类语言编写Map或Reduce程序，还可以在任何安装Hadoop的集群中运行同样的程序，不论这个集群有多少台主机。MapReduce适合处理海量数据，因为它会被多台主机同时处理，这样通常会有较快的速度。

下面来看一个例子。广州网站建设

引文分析是评价论文好坏的一个非常重要的方面，本例只对其中最简单的一部分，即论文的被引用次数进行了统计。假设有很多篇论文（百万级），且每篇论文的引文形式如下所示：

References
David M. Blei, Andrew Y. Ng, and Michael I. Jordan.
2003. Latent dirichlet allocation. Journal of Machine
Learning Research, 3:993–1022.
Samuel Brody and Noemie Elhadad. 2010. An unsupervised
aspect-sentiment model for online reviews. In
NAACL '10.
Jaime Carbonell and Jade Goldstein. 1998. The use of
mmr, diversity-based reranking for reordering documents
and producing summaries. In SIGIR '98, pages
335–336.
Dennis Chong and James N. Druckman. 2010. Identifying
frames in political news. In Erik P. Bucy and
R. Lance Holbert, editors, Sourcebook for Political
Communication Research: Methods, Measures, and
Analytical Techniques. Routledge.
Cindy Chung and James W. Pennebaker. 2007. The psychological
function of function words. Social Communication:
Frontiers of Social Psychology, pages 343–
359.
G¨unes Erkan and Dragomir R. Radev. 2004. Lexrank:
graph-based lexical centrality as salience in text summarization.
J. Artif. Int. Res., 22(1):457–479.
Stephan Greene and Philip Resnik. 2009. More than
words: syntactic packaging and implicit sentiment. In
NAACL '09, pages 503–511.
Aria Haghighi and Lucy Vanderwende. 2009. Exploring
content models for multi-document summarization. In
NAACL '09, pages 362–370.
Sanda Harabagiu, Andrew Hickl, and Finley Lacatusu.
2006. Negation, contrast and contradiction in text processing.

在单机运行时，想要完成这个统计任务，需要先切分出所有论文的名字存入一个Hash表中，然后遍历所有论文，查看引文信息，一一计数。因为文章数量很多，需要进行很多次内外存交换，这无疑会延长程序的执行时间。但在MapReduce中，这是一个WordCount就能解决的问题。

标签：网站建设(5654)广州网站设计(5620)广州网站建设(4075)广州网页设计(1868)

[相关日志]

一年涨粉几十万，这些公众号的	花式共享还是变相租赁，共享经济
设计师是有多无聊才设计出了这些	我实在想看看苹果到底敢不敢下架
奇葩电影名翻译，难以想象它们是	谷歌设计师教你做情感化设计

思洋互动网站建设学前班

为什么要用MapReduce

[相关日志]

热门日志

标签云

友情链接

思洋互动网站建设学前班

为什么要用MapReduce

[相关日志]

搜索

热门日志

标签云

友情链接