Hadoop之MapReduce单元测试

通常情况下，我们需要用小数据集来单元测试我们写好的map函数和reduce函数。而一般我们可以使用Mockito框架来模拟OutputCollector对象（Hadoop版本号小于0.20.0)和Context对象(大于等于0.20.0)。

下面是一个简单的WordCount例子：(使用的是新API）

在开始之前，需要导入以下包：

1.Hadoop安装目录下和lib目录下的所有jar包。

2.JUnit4

3.Mockito

map函数：

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

	private static final IntWritable one = new IntWritable(1);
	private Text word = new Text();
	
	@Override
	protected void map(LongWritable key, Text value,Context context)
			throws IOException, InterruptedException {
		
		String line = value.toString();		// 该行的内容
		String[] words = line.split(";");	// 解析该行的单词
		
		for(String w : words) {
			word.set(w);
			context.write(word,one);
		}
	}
}

reduce函数：

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

	@Override
	protected void reduce(Text key, Iterable<IntWritable> values,Context context)
			throws IOException, InterruptedException {
		int sum = 0;
		Iterator<IntWritable> iterator = values.iterator();		// key相同的值集合
		while(iterator.hasNext()) {
			int one = iterator.next().get();
			sum += one;
		}
		context.write(key, new IntWritable(sum));
	}

}

测试代码类：

public class WordCountMapperReducerTest {

	@Test
	public void processValidRecord() throws IOException, InterruptedException {
		WordCountMapper mapper = new WordCountMapper();
		Text value = new Text("hello");
		org.apache.hadoop.mapreduce.Mapper.Context context = mock(Context.class);
		mapper.map(null, value, context);
		verify(context).write(new Text("hello"), new IntWritable(1));
	}
	@Test
	public void processResult() throws IOException, InterruptedException {
		WordCountReducer reducer = new WordCountReducer();
		Text key = new Text("hello");
		// {"hello",[1,1,2]}
		Iterable<IntWritable> values = Arrays.asList(new IntWritable(1),new IntWritable(1),new IntWritable(2));
		org.apache.hadoop.mapreduce.Reducer.Context context = mock(org.apache.hadoop.mapreduce.Reducer.Context.class);
		reducer.reduce(key, values, context);
		verify(context).write(key, new IntWritable(4));		// {"hello",4}
	}
}

具体就是给map函数传入一行数据-"hello"

map函数对数据进行处理，输出{"hello",0}

reduce函数接受map函数的输出数据，对相同key的值求和，并输出。

已有 0人发表留言，猛击->> 这里<<-参与讨论

ITeye推荐

—软件人才免语言低担保赴美带薪读研！—

Hadoop之MapReduce单元测试

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本