MapReduce的基于物品的协同过滤算法实现-物联网技术文章-傲云油气装备网

文章目录

写在前面
步骤一：根据已知用户行为列表计算用户、物品的评分矩阵

输入
输出
代码实现

Mapper类实现逻辑（Map1.java）
Reducer类实现逻辑（Red1.java）
Run主类实现逻辑（Run1.java）

步骤二：根据用户、物品的评分矩阵得到物品与物品的相似度矩阵

输入
缓存
输出
代码实现

Mapper类实现逻辑（Map2.java）
Reducer类实现逻辑（Red2.java）
Run主类实现逻辑（Run2.java）

步骤三：将用户、物品的评分矩阵转置

输入
输出
代码实现

Mapper类实现逻辑（Map3.java）
Reducer类实现逻辑（Red3.java）
Run主类实现逻辑（Run3.java）

步骤四：物品与物品的相似度矩阵 x 用户、物品的评分矩阵 = 伪推荐列表

输入
缓存
输出
代码实现

Mapper类实现逻辑（Map4.java）
Reducer类实现逻辑（Red4.java）
Run主类实现逻辑（Run4.java）

步骤五：把伪推荐列表中用户之前有过行为的元素置0

输入
缓存
输出
代码实现

Mapper类实现逻辑（Map5.java）
Reducer类实现逻辑（Red5.java）
Run主类实现逻辑（Run5.java）

写在前面

关于基于物品的协同过滤算法的算法图解请看这篇blog：推荐系统----基于物品的协同过滤，关于MapReduce的基于物品的协同过滤算法的代码实现请看下面。别看下面代码这么多，其实大部分都是靠复制粘贴搞定的，其算法的代码核心逻辑实现我觉得在于矩阵的转置和矩阵的乘法运算，步骤二涉及到的是矩阵乘法运算，步骤三涉及到的是矩阵转置，步骤四涉及到的是矩阵乘法运算。关于MapReduce关于矩阵转置和矩阵乘法运算的代码实现可以戳这篇blog：MapReduce实现矩阵乘法

步骤一：根据已知用户行为列表计算用户、物品的评分矩阵

输入

用户行为列表

其中用户行为点击1分，搜索3分，收藏5分，付款10分

用户	物品	行为
A	1	1
C	3	5
B	2	3
B	5	3
B	6	5
A	2	10
C	3	10
C	4	5
C	1	5
A	1	1
A	6	5
A	4	3

输入文件是这样的，实际就是上面的用户行为列表

输出

用户、物品的评分矩阵

步骤一的目的即是从上面已知的用户的行为列表计算得到如下用户、物品的评分矩阵

	A	B	C
1	2.0	0.0	5.0
2	10.0	3.0	0.0
3	0.0	0.0	15.0
4	3.0	0.0	5.0
5	0.0	3.0	0.0
6	5.0	5.0	0.0

得到输出文件是这样的，实际就是上面的用户、物品的评分矩阵

代码实现

Mapper类实现逻辑（Map1.java）

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map1 extends Mapper<LongWritable, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue =new Text();
   
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
            throws IOException, InterruptedException {
        
        //matrix row number
        String[] values = value.toString().split(",");
        String userID = values[0];
        String itemID = values[1];
        String score = values[2];
        
        outKey.set(itemID);
        outValue.set(userID + "_" + score);
        
        context.write(outKey, outValue);
    }
}

Reducer类实现逻辑（Red1.java）

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Red2 extends Reducer<Text, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue = new Text();
    
    protected void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
    	String itemID = key.toString();
    	
    	Map<String,Integer> map = new HashMap<String,Integer>();
    	
    	for(Text value:values){
    		String userID = value.toString().split("_")[0];
    		String score = value.toString().split("_")[1];
    		if(map.get(userID) == null){
    			map.put(userID, Integer.valueOf(score));
    		}else{
    			Integer preScore = map.get(userID);
    			map.put(userID, preScore + Integer.valueOf(score));
    		}
    	}
    	
        StringBuilder sBuilder = new StringBuilder();
        for(Map.Entry<String,Integer> entry : map.entrySet()) {
            String userID = entry.getKey();
            String score = String.valueOf(entry.getValue());
            sBuilder.append(userID + "_" + score + ",");
        }
        String line = null;
        if(sBuilder.toString().endsWith(",")) {
            line = sBuilder.substring(0,sBuilder.length()-1);
        }
        
        outKey.set(key);
        outValue.set(line);
        
        context.write(outKey, outValue);
    }
}

Run主类实现逻辑（Run1.java）

import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Run1 {
    private static String inPath = "/user/hadoop/user_matrix.txt";
    
    private static String outPath = "/user/hadoop/Tuser_matrix.txt";
    
    private static String hdfs ="hdfs://Master:9000";
    
    public int run() {
        try {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", hdfs);
            Job job = Job.getInstance(conf,"step1");
            
            job.setJarByClass(Run1.class);
            job.setMapperClass(Map1.class);
            job.setReducerClass(Red1.class);
            
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            
            FileSystem fs = FileSystem.get(conf);
            Path inputPath = new Path(inPath);
            if(fs.exists(inputPath)) {
                FileInputFormat.addInputPath(job, inputPath);
            }
            
            Path outputPath = new Path(outPath);
            fs.delete(outputPath,true);
            
            FileOutputFormat.setOutputPath(job, outputPath);
            
            return job.waitForCompletion(true)?1:-1;
        
        }catch(IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return -1;
    }
    public static void main(String[] args) {
        int result = -1;
        result = new Run1().run();
        if(result==1) {
            System.out.println("step1 success...");
        }else if(result==-1) {
            System.out.println("step1 failed...");
        }
    }
}

步骤二：根据用户、物品的评分矩阵得到物品与物品的相似度矩阵

输入

用户、物品的评分矩阵(即是步骤一的输出)

请自行翻阅上面步骤一的输出：用户、物品的评分矩阵

缓存

用户、物品的评分矩阵

输入和缓存是同样的文件

输出

物品与物品的相似度矩阵

	1	2	3	4	5	6
1	1.0	0.36	0.93	0.99	0	0.26
2	0.36	1.0	0	0.49	0.29	0.88
3	0.93	0	1.0	0.86	0	0
4	0.99	0.49	0.86	1.0	0	0.36
5	0	0.29	0	0	1.0	0.71
6	0.26	0.88	0	0.36	0.71	1.0

物品与物品的相似度矩阵 = (用户、物品的评分矩阵) x (用户、物品的评分矩阵)^T

其输出文件实际上在HDFS文件中的表示是这样的

代码实现

Mapper类实现逻辑（Map2.java）

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map2 extends Mapper<LongWritable, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue =new Text();
    private DecimalFormat df = new DecimalFormat("0.00");
    
    private List<String> cacheList = new ArrayList<String>();
   
    protected void setup(Context context)
            throws IOException, InterruptedException {
        super.setup(context);
        FileReader fr = new FileReader("itemsource1");
        BufferedReader br = new BufferedReader(fr);
        String line = null;
        while((line=br.readLine())!=null) {
            cacheList.add(line);
        }
        fr.close();
        br.close();
    }

    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
            throws IOException, InterruptedException {
        String row_matrix1 = value.toString().split("\t")[0];
        String[] column_value_array_matrix1 = value.toString().split("\t")[1].split(",");
        
        double denominator1 = 0;
        for(String column_value:column_value_array_matrix1){
        	String score = column_value.split("_")[1];
        	denominator1 += Double.valueOf(score)*Double.valueOf(score);
        	
        }
        denominator1 = Math.sqrt(denominator1);
        
        for(String line:cacheList) {
            String row_matrix2 = line.toString().split("\t")[0];
            String[] column_value_array_matrix2 = line.toString().split("\t")[1].split(",");
            
            double denominator2 = 0;
            for(String column_value:column_value_array_matrix2){
            	String score = column_value.split("_")[1];
            	denominator2 += Double.valueOf(score)*Double.valueOf(score);
            	
            }
            denominator2 = Math.sqrt(denominator2);
            
            
            int numberator = 0;
            for(String column_value_matrix1:column_value_array_matrix1) {
                String column_matrix1 = column_value_matrix1.split("_")[0];
                String value_matrix1 = column_value_matrix1.split("_")[1];
                
                for(String column_value_matrix2:column_value_array_matrix2) {
                    if(column_value_matrix2.startsWith(column_matrix1 + "_")) {
                        String value_matrix2 = column_value_matrix2.split("_")[1];
                        numberator += Integer.valueOf(value_matrix1) *Integer.valueOf(value_matrix2); 
                    }
                }
            }
            double cos = numberator / (denominator1*denominator2);
            if(cos == 0){
            	continue;
            }
            outKey.set(row_matrix1);
            outValue.set(row_matrix2+"_"+df.format(cos));
            context.write(outKey, outValue);
        }
    }
}

Reducer类实现逻辑（Red2.java）

import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Red2 extends Reducer<Text, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue = new Text();
    
    protected void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
        StringBuilder sb = new StringBuilder();
        for(Text text:values) {
            sb.append(text+",");
        }
        String line = null;
        if(sb.toString().endsWith(",")) {
            line = sb.substring(0,sb.length()-1);
        }
        
        outKey.set(key);
        outValue.set(line);
        
        context.write(outKey, outValue);
    } 
}

Run主类实现逻辑（Run2.java）

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Run2 {
    private static String inPath = "/user/hadoop/output/Tuser_matrix.txt";
    private static String outPath = "/user/hadoop/output/step2_output.txt";
    private static String cache = "/user/hadoop/output/Tuser_matrix.txt/part-r-00000";  
    private static String hdfs ="hdfs://Master:9000";   
    public int run() throws URISyntaxException {
        try {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", hdfs);
            Job job = Job.getInstance(conf,"step2");
            
            job.addCacheArchive(new URI(cache+"#itemsource1"));
            
            job.setJarByClass(Run2.class);
            job.setMapperClass(Map2.class);
            job.setReducerClass(Red2.class);
            
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            
            FileSystem fs = FileSystem.get(conf);
            Path inputPath = new Path(inPath);
            if(fs.exists(inputPath)) {
                FileInputFormat.addInputPath(job, inputPath);
            }
            
            Path outputPath = new Path(outPath);
            fs.delete(outputPath,true);
            
            FileOutputFormat.setOutputPath(job, outputPath);
            System.out.println("111111...");
            return job.waitForCompletion(true)?1:-1;
        
        } catch(IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch(URISyntaxException e) {
            e.printStackTrace();
        }
        return -1;                                                      
    }
    public static void main(String[] args) {
        try {
            int result=-1;
            result = new Run2().run();
        
            if(result == 1) {
                System.out.println("step2 success...");
            }
            else if(result == -1){
                System.out.println("step2 failed...");
            }
        } catch (URISyntaxException e) {
            e.printStackTrace();
        }
    }
}

步骤三：将用户、物品的评分矩阵转置

输入

步骤一的输出，即用户、物品的评分矩阵

输出

步骤一的输出的转置矩阵，即用户、物品的评分矩阵的转置矩阵

其输出文件实际上在HDFS文件中的表示是这样的

代码实现

Mapper类实现逻辑（Map3.java）

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map3 extends Mapper<LongWritable, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue =new Text();
    
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
            throws IOException, InterruptedException {
        String[] rowAndLine = value.toString().split("\t");
        
        String row = rowAndLine[0];
        String[] lines = rowAndLine[1].split(",");

        for(int i=0;i<lines.length;i++) {
            String column = lines[i].split("_")[0];
            String valueStr = lines[i].split("_")[1];
            //key:column value:rownumber_value
            outKey.set(column);
            outValue.set(row+"_"+valueStr);
            context.write(outKey, outValue);
        }
    }
}

Reducer类实现逻辑（Red3.java）

import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Red3 extends Reducer<Text, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue = new Text();
    
    protected void reduce(Text key, Iterable<Text> values,Context context)
            throws IOException, InterruptedException {
        StringBuilder sb = new StringBuilder();
        for(Text text:values) {
            sb.append(text+",");
        }
        String line = null;
        if(sb.toString().endsWith(",")) {
            line = sb.substring(0,sb.length()-1);
        }
        outKey.set(key);
        outValue.set(line);
        
        context.write(outKey, outValue);
    }   
}

Run主类实现逻辑（Run3.java）

import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Run3 {
    private static String inPath = "/user/hadoop/output/Tuser_matrix.txt";
    private static String outPath = "/user/hadoop/output/step3_output.txt";
    private static String hdfs ="hdfs://Master:9000";
    
    public int run() {
        try {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", hdfs);
            Job job = Job.getInstance(conf,"step3");
            
            job.setJarByClass(Run3.class);
            job.setMapperClass(Map3.class);
            job.setReducerClass(Red3.class);
            
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            
            FileSystem fs = FileSystem.get(conf);
            Path inputPath = new Path(inPath);
            if(fs.exists(inputPath)) {
                FileInputFormat.addInputPath(job, inputPath);
            }
            
            Path outputPath = new Path(outPath);
            fs.delete(outputPath,true);
            FileOutputFormat.setOutputPath(job, outputPath);
            return job.waitForCompletion(true)?1:-1;
        
        }catch(IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return -1;
    }
    public static void main(String[] args) {
        int result = -1;
        result = new Run3().run();
        if(result==1) {
            System.out.println("step3 success...");
        }else if(result==-1) {
            System.out.println("step3 failed...");
        }
    }
}

步骤四：物品与物品的相似度矩阵 x 用户、物品的评分矩阵 = 伪推荐列表

输入

步骤二的输出，即物品与物品的相似度矩阵

缓存

步骤三的输出，即用户、物品的评分矩阵的转置矩阵

输出

伪推荐列表

	A	B	C
1	9.9	2.4	23.9
2	16.6	8.3	4.3
3	4.4	0	24.0
4	11.7	3.3	22.9
5	6.5	7.4	0
6	15.4	9.8	3.1

其输出文件实际上在HDFS文件中的表示是这样的

貌似我们在这步即可得到最终的推荐列表，但其实程序走到这一步还不够，因为我们无法真正地依据上面的伪推荐列表决策出该给哪个用户推荐哪些商品，我们还需把伪推荐列表中用户曾对物品有过行为的物品相关推荐系数置0（即用户曾经购买或者收藏点击过的商品我们不予推荐），至于这一步怎么实现请看步骤五。

代码实现

Mapper类实现逻辑（Map4.java）

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map4 extends Mapper<LongWritable, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue =new Text();
    private DecimalFormat df = new DecimalFormat("0.00");
    private List<String> cacheList = new ArrayList<String>();
    
    protected void setup(Context context)
            throws IOException, InterruptedException {
        super.setup(context);
        FileReader fr = new FileReader("itemsource2");
        BufferedReader br = new BufferedReader(fr);
        String line = null;
        while((line=br.readLine())!=null) {
            cacheList.add(line);
        }
        fr.close();
        br.close();
    }

    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
            throws IOException, InterruptedException {
        String row_matrix1 = value.toString().split("\t")[0];
        String[] column_value_array_matrix1 = value.toString().split("\t")[1].split(",");
         
        for(String line:cacheList) {
            String row_matrix2 = line.toString().split("\t")[0];
            String[] column_value_array_matrix2 = line.toString().split("\t")[1].split(",");
            
            double numberator = 0;
            for(String column_value_matrix1:column_value_array_matrix1) {
                String column_matrix1 = column_value_matrix1.split("_")[0];
                String value_matrix1 = column_value_matrix1.split("_")[1];
                
                for(String column_value_matrix2:column_value_array_matrix2) {
                    if(column_value_matrix2.startsWith(column_matrix1 + "_")) {
                        String value_matrix2 = column_value_matrix2.split("_")[1];
                        numberator += Double.valueOf(value_matrix1) *Integer.valueOf(value_matrix2); 
                    }
                }
            }

            outKey.set(row_matrix1);
            outValue.set(row_matrix2+"_"+df.format(numberator));
            context.write(outKey, outValue);
        }
    }
}

Reducer类实现逻辑（Red4.java）

import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Red4 extends Reducer<Text, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue = new Text();
    
    protected void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
        StringBuilder sb = new StringBuilder();
        for(Text text:values) {
            sb.append(text+",");
        }
        String line = null;
        if(sb.toString().endsWith(",")) {
            line = sb.substring(0,sb.length()-1);
        }
        
        outKey.set(key);
        outValue.set(line);
        
        context.write(outKey, outValue);
    }   
}

Run主类实现逻辑（Run4.java）

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Run4 {
    private static String inPath = "/user/hadoop/output/step2_output.txt";
    private static String outPath = "/user/hadoop/output/step4_output.txt";
    private static String cache = "/user/hadoop/output/step3_output.txt/part-r-00000";
    private static String hdfs ="hdfs://Master:9000";
       
    public int run() throws URISyntaxException {
        try {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", hdfs);
            Job job = Job.getInstance(conf,"step4");
            
            job.addCacheArchive(new URI(cache+"#itemsource2"));
            
            job.setJarByClass(Run4.class);
            job.setMapperClass(Map4.class);
            job.setReducerClass(Red4.class);
            
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            
            FileSystem fs = FileSystem.get(conf);
            Path inputPath = new Path(inPath);
            if(fs.exists(inputPath)) {
                FileInputFormat.addInputPath(job, inputPath);
            }
            
            Path outputPath = new Path(outPath);
            fs.delete(outputPath,true);
            
            FileOutputFormat.setOutputPath(job, outputPath);
            System.out.println("111111...");
            return job.waitForCompletion(true)?1:-1;
        
        } catch(IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch(URISyntaxException e) {
            e.printStackTrace();
        }
        return -1;                                                      
    }
    public static void main(String[] args) {
        try {
            int result=-1;
            result = new Run4().run();
        
            if(result == 1) {
                System.out.println("step4 success...");
            }
            else if(result == -1){
                System.out.println("step4 failed...");
            }
        } catch (URISyntaxException e) {
            e.printStackTrace();
        }
    }
}

步骤五：把伪推荐列表中用户之前有过行为的元素置0

输入

步骤四的输出，即伪推荐列表

缓存

步骤一的输出，即用户、物品的评分矩阵

输出

最终的推荐列表

	1	2	3	4	5	6
A	0	0	4.44	0	6.45	0
B	2.38	0	0	3.27	0	0
C	0	4.25	0	0	0	3.10

其输出文件实际上在HDFS文件中的表示是这样的

代码实现

Mapper类实现逻辑（Map5.java）

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.List;


public class Map5 extends Mapper<LongWritable, Text, Text, Text> {
	private Text outKey = new Text();
	private Text outValue = new Text();

	private List<String> cacheList = new ArrayList<String>();
	private DecimalFormat df = new DecimalFormat("0.00");

	@Override
	protected void setup(Context context) throws IOException, InterruptedException {
		super.setup(context);

		FileReader fr = new FileReader("itemsource3");
		BufferedReader br = new BufferedReader(fr);

		String line = null;
		while ((line = br.readLine()) != null) {
			cacheList.add(line);
		}

		br.close();
		fr.close();
	}

	@Override
	protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { ;
		String item_matrix1 = value.toString().split("\t")[0];
		String[] user_score_array_matrix1 = value.toString().split("\t")[1].split(",");

		for (String line : cacheList) {
			String item_matrix2 = line.toString().split("\t")[0];
			String[] user_score_array_matrix2 = line.toString().split("\t")[1].split(",");

			//物品ID相同
			if (item_matrix1.equals(item_matrix2)) {
				for (String user_score_matrix1 : user_score_array_matrix1) {
					boolean flag = false;
					String user_matrix1 = user_score_matrix1.split("_")[0];
					String score_matrix1 = user_score_matrix1.split("_")[1];

					for (String user_score_matrix2 : user_score_array_matrix2) {
						String user_matrix2 = user_score_matrix2.split("_")[0];
						if (user_matrix1.equals(user_matrix2)) {
							flag = true;
						}
					}

					if (false == flag) {
						outKey.set(user_matrix1);
						outValue.set(item_matrix1 + "_" + score_matrix1);
						context.write(outKey, outValue);
					}
				}
			}
		}
	}
}

Reducer类实现逻辑（Red5.java）

import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Red5 extends Reducer<Text, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue = new Text();
    
    protected void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
        StringBuilder sb = new StringBuilder();
        for(Text text:values) {
            sb.append(text+",");
        }
        String line = null;
        if(sb.toString().endsWith(",")) {
            line = sb.substring(0,sb.length()-1);
        }
        
        outKey.set(key);
        outValue.set(line);
        
        context.write(outKey, outValue);
    }   
}

Run主类实现逻辑（Run5.java）

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Run5 {
    private static String inPath = "/user/hadoop/output/step4_output.txt";
    private static String outPath = "/user/hadoop/output/step5_output.txt";
    private static String cache = "/user/hadoop/output/Tuser_matrix.txt/part-r-00000";
    private static String hdfs ="hdfs://Master:9000";
       
    public int run() throws URISyntaxException {
        try {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", hdfs);
            Job job = Job.getInstance(conf,"step5");
            
            job.addCacheArchive(new URI(cache+"#itemsource3"));
            
            job.setJarByClass(Run5.class);
            job.setMapperClass(Map5.class);
            job.setReducerClass(Red5.class);
            
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            
            FileSystem fs = FileSystem.get(conf);
            Path inputPath = new Path(inPath);
            if(fs.exists(inputPath)) {
                FileInputFormat.addInputPath(job, inputPath);
            }
            
            Path outputPath = new Path(outPath);
            fs.delete(outputPath,true);
            
            FileOutputFormat.setOutputPath(job, outputPath);
            System.out.println("111111...");
            return job.waitForCompletion(true)?1:-1;
        
        } catch(IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch(URISyntaxException e) {
            e.printStackTrace();
        }
        return -1;                                                      
    }
    public static void main(String[] args) {
        try {
            int result=-1;
            result = new Run5().run();
        
            if(result == 1) {
                System.out.println("step5 success...");
            }
            else if(result == -1){
                System.out.println("step5 failed...");
            }
        } catch (URISyntaxException e) {
            e.printStackTrace();
        }
    }
}

写到这里我们的运行jar包里已经有5个MapReduce程序，15个类了，感兴趣的人可以试试把它们集成起来放到一个类里运行，这里就不演示了。

• CentOS7安装Memcache1.4.15	• Kafka、Flume、Storm 结合学习案例
• CentOS7镜像下载以及在VMware上安装（图文）	• redis的zset解析
• python 正则表达式	• MyBatis Generator 生成器把其他数据库的同名表

• Esp8266天猫精灵_RGB灯_非点灯平台	• STM32F103 串口1和串口3对发数据配合蓝牙模块
• TMS570学习【1】了解什么是TMS570	• 新闻稿 \| Qt公司收购froglogic公司以巩固市场领
• [Java]SpringBoot2整合mqtt服务器EMQ实现消息订	• 苹果群控投屏同步操作原理及运用的平台APP分享

• Esp8266天猫精灵_RGB灯_非点灯平台	• STM32F103 串口1和串口3对发数据配合蓝牙模块
• TMS570学习【1】了解什么是TMS570	• 新闻稿 \| Qt公司收购froglogic公司以巩固市场领
• [Java]SpringBoot2整合mqtt服务器EMQ实现消息订	• 苹果群控投屏同步操作原理及运用的平台APP分享
• STM32查询式按键输入[直接用寄存器]	• Ubuntu系统 USB设备端口绑定
• 2021-04-14 第四次按键输入实验	• Flutter扫码功能完美实现

• 谈谈Spring中的对象跟Bean，你知道Spring怎么创	• 面试\|有关字符串中字符出现重复字符的面试问题
• 老王的JAVA基础课：第4课以hello world学习基	• 配置SpringBoot项目热部署
• 我的Java学习之路（九）-- 模拟斗地主扑克牌发	• 深入浅出的Java面向对象编程，助你深入探索开发
• 关于blob与流互转的问题	• 在Java中MD5、SHA、SHA256、SHA512加密的实现[
• Java设计模式---原型模式	• spring boot整合mybatis+druid和多数据源外加dr