吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 2076|回复: 5
收起左侧

[已解决] mapreduce怎么筛选空字符

[复制链接]
牵手丶若相惜 发表于 2019-9-30 18:39
本帖最后由 牵手丶若相惜 于 2019-10-29 18:24 编辑

60.174.83.183|q11.cnzz.com|20170207161935|42.120.219.31|0
60.174.83.183|textlink.simba.taobao.com|20170207161935|140.205.62.20|0
60.174.83.184|www.taobao.com|20170207161935|124.112.127.48|0
60.174.83.226|p3.ssl.qhimg.com|20170207161935|0
60.174.83.238|show-s.mediav.com|20170207161935|180.163.255.159|0
60.174.83.181|q11.cnzz.com|20170207161935|42.120.219.31|0
60.174.83.182|textlink.simba.taobao.com|140.205.62.20|0
74.125.77.73|89.110.170.60.in-addr.arpa|20170207161935||3
60.174.83.195|www.taobao.com|20170207161935|124.112.127.48|0
60.174.83.200|p3.ssl.qhimg.com|20170207161935|101.227.5.22;101.227.5.23|0
60.174.83.238|show-s.mediav.com|20170207161935|180.163.255.159|0
60.174.83.156|q11.cnzz.com|20170207161935|42.120.219.31|0
74.125.80.70|144.107.102.114.in-addr.arpa|20170207161935||3
60.174.83.152|textlink.simba.taobao.com|20170207161935|140.205.62.20|0
60.174.83.182|www.taobao.com|201702071619350|0
60.174.83.226|p3.ssl.qhimg.com|201702071619350|0
60.174.83.231|show-s.mediav.com|20170207161935|180.163.255.159|0
61.220.10.193|242.26.103.114.in-addr.arpa|20170207161935||3
89.67.84.50|85.235.178.220.in-addr.arpa|20170207161935||3

要求:将字段个数不满足5个的数据过滤掉,并且将网站地址中为 ”www.taobao.com” 的标记为购物网替换为“ShoppingAction ”,最后将清洗过滤后的数据全部输出

分隔符:|

我只能把字段不满足5个的给筛选掉 ,其他的解决不了 求大神
我用for历遍判断,还是判断不出来

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

吸水雨衣 发表于 2019-10-3 00:17
spark SQL   translate 转换
foreach
15212520947 发表于 2019-10-3 16:08
本帖最后由 15212520947 于 2019-10-3 16:09 编辑

Mapper端

[Java] 纯文本查看 复制代码
import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MyMapper extends Mapper<LongWritable, Text, MyBean, NullWritable> {
        MyBean bean = new MyBean();

        @Override
        protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, MyBean, NullWritable>.Context context)
                        throws IOException, InterruptedException {
                String values = value.toString();
                String[] words = values.split("\\|");
                if(words.length>5) {
                        if (words[1].equals("www.taobao.com")) {
                                words[1] = "ShoppingAction";
                                bean.setBean(words[0], words[1], words[2], words[3], words[4]);
                                context.write(bean, NullWritable.get());
                        } else {
                                bean.setBean(words[0], words[1], words[2], words[3], words[4]);
                                context.write(bean, NullWritable.get());
                        }
                }
        }

}


Reducer端
[Java] 纯文本查看 复制代码
import java.io.IOException;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;

public class MyReducer extends Reducer<MyBean, NullWritable, MyBean, NullWritable> {

        @Override
        protected void reduce(MyBean key, Iterable<NullWritable> value,
                        Reducer<MyBean, NullWritable, MyBean, NullWritable>.Context context)
                        throws IOException, InterruptedException {
                for (NullWritable values : value) {
                        context.write(key, values);
                }
        }

}


自建的类MyBean

[Java] 纯文本查看 复制代码
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.WritableComparable;

public class MyBean implements WritableComparable<MyBean> {
	String ip;
	String addres;
	String data;
	String ips;
	String zero;

	public MyBean() {

	}

	public void setBean(String ip, String addres, String data, String ips, String zero) {
		this.ip = ip;
		this.addres = addres;
		this.data = data;
		this.ips = ips;
		this.zero = zero;
	}

	public MyBean(String ip, String addres, String data, String ips, String zero) {
		this.ip = ip;
		this.addres = addres;
		this.data = data;
		this.ips = ips;
		this.zero = zero;
	}

	public String getIp() {
		return ip;
	}

	public void setIp(String ip) {
		this.ip = ip;
	}

	public String getAddres() {
		return addres;
	}

	public void setAddres(String addres) {
		this.addres = addres;
	}

	public String getData() {
		return data;
	}

	public void setData(String data) {
		this.data = data;
	}

	public String getIps() {
		return ips;
	}

	public void setIps(String ips) {
		this.ips = ips;
	}

	public String getZero() {
		return zero;
	}

	public void setZero(String zero) {
		this.zero = zero;
	}

	@Override
	public String toString() {
		return ip + "\t" + addres + "\t" + data + "\t" + ips + "\t" + zero;
	}

	@Override
	public void write(DataOutput out) throws IOException {
		out.writeUTF(ip);
		out.writeUTF(addres);
		out.writeUTF(data);
		out.writeUTF(ips);
		out.writeUTF(zero);

	}

	@Override
	public void readFields(DataInput in) throws IOException {
		this.ip = in.readUTF();
		this.addres = in.readUTF();
		this.data = in.readUTF();
		this.ips = in.readUTF();
		this.zero = in.readUTF();

	}

	@Override
	public int compareTo(MyBean o) {
		// TODO Auto-generated method stub
		return 0;
	}
}

免费评分

参与人数 1吾爱币 +1 热心值 +1 收起 理由
牵手丶若相惜 + 1 + 1 谢谢@Thanks!

查看全部评分

 楼主| 牵手丶若相惜 发表于 2019-10-4 22:55
15212520947 发表于 2019-10-3 16:08
Mapper端

[mw_shl_code=java,true]import java.io.IOException;

这个问题已经解决了
是因为" | "分隔符 需要\\来转义
但是还是谢谢
15212520947 发表于 2019-10-5 16:52
牵手丶若相惜 发表于 2019-10-4 22:55
这个问题已经解决了
是因为" | "分隔符 需要\\来转义
但是还是谢谢

我也刚学没多久,这题顺便做到了
waowao 发表于 2019-11-12 16:29
你这几个要求在map阶段其实就直接可以处理掉。
[Java] 纯文本查看 复制代码
public class NewTest {
    private static class MyMapper extends Mapper<Object, Text, Text, NullWritable> {
        @Override
        protected void map(Object k1, Text v1, Context context) throws IOException, InterruptedException {
            String line = v1.toString();
            String data = line.replace("www.taobao.com", "ShoppingAction");
            String words[] = data.split("\\|");
            if (words.length == 5) {
                for (int i = 0; i < words.length; i++) {
                    if (words[i].isEmpty()) {
                        return;
                    }
                }
                context.write(new Text(data), NullWritable.get());
            }



        }
    }
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2025-1-13 11:01

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表