【Flink】【笔记】Flink入门之离线+实时wordcount笔记
本帖最后由 moocer 于 2020-6-23 23:34 编辑1搭建maven工程 flink-2019
1.1、pom文件
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>Flink</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_2.11</artifactId>
<version>1.7.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-scala -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.7.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- 该插件用于将Scala代码编译成class文件 -->
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.4.6</version>
<executions>
<execution>
<!-- 声明绑定到maven的compile阶段 -->
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
1.2 添加scala框架 和 scala文件夹
2 批处理wordcount
def main(args: Array): Unit = {
//env构造执行环境
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
//source读取文件
val inputPath = "F:\\myProject\\sparkcode\\input\\word.txt"
val ds: DataSet = env.readTextFile(inputPath)
//transform其中flatMap 和Map 中需要引入隐式转换
import org.apache.flink.api.scala.createTypeInformation
//经过groupby进行分组,sum进行聚合
val aggDs = ds.flatMap(_.split(" ")).map((_, 1)).groupBy(0).sum(1)
//sink打印
aggDs.print()
}
3流处理 wordcount
def main(args: Array): Unit = {
//创建流处理环境
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//接收socket文本流
val textDstream: DataStream = env.socketTextStream("hadoop202",9999)
// flatMap和Map需要引用的隐式转换
import org.apache.flink.api.scala._
//处理 分组并且sum聚合
val dStream: DataStream[(String, Int)] = textDstream.flatMap(_.split(" ")).filter(_.nonEmpty).map((_,1)).keyBy(0).sum(1)
//打印
dStream.print()
env.execute()
在我对应的虚拟机Hadoop202中启动netcat工具向9999端口发送消息
nc -lk 9999
为什么这么多人玩JAVA,不去玩C++ sam喵喵 发表于 2020-6-23 23:26
为什么这么多人玩JAVA,不去玩C++
这个问题估计得找市场要答案? 谢谢分享 谢谢分享 我都是用 pascal sam喵喵 发表于 2020-6-23 23:26
为什么这么多人玩JAVA,不去玩C++
由俭入奢易,由奢入俭难。话可能不对,意思是这个意思 先支持一下了 kabengqi 发表于 2020-6-24 12:51
由俭入奢易,由奢入俭难。话可能不对,意思是这个意思
国产OS何时才能崛起啊
页:
[1]