Spark example


Spark execution example of a word counting process available on the platform:

1. Counting words:

Upload a text file mydatafile.txt to a "project" directory (/projets/test/sparktest/mydatafile.txt)

Launch of spark-shell:
[xxxx@osirim-hadoop ~]$ spark-shell 

When you get the Scala prompt, enter the following commands:
Scala > var file=sc.textFile("/projets/test/sparktest/mydatafile.txt")
Scala > var counts=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
Scala > counts.saveAsTextFile("/projets/test/sparktest/output")

From the prompt Scala, see the result:
Scala > counts.toArray().foreach(println)

From hdfs (excluding spark), see the result:
Scala > Ctrol D
[xxxx@osirim-hadoop ~]$ hadoop fs -cat /projets/test/sparktest/output/part*

From the shell, (excluding spark or hdfs), see the result:
[xxxx@osirim-hadoop ~]$ cat /projets/test/sparktest/output/part*