Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
gcp
gcp-bigdataml
Commits
812e502e
Commit
812e502e
authored
Apr 03, 2019
by
Federico Mestrone
Browse files
Minor improvements to Spark process and code
parent
b3465391
Changes
1
Hide whitespace changes
Inline
Side-by-side
adtech-spark/src/main/scala/gcp/cm/bigdata/adtech/spark/AdtechClean.scala
View file @
812e502e
...
...
@@ -9,15 +9,18 @@ object AdtechClean extends App {
conf
.
setAppName
(
"Word Count"
)
val
sc
=
new
SparkContext
(
conf
)
val
impressions
=
sc
.
textFile
(
"gs://abucket-for-codemotion/adtech/test"
)
// Si può usare GS come file system distribuito nativo al posto di HDFS
val
impressions
=
sc
.
textFile
(
"gs://abucket-for-codemotion/adtech/test.csv"
)
val
csv
=
impressions
.
map
(
line
=>
line
.
split
(
","
))
val
cleaned
=
csv
.
map
(
rec
=>
rec
.
take
(
2
)
++
rec
.
drop
(
3
).
take
(
12
))
// val cleaned = csv.map(rec => rec.take(2) ++ rec.drop(3).take(12))
val
cleaned
=
csv
.
map
(
rec
=>
rec
.
take
(
2
)
++
rec
.
slice
(
3
,
15
))
val
textfile
=
cleaned
.
map
(
rec
=>
rec
.
mkString
(
","
))
textfile
.
saveAsTextFile
(
"gs://abucket-for-codemotion/adtech/test_cleaned"
)
// textfile.coalesce(1, shuffle = true).saveAsTextFile("gs://abucket-for-codemotion/adtech/test_cleaned")
// Si può usare GS come file system distribuito nativo al posto di HDFS
textfile
.
saveAsTextFile
(
"gs://abucket-for-codemotion/adtech/test_cleaned.csv"
)
// textfile.coalesce(1, shuffle = true).saveAsTextFile("gs://abucket-for-codemotion/adtech/test_cleaned.csv")
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment