mardi 21 avril 2015

Type mismatch with identical types in Spark-shell

I have build a scripting workflow around the spark-shell but I'm often vexed by bizarre type mismatches (probably inherited from the scala repl) occuring with identical found and required types. The following example illustrates the problem. Executed in paste mode, no problem

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.rdd.RDD
case class C(S:String)
def f(r:RDD[C]): String = "hello"
val in = sc.parallelize(List(C("hi")))

// Exiting paste mode, now interpreting.

import org.apache.spark.rdd.RDD
defined class C
f: (r: org.apache.spark.rdd.RDD[C])String
in: org.apache.spark.rdd.RDD[C] = ParallelCollectionRDD[0] at parallelize at <console>:13
res0: String = hello


scala> f(in)
<console>:29: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[C]
 required: org.apache.spark.rdd.RDD[C]

There are related discussion about the scala repl and about the spark-shell but the mentioned issue seems unrelated (and resolved) to me.

This problem causes serious problems for writing passable code to be executed interactively in the repl, or causes to lose most of the advantage of working in a repl to begin with. Is there a solution? (And/or is it a known issue?)

Aucun commentaire:

Enregistrer un commentaire