Spark Cassandra Connector for Python

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Spark Cassandra Connector for Python


At I see that you
are extending API that Spark provides for interacting with RDDs to
leverage some native Cassandra features. We are using Apache Cassandra
together with PySpark to do some analytics and since we have community
version, we use classic api calls like sc.newAPIHadoopRDD which means
writing converters for data in Scala. We would like to use calls such as
sc.cassandraTable but I don't see these methods anywhere in PySpark and does not even
mention access from Python.

I see however that you are using these methods in PySpark. Does it mean
Spark Cassandra Connector for Python is available only in DataStax
Enterprise and we have to buy it to use that API and features like
server-side filtering from PySpark?

Also at 
I see that there is some effort to interface CassandraSparkContext to
Python, does it mean that those guys are duplicating your work?

Marek WiewiĆ³rski
Opera Software