Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 2.3.1 #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Spark 2.3.1 #13

wants to merge 1 commit into from

Conversation

polomarcus
Copy link
Owner

Spark 2.2.0 to 2.3.1

Need to update Cassandra Sink

@cranberrysoft
Copy link

I guess it will not work. I tried to upgrade spark to 2.3.1 and it started returning such an error:
org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;

btw. there is already sink for Cassandra in DSE 6. I am wondering when(if) they port that solution to cassandra driver. The sink I mentioned exists in spark-connector-6.0.2.jar CassandraSourceRelation. I tested that in stand-alone DSE and it works with .outputMode("update").
It is a pity that community can not use that solution for free...

@polomarcus
Copy link
Owner Author

polomarcus commented Aug 3, 2018 via email

@cranberrysoft
Copy link

Hi Paul
I'd love to help you with development of sink for Cassandra especially that it is not going to be included to the open-source driver as you said. Please let me know how I can reach you if you need any help in this matter.

@polomarcus
Copy link
Owner Author

I would have a look to the elastic sink, which is open source, and see their implementation to be inspired.
Hopefully, we just need to change import (DatasourceV2 or something like that) but it can also be, rewrite the sink to be 2.3 compliant and it may take some time :/

We also have the foreach sink that can be used with Cassandra. I refer to it as "unsafe" in the repo

@cranberrysoft
Copy link

cranberrysoft commented Aug 4, 2018

I thought also about foreach sink but it has two downsides. First of all it does not support this stateful transaminations which are the key things when it comes to Structured Streaming. Secondly I believe that this solution is not really optimal since it use low level API to save data to Cassandra and you operate on a row so probably all the under-hood optimization which are done by the driver is lost. I am pretty sure that you saw one of Russel videos about the Cassandra driver https://www.youtube.com/watch?v=cKIHRD6kUOc

I also tried to find an inspiration in DSE implementation unfortunately it's not opensource and it is Scala code so you can not easily decompile the code ;) but I will also try to dig a little bit to understand the way it should have been implemented.

@redsk
Copy link

redsk commented Aug 23, 2018

Hi guys, I'm also interested in this and I'd love to help you with development. Please let me know how I can contact you for this effort. Cheers

@snowch
Copy link
Contributor

snowch commented Aug 28, 2018

See also: scylladb/scylla-code-samples#67 (comment)

@polomarcus
Copy link
Owner Author

Thanks for all your messages 😄

If you feel like give it a try, the offical Elastic sink can be a great source of inspiration for the Cassandra sink

Compared to what we have in the repo :

I might be able to spend some time on the issue the following month.

@snowch
Copy link
Contributor

snowch commented Sep 20, 2018

Looks like there is some useful stuff in here: scylladb/scylla-code-samples#68

@polomarcus
Copy link
Owner Author

thanks @snowch Scylla does it the same way by using the Datastax's connector : https://github.com/scylladb/scylla-code-samples/pull/68/files#diff-1e869081fec2d3c842a3b91688825a5eR71

I'm guessing it should be a small fix to be able to have the project running for spark 2.3.1 and the cassandra sink

@snowch
Copy link
Contributor

snowch commented Oct 26, 2018

@polomarcus are you planning to implement the fix you suggested above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants