-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transition away from Java serialization for storing state on disk or in Zookeeper #497
Conversation
config and nimbus currently use the core Java serialization to store StormTopology instances. This commit will change this to use Thrift serialization instead. StormTopology is a Thrift struct so this basically involves finding all the places we call `Utils/serialize` and `Utils/deserialize` and replace them with a call to a method that serializes with Thrift instead.
config.clj and nimbus serialize Storm configuration using the default Java implementation. This commit will phase out Java default serialization in favor of Clojure default serialization. There are obviously many reasons to phase out Java serialization, but the main rationale is that Java serialization will complain about version mismatch even if the change is semantically backward compatible.
TSerializer is not threadsafe. In `Utils` we instantiate a static final Tserializer, but this can (and will) cause odd bugs if we start calling `serialize()` in different threads. Thus, every time we call `Utils/serializeTopology`, we create a new TSerializer. Another way to do this would be to lock it, which performance may or may not merit.
Serialization of configuration is handled in config, but it is not different from a generic method for Clojure form serialization. This commit will move this method to utils so that we can use it for other things, like serialization in cluster.
cluster.clj uses the stock Java serialization implementation. There are obviously many reason to not use standard Java serialization, but our main motivation is that Java will complain about serialized state when the versions don't match even if they're semantically backwards compatible.
Supervisor currently just uses stock jvm serialization to communicate LocalState. This is undesirable for many reasons, so this commit will introduce a serialization interface which makes code cleaner, letting us specify which type of serializer to use without populating Supervisor with unnecessary boxing/unboxing behavior, or LocalState with too much knowledge about what's happening in the Supervisor. Also this commit will introduce a basic implementation sketch for a serializer for LocalState (though it will just use the jvm serialization at this point).
To build the serialization interface in Java, we need to put these constants in a Java file. This commit will put them in Constants.java
Conflicts: src/clj/backtype/storm/daemon/worker.clj src/jvm/backtype/storm/Constants.java src/jvm/backtype/storm/utils/LocalState.java
Is there something I can do to speed this patch integration? |
Nathan, can you put backwards incompatible changes into 0.10 branch. 0.9.0-wip16 is super well tested and the only branch with metrics, rarely can you get this amount of testing , let's do compatible releases on it for while. |
i'm going to be handling the merge of this fyi. i'll read through the code next chance I get. |
(this will be going into 0.9.0 branch) |
did you sign storm contributors agreement? if not please do. |
Does this PR solve https://github.com/nathanmarz/storm/issues/525 e.g. can I add a new field to StormBase state for example, and the next time I start nimbus with this change all the previously created state that didn't have this field (but did have your PR) will still be readable. Is it forwards compatible as well? |
Great, looking forward to resolving the pr. I have signed the contributor's agreement. It is "forwards compatible" in that serialized objects that are semantically identical are always unserialized to the same thing, regardless of whether we have done something like upgraded Java. It is less forwards compatible in that if you change the semantics of the storm state (and in particular LocalState) to demand something crazier than vanilla clojure serialization, you will need to build an accompanying implementation of the As for adding fields to StormBase, my response is limited to what I know about Storm (I'm a noob). But my answer is, I think that yes it should. Our proposed serialization scheme amounts to using standard clojure serialization, so for example the stormbase serialization here: https://github.com/nathanmarz/storm/pull/497/files#L0R304 should just work. Also, if EDT is really a strict superset of clojure, then all serialization of LocalState should be compatible with anyone who wants to deserialize with implementation of EDT (I think this addresses questions you had in #525). |
What is the status of this PR? |
I think it's safe to close this. |
This pull request was closed because it was superseded by #625. |
Addresses issue #419. Specifically, transitions
Supervisor
to clojure serialization,StormTopology
to thrift, and maybe some other things I'm forgetting.Also my first go-around with clojure. Any code written in clojure should be considered suspect.