Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition away from Java serialization for storing state on disk or in Zookeeper #497

Closed
wants to merge 13 commits into from

Conversation

hausdorff
Copy link

Addresses issue #419. Specifically, transitions Supervisor to clojure serialization, StormTopology to thrift, and maybe some other things I'm forgetting.

Also my first go-around with clojure. Any code written in clojure should be considered suspect.

config and nimbus currently use the core Java serialization to store
StormTopology instances. This commit will change this to use Thrift
serialization instead. StormTopology is a Thrift struct so this
basically involves finding all the places we call `Utils/serialize` and
`Utils/deserialize` and replace them with a call to a method that
serializes with Thrift instead.
config.clj and nimbus serialize Storm configuration using the default
Java implementation. This commit will phase out Java default
serialization in favor of Clojure default serialization. There are
obviously many reasons to phase out Java serialization, but the main
rationale is that Java serialization will complain about version
mismatch even if the change is semantically backward compatible.
TSerializer is not threadsafe. In `Utils` we instantiate a static final
Tserializer, but this can (and will) cause odd bugs if we start calling
`serialize()` in different threads. Thus, every time we call
`Utils/serializeTopology`, we create a new TSerializer.

Another way to do this would be to lock it, which performance may or may
not merit.
Serialization of configuration is handled in config, but it is not
different from a generic method for Clojure form serialization. This
commit will move this method to utils so that we can use it for other
things, like serialization in cluster.
cluster.clj uses the stock Java serialization implementation. There are
obviously many reason to not use standard Java serialization, but our
main motivation is that Java will complain about serialized state
when the versions don't match even if they're semantically backwards
compatible.
Supervisor currently just uses stock jvm serialization to communicate
LocalState. This is undesirable for many reasons, so this commit
will introduce a serialization interface which makes code cleaner,
letting us specify which type of serializer to use without populating
Supervisor with unnecessary boxing/unboxing behavior, or LocalState
with too much knowledge about what's happening in the Supervisor.

Also this commit will introduce a basic implementation sketch for a
serializer for LocalState (though it will just use the jvm serialization
at this point).
To build the serialization interface in Java, we need to put these
constants in a Java file. This commit will put them in Constants.java
Conflicts:
	src/clj/backtype/storm/daemon/worker.clj
	src/jvm/backtype/storm/Constants.java
	src/jvm/backtype/storm/utils/LocalState.java
@hausdorff
Copy link
Author

Is there something I can do to speed this patch integration?

@jasonjckn
Copy link
Collaborator

Nathan, can you put backwards incompatible changes into 0.10 branch. 0.9.0-wip16 is super well tested and the only branch with metrics, rarely can you get this amount of testing , let's do compatible releases on it for while.

@jasonjckn
Copy link
Collaborator

i'm going to be handling the merge of this fyi. i'll read through the code next chance I get.

@jasonjckn
Copy link
Collaborator

(this will be going into 0.9.0 branch)

@jasonjckn
Copy link
Collaborator

did you sign storm contributors agreement? if not please do.

@jasonjckn
Copy link
Collaborator

Does this PR solve https://github.com/nathanmarz/storm/issues/525 e.g. can I add a new field to StormBase state for example, and the next time I start nimbus with this change all the previously created state that didn't have this field (but did have your PR) will still be readable.

Is it forwards compatible as well?

@hausdorff
Copy link
Author

Great, looking forward to resolving the pr. I have signed the contributor's agreement.

It is "forwards compatible" in that serialized objects that are semantically identical are always unserialized to the same thing, regardless of whether we have done something like upgraded Java. It is less forwards compatible in that if you change the semantics of the storm state (and in particular LocalState) to demand something crazier than vanilla clojure serialization, you will need to build an accompanying implementation of the backtype.storm.utils.StateSerializer interface.

As for adding fields to StormBase, my response is limited to what I know about Storm (I'm a noob). But my answer is, I think that yes it should. Our proposed serialization scheme amounts to using standard clojure serialization, so for example the stormbase serialization here: https://github.com/nathanmarz/storm/pull/497/files#L0R304 should just work.

Also, if EDT is really a strict superset of clojure, then all serialization of LocalState should be compatible with anyone who wants to deserialize with implementation of EDT (I think this addresses questions you had in #525).

@hausdorff
Copy link
Author

What is the status of this PR?

@ptgoetz
Copy link
Collaborator

ptgoetz commented Sep 27, 2013

I think it's safe to close this.

@ptgoetz ptgoetz closed this Sep 27, 2013
@d2r
Copy link
Contributor

d2r commented Sep 27, 2013

This pull request was closed because it was superseded by #625.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants