Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training LSTM model #2

Closed
shiffman opened this issue Oct 23, 2017 · 14 comments
Closed

Training LSTM model #2

shiffman opened this issue Oct 23, 2017 · 14 comments

Comments

@shiffman
Copy link
Member

@nsthorat and @cvalenzuela have an e-mail thread about this, but adding here to track going forward.

At the moment the LSTM example uses a model trained with this script from deeplearn.js. Eventually we want to train the LSTM in browser, but before that I thought it might be simpler to demonstrate the training with a keras model using this example from my class last year.

@cvalenzuela attempted to use this script to convert the model from my example but the output isn't working just yet.

Shall we do more work to get the keras trained example to be compatible or point students towards train.py that's in deeplearn.js if they want to train their own model?

Did I get this right?

@cvalenzuela
Copy link
Member

The script you reference, transforms the original Keras model (saved as one .h5 file) to Tensorflow checkpoints. These are the files deeplearn.js uses to create a manifest.json. The issue is that when dumping the manifest file we also get a bunch of other variables in the manifest.json.

I think for now, the best approach is to point students to the train.py.
Inside /training is a modified train.py that can be used to train an lstm with text. That's the one I used to get the demo working.

@shiffman
Copy link
Member Author

Makes sense, I'll try to train a slightly better model in the morning and show how to run train.py in class. It probably makes sense to adapt/simplify these python environment instructions and include here? Let me know if you have any specific suggestions.

https://github.com/shiffman/NOC-S17-2-Intelligence-Learning/wiki/Python-Environment-Setup

@cvalenzuela
Copy link
Member

If you run it with the complete text on your computer it will take forever. Try with just a part of it instead. I haven't played with the hidden layers that much, but that should also influence the result.

I'm also writing this: https://github.com/cvalenzuela/hpc
As a reference to start training models using NYU hpc. Maybe we can add that on the instructions setup.

@nsthorat
Copy link
Contributor

One thing you could do is write the keras LSTM implementation in deeplearn.js using the RNN cell, and then modify our dump checkpoint script so that it's suited to dump exactly what you need.

Did you ever get that LSTM keras model working from before?

@cvalenzuela
Copy link
Member

I couldn't get the Keras model to work. Here is what I was doing, I'm sure I'm doing something wrong with the encoding.

I'm now going to try with a model that was trained without keras, just tf.

@cvalenzuela
Copy link
Member

Update: The lstm example is now working properly here, you can even play with the temperature and the length. It's trained on a small corpus of Shakespeare.
I used this to train and then port the model.

@shiffman
Copy link
Member Author

oh oops, I missed this thread when filing #12, I'll take a look and see about porting to the simpler "plainjs" examples! Should I discuss and show char-rnn-tensorflow with students and add to the training instructions of just leave the info on train.py?

@nsthorat
Copy link
Contributor

Love the demo!

Let me know if there's any feedback you have for us about what we can make easier.

@nsthorat
Copy link
Contributor

FYI - there's a thread in our repo that is curious about this if you want to show off the demo!

https://github.com/PAIR-code/deeplearnjs/issues/237

@cvalenzuela
Copy link
Member

thanks @nsthorat. We still need to clean the code a little bit and refactor, but I'll post it!

@shiffman, I will rewrite the instructions to use the char-rnn-tensorflow for the training. I found that easier and more manageable to use than the one we were using before. Also, running it on tinny Shakespeare took like 1hr in my computer.
I think the next step is to start wrapping that into a callable method. What do you think?

@shiffman
Copy link
Member Author

Yes! It's a little bit odd to start building the library out with LSTM as the first stop, but since it's what we're using in A2Z right now I guess it makes sense. I'm imagining a more OOP style API, something like:

const lstm = new LSTMGenerator('/path/to/model/', 'path/to/variables.json');
let txt = lstm.generate(len, seed, temperature);

Does that make sense do you think? We can move this to a separate thread. I am not at all sure about the naming. But at least I tried to use const!

Maybe the variables.json should be integrated into the model files directory and not treat them separately?

@cvalenzuela
Copy link
Member

yes, that makes sense!

let's move this to another thread to keep track of it

@shiffman
Copy link
Member Author

Great, will leave this open, you can close once the char-rnn-tensorflow instructions are up. I'll repost my comment to a new thread.

@shiffman shiffman mentioned this issue Oct 30, 2017
@cvalenzuela
Copy link
Member

Thechar-rnn-tensorflow instructions are up here.

joeyklee pushed a commit that referenced this issue Aug 26, 2019
bomanimc added a commit that referenced this issue Apr 28, 2020
# This is the 1st commit message:

Add a simple version of the Handpose model

# This is the commit message #2:

Incomplete handpose

# This is the commit message #3:

Working on handpose
bomanimc added a commit that referenced this issue Apr 28, 2020
# This is the 1st commit message:

Add a simple version of the Handpose model

# This is the commit message #2:

Incomplete handpose

# This is the commit message #3:

Working on handpose
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants