-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clustering / Unsupervised Learning #28
Comments
Any ideas for a good sample dataset to try with this? Eventually I'd like to try some interesting demos like @genekogan's amazing TSNe Viewer but will start with something smaller, simpler and more suited for kmeans. Maybe colors or small corpus of word vectors? |
hi @shiffman! i often dig into the (now relatively old) CalTech-256 which is structured into 256 rather random categories. the tsne demo just is the subset of those categories which are animals. it's neatly organized so it's easy to make themed subsets. |
The iris dataset is popular for kmeans demos. There are only four dimensions so it's relatively simple to visualize the clusters. If the intended audience is new to machine learning, the results can be visualized without additional concepts such as PCA. also, I believe the example is now here: https://github.com/ml5js/ml5-examples/tree/master/p5js/WIP_clustering/kmeans |
Adding to this thread a good reference for t-SNE by @enjalot |
Amazing reference! Realtime tSNE Visualizations with TensorFlow.js by @Nicola17 https://twitter.com/nicolapezzotti/status/1004866454578257922?s=11 |
this looks really nice. great find! |
We could work on porting this: |
I had originally made a pull request for this, but development fell off and I closed the pull request. I'm leaving some breadcrumbs here for reference. |
I can work on some of these over summer - is there a particular API I should strive for/methods I should expose? |
Hi @jwilber - Hooray! Thanks so much for your interest. This would be super wonderful. I'm happy to chat more about the API structure anytime (I know @shiffman and @yining1023 will have great feedback here), but in general, the ml5 process goes something like:
So a rough rough proposal might be something where you set a bunch of options and the kmeans just spits out a bunch of results: const options = {
prop1: 1,
prop2: 'something'
}
const dataUrl = 'rainbowData.csv'
const kmeans = ml5.cluster('kmeans', dataUrl, options, modelWithDataLoadedCallback)
function modelWithDataLoadedCallback(err, data){
if(err){
return err;
}
console.log(results)
)} Maybe a different approach could be something like: const options = {
prop1: 1,
prop2: 'something'
}
const dataUrl = 'rainbowData.csv'
const kmeans = ml5.cluster('kmeans', dataUrl, dataLoadedCallback)
function dataLoadedCallback(err, _data){
if(err) return err;
kmeans.classify(data, options, dataCrunchedCallback)
}
function dataCrunchedCallback(err, data){
if(err) return err;
// do something with the data
} I'm not sure yet what the terminology or function naming would be best here quite yet, but in general, we try to use more approachable terms or helpful analogies. I also wonder if it makes sense if some of these functions become "helper" functions like:
turf.js has a kmeans implementation for geo operations. Maybe there's some helpful nuggets in there for us too? |
@joeyklee thanks for the detailed response! I won't be able to get started until roughly the end of the month, so expect an update around then :) Also - I'm assuming I should implement each alg using tensorflowjs? Is there a particular version? Thanks! |
@jwilber - Sure thing! Thanks so much for your interest in ml5! This issue has been open awhile and we'd love to open up these kinds of methods to the ml5 community. No pressure on timing! Also nice to enjoy the summer :)
p.s. Your pudding.cool piece on skate music is one of my favorite data viz/editorial pieces. I can still remember every trick on beat for all skate videos between ~2000 - 2010. I love the line _Classic rock is often used with a so-called “hesh” style of skateboarding, _ 😂 |
Haha, thanks a lot! Skateboarders of the world unite ✊. (I actually watched Appleyard's part in Sorry recently and oh, man, the waves of nostalgia I felt when the Placebo song came on were insane). Re: implementation: As a first pass I'll just grab any relevant functions I need from tf.js, and if it only turns out to be a few we can discuss cleaning up the dependencies (i.e. rewriting them as class methods or whatever) in the PR. Looking forward to contributing :) |
@jwilber :
Re implementation:
Thanks! |
Sorry for long response (had heart surgery and it took longer to recover than I thought) - I'll start working on this over the weekend, so expect a first pass next week! |
@jwilber - Thanks for the update + no worries. I hope you have a speedy recovery! As many of the ml5 team have been on holidays (myself included) we're also getting caught up with issues and PRs etc. Thanks + take care! |
Just making a note that the |
Linking to: #563 |
Thanks for adding this class! Closing for now, since we have a separate issue for DBScan. |
I am working on some examples to cluster data sets. Here are some algorithms I imagine eventually having in this library:
I committed a kmeans example (uses random vectors) as a start.
https://github.com/ITPNYU/p5-deeplearn-js/tree/master/examples/clustering/kmeans
Some next steps are:
There are some interesting possibilities with combining clustering algorithms with word vectors.
The text was updated successfully, but these errors were encountered: