Skip to content

Commit

Permalink
Added starspace paper
Browse files Browse the repository at this point in the history
  • Loading branch information
shagunsodhani committed Jan 30, 2018
1 parent f0f1a29 commit 207f8a0
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho

## List of papers

* [StarSpace - Embed All The Things!](https://shagunsodhani.in/papers-I-read/StarSpace-Embed-All-The-Things)
* [Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory](https://shagunsodhani.in/papers-I-read/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory)
* [How transferable are features in deep neural networks](https://shagunsodhani.in/papers-I-read/How-transferable-are-features-in-deep-neural-networks)
* [Distilling the Knowledge in a Neural Network](https://shagunsodhani.in/papers-I-read/Distilling-the-Knowledge-in-a-Neural-Network)
Expand Down
42 changes: 42 additions & 0 deletions site/_posts/2018-01-29-StarSpace - Embed All The Things.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
layout: post
title: StarSpace - Embed All The Things!
comments: True
excerpt: The paper describes a general purpose neural embedding model where different type of entities (described in terms of discrete features) are embedded in a common vector space.
tags: ['2017', 'Graph Representation', 'Multi Task', Network Representation', 'Network Embedding', 'Word Vectors', 'Representation Learning', Embedding, Graph, NLP]
---

## Introduction

* The paper describes a general purpose neural embedding model where different type of entities (described in terms of discrete features) are embedded in a common vector space.

* A similarity function is learnt to compare these entities in a meaningful way and score their similarity. The definition of the similarity function could depend on the downstream task where the embeddings are used.

* [Link to the paper](https://arxiv.org/abs/1709.03856)

* [Link to the implementation](https://github.com/facebookresearch/StarSpace)

## Approach

* Each entity is described as a set of discrete features. For example, for the recommendation use case, the users may be described as a bag-of-words of movies they have liked. For the search use case, the document may be described as a bag-of-words of words they are made up of.

* Given a dataset and a task at hand, generate a set of positive samples *E = (a, b)* such that *a* is the input to the task (from the dataset) and *b* is the expected label(answer/entity) for the given task.

* Similarly, generate another set of negative samples *E <sup>-</sup> = (a, b<sub>i</sub><sup>-</sup>)* such that *b<sub>i</sub><sup>-</sup>* is one of the incorrect label(answer/entity) for the given task. The incorrect entity can be sampled randomly from the set of candidate entities. Multiple incorrect samples could be generated for each positive example. These incorrect samples are indexed using *i*.

* For example, in case of supervised learning problem like document classification, *a* would be one of the documents (probably described in terms of words), *b* is the correct label and *b<sub>i</sub><sup>-</sup>)* is one of the randomly sampled label from set of all the labels (excluding the correct label).

* In case of collaborative filtering, *a* would be the user (either described as a discrete entity like a userid or in terms of items purchased so far), *b* is the next item the user purchases and *b<sub>i</sub><sup>-</sup>)* is one of the randomly sampled item from the set of all the items.

* A similarity function is chosen to compare the representation of entities of type *a* and *b*. The paper considered cosine similarity and inner product and observed that cosine similarity works better for the case with a large number of entities.

* A loss function compares the similarity between positive pairs *(a, b)* and *(a, b<sub>i</sub><sup>-</sup>)*. The paper considered margin ranking loss and negative log loss of softmax and reported that margin ranking loss works better.

* The norm of embeddings is capped at 1.

## Observations

* The same model architecture is applied to a variety of tasks including multi-class classification, multi-label classification, collaborative filtering, content-based recommendation, link prediction, information retrieval, word embeddings and sentence embeddings.

* The model provides a strong baseline on all the tasks and performs at par with much more complicated and task-specific networks.

2 changes: 1 addition & 1 deletion site/_site
Submodule _site updated from a676f5 to 638645

0 comments on commit 207f8a0

Please sign in to comment.