Skip to content

Commit

Permalink
move all docs/readmes/etc facebookresearch -> flashlight (flashlight#540
Browse files Browse the repository at this point in the history
)

Summary:
Pull Request resolved: flashlight#540

migration from facebookresearch into flashlight

Reviewed By: jacobkahn

Differential Revision: D27802384

fbshipit-source-id: 884a468581fe1cc1b5e26af1aebefdffaf7b8b94
  • Loading branch information
Tatiana Likhomanenko authored and facebook-github-bot committed Apr 15, 2021
1 parent 7e56c1f commit e9628da
Show file tree
Hide file tree
Showing 13 changed files with 49 additions and 49 deletions.
4 changes: 2 additions & 2 deletions .docker/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Flashlight and its dependencies can also be built with the provided Dockerfiles. Both CUDA and CPU backends are supported with Docker. The current Docker images are frozen at **Ubuntu 18.04** and **CUDA 10.0**; we update these periodically.

## Docker images on [Docker Hub](https://hub.docker.com/r/flml/flashlight/tags)

Docker images for the CUDA and CPU backends for each Flashlight commit are [available on Docker Hub](https://hub.docker.com/r/flml/flashlight/tags).

### Running Flashlight with Docker
Expand All @@ -27,7 +27,7 @@ cd /root/flashlight/build && make test

Using the Dockerfiles in this directory:
```shell
git clone --recursive https://github.com/facebookresearch/flashlight.git
git clone --recursive https://github.com/flashlight/flashlight.git
cd flashlight
# for CUDA backend
sudo docker build -f .docker/Dockerfile-CUDA -t flashlight .
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/docker_image_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
- master
jobs:
cuda_image_build:
if: github.repository_owner == 'facebookresearch'
if: github.repository_owner == 'flashlight'
name: CUDA image build
runs-on: ubuntu-latest
steps:
Expand All @@ -26,7 +26,7 @@ jobs:
- name: Docker logout
run: docker logout
cpu_image_build:
if: github.repository_owner == 'facebookresearch'
if: github.repository_owner == 'flashlight'
name: CPU image build
runs-on: ubuntu-latest
steps:
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ All contributors must sign the CLA for their pull requests to be eligible for me
You can find the CLA [here](https://code.facebook.com/cla).

## Issues
We use [GitHub issues](https://github.com/facebookresearch/flashlight/issues) to track public bugs. When filing, a bug, please make sure your description is clear and include sufficient instructions to reproduce the issue (for instance, your OS, compiler version, and selected backend).
We use [GitHub issues](https://github.com/flashlight/flashlight/issues) to track public bugs. When filing, a bug, please make sure your description is clear and include sufficient instructions to reproduce the issue (for instance, your OS, compiler version, and selected backend).

## License
By contributing to flashlight, you agree that your contributions will be licensed
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
| [**Installation**](#building-and-installing)
| [**Documentation**](https://fl.readthedocs.io/en/latest/)

[![CircleCI](https://circleci.com/gh/facebookresearch/flashlight.svg?style=shield)](https://circleci.com/gh/facebookresearch/flashlight)
[![CircleCI](https://circleci.com/gh/flashlight/flashlight.svg?style=shield)](https://app.circleci.com/pipelines/github/flashlight/flashlight)
[![Documentation Status](https://img.shields.io/readthedocs/fl.svg)](https://fl.readthedocs.io/en/latest/)
[![Docker Image Build Status](https://img.shields.io/github/workflow/status/facebookresearch/flashlight/Publish%20Docker%20images?label=docker%20image%20build)](https://hub.docker.com/r/flml/flashlight/tags)
[![Docker Image Build Status](https://img.shields.io/github/workflow/status/flashlight/flashlight/Publish%20Docker%20images?label=docker%20image%20build)](https://hub.docker.com/r/flml/flashlight/tags)
[![Join the chat at https://gitter.im/flashlight-ml/community](https://img.shields.io/gitter/room/flashlight-ml/community)](https://gitter.im/flashlight-ml/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

[![Docker Image for CUDA backend](https://img.shields.io/docker/image-size/flml/flashlight/cuda-latest?label=docker%20%28cuda%29&logo=docker)](https://hub.docker.com/r/flml/flashlight/tags?page=1&ordering=last_updated&name=cuda-latest)
Expand All @@ -26,8 +26,8 @@ tensor library.
- CUDA and CPU backends for GPU and CPU training.
- An emphasis on efficiency and scale.

Native support in C++ and simple extensibility makes Flashlight a powerful research framework that's *hackable to its core* and enables fast iteration on new experimental setups and algorithms without sacrificing performance. In a single repository, Flashlight provides [apps](https://github.com/facebookresearch/flashlight/tree/master/flashlight/app) for research across multiple domains:
- [Automatic speech recognition](https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/asr) (the [wav2letter](https://github.com/facebookresearch/wav2letter/) project) — [Documentation](flashlight/app/asr) | [Tutorial](flashlight/app/asr/tutorial)
Native support in C++ and simple extensibility makes Flashlight a powerful research framework that's *hackable to its core* and enables fast iteration on new experimental setups and algorithms without sacrificing performance. In a single repository, Flashlight provides [apps](https://github.com/flashlight/flashlight/tree/master/flashlight/app) for research across multiple domains:
- [Automatic speech recognition](https://github.com/flashlight/flashlight/tree/master/flashlight/app/asr) (the [wav2letter](https://github.com/flashlight/wav2letter/) project) — [Documentation](flashlight/app/asr) | [Tutorial](flashlight/app/asr/tutorial)
- [Image classification](flashlight/app/imgclass)
- [Object detection](flashlight/app/objdet)
- [Language modeling](flashlight/app/lm)
Expand Down Expand Up @@ -188,7 +188,7 @@ To build the Flashlight CPU backend from source using dependencies installed wit
##### Build Using the `vcpkg` Toolchain File
To build Flashlight from source with these dependencies, clone the repository:
```shell
git clone https://github.com/facebookresearch/flashlight.git && cd flashlight
git clone https://github.com/flashlight/flashlight.git && cd flashlight
mkdir -p build && cd build
```
Then, build from source using `vcpkg`'s [CMake toolchain](https://github.com/microsoft/vcpkg/blob/master/docs/users/integration.md#cmake-toolchain-file-recommended-for-open-source-cmake-projects):
Expand All @@ -209,7 +209,7 @@ Some dependencies marked below are downloaded and installed automatically if not

**Once all dependencies are installed**, clone the repository:
```shell
git clone https://github.com/facebookresearch/flashlight.git && cd flashlight
git clone https://github.com/flashlight/flashlight.git && cd flashlight
mkdir -p build && cd build
```
Then build all Flashlight components with:
Expand All @@ -224,7 +224,7 @@ To build a smaller subset of Flashlight features/apps, see the [build options](#

To install Flashlight in a custom directory, use CMake's [`CMAKE_INSTALL_PREFIX`](https://cmake.org/cmake/help/v3.10/variable/CMAKE_INSTALL_PREFIX.html) argument. Flashlight libraries can be built as shared libraries using CMake's [`BUILD_SHARED_LIBS`](https://cmake.org/cmake/help/v3.10/variable/BUILD_SHARED_LIBS.html) argument.

Flashlight uses modern CMake and `IMPORTED` targets for most dependencies. If a dependency isn't found, passing `-D<package>_DIR` to your `cmake` command or exporting `<package>_DIR` as an environment variable equal to the path to `<package>Config.cmake` can help locate dependencies on your system. See [the documentation](https://cmake.org/cmake/help/v3.10/command/find_package.html) for more details. If CMake is failing to locate a package, check to see if a corresponding [issue](https://github.com/facebookresearch/flashlight/issues) has already been created before creating your own.
Flashlight uses modern CMake and `IMPORTED` targets for most dependencies. If a dependency isn't found, passing `-D<package>_DIR` to your `cmake` command or exporting `<package>_DIR` as an environment variable equal to the path to `<package>Config.cmake` can help locate dependencies on your system. See [the documentation](https://cmake.org/cmake/help/v3.10/command/find_package.html) for more details. If CMake is failing to locate a package, check to see if a corresponding [issue](https://github.com/flashlight/flashlight/issues) has already been created before creating your own.

#### Dependencies

Expand Down
4 changes: 2 additions & 2 deletions bindings/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ where ``ntokens`` is the number of tokens predicted for each frame (number of cl
### Beam-search decoder
Currently lexicon-based and lexicon-free based beam-search decoder is supported for CTC/ASG models only (no seq2seq models support). Also only n-gram (KenLM) language model is supported for python bindings.
However, one can define custom language model inside python and use it for decoding, details see below.
To have better understanding how this beam-search decoder works please see [Beam-search decoder section](https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/asr#beam-search-decoders).
To have better understanding how this beam-search decoder works please see [Beam-search decoder section](https://github.com/flashlight/flashlight/tree/master/flashlight/app/asr#beam-search-decoders).

To run decoder one first should define its options:
```python
Expand Down Expand Up @@ -182,7 +182,7 @@ To run decoder one first should define its options:

Then we should prepare tokens dictionary (tokens for which acoustic models
returns probability for each frame), lexicon (mapping between words and its spelling with the tokens set).
Details on the tokens and lexicon files format have a look at [Data Preparation](https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/asr#data-preparation).
Details on the tokens and lexicon files format have a look at [Data Preparation](https://github.com/flashlight/flashlight/tree/master/flashlight/app/asr#data-preparation).

```python
from flashlight.lib.text.dictionary import Dictionary, load_words, create_word_dict
Expand Down
12 changes: 6 additions & 6 deletions flashlight/app/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Flashlight Applications

Flashlight application libraries are domain-specific libraries build on top of the [flashlight core](https://github.com/facebookresearch/flashlight/tree/master/flashlight/fl) and [flashlight lib](https://github.com/facebookresearch/flashlight/tree/master/flashlight/lib). They provide lightweight, unopinionated pipelines and tools that are easily modifiable for training or inference across tasks. Below are supported applications; new applications are under active development.
Flashlight application libraries are domain-specific libraries build on top of the [flashlight core](https://github.com/flashlight/flashlight/tree/master/flashlight/fl) and [flashlight lib](https://github.com/flashlight/flashlight/tree/master/flashlight/lib). They provide lightweight, unopinionated pipelines and tools that are easily modifiable for training or inference across tasks. Below are supported applications; new applications are under active development.

### [Automatic Speech Recognition](https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/asr)`asr` (the [wav2letter](https://github.com/facebookresearch/wav2letter/) Project)
### [Automatic Speech Recognition](https://github.com/flashlight/flashlight/tree/master/flashlight/app/asr)`asr` (the [wav2letter](https://github.com/flashlight/wav2letter/) Project)

The `asr` application provides tools for audio processing/augmentation, acoustic model training, beam search decoding, and preprocessing/preparing audio data for use. Full documentation for usage and binaries [can be found here](https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/asr).
The `asr` application provides tools for audio processing/augmentation, acoustic model training, beam search decoding, and preprocessing/preparing audio data for use. Full documentation for usage and binaries [can be found here](https://github.com/flashlight/flashlight/tree/master/flashlight/app/asr).

#### Provided Artifacts:
- Binaries:
Expand All @@ -15,16 +15,16 @@
- `fl_asr_tutorial_inference_ctc`
- `fl_asr_tutorial_finetune_ctc`

### [Language Modeling](https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/lm)`lm`
### [Language Modeling](https://github.com/flashlight/flashlight/tree/master/flashlight/app/lm)`lm`

The `lm` application provides tools for text preprocessing and language model training for both auto-regressive and BERT-style models. Full documentation for usage and binaries [can be found here](https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/lm).
The `lm` application provides tools for text preprocessing and language model training for both auto-regressive and BERT-style models. Full documentation for usage and binaries [can be found here](https://github.com/flashlight/flashlight/tree/master/flashlight/app/lm).

#### Provided Artifacts:
- Binaries:
- `fl_lm_dictionary_builder`
- `fl_lm_train`

### [Image Classification](https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/imgclass) — `imgclass`
### [Image Classification](https://github.com/flashlight/flashlight/tree/master/flashlight/app/imgclass) — `imgclass`

The `imgclass` application is still in early, active development. It currently provides dataset abstractions for ImageNet and an example training pipeline for `Resnet34` which can be easily extended to more complex setups.

Expand Down
10 changes: 5 additions & 5 deletions flashlight/app/asr/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Automatic Speech Recognition (ASR)

Flashlight's ASR application (formerly the [wav2letter](https://github.com/facebookresearch/wav2letter/) project) provides training and inference capabilities for end-to-end speech recognition systems. Outside of original research conducted with Flashlight and wav2letter, the codebase contains up-to-date implementations of recent architectures and developments in the speech domain.
Flashlight's ASR application (formerly the [wav2letter](https://github.com/flashlight/wav2letter/) project) provides training and inference capabilities for end-to-end speech recognition systems. Outside of original research conducted with Flashlight and wav2letter, the codebase contains up-to-date implementations of recent architectures and developments in the speech domain.

**To get started using the ASR library with existing/pre-trained models, see [tutorials](https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/asr/tutorial).**
**To get started using the ASR library with existing/pre-trained models, see [tutorials](https://github.com/flashlight/flashlight/tree/master/flashlight/app/asr/tutorial).**

### Table of Contents

Expand Down Expand Up @@ -407,7 +407,7 @@ epoch: 6 | nupdates: 1000 | lr: 0.000469 | lrcriterion: 0.000000
where we report epochs, number of updates, learning rates, timing, loss and WER/LER for train and validation data.
#### Flags
We give a short description of some of the more important flags here. A complete list of the flag definitions and short descriptions of their meaning can be found [here](https://github.com/facebookresearch/flashlight/blob/master/flashlight/app/asr/common/Defines.cpp).
We give a short description of some of the more important flags here. A complete list of the flag definitions and short descriptions of their meaning can be found [here](https://github.com/flashlight/flashlight/blob/master/flashlight/app/asr/common/Defines.cpp).
The `datadir` flag is the base path to where all the `train` and `valid` dataset list files live. Every `train` path will be prefixed by `datadir`. Multiple datasets can be passed to `train` and `valid` as a comma-separated list.
Expand Down Expand Up @@ -504,7 +504,7 @@ root → w → o → r → l → d ([world])
root → p → i → n → e ([pine]) → a → p → p → l → e → ([pineapple])
```

**Note** We forced to have up to 6 words with the same spelling, others will be ignored in the inference. So if you have more than 6 words in the lexicon, you need to [update this constant](https://github.com/facebookresearch/flashlight/blob/master/flashlight/lib/text/decoder/Trie.h#L17).
**Note** We forced to have up to 6 words with the same spelling, others will be ignored in the inference. So if you have more than 6 words in the lexicon, you need to [update this constant](https://github.com/flashlight/flashlight/blob/master/flashlight/lib/text/decoder/Trie.h#L17).

##### 1.2 Lexicon-free beam-search decoder (`uselexicon=false`)
The lexicon-free beam-search decoder considers any possible token as candidates and there is no notion of words during decoding. In this case, a word separator should be set by `wordseparator` and included into tokens set for AM training. The word separator is treated and predicted as all the other normal tokens. After obtaining the transcription in tokens, word separator is used to split the sequence into words. Usually, when we use word-pieces as target units, the word separator can be part of the token. To correctly handle this case, one should set `--usewordpiece=true`.
Expand All @@ -529,7 +529,7 @@ Currently we are supporting decoding with the following language models: ZeroLM,

**KenLM** language model can be trained standalone with [KenLM library](https://kheafield.com/code/kenlm/). The text data should be prepared accordingly to the acoustic model data. For example, in case of word-level LM if your AM token set doesn’t contain punctuation, then remove all punctuation from the data. In case of token-level LM training one should split words into tokens set sequence and only then train LM on such data in a way that LM predicts probability for a token (not for a word). Both of the `.arpa` and the binarized `.bin` LM can be used. However it is recommended to convert arpa files to the [binary format](https://github.com/kpu/kenlm#querying) for faster loading.

**ConvLM** models are convolutional neural networks. They are currently trained in the [fairseq](https://github.com/pytorch/fairseq) and then converted into flashlight-serializable models ([example](https://github.com/facebookresearch/wav2letter/blob/master/recipes/lexicon_free/librispeech/convert_convlm.sh) how we are doing this) to be able to load. `lm_vocab` should be specified as it is a dictionary to map tokens into indices in the ConvLM training. Note that this token set is usually different from the one used in AM training. Each line of this file is a single token (char, word, word-piece, etc.) and the token index is exactly its line number.
**ConvLM** models are convolutional neural networks. They are currently trained in the [fairseq](https://github.com/pytorch/fairseq) and then converted into flashlight-serializable models ([example](https://github.com/flashlight/wav2letter/blob/master/recipes/lexicon_free/librispeech/convert_convlm.sh) how we are doing this) to be able to load. `lm_vocab` should be specified as it is a dictionary to map tokens into indices in the ConvLM training. Note that this token set is usually different from the one used in AM training. Each line of this file is a single token (char, word, word-piece, etc.) and the token index is exactly its line number.

To efficiently decode with ConvLM, which is pretty expensive on running the forward pass, we design a dynamic cache to hold the probabilities over all the tokens given the candidates generated from the previous frame. This way, when we want to propose new candidates, we can easily check the cache for its pre-computed LM score. In other words, we only need to run the ConvLM forward pass in batches at the end of decoding each frame, when all the possible new candidates are gathered. Thus, the batching and caching can greatly reduce the number of the forward passes we need to run in total.

Expand Down
Loading

0 comments on commit e9628da

Please sign in to comment.