Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ruby bindings #83

Open
wants to merge 99 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
6c49a41
refactor
eisber Feb 8, 2023
09dada4
ported loading to Rust
eisber Feb 9, 2023
4c6afcb
fixed?
eisber Feb 9, 2023
502a6ec
Propagate errors from split whitespace in bpe_merges
jackgerrits Feb 9, 2023
c685d99
moving registry into rust
eisber Feb 9, 2023
6dcfa91
return ref
jackgerrits Feb 9, 2023
25306a5
split into multiple libs
eisber Feb 10, 2023
ba69bd9
fixed core
eisber Feb 10, 2023
c98f824
Fix python setup.py for package in workspace
jackgerrits Feb 10, 2023
f248694
java bindings...
eisber Feb 10, 2023
2f371f6
more TODO
eisber Feb 10, 2023
1d3f707
Initial VERY rough impl of JNI layer
jackgerrits Feb 11, 2023
95aef95
Add wasm-bindgen, inline ranks
dqbd Feb 19, 2023
799df7f
v0.2.1
dqbd Feb 19, 2023
5a32b88
Update README.md, polish API
dqbd Feb 19, 2023
97d8dea
improve error handling
eisber Feb 22, 2023
3139d04
chore: add README
dqbd Feb 23, 2023
02d132e
feat: add option to extend special tokens and to provide custom bfe
dqbd Feb 23, 2023
46668df
Merge pull request #2 from dqbd/extendability
dqbd Feb 23, 2023
3a39f24
Bump version, update README
dqbd Feb 23, 2023
d0dc32b
Improve error handling, add support for parameters
dqbd Feb 23, 2023
90ee9f6
Fix `any` in TS files, add core tests
dqbd Feb 23, 2023
f7fe717
Update README.md, add tests, fix disallowed special bug
dqbd Feb 24, 2023
d38c936
Validate the values properly
dqbd Feb 24, 2023
53628a4
Improve error handling in JNI functions
jackgerrits Feb 24, 2023
87603e9
Bump version to 0.4.0
dqbd Feb 24, 2023
42548d8
Merge pull request #1 from eisber/error_handling
eisber Feb 27, 2023
e3ab3f6
moved config into json
eisber Feb 27, 2023
22584d4
add github action
eisber Feb 27, 2023
42efde4
add jar build
eisber Feb 27, 2023
484f15a
fix rust build
eisber Feb 27, 2023
6c45bf6
build jar
eisber Feb 27, 2023
b883d0c
fix path
eisber Feb 27, 2023
98070f1
fix java
eisber Feb 27, 2023
3b39f6b
Merge remote-tracking branch 'upstream/main' into main
eisber Feb 27, 2023
4e70da0
cleanup
eisber Feb 28, 2023
381d32b
update groupid
eisber Feb 28, 2023
f1560bd
remove comments
eisber Feb 28, 2023
01d4f9e
Move to separate js folder
dqbd Mar 1, 2023
4d8f9af
Merge branch 'eisber/main'
dqbd Mar 1, 2023
ef77b1a
Make sure JS builds
dqbd Mar 1, 2023
b04f0cf
Attempt to fix sdist
dqbd Mar 2, 2023
bbcb591
Match sdist
dqbd Mar 2, 2023
d1c4af2
Remove the _js suffix
dqbd Mar 2, 2023
33207e6
Update to newer build wheels
dqbd Mar 2, 2023
2370284
Fix wrong result for None
dqbd Mar 5, 2023
98ac953
Add CI step to build and test
dqbd Mar 5, 2023
d989c22
CI: install initialize wasm-pack
dqbd Mar 5, 2023
01bf979
Fix Python CI
dqbd Mar 5, 2023
82cd413
Replace yum with apt
dqbd Mar 5, 2023
7db26cb
Try yum once again
dqbd Mar 5, 2023
08403a1
debug ci
dqbd Mar 5, 2023
12805e3
Try CI again
dqbd Mar 5, 2023
ae30c13
Fix CI again
dqbd Mar 5, 2023
dabb296
invalid `-y` command
dqbd Mar 5, 2023
8e40682
Revert "debug ci"
dqbd Mar 5, 2023
4866cf9
Optimize performace, enable/disable features of core
dqbd Mar 5, 2023
33bb13d
Add custom initialisation, bundler changes
dqbd Mar 7, 2023
03cdbb3
Flatten the structure
dqbd Mar 8, 2023
7c8cd78
Run WASM build sequentially in CI for now
dqbd Mar 8, 2023
fcb52cf
Rename TiktokenEmbedding to TiktokenEncoding
dqbd Mar 8, 2023
945d4f2
Reverse order of default
dqbd Mar 8, 2023
9ea421b
Merge pull request #12 from dqbd/async-init-bundler
dqbd Mar 8, 2023
680fbc5
Update README.md
dqbd Mar 8, 2023
7c75b04
Merge remote-tracking branch 'upstream/main'
dqbd Mar 8, 2023
c1d11fb
Add gpt-3.5-turbo support in types and matchers
dqbd Mar 8, 2023
268bc5c
Add README.md
dqbd Mar 8, 2023
9f9ad0d
Merge pull request #13 from dqbd/gpt-3.5-turbo
dqbd Mar 8, 2023
15dd0f2
Add caveats for CFW
dqbd Mar 10, 2023
2c7e0e4
Create a lite build which defers loading of weights to consumers
dqbd Mar 11, 2023
586d205
Fix README.md
dqbd Mar 11, 2023
6d8c1dc
Cleanup
dqbd Mar 11, 2023
64178e5
Expose loading script
dqbd Mar 11, 2023
17dd0ba
Add polyfill for Buffer.from
dqbd Mar 11, 2023
4f6745f
Merge pull request #15 from dqbd/lite-build
dqbd Mar 11, 2023
e6f0726
Fix exports for CJS (Node ESM)
dqbd Mar 11, 2023
1ee26c3
Remove bundler, as it is unnecessary
dqbd Mar 11, 2023
8984ea7
Add disclaimer for CFW
dqbd Mar 11, 2023
9c8caec
Add support for GPT-4
dqbd Mar 14, 2023
103d010
Expose model_to_encoding.json and registry.json
dqbd Mar 15, 2023
79ac036
Bump to 1.0.0-alpha.5
dqbd Mar 15, 2023
f748754
Fix lite crash, add README.md
dqbd Mar 15, 2023
bfe3817
Update README.md
dqbd Mar 15, 2023
9847f4d
Bump to 1.0.0-alpha.6
dqbd Mar 15, 2023
c01af19
Add JSON importable modules
dqbd Mar 15, 2023
23bb57d
Bump to 1.0.0-alpha.7
dqbd Mar 15, 2023
a3baa6c
Bump to 1.0.0-alpha.8
dqbd Mar 15, 2023
6ac1a1a
Compress ranks
dqbd Mar 15, 2023
7efba72
Use compressed version to make main WASM smaller
dqbd Mar 15, 2023
4d4b921
Bump to 1.0.0-alpha.10
dqbd Mar 15, 2023
efe3728
Update README.md
dqbd Mar 15, 2023
8476eca
Bump to 1.0.0
dqbd Mar 15, 2023
e1c4313
Fix issues with duplicate initialization
dqbd Mar 16, 2023
c4dfaad
Bump to 1.0.1
dqbd Mar 16, 2023
68efa86
Clarifies usage
dqbd Mar 16, 2023
481fb45
Bump to 1.0.2
dqbd Mar 16, 2023
02794bd
Seems like it works..
arjun810 Mar 20, 2023
f1d8f3c
Add workflow to build ruby gem, add missing lib folder.
arjun810 Mar 20, 2023
c16c310
Fix compilation issues by removing OpenSSL and using the right linker
arjun810 Mar 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions .github/workflows/build_ruby_gem.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
name: Build ruby gem
on: workflow_dispatch
env:
CACHE_VERSION: "v0"
CARGO_CACHE_CLEAN: "true"
RUBY_VERSIONS: "3.2"
jobs:
release:
strategy:
fail-fast: false
matrix:
include:
- platform: x86_64-linux
target: x86_64-unknown-linux-gnu
- platform: x86_64-linux-musl
target: x86_64-unknown-linux-musl
- platform: aarch64-linux
target: aarch64-unknown-linux-gnu
- platform: x86_64-darwin
target: x86_64-apple-darwin
# Rust uses external command to strip symbols and debuginfo on Mac
# Do not do for arm64 since it interferes with code signing
# and codesign binary is not present to re-sign
setup: sudo ln -s /opt/osxcross/target/bin/x86_64-apple-darwin-strip /usr/local/bin/strip
- platform: arm64-darwin
target: aarch64-apple-darwin
# - platform: x64-mingw-ucrt
# target: x86_64-pc-windows-gnu
# - platform: x64-mingw32
# target: x86_64-pc-windows-gnu
runs-on: ubuntu-latest
name: ${{ matrix.platform }}
steps:
- uses: actions/checkout@v3
- uses: ruby/setup-ruby@v1
with:
ruby-version: 3.2
- name: generate rank
run: bundle install && bundle exec rake rank
working-directory: "ruby"

# Didn't use https://github.com/oxidize-rb/actions/tree/main/cross-gem due to weird directory structure, but this code is adapted from there.
- name: Configure environment
run: |
: Configure environment
echo "RB_SYS_DOCK_UID=$(id -u)" >> $GITHUB_ENV
echo "RB_SYS_DOCK_GID=$(id -g)" >> $GITHUB_ENV
rb_sys_dock_cache_dir="$HOME/.cache/rb-sys-dock"
mkdir -p "$rb_sys_dock_cache_dir"
echo "RB_SYS_DOCK_CACHE_DIR=$rb_sys_dock_cache_dir" >> $GITHUB_ENV
- name: Setup caching
uses: actions/cache@v3
with:
path: |
${{ env.RB_SYS_DOCK_CACHE_DIR }}
${{ github.workspace }}/ruby/tmp/rb-sys-dock/${{ matrix.platform }}/target
key: rb-sys-dock-${{ env.CACHE_VERSION }}-${{ matrix.platform }}-${{ hashFiles('**/Gemfile.lock', '**/Cargo.lock') }}
restore-keys: |
rb-sys-dock-${{ env.CACHE_VERSION }}-${{ matrix.platform }}-
- name: Install cargo-cache
uses: oxidize-rb/actions/cargo-binstall@v1
id: install-cargo-cache
if: env.CARGO_CACHE_CLEAN == 'true'
with:
crate: cargo-cache
version: 0.8.3
strategies: quick-install

- name: Clean the cargo cache
if: env.CARGO_CACHE_CLEAN == 'true'
uses: oxidize-rb/actions/post-run@v1
with:
run: cargo-cache --autoclean
cwd: ${{ github.workspace }}

#- name: Start SSH session
# uses: luchihoratiu/debug-via-ssh@main
# with:
# NGROK_AUTH_TOKEN: ${{ secrets.NGROK_AUTH_TOKEN }}
# SSH_PASS: ${{ secrets.SSH_PASS }}

- name: Build gem
env:
INPUT_RUBY_VERSIONS: "${{ env.RUBY_VERSIONS }}"
INPUT_PLATFORM: "${{ matrix.platform }}"
run: |
: Compile gem
set -x
args=()
args+=("--platform")
args+=("$INPUT_PLATFORM")
if [ "$INPUT_RUBY_VERSIONS" != "default" ]; then
args+=("--ruby-versions")
args+=("$INPUT_RUBY_VERSIONS")
fi
BUNDLE_GEMFILE=ruby/Gemfile bundle exec rb-sys-dock "${args[@]}" --build -- "cd ruby && bundle install && export CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER=aarch64-linux-gnu-gcc && env | grep CARGO"

- name: Set outputs
id: set-outputs
run: |
: Set output
echo "gem-path=$(find ${{ github.workspace }}/ruby/pkg -name '*-${{ matrix.platform }}.gem')" >> $GITHUB_OUTPUT

- uses: actions/upload-artifact@v3
with:
name: cross-gem
path: ${{ steps.set-outputs.outputs.gem-path }}
53 changes: 0 additions & 53 deletions .github/workflows/build_wheels.yml

This file was deleted.

4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,7 @@ htmlcov

Cargo.lock
target/

# WASM
ranks/
node_modules
30 changes: 13 additions & 17 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,21 +1,17 @@
[package]
name = "tiktoken"
version = "0.3.0"
edition = "2021"
rust-version = "1.57.0"
[workspace]

[lib]
name = "_tiktoken"
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.17.3", features = ["extension-module"] }

# tiktoken dependencies
fancy-regex = "0.10.0"
regex = "1.7.0"
rustc-hash = "1.1.0"
bstr = "1.0.1"
members = [
"core",
"jni",
"js",
"python",
"ruby",
]

[profile.release]
incremental = true
opt-level = 's' # Optimize for size
lto = true # Enable link-time optimization
codegen-units = 1 # Reduce number of codegen units to increase optimizations
panic = 'abort' # Abort on panic
strip = true # Strip symbols from binary*
7 changes: 6 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,9 @@ include Makefile
global-include py.typed
recursive-include scripts *.py
recursive-include tests *.py
recursive-include src *.rs
recursive-include core *.rs *.toml
recursive-include python *.rs *.toml
recursive-exclude jni *
recursive-exclude java *
recursive-exclude js *
include tiktoken *.json
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ Layout your project like this, making sure to omit the `tiktoken_ext/__init__.py
```
my_tiktoken_extension
├── tiktoken_ext
   └── my_encodings.py
└── my_encodings.py
└── setup.py
```

Expand All @@ -101,4 +101,3 @@ setup(

Then simply `pip install ./my_tiktoken_extension` and you should be able to use your
custom encodings! Make sure **not** to use an editable install.

26 changes: 26 additions & 0 deletions core/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[package]
name = "tiktoken_core"
version = "0.3.0"
edition = "2021"
rust-version = "1.57.0"

[lib]
name = "_tiktoken_core"
crate-type = ["lib"]

[dependencies]
# tiktoken dependencies
fancy-regex = "0.10.0"
regex = "1.7.0"
rustc-hash = "1.1.0"
bstr = "1.0.1"
reqwest = { version = "0.11.14", features = ["rustls-tls", "blocking"], default-features = false }
sha1 = "0.10.5"
json = "0.12.4"
base64 = "0.21.0"
lazy_static = "1.4.0"

[features]
default = []
lazyload = []
multithreading = []
Loading