[Rust] yukarin_sa_forward と decode_forward を実装（yukarin_s_forward の再実装含む） #158

PickledChair · 2022-06-11T16:11:43Z

内容

yukarin_sa_forward と decode_forward の実装を目指します。
~~現在、yukarin_sa_forward が推論に失敗するので、draft にしています。~~
現在、decode_forward の実装に取り組み中なので、draft にしています。

入力テンソルのうち、length のみが次元０であるため、引数は複数の次元のテンソルが混在する Vec となっています。そのためにテンソルの型を ArrayD<i64> にして混在できるようにしてみました~~が、それではうまくいかないようです~~。これでうまくいきました……！

関連 Issue

ref #128

…re/implements-yukarin_s_forward

…re/implements-yukarin_sa_forward-and-decode_forward

crates/voicevox_core/src/internal.rs

Co-authored-by: Hiroshiba <[email protected]>

PickledChair · 2022-06-12T11:23:20Z

yukarin_sa_forward については、speaker_id を tensor として渡し忘れるという凡ミスをしていただけでした。修正したところ、うまく機能するようになりました（~~後ほど修正を push します~~ 修正を反映しました！）。ビルドしたコアを用いて voicevox_engine を起動し、audio_query api を叩くと期待通りに JSON を取得できるようになりました。テストも後ほど追加しようと思います。

一方、decode_forward でも同様のミスをしていたので修正しようと思ったのですが、speaker_id だけ tensor の要素の型が i64 なので（f0 と phoneme の場合は f32）、yukarin_sa_forward の場合と同じ方法では Vec を作れませんでした。異なる要素の型が混在している tensor の Vec が作れるのかどうかについて rust-ndarray/ndarray#651 なども参考にしたりして情報を集めているのですが、どのように実装するべきなのか（今度こそ onnxruntime-rs に迂回的な API を追加するしかないのかどうかも含めて）まだ調査中です。もし実装にあたって有用な情報があれば共有していただけると本当に助かります……！

PickledChair · 2022-06-12T11:28:06Z

あるいは、本 PR は yukarin_sa_forward の実装のみにする方が良いのかもしれません（張り切りすぎて、decode_forward と合わせて２つとも実装しようとしてしまっていました🙇）

Hiroshiba · 2022-06-12T12:35:37Z

speaker_id だけ tensor の要素の型が i64 なので
Vec を作れませんでした

こちら、型が合わないからRustのVecが作れないということですよね。
･･･Vec作るの無理な気がしますね！

Rustよくわかってないのですが、何かしらのIterなら作れるかも？
そうしたとしてもonnxruntime-rsのrunがVecを受け取る形式だから、Iter受け取る形式にしないとダメかも･･･
https://github.com/nbigaouette/onnxruntime-rs/blob/cd5a6ebae8abc06080f4da9a248be590fc6ce2fe/onnxruntime/src/session.rs#L378-L381

本 PR は yukarin_sa_forward の実装のみにする方が良いのかもしれません

コード量的にはどちらも一緒で良さそうなのですが、たしかにdecodeはむずかしそうなので分けても良さそうに感じました！

PickledChair · 2022-06-12T13:02:10Z

こちら、型が合わないからRustのVecが作れないということですよね。
･･･Vec作るの無理な気がしますね！

Rustよくわかってないのですが、何かしらのIterなら作れるかも？
そうしたとしてもonnxruntime-rsのrunがVecを受け取る形式だから、Iter受け取る形式にしないとダメかも･･･

onnxruntime-rs の Session::run のコードをもう少し詳しく眺めてみているのですが、やはり今回の decode_forward の問題の解決のために onnxruntime-rs に新しい API を追加しなければならないように見えます。どのような形にすべきかはちょっとまだよく分かりません……（引数をイテレータにするだけでは Array の要素の型を混在させることはできない気がします。Vec の要素の型として、Array に enum をかぶせたものを要素にするか、Box<dyn Arrayのための何かのトレイト> みたいなものを要素にするか、あるいは何か他の解決策を考えるか……）。

コード量的にはどちらも一緒で良さそうなのですが、たしかにdecodeはむずかしそうなので分けても良さそうに感じました！

直ちに解決するのは難しそうです。onnxruntime-rs 側の対応の仕方について @qwerty2501 さんにお聞きしてから、本 PR の方針を決められたらと思いました……！

qwerty2501 · 2022-06-12T13:27:56Z

@PickledChair ~~voicevox_core側で onnxruntime-rsの TypeToTensorElementDataType を実装したenumを定義してやれば目的を果たせそうに見えます~~
onnxruntime-rsで対応の必要あるかもです

qwerty2501 · 2022-06-12T13:37:10Z

いやenumだとだめかも・・・？

PickledChair · 2022-06-12T14:30:30Z

@qwerty2501

いやenumだとだめかも・・・？

ちょっとやってみようとしたのですが、tensor_element_data_type が引数に self を含まないので、パターンマッチで適切な TensorElementDataType を返すということができなさそうでした。

/// Trait used to map Rust types (for example `f32`) to ONNX types (for example `Float`)
pub trait TypeToTensorElementDataType {
    /// Return the ONNX type for a Rust type
    fn tensor_element_data_type() -> TensorElementDataType;

    /// If the type is `String`, returns `Some` with utf8 contents, else `None`.
    fn try_utf8_bytes(&self) -> Option<&[u8]>;
}

qwerty2501 · 2022-06-12T15:31:07Z

@PickledChair そうですね。なのでonnxruntimeを修正する必要はありそうです

Hiroshiba · 2022-06-12T16:20:05Z

なるほどです！　作戦は３つほどありそうです。

onnxruntime-sysを使う
- C APIなのでコードがちょっと複雑になってしまうかも
このPRを一旦マージし、 @qwerty2501 さんのonnxruntimeラッパーを修正して対応
- リポジトリの所持者の関係で @qwerty2501 さんにお願いするのが一番手っ取り早そう（お手数おかけします）
- か、修正PRを送っていただく
このPRを一旦マージし、VOICEVOX側で @qwerty2501 さんのonnxruntime-rsをフォークして修正して対応
- VOICEVOXにPRを送っていただく形
- 更に修正がある場合はPRとマージが必要になって速度が落ちてしまう

VOICEVOX的には、onnxruntimeの追従が簡単そうなので、意外と1番の方法が最も良いのかなと思っています。

2か3は、 @qwerty2501 さんさえよければ2の方がスピーディだと思いますが、お手数おかけしちゃうので、その場合は3が良いのかなと思っています。

PickledChair · 2022-06-12T16:31:27Z

「とりあえず音声合成に使用できるコア」を作るためには、今回の問題が最後のハードルになっていると思うので、2 か 3 でなんとか押し切れないかと感じているところです……！

qwerty2501 · 2022-06-12T17:55:01Z

1はrunのとこだけonnxruntime-sysで実装するといびつになるのでちょっと避けたほうが良さそうですね
いま治そうとしてみて入るのですが、内部の型が結構直さないといけないところがあるのでちょっと時間かかりそうです。
最終的にそちらにbranch移管するつもりですし、急ぐなら3でも良いかなとは思いますが結果的にはあまりかかる時間は川ならさそうです

Hiroshiba · 2022-06-12T17:59:35Z

なるほどです、お二人ともありがとうございます！

2か3に関しては、Rustに強く、かつonnxruntime-rsの地理にも詳しい @qwerty2501 さんがにお任せするのが一番速いかなと感じています。
お忙しくなければぜひお任せし、2で行かせて頂きたいです。

2にしろ3にしろ、一旦マージしちゃえばチェックもしやすいのかなと思います！

qwerty2501 · 2022-06-12T18:58:39Z

@PickledChair @Hiroshiba ちょっとonnxruntime-rsのラッパーの実装が高レベル過ぎて良くないと感じています。
今回はとりあえず動くことを目的として実装して将来的にはonnxruntime-sys部分を使用し、onnxruntime-rsのソースコードは捨ててラッパーを新しく実装するのが良いと感じるようになりました。(voivcevox_coreに実装するか、onnxruntime-rsに新しいラッパーとして実装し直すかは要検討)
そういう意味では二転三転してもうしわけないですが1も選択肢に入るかもです。(早く実装できるのであれば)
とりあえず2で進めてはいますが時間かかりそうなら1でもよいかもです

PickledChair · 2022-06-13T06:48:44Z

@qwerty2501

ちょっとonnxruntime-rsのラッパーの実装が高レベル過ぎて良くないと感じています。

これは本当にそうですね……。

最終的な形をどうするかは置いておくとして、最短で実装できそうな案を考えてみました。onnxruntime-rs に unsafe な薄い wrap 関数を追加する方法です。根本的な考え方の間違いがなければ、大体以下の３ステップで実現できる気がしました。

1. `g_ort().Run()` を薄く wrap する関数を `Session` のメソッドとして追加する

Session::run メソッドの処理のうち、主に以下の部分のみを切り出した薄い wrapper 関数を Session のメソッドとして定義する

https://github.com/qwerty2501/onnxruntime-rs/blob/dec3bcdd880cf69a86207166f60cc986d0494d09/onnxruntime/src/session.rs#L472-L485

2. `CreateTensorWithDataAsOrtValue` 関数の薄い wrapper 関数を `Session` のメソッドとして追加する

1. の関数を使用するために OrtValue を voicevox_core 側で作成する必要が出てくるので、そのための関数を onnxruntime-rs 側に用意する。主な処理は以下の部分とする。ただし、Session が持っている MemoryInfo を参照できるようにしなければならないので、Session のメソッドとして実装するのが最も早いと思われる（理想的には MemoryInfo を Session の外に持つように設計変更した方が良いかもしれないのですが、それだともう少し工数がかかりそう）。

https://github.com/qwerty2501/onnxruntime-rs/blob/dec3bcdd880cf69a86207166f60cc986d0494d09/onnxruntime/src/tensor/ort_tensor.rs#L76-L89

追記：一連の処理が終了した後にOrtValue のメモリ解放をするための関数も API として公開した方が良さそう？（以下のコードを参照）

https://github.com/qwerty2501/onnxruntime-rs/blob/dec3bcdd880cf69a86207166f60cc986d0494d09/onnxruntime/src/tensor/ort_tensor.rs#L163-L180

3. 上記の追加の API を用いて `voicevox_core` 側の処理を実装する

qwerty2501 · 2022-06-13T07:19:58Z

@PickledChair とりあえず今日頑張ってだめそうだったらその案にしようと思います

qwerty2501 · 2022-06-13T11:35:21Z

@PickledChair 修正したので 0.0.24 にupgradeしてみてください

使い方は example 参照してください

PickledChair · 2022-06-13T15:47:47Z

@qwerty2501

修正したので 0.0.24 にupgradeしてみてください

0.0.24 に upgrade し、新しい API を使って yukarin_s_foward および yukarin_sa_forward の書き直し、および decode_forward の実装を行いました。そしてビルドしたコアを用いて voicevox_engine で音声合成を行うことに成功しました！　すごく使いやすい実装ありがとうございます……！（自分には思いつかないやり方でした）

この後 test を追加したいと思います。また、エラーの定義の仕方・扱い方が自分でも微妙に見えるので、意見をいただければと思います……！

qwerty2501

この機能のテストコードは正直私もどう書いていいかわからないですが最低限呼び出してOkかえることは確認したほうがいいと思います

crates/voicevox_core/src/internal.rs

crates/voicevox_core/src/status.rs

crates/voicevox_core/src/internal.rs

Hiroshiba

LGTM！！

処理を共通化できそうな箇所は結構ありますが、一旦マージしちゃって動かしてみるのが有線かなと思います！

PickledChair · 2022-06-13T22:04:52Z

この機能のテストコードは正直私もどう書いていいかわからないですが最低限呼び出してOkかえることは確認したほうがいいと思います

了解です！　とりあえず入力（エンジンの方で引数を観察して得たもの）を渡して Ok が返ることを確認するようなテストを書きました。

crates/voicevox_core/src/internal.rs

qwerty2501

LGTM

…含む） (VOICEVOX#158) * add yukarin_s_forward implementation * use replace method * use std::slice::from_raw_parts_mut * change argument type of Internal::yukarin_s_forward * fix to use proper speaker_id * use assert_eq * add yukarin_sa_forward implementation * add decode_forward implementation * Update crates/voicevox_core/src/internal.rs Co-authored-by: Hiroshiba <[email protected]> * add tensor of speaker_id in yukarin_sa_forward * remove allow unused_variables attribute * update onnxruntime-rs version to 0.0.24, and implement decode_forward * change return value of *-forward * change the way to convert from OrtOwnedTensor to Vec * add tests of yukarin_sa_forward and decode_forward * trivial change Co-authored-by: Hiroshiba <[email protected]>

PickledChair added 10 commits June 11, 2022 06:24

add yukarin_s_forward implementation

fc313ff

Merge remote-tracking branch 'VOICEVOX/voicevox_core/rust' into featu…

dd8ef3a

…re/implements-yukarin_s_forward

use replace method

e652737

use std::slice::from_raw_parts_mut

9394d08

change argument type of Internal::yukarin_s_forward

5ef9395

fix to use proper speaker_id

defac32

use assert_eq

649f9f7

add yukarin_sa_forward implementation

4ef5e49

add decode_forward implementation

c72e572

Merge remote-tracking branch 'VOICEVOX/voicevox_core/rust' into featu…

d592d4f

…re/implements-yukarin_sa_forward-and-decode_forward

PickledChair commented Jun 11, 2022

View reviewed changes

crates/voicevox_core/src/internal.rs Outdated Show resolved Hide resolved

Update crates/voicevox_core/src/internal.rs

fd7bfd7

Co-authored-by: Hiroshiba <[email protected]>

PickledChair added 2 commits June 12, 2022 20:40

add tensor of speaker_id in yukarin_sa_forward

cef8d1e

remove allow unused_variables attribute

5925337

PickledChair mentioned this pull request Jun 12, 2022

コアの実装言語を C++ から Rust へ移行する #128

Closed

43 tasks

update onnxruntime-rs version to 0.0.24, and implement decode_forward

9bbde26

PickledChair marked this pull request as ready for review June 13, 2022 15:49

PickledChair changed the title ~~[Rust] yukarin_sa_forward と decode_forward を実装~~ [Rust] yukarin_sa_forward と decode_forward を実装（yukarin_s_forward の再実装含む） Jun 13, 2022

qwerty2501 reviewed Jun 13, 2022

View reviewed changes

crates/voicevox_core/src/internal.rs Outdated Show resolved Hide resolved

crates/voicevox_core/src/status.rs Outdated Show resolved Hide resolved

crates/voicevox_core/src/internal.rs Outdated Show resolved Hide resolved

Hiroshiba approved these changes Jun 13, 2022

View reviewed changes

PickledChair added 4 commits June 14, 2022 05:10

change return value of *-forward

421f4e5

change the way to convert from OrtOwnedTensor to Vec

dccf593

add tests of yukarin_sa_forward and decode_forward

466857f

trivial change

2b1ead7

qwerty2501 reviewed Jun 14, 2022

View reviewed changes

crates/voicevox_core/src/internal.rs Show resolved Hide resolved

qwerty2501 approved these changes Jun 14, 2022

View reviewed changes

Hiroshiba merged commit 0ca5f00 into VOICEVOX:rust Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rust] yukarin_sa_forward と decode_forward を実装（yukarin_s_forward の再実装含む） #158

[Rust] yukarin_sa_forward と decode_forward を実装（yukarin_s_forward の再実装含む） #158

PickledChair commented Jun 11, 2022 •

edited

Loading

PickledChair commented Jun 12, 2022 •

edited

Loading

PickledChair commented Jun 12, 2022

Hiroshiba commented Jun 12, 2022

PickledChair commented Jun 12, 2022

qwerty2501 commented Jun 12, 2022 •

edited

Loading

qwerty2501 commented Jun 12, 2022

PickledChair commented Jun 12, 2022

qwerty2501 commented Jun 12, 2022

Hiroshiba commented Jun 12, 2022 •

edited

Loading

PickledChair commented Jun 12, 2022

qwerty2501 commented Jun 12, 2022 •

edited

Loading

Hiroshiba commented Jun 12, 2022

qwerty2501 commented Jun 12, 2022

PickledChair commented Jun 13, 2022 •

edited

Loading

qwerty2501 commented Jun 13, 2022

qwerty2501 commented Jun 13, 2022

PickledChair commented Jun 13, 2022 •

edited

Loading

qwerty2501 left a comment

Hiroshiba left a comment

PickledChair commented Jun 13, 2022 •

edited

Loading

qwerty2501 left a comment

[Rust] yukarin_sa_forward と decode_forward を実装（yukarin_s_forward の再実装含む） #158

[Rust] yukarin_sa_forward と decode_forward を実装（yukarin_s_forward の再実装含む） #158

Conversation

PickledChair commented Jun 11, 2022 • edited Loading

内容

関連 Issue

PickledChair commented Jun 12, 2022 • edited Loading

PickledChair commented Jun 12, 2022

Hiroshiba commented Jun 12, 2022

PickledChair commented Jun 12, 2022

qwerty2501 commented Jun 12, 2022 • edited Loading

qwerty2501 commented Jun 12, 2022

PickledChair commented Jun 12, 2022

qwerty2501 commented Jun 12, 2022

Hiroshiba commented Jun 12, 2022 • edited Loading

PickledChair commented Jun 12, 2022

qwerty2501 commented Jun 12, 2022 • edited Loading

Hiroshiba commented Jun 12, 2022

qwerty2501 commented Jun 12, 2022

PickledChair commented Jun 13, 2022 • edited Loading

1. g_ort().Run() を薄く wrap する関数を Session のメソッドとして追加する

2. CreateTensorWithDataAsOrtValue 関数の薄い wrapper 関数を Session のメソッドとして追加する

3. 上記の追加の API を用いて voicevox_core 側の処理を実装する

qwerty2501 commented Jun 13, 2022

qwerty2501 commented Jun 13, 2022

PickledChair commented Jun 13, 2022 • edited Loading

qwerty2501 left a comment

Choose a reason for hiding this comment

Hiroshiba left a comment

Choose a reason for hiding this comment

PickledChair commented Jun 13, 2022 • edited Loading

qwerty2501 left a comment

Choose a reason for hiding this comment

PickledChair commented Jun 11, 2022 •

edited

Loading

PickledChair commented Jun 12, 2022 •

edited

Loading

qwerty2501 commented Jun 12, 2022 •

edited

Loading

Hiroshiba commented Jun 12, 2022 •

edited

Loading

qwerty2501 commented Jun 12, 2022 •

edited

Loading

PickledChair commented Jun 13, 2022 •

edited

Loading

1. `g_ort().Run()` を薄く wrap する関数を `Session` のメソッドとして追加する

2. `CreateTensorWithDataAsOrtValue` 関数の薄い wrapper 関数を `Session` のメソッドとして追加する

3. 上記の追加の API を用いて `voicevox_core` 側の処理を実装する

PickledChair commented Jun 13, 2022 •

edited

Loading

PickledChair commented Jun 13, 2022 •

edited

Loading