Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vectorstores: fix pgvector issues and add more test #617

Merged
merged 1 commit into from
Mar 8, 2024
Merged

Conversation

Abirdcfly
Copy link
Contributor

@Abirdcfly Abirdcfly commented Feb 20, 2024

  1. add pgvector into test updates Run vector stores tests on CI #623
  2. add env OPENAI_API_KEY and GENAI_API_KEY into test, like langchain-python do (But I don't know how langchain generates the openai key in the test, is it the maintainer's personal account token or is it dependent on the sponsorship...)
  3. deprecate pgvector table name Sanitize function, Fix pgvector WithEmbeddingTableName sanitization conflicts with index creation  #605
  4. update pgvector Search sql and make TestDeduplicater rerun
  5. add test TestWithAllOptions for test all options
  6. because of StuffDocuments.joinDocuments just ignore document's metadata, update some tests

then go test

OPENAI_API_KEY=xxx OPENAI_BASE_URL=xxx GENAI_API_KEY=xxx go test -v ./...
=== RUN   TestPgvectorStoreRest
=== PAUSE TestPgvectorStoreRest
=== RUN   TestPgvectorStoreRestWithScoreThreshold
=== PAUSE TestPgvectorStoreRestWithScoreThreshold
=== RUN   TestSimilaritySearchWithInvalidScoreThreshold
=== PAUSE TestSimilaritySearchWithInvalidScoreThreshold
=== RUN   TestSimilaritySearchWithDifferentDimensions
=== PAUSE TestSimilaritySearchWithDifferentDimensions
=== RUN   TestPgvectorAsRetriever
=== PAUSE TestPgvectorAsRetriever
=== RUN   TestPgvectorAsRetrieverWithScoreThreshold
=== PAUSE TestPgvectorAsRetrieverWithScoreThreshold
=== RUN   TestPgvectorAsRetrieverWithMetadataFilterNotSelected
=== PAUSE TestPgvectorAsRetrieverWithMetadataFilterNotSelected
=== RUN   TestPgvectorAsRetrieverWithMetadataFilters
=== PAUSE TestPgvectorAsRetrieverWithMetadataFilters
=== RUN   TestDeduplicater
=== PAUSE TestDeduplicater
=== RUN   TestWithAllOptions
=== PAUSE TestWithAllOptions
=== CONT  TestPgvectorStoreRest
=== CONT  TestPgvectorAsRetrieverWithScoreThreshold
=== CONT  TestWithAllOptions
=== CONT  TestSimilaritySearchWithInvalidScoreThreshold
=== CONT  TestPgvectorAsRetriever
=== CONT  TestDeduplicater
=== CONT  TestPgvectorStoreRestWithScoreThreshold
=== CONT  TestSimilaritySearchWithDifferentDimensions
2024/03/06 13:47:10 github.com/testcontainers/testcontainers-go - Connected to docker:
  Server Version: 24.0.6
  API Version: 1.43
  Operating System: Docker Desktop
  Total Memory: 7956 MB
  Resolved Docker Host: unix:///var/run/docker.sock
  Resolved Docker Socket Path: /var/run/docker.sock
  Test SessionID: 13f06365a06d4a6a415ed5a066154b5357883abdabe05c845864839042cd388f
  Test ProcessID: 444ab8b8-ad8b-44b8-b53b-df6422b19a7a
2024/03/06 13:47:10 🐳 Creating container for image testcontainers/ryuk:0.6.0
2024/03/06 13:47:10 ✅ Container created: dbeab3d2e715
2024/03/06 13:47:10 🐳 Starting container: dbeab3d2e715
2024/03/06 13:47:10 ✅ Container started: dbeab3d2e715
2024/03/06 13:47:10 🚧 Waiting for container id dbeab3d2e715 image: testcontainers/ryuk:0.6.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/03/06 13:47:10 🐳 Creating container for image docker.io/pgvector/pgvector:pg16
2024/03/06 13:47:10 🐳 Creating container for image docker.io/pgvector/pgvector:pg16
2024/03/06 13:47:10 🐳 Creating container for image docker.io/pgvector/pgvector:pg16
2024/03/06 13:47:10 🐳 Creating container for image docker.io/pgvector/pgvector:pg16
2024/03/06 13:47:10 🐳 Creating container for image docker.io/pgvector/pgvector:pg16
2024/03/06 13:47:10 🐳 Creating container for image docker.io/pgvector/pgvector:pg16
2024/03/06 13:47:10 🐳 Creating container for image docker.io/pgvector/pgvector:pg16
2024/03/06 13:47:10 🐳 Creating container for image docker.io/pgvector/pgvector:pg16
2024/03/06 13:47:10 ✅ Container created: 549ae2b9af04
2024/03/06 13:47:10 🐳 Starting container: 549ae2b9af04
2024/03/06 13:47:10 ✅ Container created: a0e8189535ad
2024/03/06 13:47:10 🐳 Starting container: a0e8189535ad
2024/03/06 13:47:10 ✅ Container created: 48cbb93d9877
2024/03/06 13:47:10 🐳 Starting container: 48cbb93d9877
2024/03/06 13:47:10 ✅ Container created: c7dbcb837980
2024/03/06 13:47:10 ✅ Container created: 421a16cb2e3f
2024/03/06 13:47:10 🐳 Starting container: c7dbcb837980
2024/03/06 13:47:10 🐳 Starting container: 421a16cb2e3f
2024/03/06 13:47:10 ✅ Container created: 60692e0bf221
2024/03/06 13:47:10 🐳 Starting container: 60692e0bf221
2024/03/06 13:47:10 ✅ Container created: 297936256e1c
2024/03/06 13:47:10 🐳 Starting container: 297936256e1c
2024/03/06 13:47:10 ✅ Container created: 4b413a6d9260
2024/03/06 13:47:10 🐳 Starting container: 4b413a6d9260
2024/03/06 13:47:11 ✅ Container started: 549ae2b9af04
2024/03/06 13:47:11 🚧 Waiting for container id 549ae2b9af04 image: docker.io/pgvector/pgvector:pg16. Waiting for: &{timeout:<nil> deadline:0xc000603198 Strategies:[0xc0002c5830]}
2024/03/06 13:47:11 ✅ Container started: a0e8189535ad
2024/03/06 13:47:11 🚧 Waiting for container id a0e8189535ad image: docker.io/pgvector/pgvector:pg16. Waiting for: &{timeout:<nil> deadline:0xc000700038 Strategies:[0xc00051e150]}
2024/03/06 13:47:11 ✅ Container started: 48cbb93d9877
2024/03/06 13:47:11 🚧 Waiting for container id 48cbb93d9877 image: docker.io/pgvector/pgvector:pg16. Waiting for: &{timeout:<nil> deadline:0xc000700028 Strategies:[0xc00051e030]}
2024/03/06 13:47:11 ✅ Container started: 421a16cb2e3f
2024/03/06 13:47:11 🚧 Waiting for container id 421a16cb2e3f image: docker.io/pgvector/pgvector:pg16. Waiting for: &{timeout:<nil> deadline:0xc000700048 Strategies:[0xc00051e330]}
2024/03/06 13:47:11 ✅ Container started: 4b413a6d9260
2024/03/06 13:47:11 🚧 Waiting for container id 4b413a6d9260 image: docker.io/pgvector/pgvector:pg16. Waiting for: &{timeout:<nil> deadline:0xc000700058 Strategies:[0xc00051e510]}
2024/03/06 13:47:11 ✅ Container started: c7dbcb837980
2024/03/06 13:47:11 🚧 Waiting for container id c7dbcb837980 image: docker.io/pgvector/pgvector:pg16. Waiting for: &{timeout:<nil> deadline:0xc00011e218 Strategies:[0xc00038e210]}
2024/03/06 13:47:11 ✅ Container started: 60692e0bf221
2024/03/06 13:47:11 🚧 Waiting for container id 60692e0bf221 image: docker.io/pgvector/pgvector:pg16. Waiting for: &{timeout:<nil> deadline:0xc00011e208 Strategies:[0xc00038e030]}
2024/03/06 13:47:11 ✅ Container started: 297936256e1c
2024/03/06 13:47:11 🚧 Waiting for container id 297936256e1c image: docker.io/pgvector/pgvector:pg16. Waiting for: &{timeout:<nil> deadline:0xc000221698 Strategies:[0xc0004040f0]}
2024/03/06 13:47:16 🐳 Terminating container: 421a16cb2e3f
2024/03/06 13:47:17 🚫 Container terminated: 421a16cb2e3f
--- PASS: TestDeduplicater (7.33s)
=== CONT  TestPgvectorAsRetrieverWithMetadataFilterNotSelected
2024/03/06 13:47:17 🐳 Creating container for image docker.io/pgvector/pgvector:pg16
2024/03/06 13:47:17 ✅ Container created: fb7aa2ff9b5d
2024/03/06 13:47:17 🐳 Starting container: fb7aa2ff9b5d
2024/03/06 13:47:17 ✅ Container started: fb7aa2ff9b5d
2024/03/06 13:47:17 🚧 Waiting for container id fb7aa2ff9b5d image: docker.io/pgvector/pgvector:pg16. Waiting for: &{timeout:<nil> deadline:0xc000614348 Strategies:[0xc0004b0870]}
2024/03/06 13:47:18 🐳 Terminating container: 549ae2b9af04
2024/03/06 13:47:19 🚫 Container terminated: 549ae2b9af04
--- PASS: TestPgvectorStoreRest (8.89s)
=== CONT  TestPgvectorAsRetrieverWithMetadataFilters
2024/03/06 13:47:19 🐳 Creating container for image docker.io/pgvector/pgvector:pg16
2024/03/06 13:47:19 ✅ Container created: b9db5ab709e7
2024/03/06 13:47:19 🐳 Starting container: b9db5ab709e7
2024/03/06 13:47:19 ✅ Container started: b9db5ab709e7
2024/03/06 13:47:19 🚧 Waiting for container id b9db5ab709e7 image: docker.io/pgvector/pgvector:pg16. Waiting for: &{timeout:<nil> deadline:0xc000614350 Strategies:[0xc00051f410]}
2024/03/06 13:47:19 🐳 Terminating container: c7dbcb837980
2024/03/06 13:47:20 🚫 Container terminated: c7dbcb837980
--- PASS: TestPgvectorAsRetriever (10.21s)
2024/03/06 13:47:20 🐳 Terminating container: 48cbb93d9877
2024/03/06 13:47:20 🚫 Container terminated: 48cbb93d9877
--- PASS: TestPgvectorAsRetrieverWithScoreThreshold (10.83s)
2024/03/06 13:47:20 🐳 Terminating container: a0e8189535ad
2024/03/06 13:47:21 🚫 Container terminated: a0e8189535ad
--- PASS: TestSimilaritySearchWithInvalidScoreThreshold (11.06s)
2024/03/06 13:47:21 🐳 Terminating container: 60692e0bf221
2024/03/06 13:47:21 🚫 Container terminated: 60692e0bf221
--- PASS: TestWithAllOptions (11.45s)
2024/03/06 13:47:24 🐳 Terminating container: 4b413a6d9260
2024/03/06 13:47:24 🚫 Container terminated: 4b413a6d9260
--- PASS: TestSimilaritySearchWithDifferentDimensions (14.49s)
2024/03/06 13:47:25 🐳 Terminating container: fb7aa2ff9b5d
2024/03/06 13:47:25 🐳 Terminating container: b9db5ab709e7
2024/03/06 13:47:25 🐳 Terminating container: 297936256e1c
2024/03/06 13:47:25 🚫 Container terminated: b9db5ab709e7
--- PASS: TestPgvectorAsRetrieverWithMetadataFilters (6.58s)
2024/03/06 13:47:25 🚫 Container terminated: fb7aa2ff9b5d
--- PASS: TestPgvectorAsRetrieverWithMetadataFilterNotSelected (8.25s)
2024/03/06 13:47:25 🚫 Container terminated: 297936256e1c
--- PASS: TestPgvectorStoreRestWithScoreThreshold (15.63s)
PASS
ok      github.com/tmc/langchaingo/vectorstores/pgvector        15.673s


OPENAI_API_KEY=xxx OPENAI_BASE_URL=xxx GENAI_API_KEY=xxx go test ./...
ok      github.com/tmc/langchaingo/vectorstores/pgvector        12.104s

@Abirdcfly Abirdcfly marked this pull request as draft February 26, 2024 03:40
@Abirdcfly Abirdcfly marked this pull request as ready for review February 26, 2024 07:46
@Abirdcfly Abirdcfly changed the title vectorstores: fix pgvector indexName with custom embedding table name vectorstores: fix pgvector issues and add more test Feb 26, 2024
@Abirdcfly
Copy link
Contributor Author

@tmc tmc force-pushed the main branch 2 times, most recently from bf89d0c to 22159ce Compare February 27, 2024 02:36
@@ -36,12 +36,31 @@ jobs:
run: make build-examples
build-test:
runs-on: ubuntu-latest
services:
Copy link
Contributor

@mdelapenya mdelapenya Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we now have testcontainers-go for that (see #519), I'd encourage to use the existing module for Postgres + pgvector.

I offer myself to assist in it. It will improve the dev experience, as we'll be able to run the very same tests the CI runs, but locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. pls take a look again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! In only 15s the code runs 10 containers! 👏👏👏

As a follow up, one container could be used, and each test would use it's own snapshot: https://golang.testcontainers.org/modules/postgres/#using-snapshots

@Abirdcfly Abirdcfly force-pushed the index branch 2 times, most recently from 5ee8944 to 2861cd0 Compare March 6, 2024 06:00
Copy link
Contributor

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mdelapenya
Copy link
Contributor

@Abirdcfly I had a branch with some similar changes regarding pgvector: #648

If this PR gets into first, I'm happy to rebase mine, and if the other one gets into first, I'm happy to help here 🙋

Copy link
Owner

@tmc tmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going head and merging #648 so this should be rebased given that.

1. add pgvector into test
2. add OPENAI_API_KEY and GENAI_API_KEY into test
3. deprecate pgvector table name Sanitize function
4. reset pgvector Search sql and make TestDeduplicater rerun
5. add test TestWithAllOptions for test all option
6. because of StuffDocuments.joinDocuments ignore document's metadata, update some tests

Signed-off-by: Abirdcfly <[email protected]>
@Abirdcfly
Copy link
Contributor Author

Going head and merging #648 so this should be rebased given that.

rebase done.

Copy link
Owner

@tmc tmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great improvements.

@tmc tmc merged commit 9986fd3 into tmc:main Mar 8, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pgvector WithEmbeddingTableName sanitization conflicts with index creation
4 participants