Feature/redbox 200 move sources to a separate field for chat endpoint responses #298

gecBurton · 2024-05-01T13:43:32Z

Context

As a frontend I want the backed to return as much information as possible., i.e. the uuid of the original file.

Changes proposed in this pull request

I have extended the ChatResponse to return the full langchain response
I have extended the test to check that the original document is returned
I have switched to the incremental test approach (thx @wpfl-dbt)

Guidance to review

Relevant links

https://technologyprogramme.atlassian.net/browse/REDBOX-200

Things to check

I have added any new ENV vars in all deployed environments
I have tested any code added or changed
I have run integration tests https://github.com/i-dot-ai/redbox-copilot/actions/runs/8910714772/job/24470508092

gecBurton · 2024-05-01T15:01:28Z

django_app/redbox_app/redbox_core/migrations/0003_chathistory_file_chatmessage_and_more.py

-                    models.ManyToManyField(
-                        blank=True, related_name="chat_histories", to="redbox_core.file"
-                    ),
+                    models.ManyToManyField(blank=True, related_name="chat_histories", to="redbox_core.file"),


this is ruff modifying some unrelated files

gecBurton · 2024-05-01T15:01:37Z

django_app/redbox_app/redbox_core/models.py

-    processing_status = models.CharField(
-        choices=ProcessingStatusEnum.choices, null=False, blank=False
-    )
+    processing_status = models.CharField(choices=ProcessingStatusEnum.choices, null=False, blank=False)


lmwilkigov · 2024-05-01T15:12:06Z

core_api/src/routes/chat.py

@@ -112,7 +112,7 @@ def simple_chat(chat_request: ChatRequest) -> ChatResponse:
    return ChatResponse(response_message=ChatMessage(text=response.text, role="ai"))


-@chat_app.post("/rag", tags=["chat"], response_model=ChatResponse)
+@chat_app.post("/rag", tags=["chat"])


Does the response model need to be returned?

not if it is in the annotation (below). ofc its not doing any harm either, happy to put it back?

wpfl-dbt · 2024-05-01T15:20:06Z

redbox/models/chat.py

+    question: str = Field(
+        description="original question",
+        examples=["Who is the prime minister?"],
+    )


Why do you need this? I think it makes things more complicated -- the frontend already has this information because it submitted it.

the frontend knows all of the documents that could have been used in RAG, but not the documents actually used in RAG. The point of this change is that the document references come back in a helpful format rather-than/as-well-as the Sources: <Doc37518e8f-f6af-4f5f-bdcd-49e2f1b65fc1> <Docf36e683f-0f9e-4573-b122-1f90b65d5ad1> text format

I think I highlighted too much -- this is a point about the question param. The front end submitted that.

ahh, good point. yes lets drop it, its unhelpful

wpfl-dbt · 2024-05-01T15:20:53Z

redbox/models/chat.py

+        description="original question",
+        examples=["Who is the prime minister?"],
+    )
+    input_documents: Optional[list[InputDocuments]] = Field(


I've been using LangChain's own Documents class. Is there a reason to roll our own? Pydantic v1 stuff?

ALSO: "input" might be misleading -- down the line we might have lots of file inputs, but the LLM only chooses certain chunks as sources for this response.

we absolutely could use the LangChain Document class but it is very underspecified, i.e. the Matadata is just a dictionary. but... im in two minds here. The point of this PR is to return structured data, however we cant be sure that LangChain is going to do this 🤔 . What do you think @lmwilkigov ?

ALSO: "input" might be misleading -- down the line we might have lots of file inputs, but the LLM only chooses certain chunks as sources for this response.

This is what i was trying to say here but put much better! we could rename the field to input_documents_used ?

Fair play on the Document point! I think Metadata is underspecified because it can contain things like what tools were called by an agent -- it's deliberately a bit open-ended and extendable. If we want to be stricter, cool, it's helpful for the frontend.

On name I'd been going with sources to conceptually align with what it is to the user. I'm not married to it but input* as a word is very unaligned with the frontend cause it'll have passed a bunch of inputs, and this attribute is quite loosely coupled to what they'd have been.

sources sounds good, it is what appears in the text afterall

gecBurton temporarily deployed to release May 1, 2024 13:44 — with GitHub Actions Inactive

gecBurton temporarily deployed to release May 1, 2024 14:07 — with GitHub Actions Inactive

gecBurton temporarily deployed to release May 1, 2024 14:09 — with GitHub Actions Inactive

gecBurton temporarily deployed to release May 1, 2024 14:10 — with GitHub Actions Inactive

gecBurton temporarily deployed to release May 1, 2024 14:21 — with GitHub Actions Inactive

gecBurton temporarily deployed to release May 1, 2024 14:55 — with GitHub Actions Inactive

gecBurton commented May 1, 2024

View reviewed changes

lmwilkigov approved these changes May 1, 2024

View reviewed changes

wpfl-dbt reviewed May 1, 2024

View reviewed changes

George Burton added 10 commits May 2, 2024 08:29

extended ChatResponse

bf91043

added test

7ff3426

add debugging step

2b83f4a

fix typo

545792c

mroe loggingin debug

1b7483e

more options

a5a9dbb

removed debug step

272ffa3

using incremental test

de47563

simplified ChatResponse

436cc50

more rationalizing

55f5685

gecBurton force-pushed the feature/REDBOX-200-move-sources-to-a-separate-field-for-chat-endpoint-responses branch from f1dc9c7 to 55f5685 Compare May 2, 2024 07:29

gecBurton temporarily deployed to release May 2, 2024 07:29 — with GitHub Actions Inactive

simplified

b1d602d

gecBurton merged commit 41b0a4b into main May 2, 2024
9 checks passed

gecBurton deleted the feature/REDBOX-200-move-sources-to-a-separate-field-for-chat-endpoint-responses branch May 2, 2024 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/redbox 200 move sources to a separate field for chat endpoint responses #298

Feature/redbox 200 move sources to a separate field for chat endpoint responses #298

gecBurton commented May 1, 2024 •

edited

Loading

gecBurton May 1, 2024

gecBurton May 1, 2024

lmwilkigov May 1, 2024

gecBurton May 1, 2024 •

edited

Loading

wpfl-dbt May 1, 2024

gecBurton May 1, 2024 •

edited

Loading

wpfl-dbt May 1, 2024

gecBurton May 1, 2024

wpfl-dbt May 1, 2024 •

edited

Loading

gecBurton May 1, 2024 •

edited

Loading

wpfl-dbt May 1, 2024

gecBurton May 1, 2024

Feature/redbox 200 move sources to a separate field for chat endpoint responses #298

Feature/redbox 200 move sources to a separate field for chat endpoint responses #298

Conversation

gecBurton commented May 1, 2024 • edited Loading

Context

Changes proposed in this pull request

Guidance to review

Relevant links

Things to check

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gecBurton May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gecBurton May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wpfl-dbt May 1, 2024 • edited Loading

Choose a reason for hiding this comment

gecBurton May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gecBurton commented May 1, 2024 •

edited

Loading

gecBurton May 1, 2024 •

edited

Loading

gecBurton May 1, 2024 •

edited

Loading

wpfl-dbt May 1, 2024 •

edited

Loading

gecBurton May 1, 2024 •

edited

Loading