Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jailbreak heuristics #292

Merged
merged 15 commits into from
Feb 12, 2024
Merged

Jailbreak heuristics #292

merged 15 commits into from
Feb 12, 2024

Conversation

erickgalinkin
Copy link
Collaborator

Add jailbreak heuristics

@erickgalinkin erickgalinkin requested a review from drazvan January 31, 2024 22:01
Copy link
Collaborator

@drazvan drazvan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 .
Apart from the comments above, there are a couple more things that should be added:

  1. Basic tests (which either mock the server, or use the in-process route suggested).
  2. Analysis of the performance, e.g., what's the latency this method adds? on a CPU, I assume on GPU it's much faster. What would be the throughput of the server? We just need to document this after we run some tests.

…lbreak heuristics documentation to guardrails-library.md. Update requirements.txt for jailbreak detection. Allow actions.py to run in-process. Add exception logging to request.py.
@erickgalinkin erickgalinkin requested a review from drazvan February 7, 2024 14:40
@drazvan drazvan merged commit c437337 into develop Feb 12, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants