-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Centralizing Logs and Meeting Transcriptions for Easier Searchability #21
Comments
I had a conversation with Valeh yesterday about the usage of Fireflies. I am awaiting access to our account for transcription services for Zoom/Google Meet. When you say a two-phase approach, can you elaborate on your thoughts? I would think storing any sensitive information in Github would be a bad idea, even in a private repo, but I suppose we could encrypt files with AES-256 or something and distribute the secret key internally amongst the team, but this still seems a insecure should the key get leaked or lost, though perhaps I am misunderstanding. Do we have a sort of template or experience with doing this so far that we can borrow from and expand on? |
@waymobetta if GoogleMeet or Zoom have any issues, let me know and we can look into a separate speech-to-text AI model like Whisper. |
@MichaelFrazzy None of them caught on because, as I understood it, they weren't providing a good transcript / value |
@zivkovicmilos makes sense, likely wouldn't make all that much of a difference unless there are specific issues we are trying to solve. Whisper is 36 cents/hour and would allow me to make a bit of a closed loop instance (I think), but even that is a stretch if there's no open issues with the current system. |
In was speaking with others (@Ticojohnny, @ValehTehranchi, @MichaelFrazzy) about our current process for this and discovered that @ccomben handles most of the summarization of these conversations currently and turns them into more coherent thoughts within "The More You Gno" newsletter. We were thinking that Signal may not be the best platform for developer conversation given that the inherent privacy features make what we are trying to do (collect conversation history in automated fashion) a bit tricky; I looked into a bot but have come to the conclusion that the easiest way for storing entire history was an actual manual process of copying and pasting the history myself, posting it into a private repo, having a review period for redactions to take place and then ultimately moving to the public meetings repo; this is inefficient and error-prone, however. This issue may warrant a bit more discussion before we get into actual technical work of implementing this as the effort put in may not yield the most return on time investment. I am happy to discuss ways of working around it to make this a reality, though I would ask that we discuss our approach further so we don't waste too much time determining best strategy. |
@waymobetta great to know, thank you. The Signal bot does seem tricky, I just checked and without an official Signal API we'd need to build it 100% from scratch and hope Signal doesn't block the request (or worst case my data scrapers). Otherwise we'd have to use a community REST API like this https://github.com/bbernhard/signal-cli-rest-api, but when it says receiving messages as a feature I'm not sure if it'd let us easily move that off Signal to a document. If we do end up with some type of automatic conversation transfer as a result of this discussion, I have the bones of a localized summarization AI model ready to go at least. Currently to summarize conversation memory to output future responses and expand the AI's context beyond token limits, but I could modify it to turn the summarizations themselves into the output. That part should be quick but overall it'd be a pretty large task to build everything for Signal, definitely worth considering other alternatives/platforms to cut down on the manual interpretation/transcription. |
Maybe less hacky:
I've an old laptop that could make the job. |
@moul If it lets us just pull from the local message database that's even better! Wasn't sure how far they took the privacy claims haha |
Just found an article detailing how this can be done: https://vmois.dev/query-signal-desktop-messages-sqlite/. Apparently the decryption key is stored in plaintext in |
No way, that's hilarious! How very secure of them. 🤣 I mean it makes sense to store it locally... but plaintext within json definitely should make life easier haha |
@MichaelFrazzy just a note, in signal's defense: storing the encryption key separated from the database makes sense, but if you're storing it as anything other than plaintext, either it's because you have a user passcode or password; or otherwise adding further encryption is probably just a gimmick (because the desktop client needs to be able to decrypt just as easily). it's a gimmick because the "encryption" you could do is probably at best an AES encryption with a hardcoded key in the program's source; but if you're an attacker with half a brain, you know how to decompile the source and get the key in half an hour :) really the "security" solution here on Signal's side would just be to be running in sandbox (ie. Windows Store, Snap, for mac possibly the ios app store is sandboxed? been a long time since I used one). and then again other applications can't access signal's data only if they themselves are run in a sandbox. the fact that a userland program can access any other program's data is an unfortunate consequence of the programming and security models of all desktop OS, which we are only now recently with sandboxes like snap and windows store. for mobile, luckily the first iphone tackled this from day one with the app store. |
I wonder why they elected not to use Mac's Keychain for storing the key. I don't know much about desktop app development so perhaps this is not possible, though I thought this was one of the purposes of Keychain. |
Bump. Background: Challenges:
Solutions:
|
Sounds great, this could likely be handled similarly to the other data collection bots. Meeting notes it makes a ton of sense to just have a single database file. For the contribution logs on the other hand, would you like us to store all contributions to a single .md doc? Opposed to one .md doc per repo before a separate script then creates a centralized summary/profile database based on them? Otherwise I'm sure we could keep things to a single database/.md file, I'd just have to create different sections within that one log to separate raw data from summarized user profile data. Also if we ever happen to have issues with Fireflies, I've had a lot of luck with Whisper. Either one we should be able to combine with the AI being worked on too if there is ever a reason to use it outside of meetings. |
Our aim is to aggregate and archive our dialogues from multiple platforms like Signal, Discord, Google Meet, Zoom, enhancing their accessibility and searchability.
For private chats, I recommend a two-phase approach to ensure sensitive information remains confidential before making them publicly available.
Please, share your suggestions, here.
The text was updated successfully, but these errors were encountered: