Caption Helper

Caption Helper is a simple tool designed to assist in creating, managing, and enhancing captions for images, particularly for use in Stable Diffusion training.

Preview

Features

Image upload: Drag and drop or select multiple images
Caption editing: Manually edit captions for each image
AI-powered caption enhancement:
- Enhance: Improve existing captions
- Extend: Add more details to captions
- Interrogate: Generate new captions based on image content
Image management: Delete unwanted images
Image cropping: Adjust image framing
Export: Save captions (and optionally images) as a ZIP file

How to Use

Getting Started

Clone the project or use the live version at https://sd-caption-helper.vercel.app/.
Click the "Settings" button to add your API keys:
- Groq API key (for enhance and extend features)
- OpenAI API key (for the interrogate feature)

Uploading Images

Drag and drop images onto the upload area, or click to select files from your device.
Uploaded images will appear in the sidebar on the left.

Editing Captions

Click on an image in the sidebar to select it.
The selected image will appear in the main view along with its caption.
Edit the caption directly in the text box provided.

Using AI Features

For each image, you can use the following AI-powered features:

Enhance: Click to improve the existing caption.
Extend: Click to add more details to the current caption.
Interrogate: Click to generate a new caption based on the image content.

Image Management

Delete: Remove the current image from your collection.
Crop: Adjust the framing of the current image.

Navigation

Use the navigation buttons to move between images in your collection.

GPT Options

Click the "GPT Options" button to set:

Custom Token: A specific token to include in generated captions.
Custom Instruction: Additional instructions for the AI when generating captions.
Inherent Attributes: Attributes to avoid in generated captions.

Exporting

Click the "Export" button.
Choose whether to include images in the export.
Optionally, choose to rename images sequentially and set a prefix.
Click "Export" to download a ZIP file containing your captions (and images if selected).

Caption Helper - Development Guide

This document provides technical details about the Caption Helper project, including setup instructions, architecture overview, and implementation details of key features.

Tech Stack

Next.js 13+ (App Router)
React
TypeScript
NextUI for UI components
Tailwind CSS for styling
react-dropzone for file uploads
react-easy-crop for image cropping
JSZip for creating ZIP files
Groq API for caption enhancement and extension
OpenAI API (GPT-4o) for image interrogation

Project Structure

caption-helper/
📦app
 ┣ 📂api
 ┃ ┣ 📂gpt-interrogate
 ┃ ┃ ┗ 📜route.ts
 ┃ ┣ 📂groq-enhance
 ┃ ┃ ┗ 📜route.ts
 ┃ ┗ 📂groq-extend
 ┃ ┃ ┗ 📜route.ts
 ┣ 📂blank-page
 ┃ ┣ 📜layout.tsx
 ┃ ┗ 📜page.tsx
 ┣ 📜error.tsx
 ┣ 📜layout.tsx
 ┣ 📜page.tsx
 ┗ 📜providers.tsx
 📦components
 ┣ 📜CaptionEditor.tsx
 ┣ 📜ExportOptionsModal.tsx
 ┣ 📜GptOptionsModal.tsx
 ┣ 📜ImageViewer.tsx
 ┣ 📜Navigation.tsx
 ┣ 📜Settings.tsx
 ┣ 📜Sidebar.tsx
 ┣ 📜icons.tsx
 ┣ 📜navbar.tsx
 ┣ 📜primitives.ts
 ┗ 📜theme-switch.tsx
 📦config
 ┣ 📜fonts.ts
 ┗ 📜site.ts
 📦lib
 ┣ 📜types.ts
 ┗ 📜utils.ts

Setup and Installation

Clone the repository:

git clone https://github.com/markuryy/caption-helper.git
cd caption-helper

Install dependencies:
```
bun i
```

Create a .env.local file in the root directory and add your API keys:

GROQ_API_KEY=your_groq_api_key
OPENAI_API_KEY=your_openai_api_key

Run the development server:
```
bun dev
```

Key Features Implementation

Image Upload

Uses react-dropzone for handling file uploads.
Implemented in app/page.tsx within the onDrop function.
Processes uploaded files using the processUploadedFiles utility function.

Caption Editing

Implemented in the CaptionEditor component.
Uses controlled input for real-time updates.

AI-powered Caption Enhancement

Implemented in app/page.tsx within the handleCaptionAction function.
Uses separate API routes for each action:
- app/api/groq-enhance/route.ts for enhancing captions
- app/api/groq-extend/route.ts for extending captions
- app/api/gpt-interrogate/route.ts for generating captions from images

Image Cropping

Uses react-easy-crop library.
Implemented in the ImageViewer component.

Export Functionality

Implemented in app/page.tsx within the handleExport function.
Uses JSZip to create ZIP files containing captions and optionally images.
Export options are managed through the ExportOptionsModal component.

State Management

Uses React's useState hook for local state management.
Global states (like API keys) are stored in localStorage and managed through the Settings component.

API Integration

Groq API

Used for caption enhancement and extension.
API calls are made from the server-side API routes to protect API keys.

OpenAI API (GPT-4o)

Used for image interrogation (generating captions from images).
Implemented in app/api/gpt-interrogate/route.ts.
Uses the GPT-4o model for omni-modal capabilities.

Image Processing

Client-side image downscaling is implemented in lib/utils.ts using the downscaleImage function.
This ensures that large images are properly handled when sent to the GPT-4o API.

Styling

Uses a combination of NextUI components and Tailwind CSS for styling.
Global styles are defined in styles/globals.css.

Future Development

Consider implementing server-side session storage for better state management across page reloads.
Explore options for batch processing of images for more efficient handling of large collections.
Implement user authentication to allow for saved projects and user-specific settings.

Contributing

Fork the repository.
Create a new branch for your feature or bug fix.
Make your changes and commit them with descriptive commit messages.
Push your changes to your fork.
Submit a pull request to the main repository.

Please ensure that your code follows the existing style conventions and includes appropriate tests.

Tips for Stable Diffusion Training

Use clear, descriptive captions that accurately represent the image content.
Include relevant details but avoid overly specific or unique identifiers.
Experiment with the AI enhancement features to generate diverse captions.
Use the custom token and instruction features to tailor captions to your specific training needs.

Support

For issues, feature requests, or contributions, please visit the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
app		app
components		components
config		config
lib		lib
public		public
styles		styles
types		types
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bun.lockb		bun.lockb
next.config.js		next.config.js
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caption Helper

Preview

Features

How to Use

Getting Started

Uploading Images

Editing Captions

Using AI Features

Image Management

Navigation

GPT Options

Exporting

Caption Helper - Development Guide

Tech Stack

Project Structure

Setup and Installation

Key Features Implementation

Image Upload

Caption Editing

AI-powered Caption Enhancement

Image Cropping

Export Functionality

State Management

API Integration

Groq API

OpenAI API (GPT-4o)

Image Processing

Styling

Future Development

Contributing

Tips for Stable Diffusion Training

Support

About

Languages

License

markuryy/caption-helper

Folders and files

Latest commit

History

Repository files navigation

Caption Helper

Preview

Features

How to Use

Getting Started

Uploading Images

Editing Captions

Using AI Features

Image Management

Navigation

GPT Options

Exporting

Caption Helper - Development Guide

Tech Stack

Project Structure

Setup and Installation

Key Features Implementation

Image Upload

Caption Editing

AI-powered Caption Enhancement

Image Cropping

Export Functionality

State Management

API Integration

Groq API

OpenAI API (GPT-4o)

Image Processing

Styling

Future Development

Contributing

Tips for Stable Diffusion Training

Support

About

Resources

License

Stars

Watchers

Forks

Languages