Add initial support for Recursive Chunking (`RecursiveChunker`) #107

bhavnicksm · 2024-12-27T10:39:17Z

This pull request introduces a new RecursiveChunker class and makes several related updates to the codebase. The RecursiveChunker class provides a hierarchical approach to text chunking using customizable rules. Additionally, there are updates to the README.md file, imports, and other existing chunkers.

New Feature:

RecursiveChunker class: A new class that chunks text hierarchically using customizable rules to create semantically meaningful chunks. This includes methods for splitting text, merging splits, and recursive chunking logic. (src/chonkie/chunker/recursive.py)

Documentation Updates:

README.md: Added RecursiveChunker to the list of available chunkers and updated the citation format to bibtex. [1] [2]

Import and Export Adjustments:

__init__.py files: Updated import statements to include RecursiveChunker and related types, ensuring the new class is properly integrated into the module. (src/chonkie/__init__.py, src/chonkie/chunker/__init__.py) [1] [2] [3] [4] [5]

Refinery Enhancements:

base.py: Added refine_batch method to handle batches of chunks and updated the __call__ method to support both single and batch processing of chunks. (src/chonkie/refinery/base.py) [1] [2]

Other Refinements:

overlap.py: Improved token handling by introducing _AVG_CHAR_PER_TOKEN and updating methods to use this constant for more accurate token estimates. (src/chonkie/refinery/overlap.py) [1] [2] [3] [4]

bhavnicksm added 11 commits December 27, 2024 01:10

Add initial implementation of RecursiveChunker

cd253f2

[fix] Infinite loop issue

968dc11

[fix] Whitespace splitting + sub-sentence splitting

d085c49

Add __str__ and __repr__ to RecursiveChunker

9f2a518

[minor] Add values to the _recursive_chunk call

9cb61a8

Shift recursive types to types.py

005416b

Add test cases for recursive chunker

1f0b701

Add RecursiveChunker and associated dataclasses to __init__

0afa633

Add better error messages for RecursiveLevel

701dd6f

Add an introduction statement about RecursiveChunker

7c2587e

[minor] Add bibtex label

933e7ff

bhavnicksm merged commit 3f9632a into development Dec 27, 2024
1 check passed

bhavnicksm deleted the add-recursive branch December 27, 2024 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial support for Recursive Chunking (`RecursiveChunker`) #107

Add initial support for Recursive Chunking (`RecursiveChunker`) #107

bhavnicksm commented Dec 27, 2024

Add initial support for Recursive Chunking (RecursiveChunker) #107

Add initial support for Recursive Chunking (RecursiveChunker) #107

Conversation

bhavnicksm commented Dec 27, 2024

New Feature:

Documentation Updates:

Import and Export Adjustments:

Refinery Enhancements:

Other Refinements:

Add initial support for Recursive Chunking (`RecursiveChunker`) #107

Add initial support for Recursive Chunking (`RecursiveChunker`) #107