Chunking
Semantic Chunking
Semantic chunking is a method of splitting documents into smaller chunks by analyzing semantic similarity between text segments using embeddings. It uses the chonkie library to identify natural breakpoints where the semantic meaning changes significantly, based on a configurable similarity threshold. This helps preserve context and meaning better than fixed-size chunking by ensuring semantically related content stays together in the same chunk, while splitting occurs at meaningful topic transitions.
Params
Parameter | Type | Default | Description |
---|---|---|---|
embedder | Embedder | OpenAIEmbedder | The embedder to use for semantic chunking. |
chunk_size | int | 5000 | The maximum size of each chunk. |
similarity_threshold | float | 0.5 | The similarity threshold for determining chunk boundaries. |
Was this page helpful?