Skip to content

Chunk

The Chunk model is closely related to the File model. It is used to store the actual data of a file in chunks. This is done to allow the Large Language Models to process the data in smaller sections.

The embedding field is used to store the text embedding of the chunk, which is crucial to the vector search functionality.

Each chunk references the File it belongs to using the parent_file_uuid field.

redbox.models.file.Chunk

Bases: PersistableModel

Chunk of a File

uuid class-attribute instance-attribute

uuid = Field(default_factory=uuid4)

created_datetime class-attribute instance-attribute

created_datetime = Field(default_factory=utcnow)

creator_user_uuid instance-attribute

creator_user_uuid

model_type property

model_type

Return the name of the model class.

RETURNS DESCRIPTION
str

The name of the model class.

TYPE: str

parent_file_uuid class-attribute instance-attribute

parent_file_uuid = Field(description='id of the original file which this text came from')

index class-attribute instance-attribute

index = Field(description='relative position of this chunk in the original file')

text class-attribute instance-attribute

text = Field(description='chunk of the original text')

metadata class-attribute instance-attribute

metadata = Field(description='subset of the unstructured Element.Metadata object', default=None)

embedding class-attribute instance-attribute

embedding = Field(description='the vector representation of the text', default=None)

text_hash property

text_hash

token_count property

token_count

ChunkStatus

The Chunk model also has a companion ChunkStatus model that helps track the status of the chunk processing. This includes information about the embedding process.

redbox.models.file.ChunkStatus

Bases: BaseModel

Status of a chunk of a file.

chunk_uuid instance-attribute

chunk_uuid

embedded instance-attribute

embedded