Chunk
The Chunk
model is closely related to the File
model. It is used to store the actual data of a file in chunks. This is done to allow the Large Language Models to process the data in smaller sections.
The embedding
field is used to store the text embedding of the chunk, which is crucial to the vector search functionality.
Each chunk references the File
it belongs to using the parent_file_uuid
field.
redbox.models.file.Chunk
Bases: PersistableModel
Chunk of a File
uuid class-attribute
instance-attribute
uuid = Field(default_factory=uuid4)
created_datetime class-attribute
instance-attribute
created_datetime = Field(default_factory=utcnow)
creator_user_uuid instance-attribute
creator_user_uuid
model_type property
model_type
Return the name of the model class.
RETURNS | DESCRIPTION |
---|---|
str | The name of the model class. TYPE: |
parent_file_uuid class-attribute
instance-attribute
parent_file_uuid = Field(description='id of the original file which this text came from')
index class-attribute
instance-attribute
index = Field(description='relative position of this chunk in the original file')
text class-attribute
instance-attribute
text = Field(description='chunk of the original text')
metadata class-attribute
instance-attribute
metadata = Field(description='subset of the unstructured Element.Metadata object', default=None)
embedding class-attribute
instance-attribute
embedding = Field(description='the vector representation of the text', default=None)
text_hash property
text_hash
token_count property
token_count
ChunkStatus
The Chunk
model also has a companion ChunkStatus
model that helps track the status of the chunk processing. This includes information about the embedding process.
redbox.models.file.ChunkStatus
Bases: BaseModel
Status of a chunk of a file.
chunk_uuid instance-attribute
chunk_uuid
embedded instance-attribute
embedded