Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/alibaba/zvec/llms.txt

Use this file to discover all available pages before exploring further.

insert()

Insert new documents into the collection. Documents must have unique IDs and conform to the collection schema.

Signature

def insert(self, docs: Union[Doc, list[Doc]]) -> Union[Status, list[Status]]

Parameters

docs
Union[Doc, list[Doc]]
required
One or more documents to insert. Each document must:
  • Have a unique ID not already in the collection
  • Include all required fields defined in the schema
  • Have vectors matching the schema dimensions

Returns

Status
Union[Status, list[Status]]
  • If a single Doc is provided: returns a single Status object
  • If a list is provided: returns a list[Status] with one status per document
Each Status indicates success or failure for that document.

Basic Example

import zvec
from zvec import Doc

# Insert a single document
doc = Doc(
    id="doc_001",
    vectors={"embedding": [0.1, 0.2, 0.3, 0.4]},
    fields={"title": "Getting Started", "category": "tutorial"}
)

status = collection.insert(doc)
if status.ok():
    print("Document inserted successfully")
else:
    print(f"Insert failed: {status.message()}")

Batch Insertion

For better performance, insert multiple documents at once:
docs = [
    Doc(
        id=f"doc_{i}",
        vectors={"embedding": [0.1 * i, 0.2 * i, 0.3 * i, 0.4 * i]},
        fields={"title": f"Document {i}", "score": i * 10}
    )
    for i in range(1000)
]

statuses = collection.insert(docs)

# Check results
success_count = sum(1 for s in statuses if s.ok())
print(f"Successfully inserted {success_count}/{len(docs)} documents")

Document Format Requirements

Documents must adhere to the collection’s schema:
  • Vector fields: Must match the defined dimension and data type
  • Scalar fields: Must match the defined data type (INT64, STRING, etc.)
  • Nullable fields: Can be omitted or set to None
  • Non-nullable fields: Must be provided
# Schema definition
schema = zvec.CollectionSchema(
    name="articles",
    vectors=zvec.VectorSchema("embedding", zvec.DataType.VECTOR_FP32, dimension=128),
    fields=[
        zvec.FieldSchema("title", zvec.DataType.STRING, nullable=False),
        zvec.FieldSchema("views", zvec.DataType.INT64, nullable=True)
    ]
)

# Valid document
doc = Doc(
    id="1",
    vectors={"embedding": [0.1] * 128},
    fields={"title": "Required field", "views": 100}  # views can be omitted
)

upsert()

Insert new documents or update existing ones by ID. If a document with the given ID exists, it will be updated; otherwise, it will be inserted.

Signature

def upsert(self, docs: Union[Doc, list[Doc]]) -> Union[Status, list[Status]]

Parameters

docs
Union[Doc, list[Doc]]
required
One or more documents to upsert. Documents with existing IDs will be updated; new IDs will be inserted.

Returns

Status
Union[Status, list[Status]]
  • Single Status if one document was provided
  • list[Status] if multiple documents were provided

Example

# First insert
doc = Doc(id="user_123", vectors={"profile_emb": [0.1, 0.2]}, fields={"name": "Alice"})
collection.upsert(doc)

# Update the same document (overwrites all fields)
updated_doc = Doc(
    id="user_123",
    vectors={"profile_emb": [0.3, 0.4]},
    fields={"name": "Alice Smith"}
)
collection.upsert(updated_doc)
upsert() replaces the entire document. To update specific fields only, use update() instead.

Error Handling

Common Insert Errors

from zvec import StatusCode

status = collection.insert(doc)

if not status.ok():
    if status.code() == StatusCode.ALREADY_EXISTS:
        print("Document ID already exists")
    elif status.code() == StatusCode.INVALID_ARGUMENT:
        print("Invalid document format or schema mismatch")
    else:
        print(f"Error: {status.message()}")

Handling Batch Failures

docs = [...] # List of documents
statuses = collection.insert(docs)

# Find failed inserts
failed_docs = [
    (doc, status) for doc, status in zip(docs, statuses) if not status.ok()
]

if failed_docs:
    print(f"{len(failed_docs)} documents failed to insert:")
    for doc, status in failed_docs:
        print(f"  ID {doc.id}: {status.message()}")

Performance Tips

Batch your inserts: Inserting 1000 documents in a single call is much faster than 1000 individual calls.
# Good: Batch insertion
docs = [create_doc(i) for i in range(10000)]
collection.insert(docs)

# Bad: Individual insertions
for i in range(10000):
    collection.insert(create_doc(i))
Flush periodically: Call collection.flush() after large batches to ensure durability.
for batch_start in range(0, len(all_docs), 1000):
    batch = all_docs[batch_start:batch_start + 1000]
    collection.insert(batch)
    
if batch_start % 10000 == 0:
    collection.flush()  # Flush every 10k documents

See Also