Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/alibaba/zvec/llms.txt

Use this file to discover all available pages before exploring further.

A schema defines the structure of a collection in Zvec. Every collection has a fixed schema that specifies:
  • The collection name
  • Scalar fields (e.g., ID, title, timestamp)
  • Vector fields for similarity search
  • Data types, dimensions, and constraints

Schema Components

Zvec schemas consist of three main classes:
  • CollectionSchema: Top-level container for the entire schema
  • FieldSchema: Defines scalar (non-vector) fields
  • VectorSchema: Defines vector fields for embeddings

CollectionSchema

The CollectionSchema class defines the overall structure of a collection:
from zvec import CollectionSchema, FieldSchema, VectorSchema, DataType

schema = CollectionSchema(
    name="my_collection",
    fields=[field1, field2, ...],    # Scalar fields
    vectors=[vector1, vector2, ...]   # Vector fields
)

Parameters

ParameterTypeDescription
namestrName of the collection (required)
fieldsFieldSchema or list[FieldSchema]One or more scalar field definitions
vectorsVectorSchema or list[VectorSchema]One or more vector field definitions
Field names must be unique across both scalar and vector fields. Duplicate names will raise a ValueError.

Accessing Schema Information

# Get collection name
print(schema.name)  # "my_collection"

# List all scalar fields
for field in schema.fields:
    print(f"{field.name}: {field.data_type}")

# List all vector fields
for vector in schema.vectors:
    print(f"{vector.name}: {vector.dimension}D {vector.data_type}")

# Retrieve specific field by name
id_field = schema.field("id")
if id_field:
    print(f"ID field type: {id_field.data_type}")

# Retrieve specific vector by name
emb_field = schema.vector("embedding")
if emb_field:
    print(f"Embedding dimension: {emb_field.dimension}")

FieldSchema

The FieldSchema class defines scalar (non-vector) fields:
from zvec import FieldSchema, DataType, InvertIndexParam

# Simple field
id_field = FieldSchema(
    name="id",
    data_type=DataType.INT64,
    nullable=False
)

# Field with inverted index
category_field = FieldSchema(
    name="category",
    data_type=DataType.STRING,
    nullable=True,
    index_param=InvertIndexParam(enable_range_optimization=True)
)

Parameters

ParameterTypeDefaultDescription
namestr-Field name (must be unique)
data_typeDataType-Data type (see below)
nullableboolFalseWhether field can contain null values
index_paramInvertIndexParamNoneInverted index configuration for filtering

Supported Scalar Data Types

Zvec supports the following scalar data types:

Numeric Types

DataType.INT32      # 32-bit signed integer
DataType.INT64      # 64-bit signed integer
DataType.UINT32     # 32-bit unsigned integer
DataType.UINT64     # 64-bit unsigned integer
DataType.FLOAT      # 32-bit floating point
DataType.DOUBLE     # 64-bit floating point

String and Boolean

DataType.STRING     # UTF-8 string
DataType.BOOL       # Boolean (true/false)

Array Types

DataType.ARRAY_INT32      # Array of 32-bit integers
DataType.ARRAY_INT64      # Array of 64-bit integers
DataType.ARRAY_UINT32     # Array of unsigned 32-bit integers
DataType.ARRAY_UINT64     # Array of unsigned 64-bit integers
DataType.ARRAY_FLOAT      # Array of floats
DataType.ARRAY_DOUBLE     # Array of doubles
DataType.ARRAY_STRING     # Array of strings
DataType.ARRAY_BOOL       # Array of booleans

Example: Multiple Scalar Fields

from zvec import FieldSchema, DataType

fields = [
    FieldSchema("id", DataType.INT64, nullable=False),
    FieldSchema("title", DataType.STRING, nullable=False),
    FieldSchema("timestamp", DataType.INT64, nullable=False),
    FieldSchema("price", DataType.FLOAT, nullable=True),
    FieldSchema("tags", DataType.ARRAY_STRING, nullable=True),
    FieldSchema("views", DataType.INT32, nullable=False)
]

VectorSchema

The VectorSchema class defines vector fields for similarity search:
from zvec import VectorSchema, DataType, HnswIndexParam

# Dense vector with HNSW index
embedding = VectorSchema(
    name="embedding",
    data_type=DataType.VECTOR_FP32,
    dimension=768,
    index_param=HnswIndexParam(m=16, ef_construction=200)
)

# Sparse vector with default index
sparse_embedding = VectorSchema(
    name="sparse_embedding",
    data_type=DataType.SPARSE_VECTOR_FP32,
    dimension=0,  # Dimension not required for sparse vectors
    index_param=FlatIndexParam()
)

Parameters

ParameterTypeDefaultDescription
namestr-Vector field name (must be unique)
data_typeDataType-Vector data type (see below)
dimensionint0Vector dimensionality (must be > 0 for dense vectors)
index_paramHnswIndexParam, IVFIndexParam, FlatIndexParamFlatIndexParam()Index configuration

Supported Vector Data Types

Dense Vectors

DataType.VECTOR_FP16    # 16-bit float (half precision)
DataType.VECTOR_FP32    # 32-bit float (single precision)
DataType.VECTOR_FP64    # 64-bit float (double precision)
DataType.VECTOR_INT8    # 8-bit integer (quantized)

Sparse Vectors

DataType.SPARSE_VECTOR_FP16    # Sparse 16-bit float
DataType.SPARSE_VECTOR_FP32    # Sparse 32-bit float
Dense vectors are stored as fixed-length arrays. Sparse vectors are stored as dictionaries mapping indices to values (see Vectors).

Example: Multiple Vector Fields

from zvec import VectorSchema, DataType, HnswIndexParam, FlatIndexParam

vectors = [
    # Text embedding
    VectorSchema(
        name="text_embedding",
        data_type=DataType.VECTOR_FP32,
        dimension=768,
        index_param=HnswIndexParam(m=16, ef_construction=200)
    ),
    # Image embedding
    VectorSchema(
        name="image_embedding",
        data_type=DataType.VECTOR_FP32,
        dimension=512,
        index_param=HnswIndexParam(m=16, ef_construction=200)
    ),
    # Sparse keyword embedding
    VectorSchema(
        name="keyword_embedding",
        data_type=DataType.SPARSE_VECTOR_FP32,
        dimension=0,
        index_param=FlatIndexParam()
    )
]

Complete Schema Example

Here’s a complete example combining all schema components:
import zvec
from zvec import (
    CollectionSchema,
    FieldSchema,
    VectorSchema,
    DataType,
    HnswIndexParam,
    InvertIndexParam
)

# Initialize Zvec
zvec.init()

# Define scalar fields
fields = [
    FieldSchema(
        name="id",
        data_type=DataType.INT64,
        nullable=False
    ),
    FieldSchema(
        name="title",
        data_type=DataType.STRING,
        nullable=False
    ),
    FieldSchema(
        name="category",
        data_type=DataType.STRING,
        nullable=True,
        index_param=InvertIndexParam(enable_range_optimization=True)
    ),
    FieldSchema(
        name="price",
        data_type=DataType.FLOAT,
        nullable=True
    ),
    FieldSchema(
        name="tags",
        data_type=DataType.ARRAY_STRING,
        nullable=True
    )
]

# Define vector fields
vectors = [
    VectorSchema(
        name="text_embedding",
        data_type=DataType.VECTOR_FP32,
        dimension=768,
        index_param=HnswIndexParam(m=16, ef_construction=200)
    ),
    VectorSchema(
        name="sparse_embedding",
        data_type=DataType.SPARSE_VECTOR_FP32,
        dimension=0
    )
]

# Create collection schema
schema = CollectionSchema(
    name="product_catalog",
    fields=fields,
    vectors=vectors
)

# Create collection
collection = zvec.create_and_open(
    path="./data/product_catalog",
    schema=schema
)

print(f"Created collection: {collection.schema.name}")
print(f"Scalar fields: {[f.name for f in collection.schema.fields]}")
print(f"Vector fields: {[v.name for v in collection.schema.vectors]}")

Schema Validation

Zvec performs automatic validation when creating schemas:
Field names must be unique across both scalar and vector fields:
# This will raise ValueError
schema = CollectionSchema(
    name="test",
    fields=FieldSchema("embedding", DataType.STRING),
    vectors=VectorSchema("embedding", DataType.VECTOR_FP32, dimension=128)
)
# Error: duplicate field name 'embedding'
Only supported data types are allowed:
# FieldSchema only accepts scalar types
field = FieldSchema("vec", DataType.VECTOR_FP32)  # ValueError

# VectorSchema only accepts vector types
vector = VectorSchema("id", DataType.INT64, dimension=128)  # ValueError
Dense vectors require positive dimensions:
# This will raise ValueError
vector = VectorSchema("emb", DataType.VECTOR_FP32, dimension=0)
# Error: dimension must be > 0 for dense vectors

# Sparse vectors can have dimension=0
sparse = VectorSchema("sparse", DataType.SPARSE_VECTOR_FP32, dimension=0)  # OK

Schema Best Practices

1

Plan your schema in advance

Collection schemas are fixed at creation time. Choose appropriate data types and dimensions before creating the collection.
2

Use nullable fields for optional data

If a field may not always have a value, set nullable=True to avoid insertion errors.
3

Choose appropriate vector data types

  • VECTOR_FP32: Most common, good balance of precision and performance
  • VECTOR_FP16: Reduced memory usage, slightly lower precision
  • VECTOR_INT8: Quantized vectors for extreme memory efficiency
  • SPARSE_VECTOR_FP32: For high-dimensional sparse data (e.g., BM25)
4

Add inverted indexes to filtered fields

If you plan to filter by a field frequently, add an inverted index:
FieldSchema(
    name="category",
    data_type=DataType.STRING,
    index_param=InvertIndexParam()
)
5

Configure vector indexes at schema creation

Set index parameters during schema definition to avoid rebuilding indexes later:
VectorSchema(
    name="embedding",
    data_type=DataType.VECTOR_FP32,
    dimension=768,
    index_param=HnswIndexParam(m=16, ef_construction=200)
)

Next Steps

Vectors

Learn about dense and sparse vector types

Indexing

Understand index types and performance tuning

Collections

Work with collections and data operations

Querying

Execute vector similarity searches