Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/alibaba/zvec/llms.txt

Use this file to discover all available pages before exploring further.

Installation Issues

Python Version Incompatibility

Problem: ERROR: Package 'zvec' requires a different Python: 3.9.x not in '>=3.10,<3.13' Solution: Zvec requires Python 3.10, 3.11, or 3.12. Check your Python version:
python --version
If you’re on an older version, install a compatible Python:
# Using conda
conda create -n zvec_env python=3.11
conda activate zvec_env
pip install zvec

# Using pyenv
pyenv install 3.11.0
pyenv virtualenv 3.11.0 zvec_env
pyenv activate zvec_env
pip install zvec
Python 3.13+ is not yet supported. Stick to Python 3.10-3.12.

Platform Not Supported

Problem: ERROR: No matching distribution found for zvec Solution: Zvec currently supports:
  • Linux (x86_64, ARM64)
  • macOS (ARM64 only - Apple Silicon)
Check your platform:
# Check OS
uname -s

# Check architecture
uname -m  # Should show x86_64, aarch64, or arm64
If you’re on an unsupported platform (Windows, macOS Intel), you’ll need to:
  1. Use a supported platform (Linux VM, Docker, etc.)
  2. Wait for future platform support
  3. Build from source (advanced)

Import Errors After Installation

Problem: ImportError: cannot import name 'zvec' from 'zvec' or ModuleNotFoundError: No module named 'zvec' Solutions:
  1. Verify installation:
    pip show zvec
    
  2. Check Python path:
    import sys
    print(sys.path)
    
  3. Reinstall with cache clear:
    pip uninstall zvec
    pip install --no-cache-dir zvec
    
  4. Check for naming conflicts:
    # Make sure you don't have a file named zvec.py in your working directory
    ls -la zvec.py
    
If using a virtual environment, ensure it’s activated before installing and running.

Build Errors When Installing from Source

Problem: CMake Error or C++ compiler error when building Solutions:
  1. Check CMake version (requires ≥ 3.26, < 4.0):
    cmake --version
    
    Install if needed:
    pip install cmake==3.27.0
    
  2. Check C++ compiler:
    g++ --version  # Should be 11+
    
    Install if needed:
    # Ubuntu/Debian
    sudo apt-get install g++-11
    
    # macOS
    xcode-select --install
    
  3. Initialize submodules:
    git submodule update --init --recursive
    
  4. Clean build:
    pip uninstall zvec
    rm -rf build/ dist/ *.egg-info
    pip install -e ".[dev]"
    
See the Building from Source guide for detailed build instructions.

Runtime Errors

Collection Creation Failed

Problem: Status error when creating collection or Failed to create collection Solutions:
  1. Check directory permissions:
    ls -la /path/to/collection
    
    Ensure you have write access.
  2. Verify directory doesn’t exist (for create operations):
    rm -rf ./my_collection  # If you want to recreate
    
  3. Check schema validity:
    # Ensure schema is properly defined
    schema = zvec.CollectionSchema(
        name="test",
        vectors=zvec.VectorSchema("embedding", zvec.DataType.VECTOR_FP32, 128)
    )
    print(schema)  # Verify schema
    
  4. Check disk space:
    df -h /path/to/collection
    

Insert Operation Failed

Problem: Failed to insert documents or Invalid vector dimension Solutions:
  1. Verify vector dimensions match schema:
    # Schema specifies dimension 768
    schema = zvec.CollectionSchema(
        name="docs",
        vectors=zvec.VectorSchema("embedding", zvec.DataType.VECTOR_FP32, 768)
    )
    
    # Vector must be exactly 768 dimensions
    doc = zvec.Doc(id="1", vectors={"embedding": [0.1] * 768})
    collection.insert([doc])
    
  2. Check vector data type:
    # Ensure vector is a list of floats, not numpy array
    import numpy as np
    vector = np.random.rand(768)
    doc = zvec.Doc(id="1", vectors={"embedding": vector.tolist()})  # Convert to list
    
  3. Verify document ID is unique:
    # Document IDs must be unique within a collection
    # Use update() if you want to modify an existing document
    
  4. Check field names and types:
    # Field names must match schema
    doc = zvec.Doc(
        id="1",
        vectors={"embedding": vector},
        fields={"title": "Text", "count": 42}  # Match schema field names
    )
    
Batch inserts are more efficient than single inserts. Insert multiple documents at once when possible.

Query Returns No Results

Problem: Query executes but returns empty results Solutions:
  1. Verify data was inserted:
    stats = collection.stats()
    print(f"Document count: {stats['doc_count']}")
    
  2. Check if optimize is needed:
    # Optimize after bulk inserts
    collection.optimize()
    
  3. Verify query vector dimensions:
    query_vector = [0.1] * 768  # Must match schema dimension
    results = collection.query(
        zvec.VectorQuery("embedding", vector=query_vector),
        topk=10
    )
    
  4. Increase topk or adjust parameters:
    # Try larger topk
    results = collection.query(
        zvec.VectorQuery("embedding", vector=query_vector),
        topk=100  # Increased from 10
    )
    
    # Or adjust HNSW ef_search
    params = zvec.HnswQueryParams(ef_search=100)
    results = collection.query(
        zvec.VectorQuery("embedding", vector=query_vector, params=params),
        topk=10
    )
    
  5. Check filters aren’t too restrictive:
    # Remove or relax filters temporarily
    results = collection.query(
        zvec.VectorQuery("embedding", vector=query_vector),
        # filter="category == 'test'",  # Comment out temporarily
        topk=10
    )
    

Memory Errors

Problem: MemoryError, std::bad_alloc, or process killed (OOM) Solutions:
  1. Check memory usage:
    # Monitor memory while running
    htop  # or top
    
  2. Reduce batch size:
    # Instead of inserting 100K docs at once
    batch_size = 1000
    for i in range(0, len(docs), batch_size):
        collection.insert(docs[i:i+batch_size])
    
  3. Use lower precision vectors:
    # Use FP16 instead of FP32 to halve memory usage
    schema = zvec.CollectionSchema(
        name="docs",
        vectors=zvec.VectorSchema("embedding", zvec.DataType.VECTOR_FP16, 768)
    )
    
  4. Optimize collection regularly:
    # Consolidate segments to reduce memory overhead
    collection.optimize()
    
  5. Consider IVF index for large datasets:
    # IVF uses less memory than HNSW
    from zvec import IVFIndexParams, MetricType
    
    schema = zvec.CollectionSchema(
        name="large_collection",
        vectors=zvec.VectorSchema(
            "embedding",
            zvec.DataType.VECTOR_FP32,
            768,
            index_params=IVFIndexParams(metric_type=MetricType.L2)
        )
    )
    
Memory requirements: roughly N × D × bytes_per_element × 1.2 where N = vector count, D = dimension.

File Lock or Corruption Errors

Problem: Failed to open collection, Lock file exists, or Corrupted data Solutions:
  1. Check for running processes:
    # Find processes using the collection
    lsof /path/to/collection
    
  2. Close collection properly:
    # Always close or use context manager
    collection.close()
    
    # Or use with statement
    with zvec.open("./data") as collection:
        # Operations here
        pass
    # Automatically closed
    
  3. Remove stale lock files (if no process is running):
    rm /path/to/collection/*.lock
    
  4. Restore from backup:
    # If data is corrupted, restore from backup
    rm -rf ./corrupted_collection
    cp -r ./backup/collection ./recovered_collection
    
Only remove lock files if you’re certain no other process is using the collection.

Performance Issues

Slow Query Performance

Problem: Queries taking too long Solutions:
  1. Optimize the collection:
    # Consolidate segments after bulk inserts
    collection.optimize()
    
  2. Tune HNSW ef_search (recall vs. speed tradeoff):
    # Lower ef_search = faster but lower recall
    params = zvec.HnswQueryParams(ef_search=50)  # Default is often 100+
    
    results = collection.query(
        zvec.VectorQuery("embedding", vector=query_vector, params=params),
        topk=10
    )
    
  3. Check index parameters (set during schema creation):
    # For faster queries, reduce M or increase ef_construction
    from zvec import HnswIndexParams, MetricType
    
    index_params = HnswIndexParams(
        metric_type=MetricType.IP,
        m=16,  # Reduce from default 32 for faster queries
        ef_construction=200
    )
    
  4. Use appropriate metric type:
    # IP (Inner Product) is fastest for normalized vectors
    # Normalize vectors before insertion:
    import numpy as np
    
    def normalize(v):
        return (np.array(v) / np.linalg.norm(v)).tolist()
    
  5. Profile query patterns:
    import time
    
    start = time.time()
    results = collection.query(...)
    print(f"Query took {time.time() - start:.3f}s")
    

Slow Insert Performance

Problem: Insertions taking too long Solutions:
  1. Use batch inserts:
    # Bad: Insert one at a time
    for doc in docs:
        collection.insert([doc])  # Slow
    
    # Good: Batch insert
    collection.insert(docs)  # Much faster
    
  2. Optimize less frequently:
    # Don't optimize after every insert
    # Instead, optimize periodically
    batch_count = 0
    for batch in data_batches:
        collection.insert(batch)
        batch_count += 1
        if batch_count % 10 == 0:  # Every 10 batches
            collection.optimize()
    
  3. Adjust index construction parameters:
    # Lower ef_construction for faster indexing (but lower recall)
    index_params = HnswIndexParams(
        metric_type=MetricType.IP,
        m=16,
        ef_construction=100  # Lower = faster inserts
    )
    
  4. Consider using Flat index initially:
    # Build with Flat index, then convert to HNSW
    from zvec import FlatIndexParams
    
    # Start with Flat for fast ingestion
    schema = zvec.CollectionSchema(
        name="temp",
        vectors=zvec.VectorSchema(
            "embedding",
            zvec.DataType.VECTOR_FP32,
            768,
            index_params=FlatIndexParams()
        )
    )
    

High Memory Usage

Problem: Process using too much memory Solutions:
  1. Switch to lower precision:
    # Use FP16 instead of FP32 (half the memory)
    zvec.DataType.VECTOR_FP16
    
    # Or use quantized INT8 (1/4 the memory)
    zvec.DataType.VECTOR_INT8
    
  2. Use IVF instead of HNSW:
    from zvec import IVFIndexParams
    
    # IVF uses significantly less memory
    index_params = IVFIndexParams(
        metric_type=MetricType.L2,
        nlist=100  # Number of clusters
    )
    
  3. Enable memory-mapped storage:
    # Let OS manage memory
    collection_options = zvec.CollectionOptions(
        use_mmap=True  # Use memory-mapped files
    )
    
  4. Reduce HNSW M parameter:
    # Lower M = less memory, but slower queries
    index_params = HnswIndexParams(
        metric_type=MetricType.IP,
        m=8  # Default is often 16-32
    )
    

Debugging Tips

Enable Verbose Logging

import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("zvec")
logger.setLevel(logging.DEBUG)

Check Collection Stats

stats = collection.stats()
print(f"Documents: {stats.doc_count}")
print(f"Segments: {stats.segment_count}")
print(f"Index type: {stats.index_type}")

Validate Schema

# Print schema to verify
print(schema)
print(f"Vector dimension: {schema.vectors[0].dimension}")
print(f"Fields: {[f.name for f in schema.fields]}")

Test with Minimal Example

import zvec

# Minimal test to isolate issues
schema = zvec.CollectionSchema(
    name="test",
    vectors=zvec.VectorSchema("vec", zvec.DataType.VECTOR_FP32, 4),
)

coll = zvec.create_and_open("./test_db", schema)
coll.insert([zvec.Doc(id="1", vectors={"vec": [0.1, 0.2, 0.3, 0.4]})])
results = coll.query(zvec.VectorQuery("vec", vector=[0.1, 0.2, 0.3, 0.4]), topk=1)
print(results)
coll.close()

Getting Help

If you’re still experiencing issues:

GitHub Issues

Report bugs and get help from maintainers

Discord Community

Get real-time help from the community

When Reporting Issues

Please include:
  1. Zvec version: pip show zvec
  2. Python version: python --version
  3. Operating system: uname -a
  4. Minimal reproducible example
  5. Error messages (full stack trace)
  6. Steps you’ve already tried
The more details you provide, the faster we can help resolve your issue!