QuantizeType defines the quantization methods available for compressing vector embeddings. Quantization reduces memory usage and can improve search speed at the cost of some accuracy.
No quantization. Vectors are stored in their original precision.Memory: 100% (baseline)Accuracy: 100%When to use: When accuracy is critical and memory is not a constraint.
# No quantization specifiedfield = Field( name="embedding", dtype=DataType.VECTOR_FP32, dim=768 # quantize_type not specified = UNDEFINED)
16-bit floating point quantization. Reduces precision from 32-bit to 16-bit floats.Memory: ~50% of original (half precision)Accuracy: ~99.5% (minimal loss for most use cases)When to use: General-purpose compression with negligible quality loss.
field = Field( name="embedding", dtype=DataType.VECTOR_FP32, dim=768, quantize_type=QuantizeType.FP16)
8-bit integer quantization. Converts floating point values to 8-bit signed integers.Memory: ~25% of original (75% reduction)Accuracy: ~95-98% (noticeable but acceptable loss)When to use: When memory reduction is important and slight accuracy loss is acceptable.
field = Field( name="embedding", dtype=DataType.VECTOR_FP32, dim=1536, quantize_type=QuantizeType.INT8)
4-bit integer quantization. Converts floating point values to 4-bit integers.Memory: ~12.5% of original (87.5% reduction)Accuracy: ~90-95% (significant loss, use with caution)When to use: Extreme memory constraints, large-scale deployments, when recall drop is acceptable.
field = Field( name="embedding", dtype=DataType.VECTOR_FP32, dim=2048, quantize_type=QuantizeType.INT4)
Best balance for most use cases✅ 50% memory savings
✅ ~99.5% accuracy retained
✅ Minimal quality loss
✅ Good for productionUse for: Text embeddings, semantic search, general applications
INT8
Good compression with acceptable quality✅ 75% memory savings
⚠️ ~95-98% accuracy
⚠️ Noticeable but acceptable loss
✅ Faster searchUse for: Large-scale systems, cost-sensitive deployments, when quality drop is acceptable
INT4
Extreme compression for specific needs✅ 87.5% memory savings
❌ ~90-95% accuracy
❌ Significant quality loss
✅ Very fast searchUse for: Massive scale (billions of vectors), memory-critical environments, when recall drop is acceptable
UNDEFINED (No Quantization)
Maximum quality, baseline✅ 100% accuracy
❌ 100% memory usage
❌ Slower searchUse for: Critical accuracy requirements, small datasets, benchmarking
Compatibility: Quantization is typically applied to VECTOR_FP32 fields. Using VECTOR_FP16 already provides 16-bit storage, so additional quantization may not be beneficial.
# ❌ Redundant: FP16 vector with FP16 quantizationField( dtype=DataType.VECTOR_FP16, quantize_type=QuantizeType.FP16)# ✅ Correct: FP32 vector with FP16 quantizationField( dtype=DataType.VECTOR_FP32, quantize_type=QuantizeType.FP16)
For critical applications, use quantized vectors for initial retrieval, then re-rank with full precision:
# Retrieve more candidates with quantized vectorsresults = collection.query( vectors={"quantized_vec": query_embedding}, topn=100 # Over-retrieve)# Re-rank top results with full precision or external modelfinal_results = rerank(results, topn=10)