Vector Similarity Search API¶
The Similarity Search API enables you to find content that is semantically similar to a specific media item across your collections. This search methodology is ideal for building recommendation systems, "more like this" functionality, and discovering related content across different media types.
Endpoint¶
Authentication¶
Include your API key in the header of all requests:
Request Structure¶
The similarity search endpoint accepts a JSON object with the following structure:
{
"media_id": "vec-a1b2c3d4", // ID of the reference media item
// Collection targeting options (choose ONE approach):
"collections": "default", // Option 1: Single collection as string
"collections": ["docs", "images"], // Option 2: Multiple collections as array
"collections": "all", // Option 3: Search across all collections
// Common parameters:
"max_results": 10, // Maximum number of similar items to return
"min_score": 0.65, // Minimum similarity threshold (0.0-1.0)
"include_vectors": false, // Include vector embeddings in results
"include_reference": false, // Include the reference item in results
// Presentation options:
"cluster": false, // Group results into similarity clusters
"include_explanation": true, // Include similarity explanations
"modality_filter": "document", // Filter by media type
// Additional filtering:
"filters": {
"created_after": "2023-01-01T00:00:00Z",
"topics": ["machine learning", "AI"]
}
}
Core Parameters¶
Required Parameters¶
| Parameter | Type | Description |
|---|---|---|
media_id |
string | The ID of the reference media item to find similar content for (required) |
Collection Targeting¶
The collections parameter offers flexible ways to specify which collections to search:
| Value Format | Description | Example |
|---|---|---|
| String (single collection) | Search within a single specified collection | "collections": "research" |
| Array of strings | Search across multiple specific collections | "collections": ["research", "reports"] |
| Special value "all" | Search across all your collections | "collections": "all" |
If not specified, the system searches the default collection.
Common Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
max_results |
integer | 10 | Maximum number of similar items to return (1-100) |
min_score |
float | 0.65 | Minimum similarity threshold (0.0-1.0) for including results |
include_vectors |
boolean | false | Whether to include vector embeddings in results |
include_reference |
boolean | false | Whether to include the reference item in results |
modality_filter |
string | null | Filter results by media type: "document", "image", "video", "audio", or "model" |
Presentation Options¶
| Parameter | Type | Default | Description |
|---|---|---|---|
cluster |
boolean | false | Group results into semantic similarity clusters |
include_explanation |
boolean | true | Include human-readable similarity explanations |
Filtering Options¶
The filters object allows you to narrow down results based on metadata:
"filters": {
"created_after": "2023-01-01T00:00:00Z",
"topics": ["machine learning", "AI"],
"entities.people": ["John Smith"]
}
Common filter fields include:
| Field | Description | Example |
|---|---|---|
media_type |
Type of media | "document", "image", "video", "audio", "model" |
created_after |
Items created after date | "2023-01-01T00:00:00Z" |
created_before |
Items created before date | "2023-12-31T23:59:59Z" |
topics |
Semantic topics | ["AI", "machine learning"] |
entities.people |
People mentioned or depicted | ["Albert Einstein"] |
entities.organizations |
Organizations mentioned | ["Research Lab"] |
Response Structure¶
Success Response¶
For successful similarity searches, you'll receive a JSON response with the following structure:
{
"success": true,
"message": "Similarity search completed successfully",
"data": {
"meta": {
"reference_item": "vec-a1b2c3d4",
"reference_collection": "research",
"version": "v1",
"cost": {
"amount": 0.01,
"rate_limit_remaining": 978
},
"stats": {
"total_results": 7,
"time_ms": 142,
"found_types": ["📄 Document (5)", "🖼 Image (2)"],
"includes_vectors": false,
"collection_count": 2
},
"collections": {
"searched": ["research", "projects"],
"result_distribution": {"research": 5, "projects": 2},
"scope_type": "multiple"
}
},
"results": [
{
"media_id": "vec-e5f6g7h8",
"score": 0.91,
"similarity_percentage": "91.0%",
"metadata": {
"media_type": "document",
"file_info": {
"filename": "ml-frameworks-comparison.pdf",
"size_bytes": 1240000,
"mime_type": "application/pdf"
},
"search_metadata": {
"summary": "Comparison of machine learning frameworks",
"description": "Detailed analysis of popular machine learning frameworks...",
"semantic_markers": {
"topics": ["machine learning", "frameworks", "comparison"],
"keywords": ["pytorch", "tensorflow", "performance"]
},
"entities": {
"organizations": ["Google", "Meta"]
},
"quality_metrics": {
"clarity": 0.92,
"completeness": 0.89
}
}
},
"collection": "research",
"relevance_explanation": "Very similar (91.0%) | Same type: document | Shared topics: machine learning",
"semantic_cluster": "machine_learning",
"is_reference": false
},
// Additional results...
],
"relationships": {
"clusters": {
"clusters": {
"machine_learning": ["vec-e5f6g7h8", "vec-i9j0k1l2"],
"applications": ["vec-m3n4o5p6", "vec-q7r8s9t0"]
},
"metadata": {
"machine_learning": {
"modality": "document",
"avg_score": 0.88,
"member_count": 2,
"depth_distribution": {"0": 2}
},
"applications": {
"modality": "mixed",
"avg_score": 0.82,
"member_count": 2,
"depth_distribution": {"0": 2}
}
}
}
}
},
"metadata": {
"search_type": "similarity",
"api_version": "v1",
"reference_item": "vec-a1b2c3d4"
}
}
Understanding Result Fields¶
Each result contains:
Core Fields:
media_id: Unique identifier for the media itemscore: Similarity score (0.0-1.0)similarity_percentage: Human-friendly similarity percentagemetadata: Detailed information about the media item, including:media_type: Type of media (document, image, video, audio, model)file_info: File metadata (name, size, type)search_metadata: Rich semantic metadata including summary, topics, entities, etc.collection: The collection this result belongs torelevance_explanation: Human-readable explanation of why this result is similarsemantic_cluster: The semantic cluster this result belongs to (when clustering is enabled)is_reference: Whether this is the reference item
The relevance_explanation field provides a detailed assessment of why the result is similar to the reference item. For example:
This tells you that the result:
- Is very similar with a 91.0% similarity
- Has the same media type as the reference item
- Shares common topics with the reference item
Understanding Clusters¶
When cluster: true is specified, the response includes a relationships.clusters object that groups similar items together. This helps identify distinct content themes within the results:
"relationships": {
"clusters": {
"clusters": {
"machine_learning": ["vec-e5f6g7h8", "vec-i9j0k1l2"],
"applications": ["vec-m3n4o5p6", "vec-q7r8s9t0"]
},
"metadata": {
"machine_learning": {
"modality": "document",
"avg_score": 0.88,
"member_count": 2,
"depth_distribution": {"0": 2}
},
"applications": {
"modality": "mixed",
"avg_score": 0.82,
"member_count": 2,
"depth_distribution": {"0": 2}
}
}
}
}
Error Handling¶
In case of errors, you'll receive a response like:
{
"success": false,
"message": "Reference media item not found",
"code": "RESOURCE_NOT_FOUND",
"details": {
"media_id": "vec-a1b2c3d4",
"searched_collections": ["research", "projects"]
},
"suggestion": "Verify the media ID exists in the specified collections"
}
Common error codes:
| Code | Description |
|---|---|
MISSING_PARAMETER |
A required parameter is missing, such as media_id |
RESOURCE_NOT_FOUND |
The reference media item was not found |
INVALID_PARAMETER |
A parameter has an invalid value |
UNAUTHORIZED |
Invalid or missing API key |
INSUFFICIENT_BALANCE |
Account balance too low for operation |
RATE_LIMIT_EXCEEDED |
Too many requests in time period |
SEARCH_ERROR |
Error during search execution |
Code Examples¶
Find Similar Documents (Python)¶
import requests
headers = {
"X-Api-Key": "YOUR_API_KEY",
"Content-Type": "application/json"
}
data = {
"media_id": "vec-a1b2c3d4", # Reference document ID
"collections": "research-papers", # Search within this collection
"modality_filter": "document", # Only find similar documents
"min_score": 0.7, # Only return high-quality matches
"max_results": 10 # Return up to 10 results
}
response = requests.post(
"https://api.wickson.ai/v1/search/similar",
headers=headers,
json=data
)
if response.status_code == 200:
results = response.json()["data"]["results"]
print(f"Found {len(results)} similar documents")
for result in results:
print(f"\nScore: {result['score']} ({result['similarity_percentage']})")
print(f"Title: {result['metadata']['file_info']['filename']}")
print(f"Summary: {result['metadata']['search_metadata']['summary']}")
print(f"Similarity: {result['relevance_explanation']}")
print("-" * 40)
else:
print(f"Error: {response.status_code} - {response.text}")
Build a Recommendation System (JavaScript)¶
const getRecommendations = async (mediaId) => {
const headers = {
"X-Api-Key": "YOUR_API_KEY",
"Content-Type": "application/json"
};
const data = {
"media_id": mediaId,
"collections": "all", // Search all collections
"max_results": 6, // Get 6 recommendations
"min_score": 0.75, // High similarity threshold
"cluster": true, // Group by similarity themes
"include_reference": false // Exclude the reference item
};
try {
const response = await fetch("https://api.wickson.ai/v1/search/similar", {
method: "POST",
headers: headers,
body: JSON.stringify(data)
});
if (!response.ok) {
throw new Error(`Error: ${response.status} - ${response.statusText}`);
}
const result = await response.json();
const recommendations = result.data.results;
console.log(`Found ${recommendations.length} recommendations`);
// Get cluster information
const clusters = result.data.relationships?.clusters?.clusters || {};
// Group recommendations by cluster
const recommendationsByCluster = {};
for (const [clusterName, itemIds] of Object.entries(clusters)) {
recommendationsByCluster[clusterName] = recommendations.filter(
item => itemIds.includes(item.media_id)
);
}
// Display recommendations by cluster
for (const [cluster, items] of Object.entries(recommendationsByCluster)) {
console.log(`\n${cluster.toUpperCase()} (${items.length} items)`);
items.forEach(item => {
console.log(`- ${item.metadata.file_info.filename}: ${item.similarity_percentage}`);
console.log(` ${item.metadata.search_metadata.summary}`);
});
}
return recommendations;
} catch (error) {
console.error("Failed to get recommendations:", error);
return [];
}
};
// Example usage:
getRecommendations("vec-a1b2c3d4");
Cross-Collection Similarity Search (Python)¶
import requests
def find_similar_across_collections(media_id, collections=None):
"""Find similar content across multiple collections."""
headers = {
"X-Api-Key": "YOUR_API_KEY",
"Content-Type": "application/json"
}
# Build request
data = {
"media_id": media_id,
"max_results": 20
}
# Set collections parameter if provided
if collections:
data["collections"] = collections
else:
data["collections"] = "all" # Search all collections by default
# Execute search
response = requests.post(
"https://api.wickson.ai/v1/search/similar",
headers=headers,
json=data
)
if response.status_code != 200:
print(f"Error: {response.status_code} - {response.text}")
return None
result = response.json()["data"]
# Print summary statistics
meta = result["meta"]
print(f"Found {meta['stats']['total_results']} similar items across {meta['stats']['collection_count']} collections")
print(f"Media types found: {', '.join([t.split('(')[0].strip() for t in meta['stats']['found_types']])}")
# Organize results by collection
results_by_collection = {}
for item in result["results"]:
collection = item["collection"]
if collection not in results_by_collection:
results_by_collection[collection] = []
results_by_collection[collection].append(item)
# Display results by collection
for collection, items in results_by_collection.items():
print(f"\nCollection: {collection} ({len(items)} items)")
for item in items:
print(f"- {item['metadata']['file_info']['filename']} ({item['similarity_percentage']})")
print(f" {item['relevance_explanation']}")
return result["results"]
# Example usage
find_similar_across_collections("vec-a1b2c3d4", ["research", "projects"])
Rate Limits & Costs¶
Rate Limits¶
Rate limits are applied based on your API tier:
| Tier | Limit | Reset Period |
|---|---|---|
| Basic/Standard | 1,000 requests | Per hour |
Rate limit information is included in response headers:
X-RateLimit-Limit: Maximum requests allowed per hourX-RateLimit-Remaining: Remaining requests in current windowX-RateLimit-Reset: UTC timestamp for limit reset
Cost Structure¶
Similarity search is priced identically to advanced search:
| Operation | Cost |
|---|---|
| Similarity Search | $0.01 per request |
Best Practices¶
Use Cases for Similarity Search¶
The similarity search endpoint is ideal for:
- Building Recommendation Systems - "People who viewed this also viewed..."
- Related Content Discovery - Finding content similar to what a user is currently viewing
- Content Organization - Automatically grouping related content
- Duplicate Detection - Finding near-duplicate content across collections
- Content Gap Analysis - Identifying where similar content exists or is missing
Optimizing Similarity Results¶
Set Appropriate Similarity Thresholds
- Higher min_score (0.8+): Only very similar items
- Medium min_score (0.65-0.8): Moderately similar items
- Lower min_score (0.5-0.65): More loosely related items
Leverage Clustering
- Enable
cluster: trueto organize results into semantic themes - Useful for diverse recommendation systems
- Helps understand different aspects of similarity
Filter by Modality
- Use
modality_filterto focus on specific media types - Compare documents to documents, images to images, etc.
- Or leave unspecified to find cross-modal similarities
Control Reference Item Inclusion
- For UI recommendations, set
include_reference: false - For similarity analysis, set
include_reference: true
Consider Collection Scope
- Search within the same collection for closely related content
- Search across collections for discovering broader connections
- Use "all" to find unexpected relationships throughout your data
Similarity Search vs. Regular Search¶
| Feature | Regular Search | Similarity Search |
|---|---|---|
| Input | Text query | Media item ID |
| Best for | Finding content matching specific criteria | Finding similar content to an existing item |
| Use case | "Find all documents about machine learning" | "Find content similar to this specific document" |
| Primary mechanism | Query-to-vector matching | Vector-to-vector comparison |
Enhancing User Experiences¶
- Combine with Regular Search
Use regular search for initial content discovery, then offer similarity search for "more like this" functionality.
- Implement Smart Filters
Combine similarity search with filters to create targeted recommendations:
{
"media_id": "vec-a1b2c3d4",
"modality_filter": "image",
"filters": {
"created_after": "2023-01-01T00:00:00Z"
}
}
- Create Diverse Recommendation Panels
Use clustering to show diverse recommendation categories:
- Implement Progressive Loading
Start with a small max_results value and load more as needed:
Troubleshooting¶
| Issue | Potential Solution |
|---|---|
Reference media item not found |
Verify the media_id is correct and exists in the specified collections |
| No results returned | Lower the min_score threshold, search more collections, or remove filters |
| Too many unrelated results | Increase the min_score threshold (e.g., from 0.65 to 0.8) |
| Missing expected similar items | Try searching across more collections using "collections": "all" |
| Results all of wrong media type | Use modality_filter to specify the desired media type |
| Slow response time | Reduce max_results, search fewer collections, or add more specific filters |
| Unexpected content in results | Check if your reference item has multiple themes that are triggering different results |
INSUFFICIENT_BALANCE errors |
Check your account balance and ensure you have funds for similarity searches |
If problems persist, you can contact our support team at sales@firespawnstudios.net with your API key and request details.