Skip to content

Vector Similarity Search API

The Similarity Search API enables you to find content that is semantically similar to a specific media item across your collections. This search methodology is ideal for building recommendation systems, "more like this" functionality, and discovering related content across different media types.

Endpoint

POST https://api.wickson.ai/v1/search/similar

Authentication

Include your API key in the header of all requests:

X-Api-Key: YOUR_API_KEY

Request Structure

The similarity search endpoint accepts a JSON object with the following structure:

{
  "media_id": "vec-a1b2c3d4",       // ID of the reference media item

  // Collection targeting options (choose ONE approach):
  "collections": "default",          // Option 1: Single collection as string
  "collections": ["docs", "images"], // Option 2: Multiple collections as array
  "collections": "all",              // Option 3: Search across all collections

  // Common parameters:
  "max_results": 10,                 // Maximum number of similar items to return
  "min_score": 0.65,                 // Minimum similarity threshold (0.0-1.0)
  "include_vectors": false,          // Include vector embeddings in results
  "include_reference": false,        // Include the reference item in results

  // Presentation options:
  "cluster": false,                  // Group results into similarity clusters
  "include_explanation": true,       // Include similarity explanations
  "modality_filter": "document",     // Filter by media type

  // Additional filtering:
  "filters": {
    "created_after": "2023-01-01T00:00:00Z",
    "topics": ["machine learning", "AI"]
  }
}

Core Parameters

Required Parameters

Parameter Type Description
media_id string The ID of the reference media item to find similar content for (required)

Collection Targeting

The collections parameter offers flexible ways to specify which collections to search:

Value Format Description Example
String (single collection) Search within a single specified collection "collections": "research"
Array of strings Search across multiple specific collections "collections": ["research", "reports"]
Special value "all" Search across all your collections "collections": "all"

If not specified, the system searches the default collection.

Common Parameters

Parameter Type Default Description
max_results integer 10 Maximum number of similar items to return (1-100)
min_score float 0.65 Minimum similarity threshold (0.0-1.0) for including results
include_vectors boolean false Whether to include vector embeddings in results
include_reference boolean false Whether to include the reference item in results
modality_filter string null Filter results by media type: "document", "image", "video", "audio", or "model"

Presentation Options

Parameter Type Default Description
cluster boolean false Group results into semantic similarity clusters
include_explanation boolean true Include human-readable similarity explanations

Filtering Options

The filters object allows you to narrow down results based on metadata:

"filters": {
  "created_after": "2023-01-01T00:00:00Z",
  "topics": ["machine learning", "AI"],
  "entities.people": ["John Smith"]
}

Common filter fields include:

Field Description Example
media_type Type of media "document", "image", "video", "audio", "model"
created_after Items created after date "2023-01-01T00:00:00Z"
created_before Items created before date "2023-12-31T23:59:59Z"
topics Semantic topics ["AI", "machine learning"]
entities.people People mentioned or depicted ["Albert Einstein"]
entities.organizations Organizations mentioned ["Research Lab"]

Response Structure

Success Response

For successful similarity searches, you'll receive a JSON response with the following structure:

{
  "success": true,
  "message": "Similarity search completed successfully",
  "data": {
    "meta": {
      "reference_item": "vec-a1b2c3d4",
      "reference_collection": "research",
      "version": "v1",
      "cost": {
        "amount": 0.01,
        "rate_limit_remaining": 978
      },
      "stats": {
        "total_results": 7,
        "time_ms": 142,
        "found_types": ["📄 Document (5)", "🖼 Image (2)"],
        "includes_vectors": false,
        "collection_count": 2
      },
      "collections": {
        "searched": ["research", "projects"],
        "result_distribution": {"research": 5, "projects": 2},
        "scope_type": "multiple"
      }
    },
    "results": [
      {
        "media_id": "vec-e5f6g7h8",
        "score": 0.91,
        "similarity_percentage": "91.0%",
        "metadata": {
          "media_type": "document",
          "file_info": {
            "filename": "ml-frameworks-comparison.pdf",
            "size_bytes": 1240000,
            "mime_type": "application/pdf"
          },
          "search_metadata": {
            "summary": "Comparison of machine learning frameworks",
            "description": "Detailed analysis of popular machine learning frameworks...",
            "semantic_markers": {
              "topics": ["machine learning", "frameworks", "comparison"],
              "keywords": ["pytorch", "tensorflow", "performance"]
            },
            "entities": {
              "organizations": ["Google", "Meta"]
            },
            "quality_metrics": {
              "clarity": 0.92,
              "completeness": 0.89
            }
          }
        },
        "collection": "research",
        "relevance_explanation": "Very similar (91.0%) | Same type: document | Shared topics: machine learning",
        "semantic_cluster": "machine_learning",
        "is_reference": false
      },
      // Additional results...
    ],
    "relationships": {
      "clusters": {
        "clusters": {
          "machine_learning": ["vec-e5f6g7h8", "vec-i9j0k1l2"],
          "applications": ["vec-m3n4o5p6", "vec-q7r8s9t0"]
        },
        "metadata": {
          "machine_learning": {
            "modality": "document",
            "avg_score": 0.88,
            "member_count": 2,
            "depth_distribution": {"0": 2}
          },
          "applications": {
            "modality": "mixed",
            "avg_score": 0.82,
            "member_count": 2,
            "depth_distribution": {"0": 2}
          }
        }
      }
    }
  },
  "metadata": {
    "search_type": "similarity",
    "api_version": "v1",
    "reference_item": "vec-a1b2c3d4"
  }
}

Understanding Result Fields

Each result contains:

Core Fields:

  • media_id: Unique identifier for the media item
  • score: Similarity score (0.0-1.0)
  • similarity_percentage: Human-friendly similarity percentage
  • metadata: Detailed information about the media item, including:
  • media_type: Type of media (document, image, video, audio, model)
  • file_info: File metadata (name, size, type)
  • search_metadata: Rich semantic metadata including summary, topics, entities, etc.
  • collection: The collection this result belongs to
  • relevance_explanation: Human-readable explanation of why this result is similar
  • semantic_cluster: The semantic cluster this result belongs to (when clustering is enabled)
  • is_reference: Whether this is the reference item

The relevance_explanation field provides a detailed assessment of why the result is similar to the reference item. For example:

Very similar (91.0%) | Same type: document | Shared topics: machine learning, frameworks

This tells you that the result:

  • Is very similar with a 91.0% similarity
  • Has the same media type as the reference item
  • Shares common topics with the reference item

Understanding Clusters

When cluster: true is specified, the response includes a relationships.clusters object that groups similar items together. This helps identify distinct content themes within the results:

"relationships": {
  "clusters": {
    "clusters": {
      "machine_learning": ["vec-e5f6g7h8", "vec-i9j0k1l2"],
      "applications": ["vec-m3n4o5p6", "vec-q7r8s9t0"]
    },
    "metadata": {
      "machine_learning": {
        "modality": "document",
        "avg_score": 0.88,
        "member_count": 2,
        "depth_distribution": {"0": 2}
      },
      "applications": {
        "modality": "mixed",
        "avg_score": 0.82,
        "member_count": 2,
        "depth_distribution": {"0": 2}
      }
    }
  }
}

Error Handling

In case of errors, you'll receive a response like:

{
  "success": false,
  "message": "Reference media item not found",
  "code": "RESOURCE_NOT_FOUND",
  "details": {
    "media_id": "vec-a1b2c3d4",
    "searched_collections": ["research", "projects"]
  },
  "suggestion": "Verify the media ID exists in the specified collections"
}

Common error codes:

Code Description
MISSING_PARAMETER A required parameter is missing, such as media_id
RESOURCE_NOT_FOUND The reference media item was not found
INVALID_PARAMETER A parameter has an invalid value
UNAUTHORIZED Invalid or missing API key
INSUFFICIENT_BALANCE Account balance too low for operation
RATE_LIMIT_EXCEEDED Too many requests in time period
SEARCH_ERROR Error during search execution

Code Examples

Find Similar Documents (Python)

import requests

headers = {
    "X-Api-Key": "YOUR_API_KEY",
    "Content-Type": "application/json"
}

data = {
    "media_id": "vec-a1b2c3d4",          # Reference document ID
    "collections": "research-papers",     # Search within this collection
    "modality_filter": "document",        # Only find similar documents
    "min_score": 0.7,                     # Only return high-quality matches
    "max_results": 10                     # Return up to 10 results
}

response = requests.post(
    "https://api.wickson.ai/v1/search/similar", 
    headers=headers, 
    json=data
)

if response.status_code == 200:
    results = response.json()["data"]["results"]
    print(f"Found {len(results)} similar documents")

    for result in results:
        print(f"\nScore: {result['score']} ({result['similarity_percentage']})")
        print(f"Title: {result['metadata']['file_info']['filename']}")
        print(f"Summary: {result['metadata']['search_metadata']['summary']}")
        print(f"Similarity: {result['relevance_explanation']}")
        print("-" * 40)
else:
    print(f"Error: {response.status_code} - {response.text}")

Build a Recommendation System (JavaScript)

const getRecommendations = async (mediaId) => {
  const headers = {
    "X-Api-Key": "YOUR_API_KEY",
    "Content-Type": "application/json"
  };

  const data = {
    "media_id": mediaId,
    "collections": "all",              // Search all collections
    "max_results": 6,                  // Get 6 recommendations
    "min_score": 0.75,                 // High similarity threshold
    "cluster": true,                   // Group by similarity themes
    "include_reference": false         // Exclude the reference item
  };

  try {
    const response = await fetch("https://api.wickson.ai/v1/search/similar", {
      method: "POST",
      headers: headers,
      body: JSON.stringify(data)
    });

    if (!response.ok) {
      throw new Error(`Error: ${response.status} - ${response.statusText}`);
    }

    const result = await response.json();
    const recommendations = result.data.results;

    console.log(`Found ${recommendations.length} recommendations`);

    // Get cluster information
    const clusters = result.data.relationships?.clusters?.clusters || {};

    // Group recommendations by cluster
    const recommendationsByCluster = {};
    for (const [clusterName, itemIds] of Object.entries(clusters)) {
      recommendationsByCluster[clusterName] = recommendations.filter(
        item => itemIds.includes(item.media_id)
      );
    }

    // Display recommendations by cluster
    for (const [cluster, items] of Object.entries(recommendationsByCluster)) {
      console.log(`\n${cluster.toUpperCase()} (${items.length} items)`);
      items.forEach(item => {
        console.log(`- ${item.metadata.file_info.filename}: ${item.similarity_percentage}`);
        console.log(`  ${item.metadata.search_metadata.summary}`);
      });
    }

    return recommendations;
  } catch (error) {
    console.error("Failed to get recommendations:", error);
    return [];
  }
};

// Example usage:
getRecommendations("vec-a1b2c3d4");

Cross-Collection Similarity Search (Python)

import requests

def find_similar_across_collections(media_id, collections=None):
    """Find similar content across multiple collections."""
    headers = {
        "X-Api-Key": "YOUR_API_KEY",
        "Content-Type": "application/json"
    }

    # Build request
    data = {
        "media_id": media_id,
        "max_results": 20
    }

    # Set collections parameter if provided
    if collections:
        data["collections"] = collections
    else:
        data["collections"] = "all"  # Search all collections by default

    # Execute search
    response = requests.post(
        "https://api.wickson.ai/v1/search/similar", 
        headers=headers, 
        json=data
    )

    if response.status_code != 200:
        print(f"Error: {response.status_code} - {response.text}")
        return None

    result = response.json()["data"]

    # Print summary statistics
    meta = result["meta"]
    print(f"Found {meta['stats']['total_results']} similar items across {meta['stats']['collection_count']} collections")
    print(f"Media types found: {', '.join([t.split('(')[0].strip() for t in meta['stats']['found_types']])}")

    # Organize results by collection
    results_by_collection = {}
    for item in result["results"]:
        collection = item["collection"]
        if collection not in results_by_collection:
            results_by_collection[collection] = []
        results_by_collection[collection].append(item)

    # Display results by collection
    for collection, items in results_by_collection.items():
        print(f"\nCollection: {collection} ({len(items)} items)")
        for item in items:
            print(f"- {item['metadata']['file_info']['filename']} ({item['similarity_percentage']})")
            print(f"  {item['relevance_explanation']}")

    return result["results"]

# Example usage
find_similar_across_collections("vec-a1b2c3d4", ["research", "projects"])

Rate Limits & Costs

Rate Limits

Rate limits are applied based on your API tier:

Tier Limit Reset Period
Basic/Standard 1,000 requests Per hour

Rate limit information is included in response headers:

  • X-RateLimit-Limit: Maximum requests allowed per hour
  • X-RateLimit-Remaining: Remaining requests in current window
  • X-RateLimit-Reset: UTC timestamp for limit reset

Cost Structure

Similarity search is priced identically to advanced search:

Operation Cost
Similarity Search $0.01 per request

Best Practices

The similarity search endpoint is ideal for:

  1. Building Recommendation Systems - "People who viewed this also viewed..."
  2. Related Content Discovery - Finding content similar to what a user is currently viewing
  3. Content Organization - Automatically grouping related content
  4. Duplicate Detection - Finding near-duplicate content across collections
  5. Content Gap Analysis - Identifying where similar content exists or is missing

Optimizing Similarity Results

Set Appropriate Similarity Thresholds

  • Higher min_score (0.8+): Only very similar items
  • Medium min_score (0.65-0.8): Moderately similar items
  • Lower min_score (0.5-0.65): More loosely related items

Leverage Clustering

  • Enable cluster: true to organize results into semantic themes
  • Useful for diverse recommendation systems
  • Helps understand different aspects of similarity

Filter by Modality

  • Use modality_filter to focus on specific media types
  • Compare documents to documents, images to images, etc.
  • Or leave unspecified to find cross-modal similarities

Control Reference Item Inclusion

  • For UI recommendations, set include_reference: false
  • For similarity analysis, set include_reference: true

Consider Collection Scope

  • Search within the same collection for closely related content
  • Search across collections for discovering broader connections
  • Use "all" to find unexpected relationships throughout your data
Feature Regular Search Similarity Search
Input Text query Media item ID
Best for Finding content matching specific criteria Finding similar content to an existing item
Use case "Find all documents about machine learning" "Find content similar to this specific document"
Primary mechanism Query-to-vector matching Vector-to-vector comparison

Enhancing User Experiences

  1. Combine with Regular Search

Use regular search for initial content discovery, then offer similarity search for "more like this" functionality.

  1. Implement Smart Filters

Combine similarity search with filters to create targeted recommendations:

{
  "media_id": "vec-a1b2c3d4",
  "modality_filter": "image",
  "filters": {
    "created_after": "2023-01-01T00:00:00Z"
  }
}

  1. Create Diverse Recommendation Panels

Use clustering to show diverse recommendation categories:

{
  "media_id": "vec-a1b2c3d4",
  "cluster": true,
  "max_results": 12
}

  1. Implement Progressive Loading

Start with a small max_results value and load more as needed:

{
  "media_id": "vec-a1b2c3d4",
  "max_results": 4,
  "min_score": 0.8
}

Troubleshooting

Issue Potential Solution
Reference media item not found Verify the media_id is correct and exists in the specified collections
No results returned Lower the min_score threshold, search more collections, or remove filters
Too many unrelated results Increase the min_score threshold (e.g., from 0.65 to 0.8)
Missing expected similar items Try searching across more collections using "collections": "all"
Results all of wrong media type Use modality_filter to specify the desired media type
Slow response time Reduce max_results, search fewer collections, or add more specific filters
Unexpected content in results Check if your reference item has multiple themes that are triggering different results
INSUFFICIENT_BALANCE errors Check your account balance and ensure you have funds for similarity searches

If problems persist, you can contact our support team at sales@firespawnstudios.net with your API key and request details.

This site uses cookies to help us improve the overall documentation and browsing experience. By continuing to use this site, you agree to our Privacy Policy.