Search Guide¶
Introduction to Vector Search¶
The Wickson API's vector search capabilities let you search and explore your stored media across documents, images, videos, audio files, and 3D models in new ways using natural language.
Unlike traditional keyword search, our hybrid vector search understands conceptual relationships, ideas, themes, and emotional contexts, allowing you to find and discover content based on what it represents and means, rather than exact word matches.
When you upload media through our API, we convert it into vectors (numerical representations of content meaning) that are stored in our specialized vector database.
Our search capabilities leverage these vectors to find semantically similar content, even when the exact terms aren't present.
Key Concepts¶
Vector Embeddings and Semantic Search¶
Vector embeddings represent content in a multidimensional space where similar concepts are positioned close together. This allows the search engine to find content that's conceptually related to your query, not just content that contains matching keywords.
Search Types¶
The Wickson API offers two distinct search approaches:
Basic Search¶
- Hybrid Vector Search that is optimized for speed and direct relevance
- Ideal for straightforward queries with clear intent
- Cost effective and performant basic search (Free to use)
- Returns direct matches in order of semantic similarity
Advanced Search (R3F)¶
- Deep, contextual, and smart exploration using our R3F search technology
- Discovers related content through contextual connections
- Identifies cross-modal relationships between different media types
- More computationally intensive at $0.01 base + $0.01 per depth level
- Perfect for research, content exploration, and discovering unexpected connections
Understanding Hybrid Vector Search¶
Our system combines the best of two search paradigms:
- Semantic (Vector) Search: Finds content based on meaning and conceptual similarity
- Lexical (Keyword) Search: Finds content with exact matching terms
The lexical_weight parameter controls the balance between these two approaches:
- Higher values (closer to 1.0) prioritize exact keyword matches
- Lower values (closer to 0.0) prioritize semantic meaning matches
- The default value of 0.35 provides a balanced approach for most use cases
This hybrid approach delivers more relevant results than either method alone by:
- Finding relevant content even when exact terms aren't present (semantic search)
- Ensuring precise matches aren't overlooked (lexical search)
- Allowing you to tune the balance based on your specific search needs
Collection Organization¶
Collections are logical groupings that organize your media. Proper collection strategy significantly impacts search efficiency and relevance:
- Content-Type Collections: Organize by media type (documents, images, videos)
- Project-Based Collections: Group by project or content purpose
- Topic-Based Collections: Organize by subject matter
- Temporal Collections: Organize by time period (quarterly reports, annual data)
Collection Targeting Approaches¶
The API offers three ways to target specific collections for searching:
Single Collection (collections: "research")
- Fastest performance
- Ideal when all relevant content is in one collection
- Default value is "default" if not specified
Multiple Collections (collections: ["research", "reports"])
- Searches only specified collections
- Balances performance and coverage
All Collections (collections: "all")
- Searches across your entire content library
- Most comprehensive but potentially slower
- Great for global exploration
Working with Search¶
Basic Search Example¶
Python¶
import requests
# Configuration
api_key = "YOUR_API_KEY"
# Create search request for a single collection
search_data = {
"query": "renewable energy solutions for developing countries",
"type": "basic",
"collections": "research-papers",
"max_results": 10,
"min_score": 0.7
}
# Execute search
response = requests.post(
"https://api.wickson.ai/v1/search",
headers={
"X-Api-Key": api_key,
"Content-Type": "application/json"
},
json=search_data
)
# Process response
if response.status_code == 200:
data = response.json()["data"]
# Print search information
print(f"Search for: '{data['meta']['query']}'")
print(f"Found {data['meta']['stats']['total_results']} results in {data['meta']['stats']['query_time_ms']}ms")
print(f"Collection: {data['meta']['collections']['searched'][0]}")
print(f"Cost: ${data['cost']}")
# Display results
for i, result in enumerate(data["results"], 1):
print(f"\n{i}. {result['metadata']['file_info']['filename']} (Score: {result['score']:.2f})")
print(f" {result['relevance_explanation']}")
if "summary" in result['metadata']['search_metadata']:
print(f" Summary: {result['metadata']['search_metadata']['summary']}")
else:
print(f"Error {response.status_code}: {response.text}")
cURL¶
curl -X POST https://api.wickson.ai/v1/search \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "renewable energy solutions for developing countries",
"type": "basic",
"collections": "research-papers",
"max_results": 10,
"min_score": 0.7
}'
Advanced Search Example with Lexical Weight Configuration¶
Python¶
import requests
# Configuration
api_key = "YOUR_API_KEY"
# Advanced search with lexical weight configuration for hybrid search
search_request = {
"query": "renewable energy solutions for developing countries",
"type": "advanced",
# Target specific collections with array format
"collections": ["research-papers", "case-studies", "policy-documents"],
# Top-level parameters
"max_results": 15,
"min_score": 0.65,
# Advanced configuration options
"config": {
"context_depth": 2,
"expansion_factor": 0.7,
"lexical_weight": 0.5 # Higher value (0.5) gives more weight to keyword matching
},
# Optional filtering
"filters": {
"media_type": ["document", "image"],
"created_after": "2024-01-01T00:00:00Z"
}
}
# Execute search
response = requests.post(
"https://api.wickson.ai/v1/search",
headers={
"X-Api-Key": api_key,
"Content-Type": "application/json"
},
json=search_request
)
# Process response
if response.status_code == 200:
data = response.json()["data"]
meta = data["meta"]
# Print search information
print(f"Advanced search for: '{meta['query']}'")
print(f"Found {meta['stats']['total_results']} results in {meta['stats']['time_ms']}ms")
print(f"Depth reached: {meta['cost']['depth_reached']}")
print(f"Cost: ${meta['cost']['amount']}")
print(f"Lexical weight (keyword vs semantic): {search_request['config']['lexical_weight']}")
# Display collections searched
collections = meta['collections']['searched']
print(f"Collections searched: {', '.join(collections)}")
print(f"Results by collection: ", end="")
for coll, count in meta['collections']['result_distribution'].items():
print(f"{coll}: {count}", end=", ")
print()
# Display results
print("\nRESULTS:")
for i, result in enumerate(data["results"], 1):
print(f"\n{i}. {result['metadata']['file_info']['filename']} (Score: {result['score']:.2f})")
print(f" Collection: {result['collection']}")
print(f" Context depth: {result['context_depth']}")
print(f" {result['relevance_explanation']}")
# Show cross-modal connections if any
if result["cross_modal_connections"]:
print(" Connected to:")
for conn in result["cross_modal_connections"]:
print(f" - {conn['target_id']} ({conn['relation_type']}, {conn['strength']:.2f})")
else:
# Handle errors with helpful information
try:
error = response.json()
print(f"Error {response.status_code}: {error.get('message', 'Unknown error')}")
if "details" in error:
print(f"Details: {error['details']}")
if "suggestion" in error:
print(f"Suggestion: {error['suggestion']}")
except:
print(f"Error {response.status_code}: {response.text}")
cURL¶
curl -X POST https://api.wickson.ai/v1/search \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "renewable energy solutions for developing countries",
"type": "advanced",
"collections": ["research-papers", "case-studies", "policy-documents"],
"max_results": 15,
"min_score": 0.65,
"config": {
"context_depth": 2,
"expansion_factor": 0.7,
"lexical_weight": 0.5
},
"filters": {
"media_type": ["document", "image"],
"created_after": "2024-01-01T00:00:00Z"
}
}'
Tuning Semantic vs. Keyword Search Balance¶
The lexical_weight parameter lets you control the balance between semantic search (meaning-based) and lexical search (keyword-based). Let's see how different settings might affect search results:
import requests
# Configure search to emphasize semantic understanding
semantic_search = {
"query": "artificial intelligence benefits in healthcare",
"type": "basic",
"collections": "medical-research",
"config": {
"lexical_weight": 0.1 # Low value (0.1) emphasizes semantic understanding
},
"max_results": 10
}
# Configure search to emphasize exact keyword matching
keyword_search = {
"query": "artificial intelligence benefits in healthcare",
"type": "basic",
"collections": "medical-research",
"config": {
"lexical_weight": 0.8 # High value (0.8) emphasizes keyword matching
},
"max_results": 10
}
# Execute both searches
semantic_response = requests.post(
"https://api.wickson.ai/v1/search",
headers={
"X-Api-Key": "YOUR_API_KEY",
"Content-Type": "application/json"
},
json=semantic_search
)
keyword_response = requests.post(
"https://api.wickson.ai/v1/search",
headers={
"X-Api-Key": "YOUR_API_KEY",
"Content-Type": "application/json"
},
json=keyword_search
)
# Compare results
semantic_results = semantic_response.json()["data"]["results"]
keyword_results = keyword_response.json()["data"]["results"]
print("SEMANTIC SEARCH RESULTS (lexical_weight: 0.1):")
for i, result in enumerate(semantic_results[:5], 1):
print(f"{i}. {result['metadata']['file_info']['filename']} (Score: {result['score']:.2f})")
print(f" {result['relevance_explanation']}")
print("\nKEYWORD SEARCH RESULTS (lexical_weight: 0.8):")
for i, result in enumerate(keyword_results[:5], 1):
print(f"{i}. {result['metadata']['file_info']['filename']} (Score: {result['score']:.2f})")
print(f" {result['relevance_explanation']}")
Typical differences you'll see:
| lexical_weight | Behavior | Best for |
|---|---|---|
| 0.1-0.2 | • Returns conceptually related content • May include results that don't contain exact terms • Finds related concepts and synonyms |
Research, idea exploration, discovering related content |
| 0.3-0.6 | • Balanced approach • Prioritizes semantic matches but values exact terms • Default setting (0.35) works well for most cases |
General search, balancing relevance with precision |
| 0.7-0.9 | • Strongly favors exact keyword matches • Works more like traditional search • Useful for technical or precise queries |
Technical documentation, exact fact-finding, specialized terminology |
Collection Targeting Examples¶
Example 1: Searching a Single Collection¶
# Single collection search (using string format)
single_collection_search = {
"query": "renewable energy technologies",
"type": "basic",
"collections": "research-papers", # Single collection as string
"max_results": 10
}
response = requests.post(
"https://api.wickson.ai/v1/search",
headers={"X-Api-Key": "YOUR_API_KEY", "Content-Type": "application/json"},
json=single_collection_search
)
# Process results...
Example 2: Searching Multiple Specific Collections¶
# Multiple collections search (using array format)
multi_collection_search = {
"query": "renewable energy technologies",
"type": "advanced",
"collections": ["research-papers", "case-studies", "news"], # Multiple collections as array
"config": {
"context_depth": 2,
"lexical_weight": 0.4
},
"max_results": 15
}
response = requests.post(
"https://api.wickson.ai/v1/search",
headers={"X-Api-Key": "YOUR_API_KEY", "Content-Type": "application/json"},
json=multi_collection_search
)
# Process results will contain items from all specified collections
Example 3: Searching All Collections¶
# All collections search (using special "all" value)
all_collections_search = {
"query": "renewable energy technologies",
"type": "advanced",
"collections": "all", # Special value to search all your collections
"config": {
"context_depth": 2,
"lexical_weight": 0.4
},
"max_results": 20
}
response = requests.post(
"https://api.wickson.ai/v1/search",
headers={"X-Api-Key": "YOUR_API_KEY", "Content-Type": "application/json"},
json=all_collections_search
)
# Results will include items from all your collections
Understanding Search Results¶
Results include rich information to help you understand why content matched your query:
- Score: Relevance score from 0.0 to 1.0
- Metadata: Detailed information about the media item
- Media Type: Document, image, video, audio, or 3D model
- File Info: Name, size, type
- Search Metadata: Rich semantic information
- Summary: Concise content summary
- Description: Detailed content description
- Semantic Markers: Topics, keywords, categories
- Entities: People, organizations, locations detected
- Quality Metrics: Content quality scores
- Collection: Source collection for the result
- Relevance Explanation: Human-readable explanation of match reason
The relevance_explanation field provides a concise explanation of why a result matched your query. For example:
Very strong match (84.5% confidence) | Matches topics: renewable energy, sustainability | Quality scores - clarity: 90%, completeness: 85%
This tells you:
- The result is a very strong match (84.5% confidence)
- It matches the topics "renewable energy" and "sustainability"
- It has high clarity (90%) and completeness (85%) quality scores
In advanced search, results also include:
- Context Depth: Exploration level (0 = direct match, higher numbers = found through connections)
- Expansion Path: Chain of connections leading to result
- Cross-Modal Connections: Links to related content in different modalities
Understanding Cross-Modal Connections¶
Cross-modal connections represent relationships between different media types. For example, a document about climate change might be connected to:
"cross_modal_connections": [
{
"target_id": "vec-abc123",
"relation_type": "semantic_expansion",
"strength": 0.82,
"source_modality": "document",
"target_modality": "image",
"symbol": "━━",
"relationship_type": "→"
}
]
This shows:
- A strong connection (strength: 0.82) to another item
- The connection is from a document to an image
- The relationship type "→" and symbol "━━" indicate a strong connection
- It was found through "semantic_expansion" (meaning-based analysis)
Searchable Content vs. Original Files¶
When using the Wickson API and searching, it's important to understand:
- Your searches query vector embeddings and metadata derived from your original files
- Search results contain rich metadata, summaries, and content extracts
- The original files you uploaded are not stored or retrievable through the Wickson API
Important for Workflow Planning:
- Maintain your own repository of source files alongside using Wickson
- When search results identify relevant content, you'll need to access those original files from this storage
- This separation keeps your files more secure and gives you complete control over source material
- Wickson's focus on vector search (not file storage) enables powerful search capabilities without the complexity of managing your media files in another cloud storage system
Search with Filtering¶
You can narrow down search results using various (optional) filters:
import requests
# Configuration
api_key = "YOUR_API_KEY"
# Set up advanced filters
search_request = {
"query": "climate change impact",
"type": "basic",
"collections": "research",
# Apply multiple filters
"filters": {
# Filter by media type
"media_type": "document",
# Filter by date range
"created_after": "2023-01-01T00:00:00Z",
"created_before": "2023-12-31T23:59:59Z",
# Filter by topics
"topics": ["climate", "environment", "policy"],
# Filter by entities
"entities": {
"organizations": ["IPCC", "United Nations"]
}
},
# Configure hybrid search to favor semantic understanding
"config": {
"lexical_weight": 0.25 # Lower value favors semantic meaning over exact keyword matches
}
}
# Execute search
response = requests.post(
"https://api.wickson.ai/v1/search",
headers={
"X-Api-Key": api_key,
"Content-Type": "application/json"
},
json=search_request
)
# Process response
if response.status_code == 200:
data = response.json()["data"]
# Display results with filter information
results_count = len(data["results"])
print(f"Found {results_count} results matching filters")
# Show applied filters
if "filters" in search_request:
print("\nApplied filters:")
for filter_name, filter_value in search_request["filters"].items():
print(f"- {filter_name}: {filter_value}")
# Display results
for i, result in enumerate(data["results"], 1):
print(f"\n{i}. {result['metadata']['file_info']['filename']}")
print(f" Score: {result['score']:.2f}")
# Show relevance explanation
if "relevance_explanation" in result:
print(f" {result['relevance_explanation']}")
else:
print(f"Error {response.status_code}: {response.text}")
Modality-Specific Search¶
Document-Focused Search¶
Optimize search for document content with document-specific parameters:
document_search = {
"query": "corporate governance best practices",
"type": "basic",
"collections": "corporate-policies",
# Add document-specific modality configuration
"modality": {
"type": "document",
"weight": 0.8, # High weight prioritizes documents
"features": ["content", "structure", "quality"]
},
"max_results": 10,
"config": {
"lexical_weight": 0.6 # Higher weight favors exact term matching for technical content
}
}
# Document-specific results emphasize document quality and structure
Image-Focused Search¶
Optimize search for visual content:
image_search = {
"query": "coastal landscape sunset photography",
"type": "basic",
"collections": "photography",
# Add image-specific modality configuration
"modality": {
"type": "image",
"weight": 0.9, # Very high weight prioritizes images
"features": ["visual", "scene", "quality"]
},
"max_results": 15,
"config": {
"lexical_weight": 0.2 # Lower weight favors semantic/visual understanding
}
}
# Image-specific results emphasize visual attributes and scene recognition
Video-Focused Search¶
Optimize search for video content:
video_search = {
"query": "product demonstration manufacturing",
"type": "basic",
"collections": "training-videos",
# Add video-specific modality configuration
"modality": {
"type": "video",
"weight": 0.8,
"features": ["visual", "audio", "temporal"]
},
"max_results": 10,
"config": {
"lexical_weight": 0.3 # Balanced approach for video content
}
}
# Video-specific results emphasize visual, audio, and temporal aspects
Audio-Focused Search¶
Optimize search for audio content:
audio_search = {
"query": "piano jazz performances",
"type": "basic",
"collections": "music",
# Add audio-specific modality configuration
"modality": {
"type": "audio",
"weight": 0.9,
"features": ["acoustic", "quality", "speech"]
},
"max_results": 10,
"config": {
"lexical_weight": 0.2 # Lower weight for conceptual audio matching
}
}
# Audio-specific results emphasize acoustic properties and sound quality
Best Practices¶
Crafting Effective Queries¶
- Be Specific: "Renewable energy solutions for residential buildings" is better than "renewable energy"
- Include Key Concepts: Use domain-specific terminology relevant to your search
- Conversational Queries: Our system understands natural language, so ask as you would ask a human
- Use Context: Include relevant context for disambiguation
Optimizing Collection Strategy¶
- Logical Grouping: Create collections based on how you'll search
- Avoid Overfragmentation: Too many small collections can reduce search effectiveness
- Balanced Size: Aim for collections with 100-10000 items for optimal performance
- Clear Naming: Use descriptive names that reflect content
Performance Optimization¶
- Target Specific Collections: Search fewer collections for faster results
- Filter Appropriately: Use filters to narrow results by media type, date, etc.
- Control Search Depth: Lower depth values in advanced search mean faster responses
The min_score Threshold¶
Adjust min_score based on your needs:
- Higher (0.8+): Very relevant but fewer results
- Medium (0.5-0.7): Good balance of relevance and recall
- Lower (0.3-0.4): More results, potentially less relevant
Hybrid Search Optimization Examples¶
Technical Documentation Search¶
import requests
# Configure search for technical documentation
tech_doc_search = {
"query": "kubernetes pod security policies implementation",
"type": "basic",
"collections": "technical-docs",
"config": {
"lexical_weight": 0.7 # High value prioritizes exact technical terms
},
"max_results": 15
}
# Execute search
response = requests.post(
"https://api.wickson.ai/v1/search",
headers={
"X-Api-Key": "YOUR_API_KEY",
"Content-Type": "application/json"
},
json=tech_doc_search
)
# Process results...
Conceptual Research Search¶
import requests
# Configure search for conceptual research exploration
concept_search = {
"query": "relationship between climate factors and migration patterns",
"type": "advanced",
"collections": ["research", "academic-papers"],
"config": {
"lexical_weight": 0.15, # Low value prioritizes conceptual relationships
"context_depth": 3
},
"max_results": 20
}
# Execute search
response = requests.post(
"https://api.wickson.ai/v1/search",
headers={
"X-Api-Key": "YOUR_API_KEY",
"Content-Type": "application/json"
},
json=concept_search
)
# Process results...
Cross-Modal Discovery¶
Advanced search can discover relationships between different media types. For example, a search for "coastal erosion" might connect:
- A research document on climate change
- Satellite images showing coastline changes
- Video footage of storms and wave action
- 3D models of coastal defenses
# Example query to find cross-modal connections
search_request = {
"query": "coastal erosion case studies",
"type": "advanced",
"collections": "all",
"max_results": 20,
"config": {
"context_depth": 3, # Higher depth finds more connections
"lexical_weight": 0.3 # Slightly favor semantic understanding for cross-modal connections
}
}
response = requests.post(
"https://api.wickson.ai/v1/search",
headers={"X-Api-Key": "YOUR_API_KEY", "Content-Type": "application/json"},
json=search_request
)
# Analyzing cross-modal connections in results
if response.status_code == 200:
data = response.json()["data"]
# Print cross-modal connection information
for result in data["results"]:
if result.get("cross_modal_connections"):
print(f"\nItem: {result['metadata']['file_info']['filename']} ({result['metadata']['media_type']})")
print("Connected to:")
for conn in result["cross_modal_connections"]:
# Find the connected item's metadata in results
connected_item = next(
(r for r in data["results"] if r["id"] == conn["target_id"]),
{"metadata": {"file_info": {"filename": "Unknown"}, "media_type": "unknown"}}
)
# Show the connection with useful details
print(f" - {connected_item['metadata']['file_info']['filename']} " +
f"({connected_item['metadata']['media_type']}) " +
f"Connection: {conn['relation_type']}, " +
f"Strength: {conn['strength']:.2f}, " +
f"Symbol: {conn['symbol']}")
Error Handling¶
import requests
import time
import random
def search_with_retry(query, search_type="basic", collections="default", max_retries=3, lexical_weight=0.35):
"""Run a basic search"""
url = "https://api.wickson.ai/v1/search"
headers = {
"X-Api-Key": "YOUR_API_KEY",
"Content-Type": "application/json"
}
search_request = {
"query": query,
"type": search_type,
"collections": collections,
"max_results": 10,
"config": {
"lexical_weight": lexical_weight # Set the lexical weight for hybrid searching appropriately to favor lexical OR semantic search
}
}
for attempt in range(max_retries):
try:
response = requests.post(
url,
headers=headers,
json=search_request,
timeout=30 # Reasonable timeout
)
# Check for HTTP errors
response.raise_for_status()
# Check for API-level errors
result = response.json()
if not result.get("success", False):
error_msg = result.get("message", "Unknown error")
error_code = result.get("code", "unknown")
raise Exception(f"API error ({error_code}): {error_msg}")
return result["data"]
except requests.exceptions.RequestException as e:
# Handle network errors, timeouts, HTTP errors
if attempt == max_retries - 1:
print(f"Failed after {max_retries} attempts: {str(e)}")
return None
# Rate limit handling
if hasattr(e, 'response') and e.response is not None and e.response.status_code == 429:
retry_after = int(e.response.headers.get('Retry-After', 5))
print(f"Rate limited. Retrying after {retry_after} seconds...")
time.sleep(retry_after)
else:
# Exponential backoff with jitter
wait_time = (2 ** attempt) + (random.random() * 0.1)
print(f"Request failed. Retrying in {wait_time:.1f} seconds...")
time.sleep(wait_time)
except Exception as e:
# Handle other errors
print(f"Error: {str(e)}")
if attempt == max_retries - 1:
return None
time.sleep(2 ** attempt) # Exponential backoff
return None
Cost Considerations¶
| Operation | Cost | Notes |
|---|---|---|
| Basic Search | FREE | Fast, direct matching |
| Advanced Search | $0.01 + ($0.01 x depth) | Depth 2 = $0.03 total |
Costs are per search operation, regardless of how many collections you search.
Troubleshooting¶
| Issue | Solutions |
|---|---|
| No results |
|
| Irrelevant results |
|
| Missing exact term matches |
|
| Missing conceptually related content |
|
| Slow performance |
|
| Missing specific content |
|
| HTTP 400 errors |
|
| HTTP 401/403 errors |
|
| HTTP 429 errors |
|