Batch Processing API¶
Create Batch Processing Job¶
Creates and initiates an asynchronous batch processing job for multiple media files. The API automatically detects file types, processes each file in the batch, and generates vector embeddings optimized for search and analysis.
Endpoint¶
Content Type¶
multipart/form-data
Authentication¶
All requests must include your API key in the X-Api-Key header.
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
| file | file | Yes | The file(s) to process. Can be specified multiple times for multiple files. Maximum 25 files total or up to 100 MB of total data per batch. |
| force_overwrite | boolean | No | When true, existing files are reprocessed. When false, duplicates are skipped. Default: false |
| collection_id | string | No | Group all batch media files into this collection for organization and retrieval. Default: "default" |
| batch_options | JSON string | No | Additional batch processing options as a JSON-encoded string (see below). |
Batch Options Format¶
The batch_options parameter allows you to specify multiple options in a single JSON-formatted string:
Note: Individual parameters (like
force_overwriteandcollection_id) take precedence over values inbatch_optionsif both are provided.
Supported File Types¶
| Category | Supported Formats | Max Size | Additional Limits |
|---|---|---|---|
| Documents | .pdf, .doc, .docx, .rtf, .odt, .txt, .md | 20MB | 75K tokens |
| Web/Data Formats | .html, .hml, .xml, .json | 20MB | |
| E-book/Email | .epub, .eml | 20MB | |
| Presentation | .pptx, .odp | 20MB | |
| Spreadsheet | .csv, .tsv, .gif, .xlsx, .xls, .ods | 20MB | |
| Images | .jpg, .jpeg, .png, .gif, .bmp, .tiff, .tif, .webp, .svg, .heic, .heif | 20MB | - |
| Video | .mp4, .mov, .avi, .webm, .mpeg, .mpg, .wmv | 100MB | 10 minutes |
| Audio | .mp3, .wav, .m4a, .flac, .ogg, .aac, .aiff, .wma | 50MB | 10 minutes |
| 3D Models | .stl, .obj, .fbx, .glb, .3mf, .ply, .dae | 50MB | - |
Response Format¶
{
"success": true,
"data": {
"batch_id": "batch_1713726085_15_20250309_122805",
"status": "created",
"total_items": 3,
"created_at": "2025-03-09T12:28:05.321Z",
"estimated_completion_seconds": 126,
"estimated_completion": "2025-03-09T12:30:11.321Z",
"collection": "quarterly_reports",
"cost_estimate": {
"total": 0.12,
"breakdown": {
"processing": {
"document": 0.09,
"image": 0.0,
"video": 0.0,
"audio": 0.0,
"model": 0.0
},
"storage": 0.03
}
},
"settings": {
"force_overwrite": true,
"collection_id": "quarterly_reports",
"media_count": 3
}
},
"metadata": {
"force_overwrite": true,
"media_types": {
"document": 3
},
"collection_id": "quarterly_reports"
}
}
Code Examples¶
cURL Example¶
curl -X POST 'https://api.wickson.ai/v1/batches' \
-H 'X-Api-Key: your_api_key' \
-F 'file=@q1_report.pdf' \
-F 'file=@q2_report.pdf' \
-F 'collection_id=quarterly_reports' \
-F 'force_overwrite=true'
Python Example¶
import requests
# Configuration
api_key = "your_api_key"
api_url = "https://api.wickson.ai/v1/batches"
# List of files to process
files_to_upload = [
"q1_report.pdf",
"q2_report.pdf",
"q3_report.pdf"
]
# Prepare request
headers = {"X-Api-Key": api_key}
data = {
"collection_id": "quarterly_reports",
"force_overwrite": "true"
}
# Prepare files for multipart upload
files = [
("file", (filename, open(filename, "rb")))
for filename in files_to_upload
]
# Submit batch
response = requests.post(api_url, headers=headers, data=data, files=files)
# Process response
if response.status_code == 200:
result = response.json()
batch_id = result["data"]["batch_id"]
print(f"Batch created: {batch_id}")
print(f"Status: {result['data']['status']}")
print(f"Files: {result['data']['total_items']}")
print(f"Estimated completion: {result['data']['estimated_completion']}")
print(f"Cost: ${result['data']['cost_estimate']['total']}")
else:
print(f"Error {response.status_code}: {response.text}")
Error Responses¶
The API uses HTTP status codes and structured error responses. All errors follow this format:
Common Errors¶
| Status | Code | Description | Resolution |
|---|---|---|---|
| 400 | validation_error | Request validation failed | Check request format and parameters |
| 400 | invalid_media | File validation failed | Verify file format and size limits |
| 403 | insufficient_funds | Account balance too low | Add funds to continue |
| 404 | file_not_found | File path not accessible | Verify file exists and is readable |
| 500 | processing_error | Internal processing error | Contact support with batch_id |
Detailed Error Examples¶
Validation Error¶
{
"success": false,
"error": {
"code": "validation_error",
"message": "Batch size exceeds maximum allowed",
"details": {
"max_allowed": 100,
"requested": 150,
"suggestion": "Please split your batch into smaller chunks of 100 or fewer items"
}
}
}
Resource Exhaustion¶
{
"success": false,
"error": {
"code": "insufficient_funds",
"message": "Insufficient balance for operation",
"details": {
"required_amount": 2.50,
"current_balance": 1.00,
"shortfall": 1.50
}
}
}
Understanding Batch Processing¶
How Batch Processing Works¶
When you submit a batch processing job, your request flows through these key stages:
Preprocessing
- We validate your request format and ensure all parameters are valid
- Files are examined for compatibility and optimization
- Intelligent duplicate detection prevents unnecessary processing
- Files are prepared for efficient processing
Vector Processing
- Previous data is handled according to your settings
- Embeddings are generated for your content
- Results are stored in your specified collection
Completion
- Processing summary is generated
- Your account is updated accordingly
The vector processing occurs asynchronously after your submission is accepted, allowing you to continue other work while your batch is being processed.
Optimizing Your Batch Processing¶
Organizing Your Media¶
Collections are the foundation for organizing your media effectively. Think of collections as folders or categories that group related content together.
Best practices for collections:
- Create logical groupings like "quarterly_reports", "product_images", or "training_videos"
- Use consistent naming conventions that will scale with your needs
- Consider hierarchical naming like
department/project/typefor complex organizations - Use collections to simplify search operations later
For optimal processing efficiency, we recommend:
- Group similar media types together in batch submissions
- Keep batch sizes between 20-50 items for best balance of efficiency and manageability
- Use descriptive collection names that make sense to your team
- Leverage collections for access control and filtering in your applications
Handling Errors and Resources¶
When implementing batch processing in your application, build in resilient error handling with exponential backoff for retries. Our system processes files asynchronously, so monitoring is essential.
Key error handling strategies:
- Implement exponential backoff for failed requests
- Monitor batch status endpoint for completion
- Store batch IDs in your application for future reference
- Build alerting for failed processing
Resource management tips:
- Verify files exist and are readable before submission
- Ensure your account has sufficient balance
- Use
force_overwritesparingly to avoid unnecessary costs - Consider submitting large batches during off-peak hours
- Process similar file types together for optimal performance
Understanding Billing¶
Our straightforward billing model consists of two charges per file:
Processing Fee ($0.03/file)
- Covers computational cost of analyzing and processing your media
- Applies to all media types
- Charged upon successful processing
Storage/IO Fee ($0.01/file)
- One-time fee for storing vector embeddings
- No recurring storage charges or hidden fees
Total cost breakdown:
- Each successfully processed file: $0.04 total
- Duplicate files: No charge (unless
force_overwrite=true) - Failed items: No charge (automatic refund if pre-charged)
We calculate and deduct costs upfront when you submit a batch, but issue automatic refunds for any files that fail to process. You can track all transactions via our API and use collections to organize cost tracking for different projects or departments.
Billing workflow:
- Costs calculated during submission
- Balance checked before processing begins
- Charges deducted upfront
- Refunds issued for any failed items
- Full transaction history available via API
Important Considerations¶
Our asynchronous processing model means files are processed in the background after your submission is accepted. This allows us to handle large batches efficiently.
Key things to remember:
- File processing happens asynchronously
- Monitor progress via
GET /v1/batches/{batch_id} - Failed items don't affect processing of other files
- Checksums identify files with duplicate content, even under different filenames
- The API will process a different version or revision of a file if its contents differ from the original
- When using
force_overwrite=true, existing vectors for previously processed files are permanently deleted, so use with caution! - Uploaded files are temporarily retained until processing completes, then deleted
- Vector embeddings remain until explicitly deleted
The power of batch processing comes from its ability to handle multiple files efficiently. By organizing your media into thoughtful collections and following these best practices, you can maximize the effectiveness of our vector embedding and storage system while maintaining control over your costs.