Batch Processing API¶

Create Batch Processing Job¶

Creates and initiates an asynchronous batch processing job for multiple media files. The API automatically detects file types, processes each file in the batch, and generates vector embeddings optimized for search and analysis.

Endpoint¶

POST https://api.wickson.ai/v1/batches

Content Type¶

multipart/form-data

Authentication¶

X-Api-Key: <your_api_key>

All requests must include your API key in the X-Api-Key header.

Parameters¶

Parameter	Type	Required	Description
file	file	Yes	The file(s) to process. Can be specified multiple times for multiple files. Maximum 25 files total or up to 100 MB of total data per batch.
force_overwrite	boolean	No	When true, existing files are reprocessed. When false, duplicates are skipped. Default: false
collection_id	string	No	Group all batch media files into this collection for organization and retrieval. Default: "default"
batch_options	JSON string	No	Additional batch processing options as a JSON-encoded string (see below).

Batch Options Format¶

The batch_options parameter allows you to specify multiple options in a single JSON-formatted string:

{
  "force_overwrite": true,
  "collection_id": "my_collection"
}

Note: Individual parameters (like force_overwrite and collection_id) take precedence over values in batch_options if both are provided.

Supported File Types¶

Category	Supported Formats	Max Size	Additional Limits
Documents	.pdf, .doc, .docx, .rtf, .odt, .txt, .md	20MB	75K tokens
Web/Data Formats	.html, .hml, .xml, .json	20MB
E-book/Email	.epub, .eml	20MB
Presentation	.pptx, .odp	20MB
Spreadsheet	.csv, .tsv, .gif, .xlsx, .xls, .ods	20MB
Images	.jpg, .jpeg, .png, .gif, .bmp, .tiff, .tif, .webp, .svg, .heic, .heif	20MB	-
Video	.mp4, .mov, .avi, .webm, .mpeg, .mpg, .wmv	100MB	10 minutes
Audio	.mp3, .wav, .m4a, .flac, .ogg, .aac, .aiff, .wma	50MB	10 minutes
3D Models	.stl, .obj, .fbx, .glb, .3mf, .ply, .dae	50MB	-

Response Format¶

{
    "success": true,
    "data": {
        "batch_id": "batch_1713726085_15_20250309_122805",
        "status": "created",
        "total_items": 3,
        "created_at": "2025-03-09T12:28:05.321Z",
        "estimated_completion_seconds": 126,
        "estimated_completion": "2025-03-09T12:30:11.321Z",
        "collection": "quarterly_reports",
        "cost_estimate": {
            "total": 0.12,
            "breakdown": {
                "processing": {
                    "document": 0.09,
                    "image": 0.0,
                    "video": 0.0,
                    "audio": 0.0,
                    "model": 0.0
                },
                "storage": 0.03
            }
        },
        "settings": {
            "force_overwrite": true,
            "collection_id": "quarterly_reports",
            "media_count": 3
        }
    },
    "metadata": {
        "force_overwrite": true,
        "media_types": {
            "document": 3
        },
        "collection_id": "quarterly_reports"
    }
}

Code Examples¶

cURL Example¶

curl -X POST 'https://api.wickson.ai/v1/batches' \
    -H 'X-Api-Key: your_api_key' \
    -F 'file=@q1_report.pdf' \
    -F 'file=@q2_report.pdf' \
    -F 'collection_id=quarterly_reports' \
    -F 'force_overwrite=true'

Python Example¶

import requests

# Configuration
api_key = "your_api_key"
api_url = "https://api.wickson.ai/v1/batches"

# List of files to process
files_to_upload = [
    "q1_report.pdf",
    "q2_report.pdf",
    "q3_report.pdf"
]

# Prepare request
headers = {"X-Api-Key": api_key}
data = {
    "collection_id": "quarterly_reports",
    "force_overwrite": "true"
}

# Prepare files for multipart upload
files = [
    ("file", (filename, open(filename, "rb"))) 
    for filename in files_to_upload
]

# Submit batch
response = requests.post(api_url, headers=headers, data=data, files=files)

# Process response
if response.status_code == 200:
    result = response.json()
    batch_id = result["data"]["batch_id"]

    print(f"Batch created: {batch_id}")
    print(f"Status: {result['data']['status']}")
    print(f"Files: {result['data']['total_items']}")
    print(f"Estimated completion: {result['data']['estimated_completion']}")
    print(f"Cost: ${result['data']['cost_estimate']['total']}")
else:
    print(f"Error {response.status_code}: {response.text}")

Error Responses¶

The API uses HTTP status codes and structured error responses. All errors follow this format:

{
    "success": false,
    "error": {
        "code": string,
        "message": string,
        "details": object
    }
}

Common Errors¶

Status	Code	Description	Resolution
400	validation_error	Request validation failed	Check request format and parameters
400	invalid_media	File validation failed	Verify file format and size limits
403	insufficient_funds	Account balance too low	Add funds to continue
404	file_not_found	File path not accessible	Verify file exists and is readable
500	processing_error	Internal processing error	Contact support with batch_id

Detailed Error Examples¶

Validation Error¶

{
    "success": false,
    "error": {
        "code": "validation_error",
        "message": "Batch size exceeds maximum allowed",
        "details": {
            "max_allowed": 100,
            "requested": 150,
            "suggestion": "Please split your batch into smaller chunks of 100 or fewer items"
        }
    }
}

Resource Exhaustion¶

{
    "success": false,
    "error": {
        "code": "insufficient_funds",
        "message": "Insufficient balance for operation",
        "details": {
            "required_amount": 2.50,
            "current_balance": 1.00,
            "shortfall": 1.50
        }
    }
}

Understanding Batch Processing¶

How Batch Processing Works¶

When you submit a batch processing job, your request flows through these key stages:

Preprocessing

We validate your request format and ensure all parameters are valid
Files are examined for compatibility and optimization
Intelligent duplicate detection prevents unnecessary processing
Files are prepared for efficient processing

Vector Processing

Previous data is handled according to your settings
Embeddings are generated for your content
Results are stored in your specified collection

Completion

Processing summary is generated
Your account is updated accordingly

The vector processing occurs asynchronously after your submission is accepted, allowing you to continue other work while your batch is being processed.

Optimizing Your Batch Processing¶

Organizing Your Media¶

Collections are the foundation for organizing your media effectively. Think of collections as folders or categories that group related content together.

Best practices for collections:

Create logical groupings like "quarterly_reports", "product_images", or "training_videos"
Use consistent naming conventions that will scale with your needs
Consider hierarchical naming like department/project/type for complex organizations
Use collections to simplify search operations later

For optimal processing efficiency, we recommend:

Group similar media types together in batch submissions
Keep batch sizes between 20-50 items for best balance of efficiency and manageability
Use descriptive collection names that make sense to your team
Leverage collections for access control and filtering in your applications

Handling Errors and Resources¶

When implementing batch processing in your application, build in resilient error handling with exponential backoff for retries. Our system processes files asynchronously, so monitoring is essential.

Key error handling strategies:

Implement exponential backoff for failed requests
Monitor batch status endpoint for completion
Store batch IDs in your application for future reference
Build alerting for failed processing

Resource management tips:

Verify files exist and are readable before submission
Ensure your account has sufficient balance
Use force_overwrite sparingly to avoid unnecessary costs
Consider submitting large batches during off-peak hours
Process similar file types together for optimal performance

Understanding Billing¶

Our straightforward billing model consists of two charges per file:

Processing Fee ($0.03/file)

Covers computational cost of analyzing and processing your media
Applies to all media types
Charged upon successful processing

Storage/IO Fee ($0.01/file)

One-time fee for storing vector embeddings
No recurring storage charges or hidden fees

Total cost breakdown:

Each successfully processed file: $0.04 total
Duplicate files: No charge (unless force_overwrite=true)
Failed items: No charge (automatic refund if pre-charged)

We calculate and deduct costs upfront when you submit a batch, but issue automatic refunds for any files that fail to process. You can track all transactions via our API and use collections to organize cost tracking for different projects or departments.

Billing workflow:

Costs calculated during submission
Balance checked before processing begins
Charges deducted upfront
Refunds issued for any failed items
Full transaction history available via API

Important Considerations¶

Our asynchronous processing model means files are processed in the background after your submission is accepted. This allows us to handle large batches efficiently.

Key things to remember:

File processing happens asynchronously
Monitor progress via GET /v1/batches/{batch_id}
Failed items don't affect processing of other files
Checksums identify files with duplicate content, even under different filenames
The API will process a different version or revision of a file if its contents differ from the original
When using force_overwrite=true, existing vectors for previously processed files are permanently deleted, so use with caution!
Uploaded files are temporarily retained until processing completes, then deleted
Vector embeddings remain until explicitly deleted

The power of batch processing comes from its ability to handle multiple files efficiently. By organizing your media into thoughtful collections and following these best practices, you can maximize the effectiveness of our vector embedding and storage system while maintaining control over your costs.