Skip to content

Batch Processing API

Create Batch Processing Job

Creates and initiates an asynchronous batch processing job for multiple media files. The API automatically detects file types, processes each file in the batch, and generates vector embeddings optimized for search and analysis.

Endpoint

POST https://api.wickson.ai/v1/batches

Content Type

multipart/form-data

Authentication

X-Api-Key: <your_api_key>

All requests must include your API key in the X-Api-Key header.

Parameters

Parameter Type Required Description
file file Yes The file(s) to process. Can be specified multiple times for multiple files. Maximum 25 files total or up to 100 MB of total data per batch.
force_overwrite boolean No When true, existing files are reprocessed. When false, duplicates are skipped. Default: false
collection_id string No Group all batch media files into this collection for organization and retrieval. Default: "default"
batch_options JSON string No Additional batch processing options as a JSON-encoded string (see below).

Batch Options Format

The batch_options parameter allows you to specify multiple options in a single JSON-formatted string:

{
  "force_overwrite": true,
  "collection_id": "my_collection"
}

Note: Individual parameters (like force_overwrite and collection_id) take precedence over values in batch_options if both are provided.

Supported File Types

Category Supported Formats Max Size Additional Limits
Documents .pdf, .doc, .docx, .rtf, .odt, .txt, .md 20MB 75K tokens
Web/Data Formats .html, .hml, .xml, .json 20MB
E-book/Email .epub, .eml 20MB
Presentation .pptx, .odp 20MB
Spreadsheet .csv, .tsv, .gif, .xlsx, .xls, .ods 20MB
Images .jpg, .jpeg, .png, .gif, .bmp, .tiff, .tif, .webp, .svg, .heic, .heif 20MB -
Video .mp4, .mov, .avi, .webm, .mpeg, .mpg, .wmv 100MB 10 minutes
Audio .mp3, .wav, .m4a, .flac, .ogg, .aac, .aiff, .wma 50MB 10 minutes
3D Models .stl, .obj, .fbx, .glb, .3mf, .ply, .dae 50MB -

Response Format

{
    "success": true,
    "data": {
        "batch_id": "batch_1713726085_15_20250309_122805",
        "status": "created",
        "total_items": 3,
        "created_at": "2025-03-09T12:28:05.321Z",
        "estimated_completion_seconds": 126,
        "estimated_completion": "2025-03-09T12:30:11.321Z",
        "collection": "quarterly_reports",
        "cost_estimate": {
            "total": 0.12,
            "breakdown": {
                "processing": {
                    "document": 0.09,
                    "image": 0.0,
                    "video": 0.0,
                    "audio": 0.0,
                    "model": 0.0
                },
                "storage": 0.03
            }
        },
        "settings": {
            "force_overwrite": true,
            "collection_id": "quarterly_reports",
            "media_count": 3
        }
    },
    "metadata": {
        "force_overwrite": true,
        "media_types": {
            "document": 3
        },
        "collection_id": "quarterly_reports"
    }
}

Code Examples

cURL Example

curl -X POST 'https://api.wickson.ai/v1/batches' \
    -H 'X-Api-Key: your_api_key' \
    -F 'file=@q1_report.pdf' \
    -F 'file=@q2_report.pdf' \
    -F 'collection_id=quarterly_reports' \
    -F 'force_overwrite=true'

Python Example

import requests

# Configuration
api_key = "your_api_key"
api_url = "https://api.wickson.ai/v1/batches"

# List of files to process
files_to_upload = [
    "q1_report.pdf",
    "q2_report.pdf",
    "q3_report.pdf"
]

# Prepare request
headers = {"X-Api-Key": api_key}
data = {
    "collection_id": "quarterly_reports",
    "force_overwrite": "true"
}

# Prepare files for multipart upload
files = [
    ("file", (filename, open(filename, "rb"))) 
    for filename in files_to_upload
]

# Submit batch
response = requests.post(api_url, headers=headers, data=data, files=files)

# Process response
if response.status_code == 200:
    result = response.json()
    batch_id = result["data"]["batch_id"]

    print(f"Batch created: {batch_id}")
    print(f"Status: {result['data']['status']}")
    print(f"Files: {result['data']['total_items']}")
    print(f"Estimated completion: {result['data']['estimated_completion']}")
    print(f"Cost: ${result['data']['cost_estimate']['total']}")
else:
    print(f"Error {response.status_code}: {response.text}")

Error Responses

The API uses HTTP status codes and structured error responses. All errors follow this format:

{
    "success": false,
    "error": {
        "code": string,
        "message": string,
        "details": object
    }
}

Common Errors

Status Code Description Resolution
400 validation_error Request validation failed Check request format and parameters
400 invalid_media File validation failed Verify file format and size limits
403 insufficient_funds Account balance too low Add funds to continue
404 file_not_found File path not accessible Verify file exists and is readable
500 processing_error Internal processing error Contact support with batch_id

Detailed Error Examples

Validation Error

{
    "success": false,
    "error": {
        "code": "validation_error",
        "message": "Batch size exceeds maximum allowed",
        "details": {
            "max_allowed": 100,
            "requested": 150,
            "suggestion": "Please split your batch into smaller chunks of 100 or fewer items"
        }
    }
}

Resource Exhaustion

{
    "success": false,
    "error": {
        "code": "insufficient_funds",
        "message": "Insufficient balance for operation",
        "details": {
            "required_amount": 2.50,
            "current_balance": 1.00,
            "shortfall": 1.50
        }
    }
}

Understanding Batch Processing

How Batch Processing Works

When you submit a batch processing job, your request flows through these key stages:

Preprocessing

  • We validate your request format and ensure all parameters are valid
  • Files are examined for compatibility and optimization
  • Intelligent duplicate detection prevents unnecessary processing
  • Files are prepared for efficient processing

Vector Processing

  • Previous data is handled according to your settings
  • Embeddings are generated for your content
  • Results are stored in your specified collection

Completion

  • Processing summary is generated
  • Your account is updated accordingly

The vector processing occurs asynchronously after your submission is accepted, allowing you to continue other work while your batch is being processed.

Optimizing Your Batch Processing

Organizing Your Media

Collections are the foundation for organizing your media effectively. Think of collections as folders or categories that group related content together.

Best practices for collections:

  • Create logical groupings like "quarterly_reports", "product_images", or "training_videos"
  • Use consistent naming conventions that will scale with your needs
  • Consider hierarchical naming like department/project/type for complex organizations
  • Use collections to simplify search operations later

For optimal processing efficiency, we recommend:

  • Group similar media types together in batch submissions
  • Keep batch sizes between 20-50 items for best balance of efficiency and manageability
  • Use descriptive collection names that make sense to your team
  • Leverage collections for access control and filtering in your applications

Handling Errors and Resources

When implementing batch processing in your application, build in resilient error handling with exponential backoff for retries. Our system processes files asynchronously, so monitoring is essential.

Key error handling strategies:

  • Implement exponential backoff for failed requests
  • Monitor batch status endpoint for completion
  • Store batch IDs in your application for future reference
  • Build alerting for failed processing

Resource management tips:

  • Verify files exist and are readable before submission
  • Ensure your account has sufficient balance
  • Use force_overwrite sparingly to avoid unnecessary costs
  • Consider submitting large batches during off-peak hours
  • Process similar file types together for optimal performance

Understanding Billing

Our straightforward billing model consists of two charges per file:

Processing Fee ($0.03/file)

  • Covers computational cost of analyzing and processing your media
  • Applies to all media types
  • Charged upon successful processing

Storage/IO Fee ($0.01/file)

  • One-time fee for storing vector embeddings
  • No recurring storage charges or hidden fees

Total cost breakdown:

  • Each successfully processed file: $0.04 total
  • Duplicate files: No charge (unless force_overwrite=true)
  • Failed items: No charge (automatic refund if pre-charged)

We calculate and deduct costs upfront when you submit a batch, but issue automatic refunds for any files that fail to process. You can track all transactions via our API and use collections to organize cost tracking for different projects or departments.

Billing workflow:

  1. Costs calculated during submission
  2. Balance checked before processing begins
  3. Charges deducted upfront
  4. Refunds issued for any failed items
  5. Full transaction history available via API

Important Considerations

Our asynchronous processing model means files are processed in the background after your submission is accepted. This allows us to handle large batches efficiently.

Key things to remember:

  • File processing happens asynchronously
  • Monitor progress via GET /v1/batches/{batch_id}
  • Failed items don't affect processing of other files
  • Checksums identify files with duplicate content, even under different filenames
  • The API will process a different version or revision of a file if its contents differ from the original
  • When using force_overwrite=true, existing vectors for previously processed files are permanently deleted, so use with caution!
  • Uploaded files are temporarily retained until processing completes, then deleted
  • Vector embeddings remain until explicitly deleted

The power of batch processing comes from its ability to handle multiple files efficiently. By organizing your media into thoughtful collections and following these best practices, you can maximize the effectiveness of our vector embedding and storage system while maintaining control over your costs.

This site uses cookies to help us improve the overall documentation and browsing experience. By continuing to use this site, you agree to our Privacy Policy.