Introduction
The Wickson API supports a wide range of media types, enabling comprehensive semantic understanding and vector-based search across diverse content formats. This guide details the supported file types, processing capabilities, size limitations, and optimization strategies for each media category.
The API processes five primary media categories:
| Media Type |
Description |
Use Cases |
| Documents |
Text-based files including papers, reports, and web content |
Knowledge bases, research archives, content management |
| Images |
Visual media ranging from photos to diagrams |
Product catalogs, visual archives, design libraries |
| Video |
Motion picture content with both visual and audio components |
Training materials, presentations, archival footage |
| Audio |
Sound recordings including speech, music, and ambient audio |
Podcasts, interviews, meeting recordings |
| 3D Models |
Three-dimensional object representations |
Product design, architecture, engineering assets |
| Format |
Extensions |
Notes |
| PDF |
.pdf |
Preferred for complex layouts and multi-page documents |
| Word Documents |
.doc, .docx |
Fully supported with formatting preservation |
| Rich Text Format |
.rtf |
Well-supported with basic formatting |
| OpenDocument Text |
.odt |
Full support for open document standard |
| Plain Text |
.txt |
Ideal for simple text content |
| Markdown |
.md |
Preserves basic formatting elements |
| Format |
Extensions |
Notes |
| HTML |
.html, .htm |
Extracts content while preserving basic structure |
| XML |
.xml |
Extracts text with semantic structure awareness |
| JSON |
.json |
Processes structured data with context preservation |
| Format |
Extensions |
Notes |
| EPUB |
.epub |
Full support for e-book content and structure |
| Email |
.eml |
Extracts message content and metadata |
| Format |
Extensions |
Notes |
| PowerPoint |
.pptx |
Processes slides with text and visual elements |
| OpenDocument Presentation |
.odp |
Alternative to PowerPoint, fully supported |
| CSV |
.csv |
Handles tabular data effectively |
| TSV |
.tsv |
Alternative delimiter-based tabular format |
| Excel |
.xlsx, .xls |
Processes complex spreadsheets with formulas |
| OpenDocument Spreadsheet |
.ods |
Open standard spreadsheet format |
| Format |
Extensions |
Notes |
| JPEG |
.jpg, .jpeg |
Excellent for photographs and complex images |
| PNG |
.png |
Ideal for screenshots, diagrams, and graphics |
| GIF |
.gif |
Supports static and animated images |
| BMP |
.bmp |
Uncompressed format with lossless quality |
| TIFF |
.tiff, .tif |
High-quality format often used for scanning |
| WebP |
.webp |
Modern format with good compression |
| SVG |
.svg |
Vector format ideal for diagrams and icons |
| HEIC |
.heic, .heif |
Apple's high-efficiency image format |
| Format |
Extensions |
Notes |
| MP4 |
.mp4 |
Recommended format with excellent compatibility |
| QuickTime |
.mov |
Full support for Apple's video format |
| AVI |
.avi |
Windows-native video container format |
| WebM |
.webm |
Open web video format |
| MPEG |
.mpeg, .mpg |
Standard video format with wide support |
| Windows Media Video |
.wmv |
Microsoft's proprietary video format |
| Format |
Extensions |
Notes |
| MP3 |
.mp3 |
Excellent for speech and general audio |
| WAV |
.wav |
Lossless audio with high quality |
| M4A |
.m4a |
Good quality with efficient compression |
| FLAC |
.flac |
Lossless compression with excellent quality |
| OGG |
.ogg |
Open format with good compression |
| AAC |
.aac |
High-quality compressed audio |
| AIFF |
.aiff, .aif |
Apple's uncompressed audio format |
| Windows Media Audio |
.wma |
Microsoft's audio format |
| Format |
Extensions |
Notes |
| STL |
.stl |
Common format for 3D printing |
| OBJ |
.obj |
Widely supported 3D format with textures |
| FBX |
.fbx |
Industry standard with limited support |
| GLB |
.glb |
Binary glTF format, highly recommended |
| 3MF |
.3mf |
Modern 3D manufacturing format |
| PLY |
.ply |
Polygon file format for 3D scanning |
| COLLADA |
.dae |
XML-based format for 3D assets |
File Size and Content Limits
The following limits apply to files processed by the Wickson API:
| Media Type |
Maximum Size |
Content Limitations |
| Documents |
20MB |
128,000 tokens maximum |
| Images |
20MB |
Extremely large resolutions may be downsampled |
| Video |
100MB |
10 minutes maximum duration |
| Audio |
50MB |
20 minutes maximum duration |
| 3D Models |
50MB |
Very complex models may be simplified |
Note: These limits may be adjusted during the beta phase as we optimize our processing pipelines. For the most current information, check the API status endpoint.
Document Processing
Documents undergo comprehensive processing to extract:
- Full text content with preservation of structure
- Semantic understanding of topics, themes, and concepts
- Entity recognition for people, organizations, locations, and more
- Quality analysis evaluating clarity, completeness, and relevance
- Structural analysis identifying sections, headings, and functional parts
- Relationship mapping connecting concepts within the document
Particularly effective for:
- Research papers
- Technical documentation
- Legal documents
- Reports and analyses
- Articles and publications
Image Processing
Images are analyzed to understand:
- Visual content identification of objects, scenes, and elements
- Text extraction from embedded text in images
- Scene comprehension understanding context and relationships
- Compositional analysis examining layout and design elements
- Quality assessment evaluating clarity and visual characteristics
- Aesthetic evaluation assessing visual impact and style
Particularly effective for:
- Photographs
- Diagrams and charts
- Screenshots
- Design assets
- Product imagery
Video Processing
Video content undergoes multi-modal analysis:
- Visual scene analysis frame-by-frame understanding
- Audio transcription conversion of speech to searchable text
- Timeline comprehension understanding sequence and narrative
- Key frame identification detecting significant visual moments
- Entity and action recognition identifying who and what appears
- Spoken content analysis understanding discussion topics
Particularly effective for:
- Presentations and talks
- Instructional content
- Interviews
- Product demonstrations
- Informational videos
Audio Processing
Audio files receive specialized processing:
- Speech recognition accurate transcription of spoken content
- Speaker identification distinguishing between different voices
- Non-speech audio analysis understanding music, sounds, and effects
- Topic recognition identifying discussion subjects
- Sentiment analysis detecting emotional tone
- Quality assessment evaluating clarity and listenability
Particularly effective for:
- Podcasts
- Interviews
- Lectures and speeches
- Meeting recordings
- Voice notes
3D Model Processing
3D models undergo spatial and structural analysis:
- Component identification recognizing parts and structures
- Spatial relationship mapping understanding positioning and scale
- Structural analysis evaluating design and engineering aspects
- Material classification identifying surface properties
- Purpose and function recognition understanding intended use
- Quality assessment evaluating model integrity and detail
Particularly effective for:
- Product designs
- Architectural models
- Engineering components
- Character models
- Scientific visualizations
Best Practices for File Preparation
Document Optimization
- Use PDF for complex documents to preserve formatting and structure
- Enable text layers in scanned documents for better text extraction
- Include proper headings and structure for improved comprehension
- Ensure reasonable font sizes (10-12pt for body text)
- Minimize embedded binary content that adds size without value
- Use standard fonts to ensure proper character recognition
- Balance image quality and file size for embedded graphics
Image Optimization
- Provide adequate resolution (minimum 800px on longest side recommended)
- Balance compression and quality (JPEG quality 80-90% recommended)
- Use PNG for text-heavy images like screenshots and diagrams
- Consider aspect ratio (extremely wide or tall images may process suboptimally)
- Remove unnecessary metadata to reduce file size
- Avoid heavy filters or effects that might obscure content
- Ensure proper lighting and contrast for better object recognition
Video Optimization
- Use MP4 with H.264 encoding for best compatibility
- Balance resolution and file size (720p or 1080p recommended)
- Ensure clear audio track for better transcription
- Keep videos under 5 minutes for optimal processing
- Use steady camera work for better visual analysis
- Reduce background noise in audio track
- Consider good lighting conditions for better visual recognition
Audio Optimization
- Use MP3 (128kbps+) or WAV for good quality and compatibility
- Ensure clear speech without excessive background noise
- Record at appropriate levels without clipping or distortion
- Use external microphones when possible for better quality
- Consider acoustic environment to minimize echo
- Normalize audio levels for consistent volume
- Split long recordings into logical segments
3D Model Optimization
- Use GLB format when possible for best compatibility
- Optimize polygon count while preserving important details
- Include proper materials and textures for better analysis
- Check for mesh errors like non-manifold geometry
- Use reasonable scale relative to real-world dimensions
- Include logical component naming for better identification
- Reduce unnecessary complexity in areas of low importance
Common Issues and Solutions
Documents
| Issue |
Solution |
| Poor text extraction |
Ensure document has text layer (not just images of text) |
| Missing structure |
Use proper headings and formatting in source document |
| Size limit exceeded |
Compress PDF or split into multiple documents |
| Processing errors with specific pages |
Check for unusual formatting or embedded objects |
Images
| Issue |
Solution |
| Poor object recognition |
Improve image clarity, lighting, and contrast |
| Text not recognized |
Ensure text is clearly visible with good contrast |
| Scene misinterpretation |
Provide clearer images with less ambiguity |
| Size limit exceeded |
Reduce resolution or compression quality |
Video
| Issue |
Solution |
| Poor transcription |
Improve audio quality and reduce background noise |
| Missed visual elements |
Ensure adequate lighting and camera stability |
| Processing timeout |
Reduce video length or split into segments |
| Size limit exceeded |
Reduce resolution or use more efficient encoding |
Audio
| Issue |
Solution |
| Transcription errors |
Improve recording quality and reduce background noise |
| Speaker misidentification |
Ensure clear transitions between speakers |
| Content misinterpretation |
Speak clearly and provide context |
| Size limit exceeded |
Use more efficient encoding or split recording |
3D Models
| Issue |
Solution |
| Component misidentification |
Use logical component organization |
| Processing timeout |
Reduce model complexity |
| Missing details |
Ensure important features have adequate resolution |
| Size limit exceeded |
Optimize mesh and texture resolution |
Batch Processing Considerations
- Group similar media types for more efficient processing
- Use consistent file formats within batches
- Balance batch size (5-20 items recommended)
- Consider processing order for interdependent content
- Monitor batch status for large operations
Multi-Modal Content Strategies
- Leverage cross-modal connections by uploading related content
- Create logical collections for content that belongs together
- Use consistent naming conventions across media types
- Consider processing order (process reference documents before derived content)
- Balance media type distribution for comprehensive coverage
Special Use Cases
Academic Research Archives
- Prioritize document quality for research papers
- Include diagrams and figures as separate image files
- Use consistent collection organization by research area
- Consider hierarchical naming for easy navigation
Product Catalogs
- Include multiple image angles for each product
- Pair products with detailed specification documents
- Organize by product category in separate collections
- Consider 3D models for complex products
Training Materials
- Include videos with clear narration
- Pair with supplementary documents
- Organize in logical learning sequences
- Use consistent formatting across materials