Skip to content

Media Types Guide

Introduction

The Wickson API supports a wide range of media types, enabling comprehensive semantic understanding and vector-based search across diverse content formats. This guide details the supported file types, processing capabilities, size limitations, and optimization strategies for each media category.

Supported Media Categories

The API processes five primary media categories:

Media Type Description Use Cases
Documents Text-based files including papers, reports, and web content Knowledge bases, research archives, content management
Images Visual media ranging from photos to diagrams Product catalogs, visual archives, design libraries
Video Motion picture content with both visual and audio components Training materials, presentations, archival footage
Audio Sound recordings including speech, music, and ambient audio Podcasts, interviews, meeting recordings
3D Models Three-dimensional object representations Product design, architecture, engineering assets

File Format Support

Document Formats

Format Extensions Notes
PDF .pdf Preferred for complex layouts and multi-page documents
Word Documents .doc, .docx Fully supported with formatting preservation
Rich Text Format .rtf Well-supported with basic formatting
OpenDocument Text .odt Full support for open document standard
Plain Text .txt Ideal for simple text content
Markdown .md Preserves basic formatting elements

Web & Data Formats

Format Extensions Notes
HTML .html, .htm Extracts content while preserving basic structure
XML .xml Extracts text with semantic structure awareness
JSON .json Processes structured data with context preservation

E-Book & Email Formats

Format Extensions Notes
EPUB .epub Full support for e-book content and structure
Email .eml Extracts message content and metadata

Presentation & Spreadsheet Formats

Format Extensions Notes
PowerPoint .pptx Processes slides with text and visual elements
OpenDocument Presentation .odp Alternative to PowerPoint, fully supported
CSV .csv Handles tabular data effectively
TSV .tsv Alternative delimiter-based tabular format
Excel .xlsx, .xls Processes complex spreadsheets with formulas
OpenDocument Spreadsheet .ods Open standard spreadsheet format

Image Formats

Format Extensions Notes
JPEG .jpg, .jpeg Excellent for photographs and complex images
PNG .png Ideal for screenshots, diagrams, and graphics
GIF .gif Supports static and animated images
BMP .bmp Uncompressed format with lossless quality
TIFF .tiff, .tif High-quality format often used for scanning
WebP .webp Modern format with good compression
SVG .svg Vector format ideal for diagrams and icons
HEIC .heic, .heif Apple's high-efficiency image format

Video Formats

Format Extensions Notes
MP4 .mp4 Recommended format with excellent compatibility
QuickTime .mov Full support for Apple's video format
AVI .avi Windows-native video container format
WebM .webm Open web video format
MPEG .mpeg, .mpg Standard video format with wide support
Windows Media Video .wmv Microsoft's proprietary video format

Audio Formats

Format Extensions Notes
MP3 .mp3 Excellent for speech and general audio
WAV .wav Lossless audio with high quality
M4A .m4a Good quality with efficient compression
FLAC .flac Lossless compression with excellent quality
OGG .ogg Open format with good compression
AAC .aac High-quality compressed audio
AIFF .aiff, .aif Apple's uncompressed audio format
Windows Media Audio .wma Microsoft's audio format

3D Model Formats

Format Extensions Notes
STL .stl Common format for 3D printing
OBJ .obj Widely supported 3D format with textures
FBX .fbx Industry standard with limited support
GLB .glb Binary glTF format, highly recommended
3MF .3mf Modern 3D manufacturing format
PLY .ply Polygon file format for 3D scanning
COLLADA .dae XML-based format for 3D assets

File Size and Content Limits

The following limits apply to files processed by the Wickson API:

Media Type Maximum Size Content Limitations
Documents 20MB 128,000 tokens maximum
Images 20MB Extremely large resolutions may be downsampled
Video 100MB 10 minutes maximum duration
Audio 50MB 20 minutes maximum duration
3D Models 50MB Very complex models may be simplified

Note: These limits may be adjusted during the beta phase as we optimize our processing pipelines. For the most current information, check the API status endpoint.

Media Type Processing Capabilities

Document Processing

Documents undergo comprehensive processing to extract:

  • Full text content with preservation of structure
  • Semantic understanding of topics, themes, and concepts
  • Entity recognition for people, organizations, locations, and more
  • Quality analysis evaluating clarity, completeness, and relevance
  • Structural analysis identifying sections, headings, and functional parts
  • Relationship mapping connecting concepts within the document

Particularly effective for:

  • Research papers
  • Technical documentation
  • Legal documents
  • Reports and analyses
  • Articles and publications

Image Processing

Images are analyzed to understand:

  • Visual content identification of objects, scenes, and elements
  • Text extraction from embedded text in images
  • Scene comprehension understanding context and relationships
  • Compositional analysis examining layout and design elements
  • Quality assessment evaluating clarity and visual characteristics
  • Aesthetic evaluation assessing visual impact and style

Particularly effective for:

  • Photographs
  • Diagrams and charts
  • Screenshots
  • Design assets
  • Product imagery

Video Processing

Video content undergoes multi-modal analysis:

  • Visual scene analysis frame-by-frame understanding
  • Audio transcription conversion of speech to searchable text
  • Timeline comprehension understanding sequence and narrative
  • Key frame identification detecting significant visual moments
  • Entity and action recognition identifying who and what appears
  • Spoken content analysis understanding discussion topics

Particularly effective for:

  • Presentations and talks
  • Instructional content
  • Interviews
  • Product demonstrations
  • Informational videos

Audio Processing

Audio files receive specialized processing:

  • Speech recognition accurate transcription of spoken content
  • Speaker identification distinguishing between different voices
  • Non-speech audio analysis understanding music, sounds, and effects
  • Topic recognition identifying discussion subjects
  • Sentiment analysis detecting emotional tone
  • Quality assessment evaluating clarity and listenability

Particularly effective for:

  • Podcasts
  • Interviews
  • Lectures and speeches
  • Meeting recordings
  • Voice notes

3D Model Processing

3D models undergo spatial and structural analysis:

  • Component identification recognizing parts and structures
  • Spatial relationship mapping understanding positioning and scale
  • Structural analysis evaluating design and engineering aspects
  • Material classification identifying surface properties
  • Purpose and function recognition understanding intended use
  • Quality assessment evaluating model integrity and detail

Particularly effective for:

  • Product designs
  • Architectural models
  • Engineering components
  • Character models
  • Scientific visualizations

Best Practices for File Preparation

Document Optimization

  • Use PDF for complex documents to preserve formatting and structure
  • Enable text layers in scanned documents for better text extraction
  • Include proper headings and structure for improved comprehension
  • Ensure reasonable font sizes (10-12pt for body text)
  • Minimize embedded binary content that adds size without value
  • Use standard fonts to ensure proper character recognition
  • Balance image quality and file size for embedded graphics

Image Optimization

  • Provide adequate resolution (minimum 800px on longest side recommended)
  • Balance compression and quality (JPEG quality 80-90% recommended)
  • Use PNG for text-heavy images like screenshots and diagrams
  • Consider aspect ratio (extremely wide or tall images may process suboptimally)
  • Remove unnecessary metadata to reduce file size
  • Avoid heavy filters or effects that might obscure content
  • Ensure proper lighting and contrast for better object recognition

Video Optimization

  • Use MP4 with H.264 encoding for best compatibility
  • Balance resolution and file size (720p or 1080p recommended)
  • Ensure clear audio track for better transcription
  • Keep videos under 5 minutes for optimal processing
  • Use steady camera work for better visual analysis
  • Reduce background noise in audio track
  • Consider good lighting conditions for better visual recognition

Audio Optimization

  • Use MP3 (128kbps+) or WAV for good quality and compatibility
  • Ensure clear speech without excessive background noise
  • Record at appropriate levels without clipping or distortion
  • Use external microphones when possible for better quality
  • Consider acoustic environment to minimize echo
  • Normalize audio levels for consistent volume
  • Split long recordings into logical segments

3D Model Optimization

  • Use GLB format when possible for best compatibility
  • Optimize polygon count while preserving important details
  • Include proper materials and textures for better analysis
  • Check for mesh errors like non-manifold geometry
  • Use reasonable scale relative to real-world dimensions
  • Include logical component naming for better identification
  • Reduce unnecessary complexity in areas of low importance

Troubleshooting Media Processing

Common Issues and Solutions

Documents

Issue Solution
Poor text extraction Ensure document has text layer (not just images of text)
Missing structure Use proper headings and formatting in source document
Size limit exceeded Compress PDF or split into multiple documents
Processing errors with specific pages Check for unusual formatting or embedded objects

Images

Issue Solution
Poor object recognition Improve image clarity, lighting, and contrast
Text not recognized Ensure text is clearly visible with good contrast
Scene misinterpretation Provide clearer images with less ambiguity
Size limit exceeded Reduce resolution or compression quality

Video

Issue Solution
Poor transcription Improve audio quality and reduce background noise
Missed visual elements Ensure adequate lighting and camera stability
Processing timeout Reduce video length or split into segments
Size limit exceeded Reduce resolution or use more efficient encoding

Audio

Issue Solution
Transcription errors Improve recording quality and reduce background noise
Speaker misidentification Ensure clear transitions between speakers
Content misinterpretation Speak clearly and provide context
Size limit exceeded Use more efficient encoding or split recording

3D Models

Issue Solution
Component misidentification Use logical component organization
Processing timeout Reduce model complexity
Missing details Ensure important features have adequate resolution
Size limit exceeded Optimize mesh and texture resolution

Advanced Media Processing Tips

Batch Processing Considerations

  • Group similar media types for more efficient processing
  • Use consistent file formats within batches
  • Balance batch size (5-20 items recommended)
  • Consider processing order for interdependent content
  • Monitor batch status for large operations

Multi-Modal Content Strategies

  • Leverage cross-modal connections by uploading related content
  • Create logical collections for content that belongs together
  • Use consistent naming conventions across media types
  • Consider processing order (process reference documents before derived content)
  • Balance media type distribution for comprehensive coverage

Special Use Cases

Academic Research Archives

  • Prioritize document quality for research papers
  • Include diagrams and figures as separate image files
  • Use consistent collection organization by research area
  • Consider hierarchical naming for easy navigation

Product Catalogs

  • Include multiple image angles for each product
  • Pair products with detailed specification documents
  • Organize by product category in separate collections
  • Consider 3D models for complex products

Training Materials

  • Include videos with clear narration
  • Pair with supplementary documents
  • Organize in logical learning sequences
  • Use consistent formatting across materials

This site uses cookies to help us improve the overall documentation and browsing experience. By continuing to use this site, you agree to our Privacy Policy.