Media Types Guide¶

Introduction¶

The Wickson API supports a wide range of media types, enabling comprehensive semantic understanding and vector-based search across diverse content formats. This guide details the supported file types, processing capabilities, size limitations, and optimization strategies for each media category.

Supported Media Categories¶

The API processes five primary media categories:

Media Type	Description	Use Cases
Documents	Text-based files including papers, reports, and web content	Knowledge bases, research archives, content management
Images	Visual media ranging from photos to diagrams	Product catalogs, visual archives, design libraries
Video	Motion picture content with both visual and audio components	Training materials, presentations, archival footage
Audio	Sound recordings including speech, music, and ambient audio	Podcasts, interviews, meeting recordings
3D Models	Three-dimensional object representations	Product design, architecture, engineering assets

File Format Support¶

Document Formats¶

Format	Extensions	Notes
PDF	.pdf	Preferred for complex layouts and multi-page documents
Word Documents	.doc, .docx	Fully supported with formatting preservation
Rich Text Format	.rtf	Well-supported with basic formatting
OpenDocument Text	.odt	Full support for open document standard
Plain Text	.txt	Ideal for simple text content
Markdown	.md	Preserves basic formatting elements

Web & Data Formats¶

Format	Extensions	Notes
HTML	.html, .htm	Extracts content while preserving basic structure
XML	.xml	Extracts text with semantic structure awareness
JSON	.json	Processes structured data with context preservation

E-Book & Email Formats¶

Format	Extensions	Notes
EPUB	.epub	Full support for e-book content and structure
Email	.eml	Extracts message content and metadata

Presentation & Spreadsheet Formats¶

Format	Extensions	Notes
PowerPoint	.pptx	Processes slides with text and visual elements
OpenDocument Presentation	.odp	Alternative to PowerPoint, fully supported
CSV	.csv	Handles tabular data effectively
TSV	.tsv	Alternative delimiter-based tabular format
Excel	.xlsx, .xls	Processes complex spreadsheets with formulas
OpenDocument Spreadsheet	.ods	Open standard spreadsheet format

Image Formats¶

Format	Extensions	Notes
JPEG	.jpg, .jpeg	Excellent for photographs and complex images
PNG	.png	Ideal for screenshots, diagrams, and graphics
GIF	.gif	Supports static and animated images
BMP	.bmp	Uncompressed format with lossless quality
TIFF	.tiff, .tif	High-quality format often used for scanning
WebP	.webp	Modern format with good compression
SVG	.svg	Vector format ideal for diagrams and icons
HEIC	.heic, .heif	Apple's high-efficiency image format

Video Formats¶

Format	Extensions	Notes
MP4	.mp4	Recommended format with excellent compatibility
QuickTime	.mov	Full support for Apple's video format
AVI	.avi	Windows-native video container format
WebM	.webm	Open web video format
MPEG	.mpeg, .mpg	Standard video format with wide support
Windows Media Video	.wmv	Microsoft's proprietary video format

Audio Formats¶

Format	Extensions	Notes
MP3	.mp3	Excellent for speech and general audio
WAV	.wav	Lossless audio with high quality
M4A	.m4a	Good quality with efficient compression
FLAC	.flac	Lossless compression with excellent quality
OGG	.ogg	Open format with good compression
AAC	.aac	High-quality compressed audio
AIFF	.aiff, .aif	Apple's uncompressed audio format
Windows Media Audio	.wma	Microsoft's audio format

3D Model Formats¶

Format	Extensions	Notes
STL	.stl	Common format for 3D printing
OBJ	.obj	Widely supported 3D format with textures
FBX	.fbx	Industry standard with limited support
GLB	.glb	Binary glTF format, highly recommended
3MF	.3mf	Modern 3D manufacturing format
PLY	.ply	Polygon file format for 3D scanning
COLLADA	.dae	XML-based format for 3D assets

File Size and Content Limits¶

The following limits apply to files processed by the Wickson API:

Media Type	Maximum Size	Content Limitations
Documents	20MB	128,000 tokens maximum
Images	20MB	Extremely large resolutions may be downsampled
Video	100MB	10 minutes maximum duration
Audio	50MB	20 minutes maximum duration
3D Models	50MB	Very complex models may be simplified

Note: These limits may be adjusted during the beta phase as we optimize our processing pipelines. For the most current information, check the API status endpoint.

Media Type Processing Capabilities¶

Document Processing¶

Documents undergo comprehensive processing to extract:

Full text content with preservation of structure
Semantic understanding of topics, themes, and concepts
Entity recognition for people, organizations, locations, and more
Quality analysis evaluating clarity, completeness, and relevance
Structural analysis identifying sections, headings, and functional parts
Relationship mapping connecting concepts within the document

Particularly effective for:

Research papers
Technical documentation
Legal documents
Reports and analyses
Articles and publications

Image Processing¶

Images are analyzed to understand:

Visual content identification of objects, scenes, and elements
Text extraction from embedded text in images
Scene comprehension understanding context and relationships
Compositional analysis examining layout and design elements
Quality assessment evaluating clarity and visual characteristics
Aesthetic evaluation assessing visual impact and style

Particularly effective for:

Photographs
Diagrams and charts
Screenshots
Design assets
Product imagery

Video Processing¶

Video content undergoes multi-modal analysis:

Visual scene analysis frame-by-frame understanding
Audio transcription conversion of speech to searchable text
Timeline comprehension understanding sequence and narrative
Key frame identification detecting significant visual moments
Entity and action recognition identifying who and what appears
Spoken content analysis understanding discussion topics

Particularly effective for:

Presentations and talks
Instructional content
Interviews
Product demonstrations
Informational videos

Audio Processing¶

Audio files receive specialized processing:

Speech recognition accurate transcription of spoken content
Speaker identification distinguishing between different voices
Non-speech audio analysis understanding music, sounds, and effects
Topic recognition identifying discussion subjects
Sentiment analysis detecting emotional tone
Quality assessment evaluating clarity and listenability

Particularly effective for:

Podcasts
Interviews
Lectures and speeches
Meeting recordings
Voice notes

3D Model Processing¶

3D models undergo spatial and structural analysis:

Component identification recognizing parts and structures
Spatial relationship mapping understanding positioning and scale
Structural analysis evaluating design and engineering aspects
Material classification identifying surface properties
Purpose and function recognition understanding intended use
Quality assessment evaluating model integrity and detail

Particularly effective for:

Product designs
Architectural models
Engineering components
Character models
Scientific visualizations

Best Practices for File Preparation¶

Document Optimization¶

Use PDF for complex documents to preserve formatting and structure
Enable text layers in scanned documents for better text extraction
Include proper headings and structure for improved comprehension
Ensure reasonable font sizes (10-12pt for body text)
Minimize embedded binary content that adds size without value
Use standard fonts to ensure proper character recognition
Balance image quality and file size for embedded graphics

Image Optimization¶

Provide adequate resolution (minimum 800px on longest side recommended)
Balance compression and quality (JPEG quality 80-90% recommended)
Use PNG for text-heavy images like screenshots and diagrams
Consider aspect ratio (extremely wide or tall images may process suboptimally)
Remove unnecessary metadata to reduce file size
Avoid heavy filters or effects that might obscure content
Ensure proper lighting and contrast for better object recognition

Video Optimization¶

Use MP4 with H.264 encoding for best compatibility
Balance resolution and file size (720p or 1080p recommended)
Ensure clear audio track for better transcription
Keep videos under 5 minutes for optimal processing
Use steady camera work for better visual analysis
Reduce background noise in audio track
Consider good lighting conditions for better visual recognition

Audio Optimization¶

Use MP3 (128kbps+) or WAV for good quality and compatibility
Ensure clear speech without excessive background noise
Record at appropriate levels without clipping or distortion
Use external microphones when possible for better quality
Consider acoustic environment to minimize echo
Normalize audio levels for consistent volume
Split long recordings into logical segments

3D Model Optimization¶

Use GLB format when possible for best compatibility
Optimize polygon count while preserving important details
Include proper materials and textures for better analysis
Check for mesh errors like non-manifold geometry
Use reasonable scale relative to real-world dimensions
Include logical component naming for better identification
Reduce unnecessary complexity in areas of low importance

Troubleshooting Media Processing¶

Common Issues and Solutions¶

Documents¶

Issue	Solution
Poor text extraction	Ensure document has text layer (not just images of text)
Missing structure	Use proper headings and formatting in source document
Size limit exceeded	Compress PDF or split into multiple documents
Processing errors with specific pages	Check for unusual formatting or embedded objects

Images¶

Issue	Solution
Poor object recognition	Improve image clarity, lighting, and contrast
Text not recognized	Ensure text is clearly visible with good contrast
Scene misinterpretation	Provide clearer images with less ambiguity
Size limit exceeded	Reduce resolution or compression quality

Video¶

Issue	Solution
Poor transcription	Improve audio quality and reduce background noise
Missed visual elements	Ensure adequate lighting and camera stability
Processing timeout	Reduce video length or split into segments
Size limit exceeded	Reduce resolution or use more efficient encoding

Audio¶

Issue	Solution
Transcription errors	Improve recording quality and reduce background noise
Speaker misidentification	Ensure clear transitions between speakers
Content misinterpretation	Speak clearly and provide context
Size limit exceeded	Use more efficient encoding or split recording

3D Models¶

Issue	Solution
Component misidentification	Use logical component organization
Processing timeout	Reduce model complexity
Missing details	Ensure important features have adequate resolution
Size limit exceeded	Optimize mesh and texture resolution

Advanced Media Processing Tips¶

Batch Processing Considerations¶

Group similar media types for more efficient processing
Use consistent file formats within batches
Balance batch size (5-20 items recommended)
Consider processing order for interdependent content
Monitor batch status for large operations

Leverage cross-modal connections by uploading related content
Create logical collections for content that belongs together
Use consistent naming conventions across media types
Consider processing order (process reference documents before derived content)
Balance media type distribution for comprehensive coverage

Special Use Cases¶

Academic Research Archives¶

Prioritize document quality for research papers
Include diagrams and figures as separate image files
Use consistent collection organization by research area
Consider hierarchical naming for easy navigation

Product Catalogs¶

Include multiple image angles for each product
Pair products with detailed specification documents
Organize by product category in separate collections
Consider 3D models for complex products

Training Materials¶

Include videos with clear narration
Pair with supplementary documents
Organize in logical learning sequences
Use consistent formatting across materials

Media Types Guide¶

Introduction¶

Supported Media Categories¶

File Format Support¶

Document Formats¶

Web & Data Formats¶

E-Book & Email Formats¶

Presentation & Spreadsheet Formats¶

Image Formats¶

Video Formats¶

Audio Formats¶

3D Model Formats¶

File Size and Content Limits¶

Media Type Processing Capabilities¶

Document Processing¶

Image Processing¶

Video Processing¶

Audio Processing¶

3D Model Processing¶

Best Practices for File Preparation¶

Document Optimization¶

Image Optimization¶

Video Optimization¶

Audio Optimization¶

3D Model Optimization¶

Troubleshooting Media Processing¶

Common Issues and Solutions¶

Documents¶

Images¶

Video¶

Audio¶

3D Models¶

Advanced Media Processing Tips¶

Batch Processing Considerations¶

Multi-Modal Content Strategies¶

Special Use Cases¶

Academic Research Archives¶

Product Catalogs¶

Training Materials¶