Documentation Index
Fetch the complete documentation index at: https://docs.sterndesk.com/llms.txt
Use this file to discover all available pages before exploring further.
Upload and Extract Documents
This guide walks you through uploading documents and extracting structured data from them using Sterndesk’s Upload Collector.Prerequisites
Before you begin, ensure you have:- An API key with appropriate permissions (see Authentication)
- An existing organization and project (see the guide)
proj_xyz789 as an example.
Step 1: Create an Extraction Schema
An extraction schema defines the structure of data you want to extract from your documents. Create a schema that matches the information you’re looking for. For this example, we’ll create a simple schema to extract contact information:The
json_schema field must be a JSON-encoded string, not a nested object. See Extraction Schemas for details on schema encoding.exsc_abc123) for the next step.
Step 2: Create an Upload Collector
An Upload Collector is a collector that accepts file uploads. When you attach an extraction schema to it, documents are automatically extracted upon upload (Direct Extraction mode).upl_coll_def456) for creating uploads.
Step 3: Create an Upload
To upload files, first create an upload request specifying the files you want to upload. You must declare the exact size of each file in bytes.upload_expiration field specifies how long the upload URLs remain valid (minimum 1ms, maximum 1 hour).
Response:
Step 4: Upload Files Using Pre-signed URLs
Use the returned pre-signed URL to upload your file directly to storage. For details on how pre-signed URLs work and why Sterndesk uses direct-to-storage uploads, see Upload URLs. Upload your file using an HTTP PUT request:files array—the first pre-signed URL corresponds to the first file specification.
Step 5: Poll for Extraction Results
Once files are uploaded, Sterndesk automatically processes them if an extraction schema is attached to the collector. Poll the extractions endpoint to check the status and retrieve results. First, list uploads to get the upload ID:UPLOAD_STATUS_DIRECTLY_EXTRACTED, retrieve the extraction results:
DIRECT_UPLOAD_EXTRACTION_STATUS_STRUCTURED, the extraction_output field contains your structured data.
Deleting an Upload Collector
When you no longer need an upload collector, you can delete it.Next Steps
Extraction Schemas
Learn more about designing extraction schemas
Crawl URLs
Extract data from web pages instead of uploaded documents