An extraction schema defines the structure of data you want to extract from your documents. Create a schema that matches the information you’re looking for.For this example, we’ll create a simple schema to extract contact information:
Copy
curl -X POST https://api.eu.sterndesk.com/r/extraction-schemas \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "project_id": "proj_xyz789", "name": "Contact Extraction", "json_schema": "{\"type\":\"object\",\"properties\":{\"full_name\":{\"type\":\"string\",\"description\":\"The full name of the person\"},\"email\":{\"type\":\"string\",\"description\":\"Email address\"},\"phone\":{\"type\":\"string\",\"description\":\"Phone number\"},\"company\":{\"type\":\"string\",\"description\":\"Company or organization name\"}},\"required\":[\"full_name\"]}" }'
The json_schema field must be a JSON-encoded string, not a nested object. See Extraction Schemas for details on schema encoding.
An Upload Collector is a collector that accepts file uploads. When you attach an extraction schema to it, documents are automatically extracted upon upload (Direct Extraction mode).
To enable long-term storage of uploaded files, add "bundle_enabled": true to your request. This stores the uploaded data in S3 as a bundle that can be downloaded later. See Bundled Extraction for details.
Use the returned pre-signed URL to upload your file directly to storage. For details on how pre-signed URLs work and why Sterndesk uses direct-to-storage uploads, see Upload URLs.Upload your file using an HTTP PUT request:
Copy
curl -X PUT "PRE_SIGNED_URL" \ --data-binary @/path/to/your/document.pdf
The file size must exactly match the size_bytes value you declared when creating the upload. A size mismatch will cause the upload to fail with a 403 error.
If you’re uploading multiple files, upload them in the same order as they were specified in the files array—the first pre-signed URL corresponds to the first file specification.
Once files are uploaded, Sterndesk automatically processes them if an extraction schema is attached to the collector. Poll the extractions endpoint to check the status and retrieve results.First, list uploads to get the upload ID: