Document Workflow
HappyPathology processes images and extracts structured medical data from them.
The following file types may be uploaded:
- PDF:
application/pdf - JPEG:
image/jpeg - PNG:
image/png
(PDF only) HappyPathology is able to detect if a pdf document contains multiple cases. This feature requires the pdf document to include a specific HappyPathology separator page between each case. Please contact us to get a printable copy of the separator page.
Summary
Section titled “Summary”The workflow has four steps:
- Create a Source — register the files you intend to upload. This will create a source and return a list of signed urls. Sources represent one or more files that will be processed.
- Upload the Files — PUT each file directly to Google Cloud Storage using the source’s signed urls from Step 1.
- Poll the Source — wait for the source’s contents to be ready for processing. When ready, this will return a list of case IDs.
- Retrieve Case Data — poll each case id, until extraction is complete, and read the structured results.
Step 1: Create a Source
Section titled “Step 1: Create a Source”Make a POST request to /v1/source with the list of files you want to upload.
The request body should look like this:
{ "files": [ { "content_type": "application/pdf", "file_name": "PatientCases.pdf" } ]}Example usage using curl:
RESPONSE=$(curl -s -X POST https://api.happypathology.com/v1/source \ -H "Authorization: Bearer $YOUR_SIGNED_TOKEN" \ -H "Content-Type: application/json" \ -d '{"files":[{"content_type":"application/pdf","file_name":"PatientCases.pdf"}]}')
# Extract the source ID and first upload URLSOURCE_ID=$(echo "$RESPONSE" | jq -r '.results.id')UPLOAD_URL=$(echo "$RESPONSE" | jq -r '.results.upload_urls[0].url')
echo "Source ID: $SOURCE_ID"echo "Upload URL: $UPLOAD_URL"The response contains two things you need to hold on to:
results.id— the source ID, used in Steps 3 and 4results.upload_urls— one signed URL per file, used in Step 2
{ "status": 201, "results": { "id": "01KJDHXSC5B768KG1Q7BM54K4E", "upload_urls": [ { "url": "https://storage.googleapis.com/...", "object_name": "images/01KJDHXSC5B768KG1Q7BM54K4E/SOURCE_FILES/01KJDHXSC5B768KG1Q7BM54K4E_1", "file_name": "PatientCases.pdf" } ] }, "debug_info": { "delta": "102.707189ms", "version": "happy_api.549.main.02ec7cd" }}Step 2: Upload the files
Section titled “Step 2: Upload the files”For each entry in upload_urls, PUT the corresponding file directly to Google Cloud Storage using the signed URL. This request goes to GCS — not to the HappyPathology API — so no Authorization header is needed.
Example usage using curl, if you captured $UPLOAD_URL in Step 1:
curl -X PUT "$UPLOAD_URL" \ -H "Content-Type: application/pdf" \ --data-binary @PatientCases.pdfExample usage using TypeScript:
const uploadFileToGCS = async ( signedUrl: string, file: File, contentType: string): Promise<void> => { const response = await fetch(signedUrl, { method: "PUT", headers: { "Content-Type": contentType }, body: file, }); if (!response.ok) { throw new Error(`GCS upload failed for ${file.name}: ${response.status}`); }};A 200 response means the upload succeeded. If you receive any other status, retry the upload — use the same signed URL as long as it has not expired.
Step 3: Poll the source for case IDs
Section titled “Step 3: Poll the source for case IDs”After uploading, poll GET /v1/source/{SOURCE_ID} periodically.
The response body will have a field named results.status.
Keep polling until results.status is complete or failed.
The status field moves through these values:
| Status | Meaning |
|---|---|
pending_upload | Waiting for files to arrive in GCS |
processing | Files received, extraction in progress |
complete | All cases extracted, case_ids is populated |
failed | Processing failed, you need to start over |
Example usage using curl:
curl https://api.happypathology.com/v1/source/$SOURCE_ID \ -H "Authorization: Bearer $YOUR_SIGNED_TOKEN"Example responses
Section titled “Example responses”pending_upload — no files have been received yet:
{ "status": 200, "results": { "id": "01KJDHXSC5B768KG1Q7BM54K4E", "status": "pending_upload", "expected_file_count": 1, "uploaded_file_count": 0, "case_ids": null }}processing — file received, cases being extracted:
{ "status": 200, "results": { "id": "01KJDHXSC5B768KG1Q7BM54K4E", "status": "processing", "expected_file_count": 1, "uploaded_file_count": 1, "case_ids": null }}complete — extraction finished, case_ids is ready:
{ "status": 200, "results": { "id": "01KJDHXSC5B768KG1Q7BM54K4E", "status": "complete", "expected_file_count": 1, "uploaded_file_count": 1, "case_ids": [ "01KJDHYF3GR99Y7CDGC27K1EGP", "01KJDHYF3WD12JXYC26BYQKNY8", "01KJDHYF48C7F1QKS4YE3Y2QV0" ] }}failed - there was an internal error and processing failed. In this case, you need to start from step 1 again.
{ "status": 200, "results": { "id": "01KJDHXSC5B768KG1Q7BM54K4E", "status": "failed", "expected_file_count": 1, "uploaded_file_count": 1, "case_ids": null },}Once status is "complete", store the case IDs and move on to Step 4:
Example usage using curl:
SOURCE_RESPONSE=$(curl -s https://api.happypathology.com/v1/source/$SOURCE_ID \ -H "Authorization: Bearer $YOUR_SIGNED_TOKEN")
# Extract all case IDs as a JSON arrayCASE_IDS=$(echo "$SOURCE_RESPONSE" | jq -r '.results.case_ids')
# Or extract a single case ID by indexCASE_ID=$(echo "$SOURCE_RESPONSE" | jq -r '.results.case_ids[0]')Step 4: Retrieve extracted case data
Section titled “Step 4: Retrieve extracted case data”For each case ID, poll GET /v1/patient_case/{CASE_ID}/extract until the http response status is 200.
While the case is being processed, the API returns http status 204 No Content with an empty body.
Keep polling until you receive a http status 200 OK.
Example usage using curl:
curl https://api.happypathology.com/v1/patient_case/$CASE_ID/extract \ -H "Authorization: Bearer $YOUR_SIGNED_TOKEN"When ready, the response contains the structured medical data under results.medical_data:
{ "status": 200, "results": { "status": "complete", "id": "01KJDHYF3GR99Y7CDGC27K1EGP", "source_id": "01KJDHXSC5B768KG1Q7BM54K4E", "case_name": "Patient 8239534 Ali Moeeny DOB:7/20/1978", "medical_data": { "01KJE278B2856XBQTF2ED1TSH2": { "patient_first_name": "Ali", "patient_last_name": "Moeeny", "patient_mrn": "123456", "patient_dob": "1/2/2026", "wbc_count": { "value": 10.14, "measurement_unit": "K/uL", "range": { "min": 4, "max": 11 } }, "hemoglobin": { "value": 14.6, "measurement_unit": "g/dL", "range": { "min": 13.5, "max": 17.5 } } }, "01KJE2A53GH8T2B9KW43NR4E4V": { "patient_first_name": "Ali", "patient_last_name": "Moeeny", "patient_mrn": "123456", "patient_dob": "3/4/2026", "wbc_count": { "value": 7.12, "measurement_unit": "K/uL", "range": { "min": 4, "max": 11 } } } }, }, "debug_info": { "delta": "1.007220824s", "version": "happy_api.549.main.02ec7cd" }}Extracted Data (medical_data)
Section titled “Extracted Data (medical_data)”When HappyPathology processes a case’s files, it organizes the pages into distinct documents. For example a document can be:
- a multipage cbc lab report from Dec 1, 2021.
- a multipage cbc lab report from Feb 2, 2026.
- a packet that contains patient medical history.
- an order form sent to the lab.
Each document is processed and HappyPathology returns the structured data under medical_data.
Best Practices
Section titled “Best Practices”All API calls are subject to rate limits and should be polled with appropriate backoff strategies.
The API will return a 429 Too Many Requests status code if you exceed the rate limit.
We recommend limiting number of API calls to less than 10 per second.