Skip to content

Document Workflow

HappyPathology processes images and extracts structured medical data from them.

The following file types may be uploaded:

  • PDF: application/pdf
  • JPEG: image/jpeg
  • PNG: image/png

(PDF only) HappyPathology is able to detect if a pdf document contains multiple cases. This feature requires the pdf document to include a specific HappyPathology separator page between each case. Please contact us to get a printable copy of the separator page.

The workflow has four steps:

  1. Create a Source — register the files you intend to upload. This will create a source and return a list of signed urls. Sources represent one or more files that will be processed.
  2. Upload the Files — PUT each file directly to Google Cloud Storage using the source’s signed urls from Step 1.
  3. Poll the Source — wait for the source’s contents to be ready for processing. When ready, this will return a list of case IDs.
  4. Retrieve Case Data — poll each case id, until extraction is complete, and read the structured results.

Make a POST request to /v1/source with the list of files you want to upload.

The request body should look like this:

{
"files": [
{
"content_type": "application/pdf",
"file_name": "PatientCases.pdf"
}
]
}

Example usage using curl:

Terminal window
RESPONSE=$(curl -s -X POST https://api.happypathology.com/v1/source \
-H "Authorization: Bearer $YOUR_SIGNED_TOKEN" \
-H "Content-Type: application/json" \
-d '{"files":[{"content_type":"application/pdf","file_name":"PatientCases.pdf"}]}')
# Extract the source ID and first upload URL
SOURCE_ID=$(echo "$RESPONSE" | jq -r '.results.id')
UPLOAD_URL=$(echo "$RESPONSE" | jq -r '.results.upload_urls[0].url')
echo "Source ID: $SOURCE_ID"
echo "Upload URL: $UPLOAD_URL"

The response contains two things you need to hold on to:

  • results.id — the source ID, used in Steps 3 and 4
  • results.upload_urls — one signed URL per file, used in Step 2
{
"status": 201,
"results": {
"id": "01KJDHXSC5B768KG1Q7BM54K4E",
"upload_urls": [
{
"url": "https://storage.googleapis.com/...",
"object_name": "images/01KJDHXSC5B768KG1Q7BM54K4E/SOURCE_FILES/01KJDHXSC5B768KG1Q7BM54K4E_1",
"file_name": "PatientCases.pdf"
}
]
},
"debug_info": {
"delta": "102.707189ms",
"version": "happy_api.549.main.02ec7cd"
}
}

For each entry in upload_urls, PUT the corresponding file directly to Google Cloud Storage using the signed URL. This request goes to GCS — not to the HappyPathology API — so no Authorization header is needed.

Example usage using curl, if you captured $UPLOAD_URL in Step 1:

Terminal window
curl -X PUT "$UPLOAD_URL" \
-H "Content-Type: application/pdf" \
--data-binary @PatientCases.pdf

Example usage using TypeScript:

const uploadFileToGCS = async (
signedUrl: string,
file: File,
contentType: string
): Promise<void> => {
const response = await fetch(signedUrl, {
method: "PUT",
headers: { "Content-Type": contentType },
body: file,
});
if (!response.ok) {
throw new Error(`GCS upload failed for ${file.name}: ${response.status}`);
}
};

A 200 response means the upload succeeded. If you receive any other status, retry the upload — use the same signed URL as long as it has not expired.


After uploading, poll GET /v1/source/{SOURCE_ID} periodically.

The response body will have a field named results.status. Keep polling until results.status is complete or failed.

The status field moves through these values:

StatusMeaning
pending_uploadWaiting for files to arrive in GCS
processingFiles received, extraction in progress
completeAll cases extracted, case_ids is populated
failedProcessing failed, you need to start over

Example usage using curl:

Terminal window
curl https://api.happypathology.com/v1/source/$SOURCE_ID \
-H "Authorization: Bearer $YOUR_SIGNED_TOKEN"

pending_upload — no files have been received yet:

{
"status": 200,
"results": {
"id": "01KJDHXSC5B768KG1Q7BM54K4E",
"status": "pending_upload",
"expected_file_count": 1,
"uploaded_file_count": 0,
"case_ids": null
}
}

processing — file received, cases being extracted:

{
"status": 200,
"results": {
"id": "01KJDHXSC5B768KG1Q7BM54K4E",
"status": "processing",
"expected_file_count": 1,
"uploaded_file_count": 1,
"case_ids": null
}
}

complete — extraction finished, case_ids is ready:

{
"status": 200,
"results": {
"id": "01KJDHXSC5B768KG1Q7BM54K4E",
"status": "complete",
"expected_file_count": 1,
"uploaded_file_count": 1,
"case_ids": [
"01KJDHYF3GR99Y7CDGC27K1EGP",
"01KJDHYF3WD12JXYC26BYQKNY8",
"01KJDHYF48C7F1QKS4YE3Y2QV0"
]
}
}

failed - there was an internal error and processing failed. In this case, you need to start from step 1 again.

{
"status": 200,
"results": {
"id": "01KJDHXSC5B768KG1Q7BM54K4E",
"status": "failed",
"expected_file_count": 1,
"uploaded_file_count": 1,
"case_ids": null
},
}

Once status is "complete", store the case IDs and move on to Step 4:

Example usage using curl:

Terminal window
SOURCE_RESPONSE=$(curl -s https://api.happypathology.com/v1/source/$SOURCE_ID \
-H "Authorization: Bearer $YOUR_SIGNED_TOKEN")
# Extract all case IDs as a JSON array
CASE_IDS=$(echo "$SOURCE_RESPONSE" | jq -r '.results.case_ids')
# Or extract a single case ID by index
CASE_ID=$(echo "$SOURCE_RESPONSE" | jq -r '.results.case_ids[0]')

For each case ID, poll GET /v1/patient_case/{CASE_ID}/extract until the http response status is 200.

While the case is being processed, the API returns http status 204 No Content with an empty body. Keep polling until you receive a http status 200 OK.

Example usage using curl:

Terminal window
curl https://api.happypathology.com/v1/patient_case/$CASE_ID/extract \
-H "Authorization: Bearer $YOUR_SIGNED_TOKEN"

When ready, the response contains the structured medical data under results.medical_data:

{
"status": 200,
"results": {
"status": "complete",
"id": "01KJDHYF3GR99Y7CDGC27K1EGP",
"source_id": "01KJDHXSC5B768KG1Q7BM54K4E",
"case_name": "Patient 8239534 Ali Moeeny DOB:7/20/1978",
"medical_data": {
"01KJE278B2856XBQTF2ED1TSH2": {
"patient_first_name": "Ali",
"patient_last_name": "Moeeny",
"patient_mrn": "123456",
"patient_dob": "1/2/2026",
"wbc_count": {
"value": 10.14,
"measurement_unit": "K/uL",
"range": { "min": 4, "max": 11 }
},
"hemoglobin": {
"value": 14.6,
"measurement_unit": "g/dL",
"range": { "min": 13.5, "max": 17.5 }
}
},
"01KJE2A53GH8T2B9KW43NR4E4V": {
"patient_first_name": "Ali",
"patient_last_name": "Moeeny",
"patient_mrn": "123456",
"patient_dob": "3/4/2026",
"wbc_count": {
"value": 7.12,
"measurement_unit": "K/uL",
"range": { "min": 4, "max": 11 }
}
}
},
},
"debug_info": {
"delta": "1.007220824s",
"version": "happy_api.549.main.02ec7cd"
}
}

When HappyPathology processes a case’s files, it organizes the pages into distinct documents. For example a document can be:

  • a multipage cbc lab report from Dec 1, 2021.
  • a multipage cbc lab report from Feb 2, 2026.
  • a packet that contains patient medical history.
  • an order form sent to the lab.

Each document is processed and HappyPathology returns the structured data under medical_data.

All API calls are subject to rate limits and should be polled with appropriate backoff strategies. The API will return a 429 Too Many Requests status code if you exceed the rate limit.

We recommend limiting number of API calls to less than 10 per second.