Document Workflow

HappyPathology processes images and extracts structured medical data from them.

The following file types may be uploaded:

PDF: application/pdf
JPEG: image/jpeg
PNG: image/png

(PDF only) HappyPathology is able to detect if a pdf document contains multiple cases. This feature requires the pdf document to include a specific HappyPathology separator page between each case. Please contact us to get a printable copy of the separator page.

Summary

The workflow has four steps:

Create a Source — register the files you intend to upload. This will create a source and return a list of signed urls. Sources represent one or more files that will be processed.
Upload the Files — PUT each file directly to Google Cloud Storage using the source’s signed urls from Step 1.
Poll the Source — wait for the source’s contents to be ready for processing. When ready, this will return a list of case IDs.
Retrieve Case Data — poll each case id, until extraction is complete, and read the structured results.

Step 1: Create a Source

Make a POST request to /v1/source with the list of files you want to upload.

The request body should look like this:

{
  "files": [
    {
      "content_type": "application/pdf",
      "file_name": "PatientCases.pdf"
    }
  ]
}

Example usage using curl:

RESPONSE=$(curl -s -X POST https://api.happypathology.com/v1/source \
  -H "Authorization: Bearer $YOUR_SIGNED_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"files":[{"content_type":"application/pdf","file_name":"PatientCases.pdf"}]}')

# Extract the source ID and first upload URL
SOURCE_ID=$(echo "$RESPONSE" | jq -r '.results.id')
UPLOAD_URL=$(echo "$RESPONSE" | jq -r '.results.upload_urls[0].url')

echo "Source ID: $SOURCE_ID"
echo "Upload URL: $UPLOAD_URL"

The response contains two things you need to hold on to:

results.id — the source ID, used in Steps 3 and 4
results.upload_urls — one signed URL per file, used in Step 2

{
    "status": 201,
    "results": {
        "id": "01KJDHXSC5B768KG1Q7BM54K4E",
        "upload_urls": [
            {
                "url": "https://storage.googleapis.com/...",
                "object_name": "images/01KJDHXSC5B768KG1Q7BM54K4E/SOURCE_FILES/01KJDHXSC5B768KG1Q7BM54K4E_1",
                "file_name": "PatientCases.pdf"
            }
        ]
    },
    "debug_info": {
        "delta": "102.707189ms",
        "version": "happy_api.549.main.02ec7cd"
    }
}

Step 2: Upload the files

For each entry in upload_urls, PUT the corresponding file directly to Google Cloud Storage using the signed URL. This request goes to GCS — not to the HappyPathology API — so no Authorization header is needed.

Example usage using curl, if you captured $UPLOAD_URL in Step 1:

curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: application/pdf" \
  --data-binary @PatientCases.pdf

Example usage using TypeScript:

const uploadFileToGCS = async (
  signedUrl: string,
  file: File,
  contentType: string
): Promise<void> => {
  const response = await fetch(signedUrl, {
    method: "PUT",
    headers: { "Content-Type": contentType },
    body: file,
  });
  if (!response.ok) {
    throw new Error(`GCS upload failed for ${file.name}: ${response.status}`);
  }
};

A 200 response means the upload succeeded. If you receive any other status, retry the upload — use the same signed URL as long as it has not expired.

Step 3: Poll the source for case IDs

After uploading, poll GET /v1/source/{SOURCE_ID} periodically.

The response body will have a field named results.status. Keep polling until results.status is complete or failed.

The status field moves through these values:

Status	Meaning
`pending_upload`	Waiting for files to arrive in GCS
`processing`	Files received, extraction in progress
`complete`	All cases extracted, `case_ids` is populated
`failed`	Processing failed, you need to start over

Example usage using curl:

curl https://api.happypathology.com/v1/source/$SOURCE_ID \
  -H "Authorization: Bearer $YOUR_SIGNED_TOKEN"

Example responses

pending_upload — no files have been received yet:

{
    "status": 200,
    "results": {
        "id": "01KJDHXSC5B768KG1Q7BM54K4E",
        "status": "pending_upload",
        "expected_file_count": 1,
        "uploaded_file_count": 0,
        "case_ids": null
    }
}

processing — file received, cases being extracted:

{
    "status": 200,
    "results": {
        "id": "01KJDHXSC5B768KG1Q7BM54K4E",
        "status": "processing",
        "expected_file_count": 1,
        "uploaded_file_count": 1,
        "case_ids": null
    }
}

complete — extraction finished, case_ids is ready:

{
    "status": 200,
    "results": {
        "id": "01KJDHXSC5B768KG1Q7BM54K4E",
        "status": "complete",
        "expected_file_count": 1,
        "uploaded_file_count": 1,
        "case_ids": [
            "01KJDHYF3GR99Y7CDGC27K1EGP",
            "01KJDHYF3WD12JXYC26BYQKNY8",
            "01KJDHYF48C7F1QKS4YE3Y2QV0"
        ]
    }
}

failed - there was an internal error and processing failed. In this case, you need to start from step 1 again.

{
    "status": 200,
    "results": {
        "id": "01KJDHXSC5B768KG1Q7BM54K4E",
        "status": "failed",
        "expected_file_count": 1,
        "uploaded_file_count": 1,
        "case_ids": null
    },
}

Once status is "complete", store the case IDs and move on to Step 4:

Example usage using curl:

SOURCE_RESPONSE=$(curl -s https://api.happypathology.com/v1/source/$SOURCE_ID \
  -H "Authorization: Bearer $YOUR_SIGNED_TOKEN")

# Extract all case IDs as a JSON array
CASE_IDS=$(echo "$SOURCE_RESPONSE" | jq -r '.results.case_ids')

# Or extract a single case ID by index
CASE_ID=$(echo "$SOURCE_RESPONSE" | jq -r '.results.case_ids[0]')

Step 4: Retrieve extracted case data

For each case ID, poll GET /v1/patient_case/{CASE_ID}/extract until the http response status is 200.

While the case is being processed, the API returns http status 204 No Content with an empty body. Keep polling until you receive a http status 200 OK.

Example usage using curl:

curl https://api.happypathology.com/v1/patient_case/$CASE_ID/extract \
  -H "Authorization: Bearer $YOUR_SIGNED_TOKEN"

When ready, the response contains the structured medical data under results.medical_data:

{
    "status": 200,
    "results": {
        "status": "complete",
        "id": "01KJDHYF3GR99Y7CDGC27K1EGP",
        "source_id": "01KJDHXSC5B768KG1Q7BM54K4E",
        "case_name": "Patient 8239534 Ali Moeeny DOB:7/20/1978",
        "medical_data": {
          "01KJE278B2856XBQTF2ED1TSH2":  {
                "patient_first_name": "Ali",
                "patient_last_name": "Moeeny",
                "patient_mrn": "123456",
                "patient_dob": "1/2/2026",
                "wbc_count": {
                    "value": 10.14,
                    "measurement_unit": "K/uL",
                    "range": { "min": 4, "max": 11 }
                },
                "hemoglobin": {
                    "value": 14.6,
                    "measurement_unit": "g/dL",
                    "range": { "min": 13.5, "max": 17.5 }
                }
            },
            "01KJE2A53GH8T2B9KW43NR4E4V":  {
                "patient_first_name": "Ali",
                "patient_last_name": "Moeeny",
                "patient_mrn": "123456",
                "patient_dob": "3/4/2026",
                "wbc_count": {
                    "value": 7.12,
                    "measurement_unit": "K/uL",
                    "range": { "min": 4, "max": 11 }
                }
            }
        },
    },
    "debug_info": {
        "delta": "1.007220824s",
        "version": "happy_api.549.main.02ec7cd"
    }
}

Extracted Data (medical_data)

When HappyPathology processes a case’s files, it organizes the pages into distinct documents. For example a document can be:

a multipage cbc lab report from Dec 1, 2021.
a multipage cbc lab report from Feb 2, 2026.
a packet that contains patient medical history.
an order form sent to the lab.

Each document is processed and HappyPathology returns the structured data under medical_data.

Best Practices

All API calls are subject to rate limits and should be polled with appropriate backoff strategies. The API will return a 429 Too Many Requests status code if you exceed the rate limit.

We recommend limiting number of API calls to less than 10 per second.