Azure Functions

The three Azure Functions that form the core processing backbone of the Digital Archive pipeline.

Overview

The pipeline's processing logic is implemented as three independently deployed Azure Functions (written in .NET). Each function is queue-triggered, stateless, and scales automatically based on the volume of messages in its queue.

Functions

File Splitter & Classifier

Trigger: document-split-classification-queue

Reads unprocessed documents from Blob Storage, splits multi-page PDFs into individual units, and runs each through the AI classification model to determine document type. Routes the output to the appropriate type-specific container and pushes a message to document-extraction-queue.

File Data Extraction Service

Trigger: document-extraction-queue

Fetches classified documents from their type-specific Blob containers and sends them to the corresponding Azure Document Intelligence model. Persists the extracted structured data to Azure SQL and pushes a message to document-cleansing-queue.

Data Cleansing Service

Trigger: document-cleansing-queue

Applies business rules to the extracted data stored in Azure SQL. Executes stored procedures to normalize and clean each field (e.g. country code expansion, special character removal), then writes the cleaned values back to the database.

Why Azure Functions

Each function runs independently from the others. This means:

A spike in classification volume does not affect extraction throughput
Each function can be deployed, scaled, and monitored separately
Failures in one stage are isolated and do not cascade upstream

Azure Queue Storage — the queues that trigger each function
Document Classification
Data Extraction
Data Cleansing