Azure Functions
The three Azure Functions that form the core processing backbone of the Digital Archive pipeline.
Overview
The pipeline's processing logic is implemented as three independently deployed Azure Functions (written in .NET). Each function is queue-triggered, stateless, and scales automatically based on the volume of messages in its queue.
Functions
File Splitter & Classifier
Trigger: document-split-classification-queue
Reads unprocessed documents from Blob Storage, splits multi-page PDFs into individual units, and runs each through the AI classification model to determine document type. Routes the output to the appropriate type-specific container and pushes a message to document-extraction-queue.
File Data Extraction Service
Trigger: document-extraction-queue
Fetches classified documents from their type-specific Blob containers and sends them to the corresponding Azure Document Intelligence model. Persists the extracted structured data to Azure SQL and pushes a message to document-cleansing-queue.
Data Cleansing Service
Trigger: document-cleansing-queue
Applies business rules to the extracted data stored in Azure SQL. Executes stored procedures to normalize and clean each field (e.g. country code expansion, special character removal), then writes the cleaned values back to the database.
Why Azure Functions
Each function runs independently from the others. This means:
- A spike in classification volume does not affect extraction throughput
- Each function can be deployed, scaled, and monitored separately
- Failures in one stage are isolated and do not cascade upstream
Related
- Azure Queue Storage — the queues that trigger each function
- Document Classification
- Data Extraction
- Data Cleansing