Azure Document Intelligence
How Azure Document Intelligence powers document classification and field extraction across the Digital Archive pipeline.
Overview
Azure Document Intelligence is the AI backbone of the pipeline. It is responsible for two distinct tasks: classifying documents by type, and extracting structured data from classified documents. Each task uses a separately trained custom model.
Models
Classification Model
Used by the File Splitter & Classifier function. Given a document image, the model identifies the document type:
- Master Card
- Index Card (Comming soon)
- Red Cross Card
- Unknown / Unclassified
The classification result determines which Blob container the document is routed to.
Extraction Models
Used by the File Data Extraction Service. Each document type has its own trained extraction model to handle the distinct layout and fields of that document:
| Document Type | Extracted Fields (examples) |
|---|---|
| Master Card | Ex-code, registration code, family members |
| Red Cross Card | Ex-code, registration code, family members |
| Index Card | (Not yet implemented) |
Each model is trained on labelled samples specific to its document type, ensuring accurate field extraction despite variation in scan quality and document age.