Azure Document Intelligence

How Azure Document Intelligence powers document classification and field extraction across the Digital Archive pipeline.

Overview

Azure Document Intelligence is the AI backbone of the pipeline. It is responsible for two distinct tasks: classifying documents by type, and extracting structured data from classified documents. Each task uses a separately trained custom model.

Models

Classification Model

Used by the File Splitter & Classifier function. Given a document image, the model identifies the document type:

Master Card
Index Card (Comming soon)
Red Cross Card
Unknown / Unclassified

The classification result determines which Blob container the document is routed to.

Extraction Models

Used by the File Data Extraction Service. Each document type has its own trained extraction model to handle the distinct layout and fields of that document:

Document Type	Extracted Fields (examples)
Master Card	Ex-code, registration code, family members
Red Cross Card	Ex-code, registration code, family members
Index Card	(Not yet implemented)

Each model is trained on labelled samples specific to its document type, ensuring accurate field extraction despite variation in scan quality and document age.

Azure Document Intelligence

Overview

Models

Classification Model

Extraction Models

Related

On this page