UNRWA Digital archive
Main workflows

Data Extraction

How classified documents are processed by Azure Document Intelligence to extract structured data into the database.

Overview

Data extraction is the third stage of the pipeline. It is handled by the File Data Extraction Service — an Azure Function that picks up classified documents, runs them through an AI model to extract structured fields, persists the results to the database, and hands off to the next stage.

Steps

1. Queue Trigger — document-extraction-queue

The function is triggered by a message on the document-extraction-queue, pushed at the end of the classification stage once a document has been classified and stored in its type-specific container.

2. Fetch — Type-Specific Blob Container

The service reads the queue message to identify the document, then fetches the file from the appropriate Blob Storage container (e.g. master-cards, index-cards, red-cross-cards).

3. Extract — Azure Document Intelligence

The document is sent to the corresponding Azure Document Intelligence model. Each document type has its own trained model to ensure accurate field extraction. The model returns structured data — ex-code, registration code, family memebrs, and other fields specific to the document type.

4. Persist — Azure SQL Database

The extracted data is written to Azure SQL. Each document type maps to its own dedicated table:

Document TypeTables
Red Cross Cardred-cross-card, red-cross-family-members
Master Card (Front)front-master-card, front-master-card-family-members

Each record is linked to its source document for traceability.

5. Advance Queue — document-cleansing-queue

Once the data is persisted, a new event is pushed to the document-cleansing-queue to trigger the next stage — data cleansing and validation.

Flow Summary

document-extraction-queue

Fetch document from Blob Storage

Send to Azure Document Intelligence (type-specific model)

Persist extracted data → Azure SQL

Push to document-cleansing-queue

On this page