Digital Archive System - Overview
The business purpose, challenges, and objectives that led to the creation of the UNRWA Digital Archive.
What It Is
The Digital Archive System automates the process of scanning, classifying, extracting data from, and indexing UNRWA refugee documents. Instead of manually searching through paper files, UNRWA staff can now search for documents and extracted data through a Power Apps interface.
The system works by automatically reading documents from SharePoint, storing them in Azure Blob Storage, splitting,classifying and extracting key information using Document Intelligence, storing the results in Azure SQL, and making everything searchable.
The Problem It Solves
UNRWA have millions of refugee documents across multiple fields. Currently, finding a specific document or piece of information requires manual search, which is slow. Document extraction is done by hand. There's no good way to cross-reference records or verify information quickly.
Users
Management, Employees, Decision makers, Operations, Operators & Field Users
How It Works
Documents are scanned and saved in SharePoint by fields. Power Automate reads them in 1K chunks and uploads them to Azure Blob Storage. An Azure queue manages the processing order. this queue trigger a Azure function for splitting docuemnt and saving it in Azure Blob Sotrage then queue a classification event,Azure Document Intelligence classifies each document and extracts key fields like ex-codes, family members, and origin. The system cleans and normalizes the extracted data, stores it in Azure SQL, and makes it searchable using Azure AI search. If something goes wrong during processing, the system logs it for manual review and put it in a dead-letter queue using the convention (queue-name-poisen).
Technology Stack
The system runs on Azure and Power Platform. Documents are stored in Azure Blob Storage, queuing is handled by Azure Queue Storage, and all structured data lives in Azure SQL. Azure Document Intelligence does the document classification and data extraction. Azure AI Search powers the search functionality. Power Automate orchestrates the entire document ingestion workflow, and Power Apps provides the user interface for searching and managing documents.
Document Lifecycle
When a document enters the system, it goes through these stages: uploaded, queued for processing, being splitted, data classified, queues, data extracted, queued, data cleaned, stored in the database, indexed for search, and finally searchable/accessable in the Power Apps interface. If something fails at any stage, it's marked as an exception for manual review.
System Architecture
Users access the system through Power Apps. Power Automate reads documents from SharePoint and sends them to Azure Blob Storage. A queue manages the processing order. Azure Document Intelligence processes the documents, then the system stores results in Azure SQL and indexes them in Azure AI Search.
What Comes Next
The detailed documentation will cover system architecture, each component in detail, how data flows through the system, the database schema, deployment and operations, and known gaps or risks that need attention.