UNRWA Digital archive
System overview

System Components

System components with their role of existing and responsibilities

SharePoint

Purpose : Serves as the source repository for scanned refugee documents awaiting processing.

Responsibilities :

  • Store scanned refugee documents
  • Organize documents into configured folders
  • Maintain document metadata and processing status

Dependencies :

  • None

Related Components :

  • Power Automate

Power Automate

Purpose : Coordinates document ingestion from SharePoint into the Digital Archive processing pipeline.

** Responsibilities** :

  • Monitor configured SharePoint folders
  • Process documents in batches
  • Upload files to Azure Blob Storage
  • Update document processing status in SharePoint
  • Publish messages to Azure Queue Storage

Inputs :

  • SharePoint folder location

Output :

  • Queue the document Splitting & Classification events
  • Update the migrated files status to Processed in sharepoint
  • Documents uploaded to Blob Storage

Dependencies :

  • SharePoint
  • Azure Blob Storage
  • Azure Queue Storage

Related Components :

  • FileSplitterAndClassifierFunction

Azure Blob Storage

Purpose : Provides centralized storage for documents throughout their processing lifecycle.

** Responsibilities** :

  • Store raw uploaded documents
  • Store split documents
  • Store classified documents
  • Maintain document state across processing phases

Inputs :

  • Documents from Power Automate
  • Split files from Azure Functions

Output :

  • Documents for downstream processing

Dependencies :

  • None

Related Components :

  • Power Automate
  • FileSplitterAndClassifierFunction
  • CardExtractionFunction
  • Power Apps

Azure Queue Storage

Purpose : Enables asynchronous communication between processing stages.

Responsibilities :

  • Decouple services
  • Trigger Azure Functions
  • Handle retries and poison messages
  • Enable asynchronous processing

Inputs :

  • Messages from producers

Output :

  • Events consumed by Azure Functions

Dependencies :

  • None

Related Components :

  • Power Automate
  • FileSplitterAndClassifierFunction
  • CardExtractionFunction
  • DataCleansingFunction

FileSplitterAndClassifierFunction

Purpose : Processes uploaded documents by splitting, classifying, and routing them for data extraction.

Responsibilities :

  • Read uploaded documents
  • Split PDF files
  • Classify documents
  • Route files to appropriate Azure Storage containers
  • Persist file metadata
  • Publish extraction events

Inputs :

  • Messages from document-split-classification-queue
  • Documents from unprocessed-raw-files container

Outputs :

  • Split files in Blob Storage
  • Classified documents
  • Database metadata records
  • Messages in document-extraction-queue

Dependencies :

  • Blob Storage
  • Azure Queue Storage
  • Azure Document Intelligence
  • Azure SQL Database

Related Components :

  • CardExtractionFunction
  • Azure Document Intelligence

CardExtractionFunction

Purpose : Extracts structured data from classified documents using AI models.

Responsibilities :

  • Retrieve classified documents
  • Select extraction workflow based on card type
  • Extract structured data
  • Persist extracted data to Azure SQL Database
  • Publish cleansing events

Inputs :

  • Queue messages from document-split-classification-queue
  • Classified documents from Blob Storage container redcrosscard / redmastercard

Outputs :

  • Documents data persisted in the database
  • Messages in document-data-cleansing-queue

Dependencies :

  • Blob Storage
  • Azure Queue Storage
  • Azure Document Intelligence
  • Azure SQL Database

Related Components :

  • DataCleansingFunction

DataCleansingFunction

Purpose : Applies business rules to normalize and validate extracted data.

Responsibilities :

  • Execute SQL stored procedures
  • Normalize extracted values
  • Apply business validation rules
  • Prepare data for search indexing and human review

Inputs :

  • Messages from document-data-cleansing-queue
  • Extracted records in Azure SQL Database

Outputs :

  • Cleansed database records
  • Data ready for indexing
  • Data ready for manual review

Dependencies :

  • Azure SQL Database
  • Business rules documentation

Related Components :

Power Apps

Purpose : Provides the user interface for document search, review, and correction workflows.

Responsibilities :

  • Search processed records
  • Review extracted data
  • Correct extraction errors
  • Display document details
  • Consume Azure AI Search results

Outputs :

  • Search results
  • User corrections
  • Manual review actions

Dependencies :

  • Azure AI Search
  • Azure SQL Database (TODO: does the app connects directly to the database)
  • Azure blob storage

TODO: add AI Search, Azure document Intelligence, Azure SQL Database

On this page