Document Ingestion

How refugee documents are sourced from SharePoint and ingested into the Digital Archive pipeline.

Overview

Document ingestion is the entry point of the entire pipeline. It is responsible for detecting new documents in SharePoint, transferring them into Azure Blob Storage, and triggering downstream processing.

Steps

1. Source — SharePoint

UNRWA stores refugee documents (Master Cards, Index Cards, Fact Sheets) in SharePoint. This is the authoritative source from which all documents originate.

2. Trigger — Power Automate

A Power Automate flow monitors SharePoint for new or unprocessed files. When a qualifying file is detected, the automation reads the file and initiates the ingestion process.

3. Upload — Azure Blob Storage

The document is uploaded to Azure Blob Storage (Unprocessed- Files Container), where it is persisted and made available to the rest of the processing pipeline.

4. Queue — AI Classification Trigger

Once uploaded, a message is pushed to an Azure Queue. This event triggers the next stage in the pipeline — AI-based document classification.

5. Status Update — SharePoint

After the upload completes, Power Automate writes a status flag back to the SharePoint record marking the file as ingested. This prevents the same document from being picked up again in subsequent automation runs.

Flow Summary

SharePoint → Power Automate → Azure Blob Storage → Azure Queue → AI Classification
                     ↓
             Update ingestion status in SharePoint

Document Ingestion

On this page