Document Ingestion
How refugee documents are sourced from SharePoint and ingested into the Digital Archive pipeline.
Overview
Document ingestion is the entry point of the entire pipeline. It is responsible for detecting new documents in SharePoint, transferring them into Azure Blob Storage, and triggering downstream processing.
Steps
1. Source — SharePoint
UNRWA stores refugee documents (Master Cards, Index Cards, Fact Sheets) in SharePoint. This is the authoritative source from which all documents originate.
2. Trigger — Power Automate
A Power Automate flow monitors SharePoint for new or unprocessed files. When a qualifying file is detected, the automation reads the file and initiates the ingestion process.
3. Upload — Azure Blob Storage
The document is uploaded to Azure Blob Storage (Unprocessed- Files Container), where it is persisted and made available to the rest of the processing pipeline.
4. Queue — AI Classification Trigger
Once uploaded, a message is pushed to an Azure Queue. This event triggers the next stage in the pipeline — AI-based document classification.
5. Status Update — SharePoint
After the upload completes, Power Automate writes a status flag back to the SharePoint record marking the file as ingested. This prevents the same document from being picked up again in subsequent automation runs.
Flow Summary
SharePoint → Power Automate → Azure Blob Storage → Azure Queue → AI Classification
↓
Update ingestion status in SharePoint