Back
Challenge
The German Army’s WW1 casualty lists contain over 31,000 printed pages of wounded and deceased soldiers. The dense formatting and blackletter typeface made OCR extremely difficult.
Solution
We developed a pipeline that:
- Segmented printed layouts
- Performed OCR with enhanced recognition of Fraktur type at ~98% accuracy
- Matched extracted names to structured GenWiki genealogical datasets
- Created alignments between OCR results and external data for error correction and OCR model finetuning
Impact
Tens of thousands of individual records were recovered and validated, supporting family history researchers and scholars of the First World War.