El Debate and El Sol Newspaper Archives (1910–1936)
With Charles III University of Madrid (UC3M)

Back

Challenge

Historic Spanish newspapers printed over a quarter of a century presented problems for traditional OCR tools — including nonstandard fonts, dense column formatting, and page degradation.

Solution

Osiris-AI built a scalable workflow that:

  • Enhanced scans and detected page regions accurately
  • Applied OCR tailored to historical Spanish typefaces with ~98% accuracy
  • Parsed multicolumn layouts and stitched fragmented content like hyphenated line breaks
  • Enabled advanced search functions and OCR accuracy metrics

Impact

Over 100,000 pages of searchable newspaper text were produced, supporting large-scale historical research and word frequency analysis across decades of printed material.