Talk:Technical challenges, opportunities, goals, strategies
From dml_wiki
Petr Sojka's panel comments (from experience on DML-CZ and EuDML projects):
Illustrated on a Alice and Bob's communication via scientific papers in Aristotle-designed form fashion.
Alice's head->linearized in natural language -> typesetting to paper form -> paper delivery to Bob -> character recognition -> word recognition -> sentence recognition -> semantic/meaning -> Bob's head
And now we have thousands of A's and B's all around the world communicating via scientific papers.
- challenges:
- from reference database MR, Zbl to fulltext handling (as in PubMed Central with 2.4 Mio fulltexts)
- conversion from low-level representation (bitmaps, born-digital PDFs) or LaTeX to semantic one (richly marked fulltexts), e.g. on morphology, syntax, semantics or even pragmatics level.
- services using this representation
- math OCR
- acquisition of tagged data from mathematicians :-)