Talk:Technical challenges, opportunities, goals, strategies

From dml_wiki
Jump to: navigation, search

Petr Sojka's panel comments (from experience on DML-CZ and EuDML projects):

Illustrated on a Alice and Bob's communication via scientific papers in Aristotle-designed form fashion.

Alice's head->linearized in natural language -> typesetting to paper form -> paper delivery to Bob -> character recognition -> word recognition -> sentence recognition -> semantic/meaning -> Bob's head

And now we have thousands of A's and B's all around the world communicating via scientific papers.

  • challenges:
    • from reference database MR, Zbl to fulltext handling (as in PubMed Central with 2.4 Mio fulltexts)
    • conversion from low-level representation (bitmaps, born-digital PDFs) or LaTeX to semantic one (richly marked fulltexts), e.g. on morphology, syntax, semantics or even pragmatics level.
    • services using this representation
    • math OCR
    • acquisition of tagged data from mathematicians :-)
Personal tools