Thierry Bouche: Position Statement

From dml_wiki
Jump to: navigation, search

In the EuDML project, we support the vision of a Digital Mathematics Library that should assemble as much as possible of the digital mathematical corpus in order to

  • help preserving it over the long term,
  • make it available online, possibly after some embargo period (eventual open access),
  • in the form of an authoritative and enduring digital collection,
  • growing continuously with publisher supplied new content,
  • augmented with sophisticated search interfaces and interoperability services,
  • and developed and curated by a network of institutions.

This means that we want to build a network of library services, acting as memory institutions where the digital items would be physically archived. Each local institution would take care of selecting, acquiring (through digitization or mere copy of born digital files), developing, maintaining, cataloguing and indexing, as well as preserving its own collections according to clear scientific and technical policies.

The network of institutions as a whole makes it possible to assemble a global, virtual reference library providing the trusted background of mathematical knowledge needed in any scientific and technological development through a one-stop gateway to the distributed content.

This vision has been implemented to some extent by the EuDML project summing up 10 local libraries and few publishers (one medium-sized commercial publisher is directly involved in the project) with the help of technology partners.

The last international effort before EuDML was the NSF planning grant awarded to Cornell university library concluded in 2004. Six working groups were formed and a report published, addressing the following issues: economic model, archiving, metadata, content, rights, and technical standards. Out of those, economic model, archiving, metadata, content, rights (5 among 6!) have proven to be serious impeding factors in the development of any DML effort since. I claim that we can just forget 3 of them and that the remaining two are just benign inhibitors nowadays.

  1. Discussing economic models has been a dead-end since a decade, so I think we should go ahead without an economic model. This is only mildly a joke: some research organizations have invested a lot in making high quality local DMLs, I think they would support any further work helping make this integrated into a worldwide facility, implying the highest impact out of their investment. The extra work to achieve this is relatively small, and moreover can be a good sand box for computer science research, so these science interest groups should be able to support and sustain a global system for their own sake.
  2. Archiving (long term preservation and access) is something that is probably much better understood now than at that time, and should probably be solved locally, at rather minimal cost.
  3. (Copy)Rights are critical inhibiting factors for many digital libraries operations.However, the landscape has moved with new laws about orphan works, e.g. Let's build a good and convincing system on top of the corpus that is now frely available (this is quite large already). We'll talk to those copyright owners that are not willing to join the effort later on, when we have built the infrastructure needed for doing science and scientists become addicted to it.
  4. Content selection is also a dead-end as it is meaningless from a memory institution point of view to discard content, for instance non math papers from a half-math journal. A collection will be called mathematical, thus eligible to the DML, if it contains enough material that obviously belong to the DML: Details on content selection is left to local institutions.
  5. Metadata formats, as well as Technical standards are mature enough now.The EuDML project can be considered successful in the area of metadata. Moreover, it didn't reinvent the wheel but tried to follow experience from PubMed Central, JSTOR and Portico. So in turn, it seems the DML is waiting a triggering event to go ahead and take advantage of the work already done at numerous places in digitization and content aggregation, and by EuDML in standardization and interoperability. Of course, everybody is thinking that the most effective way of having people cooperate is to fund cooperation. I'd say it's better to have many cooperation challenges fundable, so what we should build is a very open system that would interoperate with all the content eligible now (i.e. rights, economic model, technological standards, etc. compatible with such a system), and make it searchable and linkable right away so that the main basic feature is already there. Moreover the system should be open enough so that any content or technology provider could easily submit new contributions, and test new and alternative services very easily.

My bet is that making the DML a testbed for new MKM (mathematical knowledge management) and DL (digital library) technology would make it sustainable by itself, as each tech and content team investing some resources in enhancing the system would be bound to invest the extra percent to maintain the system and its global usefulness. Some MKM and DL challenges relevant to the DML are

  • Design and implement a very open scriptable, pluggable, interoperable yet secure enough system,
  • Make this so that it is ready for some kind of community management.
  • Design a service platform based on the agregated content that allows to run unexpected services to upgrade the content and its metadata, create links, allow new searching paths and technology, yet preserves rights of content owners, if any.
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox