Showing posts with label data publishing. Show all posts
Showing posts with label data publishing. Show all posts

Sunday, 17 August 2014

The problem of data discovery for invasive species

At a recent conference on invasive species someone from the audience requested that there should be fewer databases, to which there was a general muttering of agreement. Can you imagine if someone had said the same thing about books? Surely only an anti-intellectual would want fewer books. However, I perfectly understand the frustrations that lead to this request. All the time there are new launches of databases and websites on alien species, they never have precisely the same remit, but they will always have overlapping interests with other databases. For a time-pressed user, who is scouring the internet for a simple answer to a simple question, there is a bewildering array of data sources. Some are highly visible, others are hidden; some are nice looking, but superficial, while others are mines of information but hard to navigate. The same thing could be said of books, but people expect more of the internet where they were promised a bright future of connectedness and interoperability.

Yet we are not going to get fewer databases any time soon, current funding models and the missions of providers restrict us. Funders want to see clear results from their investment and providers want to create something new for their effort. It is currently hard to achieve those aims if they are tied to a single product. There are other problems, too. Single products don't handle differences of opinion well and local issues related to culture and language are not well suited to a monolithic approach. Furthermore, data is best managed by the people with most interest in it. They have the most incentive to gather new data and they are the experts in their specialisms.

We all want discoverable, accurate, up-to-date information on alien species that are sustainably managed, but for all the reasons I've mentioned we can't yet have fewer databases. Nevertheless, there are many things we can do to improve the situation, which include federation, openness and standardisation.

Federation

There is duplication of effort in our invasive species databases. Taxonomic names, common names, references, observation data and specimen data are repeatedly entered and curated by each database independently. This does not have to be the case. We can federate out some of our work to providers who specialise in those data. For example, all modern scientific publications have a Digital Object Identifier (DOI). This simple, but unique identifier is the key to all the bibliographic information on a publication. DOIs are maintained by the publishing industry who will do a better job of looking after this domain of data than we will. In our databases it is only necessary to store the DOI and derive other information from the DOI resolvers. ORCIDs are another example; they are an open, self administered system for uniquely identifying a scientist. They are administered by the scientists themselves who are best placed to do the job and we potentially only need to store the ORCID in our database.
Federation has the potential to reduce costs, while at the same time improving standards, improving sustainability and helping us to concentrate on our core interest of invasion biology.
Nevertheless, if we federate some services we need to trust those services to provide the information we need, at a price we can afford and for those services to be provided for the long term and reliably. DOIs and ORCIDs are supported by the publishing industry and by large academic institutions. Other infrastructures to which we could federate responsibilities might be the Global Names architecture and the GBIF. These infrastructures need communities such as ours to justify their existence, but similarly we might benefit considerably from their domain expertise and investment.

Standards

Working within a framework of standards for data quality can be frustrating, particularly in an emerging discipline where standards often seem to be unnecessarily constraining. There is a temptation for everyone to invent their own ‘standard’, yet the advantages of standardization are numerous. We should always be looking to other disciplines to reuse and build upon their standards. Standards that are extensible can provide a flexible approach.
The ability to combine digital resources is a fine goal of standardisation, but to do it we need to understand each other’s data. Wherever possible, we need to explain and annotate our data. Using common standards is a good first step, but we also need to ensure that the metadata is kept up-to-date and accurate. Many people will have noticed the problem of data aggregators where the meaning of data can subtly change as it is transferred from one database to another. The creation of domain ontologies can clarify the meaning of terms without necessarily constraining the development of new data sources. This is a comparatively new field within computer science, but one that should be explored for invasion biology.

Openness

Even if we don't want it, we get copyright automatically and are stuck with it for many years after our death, unless we ensure that each of our works is openly licensed. You can't deny the usefulness of open resources such as Wikipedia. And yet, it is perhaps most remarkable for its success in mobilizing data providers. However, scientists are often afraid of openness, thinking that others will ‘steal’ their work and not give them sufficient credit. However, often the reverse is true. Open licensing promotes data discovery and experts can use it to promote themselves through their expertise not the data they hold. Work still needs to be done on providing traceable citations for data, but scientists already have mechanisms for doing this, such as so-called data publications. Scientists also need to become more educated about copyright, as data per se can’t be copyrighted.

To conclude, making data more accessible and discoverable is not an easy task, yet the tools and practises to do it are available to us. What is needed is a change in culture, not necessarily towards having monolithic databases, but towards sharing, openness and connectivity. It will take some investment of resources and progress might seem slow at first, but eventually we can build a global infrastructure for invasive species that satisfies our needs and we don’t necessarily need to have fewer databases to do it.

This work by Quentin Groom is licensed under a Creative Commons Attribution 3.0 Unported License.

Saturday, 14 June 2014

The Bouchout Declaration for Open Biodiversity Knowledge Management



The Bouchout Declaration for Open Biodiversity Knowledge Management

On the 12th June 2014 the Bouchout Declaration was launched at Bouchout Castle in the grounds of the Botanical Garden Meise, Belgium. The declaration aims to promote openness of biodiversity data and encourage digital access to those data. The original signatories included more than 50 institutions from all over the world. Many were influential institutions such as Kew Gardens in the UK; Berlin Botanic Garden in Germany; Naturalis in the Netherlands and the Natural History Museum, Paris.

I encourage you to sign up to the declaration and support its values, either as an institution or an individual.

Below I've given five reasons why you should sign the declaration and five Dos and Don’ts of data openness… 


Five reasons to sign the The Bouchout Declaration

  1. Good scientists show the evidence for their assertions
  2. Modelling and protecting the biosphere is impossible without large amounts of high quality data
  3. We need evidence-based, not opinion-based, policies
  4. Small amounts of data have little value, but large amounts of pooled data are priceless
  5. These data should not be lost, they will have just as much value in the future

Five DOs of digital openness

  1. Publish your data, so that people can cite you
  2. Ensure your data is available in an agreed standard
  3. Make sure your data is well described so that it can be discovered and is useable
  4. Deposit your data in a long-term repository
  5. Promote the use of your data to others, who might not know how useful it is

Five DON'Ts of digital openness

  1. Don’t sit on your data for years because you think you might make use of it one day
  2. Don’t display your data, but make it difficult for people to download
  3. Don’t hold on to it because you think it has commercial value, unless you actually have a business plan for its exploitation
  4. Don’t restrict access of your data to the IT literate
  5. Don’t think your data is insignificant

Delegates of the pro-iBiosphere Final Event at Bouchout Castle


This work by Quentin Groom is licensed under a Creative Commons Attribution 3.0 Unported License.