Wednesday, 21 August 2013

GBIF should store gridded data and stop converting it to point-radius

Why are modellers still coming to me for data, rather than taking it straight from Global Biodiversity Information Facility ( +GBIF )? I believe GBIF and the creators of the Darwin Core standard (Wieczorek et al., 2012) made a mistake in judgment by adopting the point-radius method of location description above all other forms.

Biodiversity locality data is collected in four forms...
  1. Point data, sometimes with an indication of the accuracy.
  2. Gridded data, where the observation is located to within a predefined grid of a geographic coordinate system.
  3. Area data, where the occurrence is located to a defined area such as a country, province, state, county, nature reserve etc.
  4. Site description data. Ill-defined locality descriptions (e.g. 5 km west of Newcastle), which are often supported by one or more of the former forms of geographic locator.
On the face of it, point data is superior to all other forms of data as it accurately locates the individual and can seemingly be converted to any of the other three forms. This is perhaps why, when you download data from GBIF all the data is in this format. It is stored in the Darwin Core standard in the fields decimalLatitude, decimalLongitudeverbatimLatitude, verbatimLongitude and coordinatePrecision.
The problem is that the vast majority of biodiversity observations are collected and analysed using gridded data. You couldn’t collect point data for every single individual. For the vast majority of organisms if there is one individual, it's fairly certain another one will be nearby1. Apart from a few rare species and for individual specimens there is little point and certainly no time to collect point data. Furthermore, almost all the environmental data for modelling such as climate, pollution, soil and land-cover are stored as gridded data. Indeed, even when they are not, they are interpolated and converted to gridded data for species distribution modelling.
Unfortunately, GBIF takes gridded observation data and converts them to point data using the centre of the grid square and an error radius that encloses the grid square (i.e. a 36% larger area)2. So if you want to convert these data back to gridded data, you either have to choose a larger grid than the original to ensure that the point and the error radius are contained within the grid square3, or recalculate the edges of the square and the error using the radius of the circle as half the diameter of the grid square that contains it. Though you can only do this if you know that the data was gridded in the first place, otherwise you end up with squares that overlap each other.
The origin of this approach stems from the requirements of one community, the museums and herbaria who geo-reference their collections using the point radius method (Chapman & Wieczorek, 2006). It ignores the vast majority of data collectors and data users; the ecologists, conservationists and modellers. Indeed, the Darwin Core standard can handle gridded data, but rather clumsily using the footprintWKT field4. Most databases containing gridded data hold the position of the south-west corner of the grid square, the size of the grid square and a description of the spatial reference system.
GBIF surely wants to be a one-stop-shop for biodiversity modelling data so it should stop converting most of the data to a format that can’t be used without converting it back to the original format, if you are lucky enough to know which one it was in the first place.


1. This is Tobler's first law of geography, "Everything is related to everything else, but near things are more related than distant things." Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2): 234-240.
2. coordinatePrecision for gridded data in GBIF seems to have been interpreted in various ways and you have to study it from each data provider to understand what it means.
3. If you use a larger grid for modelling you are not looking at data at a different scale as is sometimes suggested, you are just losing definition. If your computer monitor had bigger pixels you wouldn’t say you’re looking at the image at a different scale.
4. The footprintWKT field is not available in GBIF downloads.


Chapman, A.D. and J. Wieczorek (eds). 2006. Guide to Best Practices for Georeferencing. Copenhagen: Global Biodiversity

Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, et al. (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE7(1): e29715. doi:10.1371/journal.pone.0029715

No comments:

Post a Comment