Biodiversity locality data is collected in four forms...
- Point data, sometimes with an indication of the accuracy.
- Gridded data, where the observation is located to within a predefined grid of a geographic coordinate system.
- Area data, where the occurrence is located to a defined area such as a country, province, state, county, nature reserve etc.
- Site description data. Ill-defined locality descriptions (e.g. 5 km west of Newcastle), which are often supported by one or more of the former forms of geographic locator.
The problem is that the vast majority of biodiversity observations are collected and analysed using gridded data. You couldn’t collect point data for every single individual. For the vast majority of organisms if there is one individual, it's fairly certain another one will be nearby1. Apart from a few rare species and for individual specimens there is little point and certainly no time to collect point data. Furthermore, almost all the environmental data for modelling such as climate, pollution, soil and land-cover are stored as gridded data. Indeed, even when they are not, they are interpolated and converted to gridded data for species distribution modelling.
Unfortunately, GBIF takes gridded observation data and converts them to point data using the centre of the grid square and an error radius that encloses the grid square (i.e. a 36% larger area)2. So if you want to convert these data back to gridded data, you either have to choose a larger grid than the original to ensure that the point and the error radius are contained within the grid square3, or recalculate the edges of the square and the error using the radius of the circle as half the diameter of the grid square that contains it. Though you can only do this if you know that the data was gridded in the first place, otherwise you end up with squares that overlap each other.
The origin of this approach stems from the requirements of one community, the museums and herbaria who geo-reference their collections using the point radius method (Chapman & Wieczorek, 2006). It ignores the vast majority of data collectors and data users; the ecologists, conservationists and modellers. Indeed, the Darwin Core standard can handle gridded data, but rather clumsily using the footprintWKT field4. Most databases containing gridded data hold the position of the south-west corner of the grid square, the size of the grid square and a description of the spatial reference system.
GBIF surely wants to be a one-stop-shop for biodiversity modelling data so it should stop converting most of the data to a format that can’t be used without converting it back to the original format, if you are lucky enough to know which one it was in the first place.
Footnotes1. This is Tobler's first law of geography, "Everything is related to everything else, but near things are more related than distant things." Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2): 234-240.
2. coordinatePrecision for gridded data in GBIF seems to have been interpreted in various ways and you have to study it from each data provider to understand what it means.
3. If you use a larger grid for modelling you are not looking at data at a different scale as is sometimes suggested, you are just losing definition. If your computer monitor had bigger pixels you wouldn’t say you’re looking at the image at a different scale.
4. The footprintWKT field is not available in GBIF downloads.
Chapman, A.D. and J. Wieczorek (eds). 2006. Guide to Best Practices for Georeferencing. Copenhagen: Global Biodiversity
Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, et al. (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE7(1): e29715. doi:10.1371/journal.pone.0029715