Digital Symposium Collection 2000  

 
 
 
 
 
 

 















LITCHI: Knowledge Integrity Testing for Taxonomic Databases

I. Sutherland, S. Embury, A. Jones, W. Gray, R. White, J. Robinson, F. Bisby,, and S. Brandt

  View Paper (PDF)  

Return to Session 7: Demonstrations

Abstract

A taxonomic checklist is a list of the names of species (and other taxa) used within a particular biological database. Since species names are typically used to gain access to data within biological databases, checklists provide a concise representation of the data values that can act as {\em keys} when querying such databases. More importantly, species names are also typically used as the {\em join attribute} when integrating several biological databases. However, naming of species is a subjective activity, and different scientific communities will have different ideas about the names that should be used for particular species. These conflicts of opinion arise as a result of the subjective nature of the classification process and geographical or historical differences in background knowledge. Some communities may use different names for the same species, while other groups of scientists may use the same name to refer to different species. Often, there is no one right naming scheme, but some consistent set of names must be used if biological databases are to be integrated. Therefore, there is a real need for a tool which will assist biologists in the integration of checklists, prior to the integration of species databases, so that these differences of opinion can be resolved. The goal of the LITCHI project is to provide a supportive environment in which to allow the integrator of biological databases to detect and resolve naming conflicts. In order to create such an environment, we have constructed a formal model (a set of constraints) of the way scientific names are used to denote taxa in common taxonomic practice. This model is then used to derive sets of Prolog rules which will detect conflicts in taxonomic checklists stored in a relational DBMS. The LITCHI system allows a biologist to import one or more checklists into the central database for analysis. The Prolog rules are executed against the DBMS, using the Prodata interface system, and details of the conflicts found are stored in the DBMS for examination by the biologist. The conflicts can then be resolved, typically by merging or deleting sets of names, so that the checklist reflects the biologist's current preferences. Once all conflicts have been removed, the new, consistent checklist can be exported (in the ALICE format) for use in other systems.

























Copyright(C) 2000 ACM