The genesis of Colandr
The impetus to develop Colandr emerged after a Science for People and Nature Working Group on Evidence-Based Conservation finally wrapped up a systematic map project on the evidence of links between conservation and human well-being. After nearly 2.5 years, the team had screened 35,000 citations, read over 3,000 full-text articles, and extracted data from over 1,000 articles. Systematic evidence syntheses are conducted using an established on a peer-reviewed a priori protocol - ensuring full transparency of methodology and providing a template for updating syntheses so they can serve as dynamic resources. However, faced with the daunting task of updating a nearly 3 year-old synthesis with potentially just as many citations to go through, the team balked and thought - can computers make this task easier?
And so grew the idea for Colandr - how can we apply machine learning and natural language processing algorithms to make this process of synthesis faster, more efficient, and more affordable?
In a collaboration between the SNAPP Evidence-Based Conservation group, DataKind, and Conservation International - a team of data and computer science volunteers spent the better part of 18 months building Colandr. A lot of research went in to the development of Colandr, and we relied heavily on two DataKind data experts, Burton and Sam to construct the ranking, screening, and tagging processes. We succeeded due to the project management skills of another volunteer, Bob Minnich. These four papers form the backbone of the processes that the DataKind team built: Distributed Representations of Words and Phrases and their Compositionality, GloVe: Global Vectors for Word Representation, Using text mining for study identification in systematic reviews: a systematic review of current approaches, Pinpointing needles in giant haystacks:use of text mining to reduce impractical screening workload in extremely large scoping reviews.
Colandr was launched in July 2017 at the International Congress for Conservation Biology in Cartagena, Colombia.
If you use Colandr in your work, please cite our paper in Conservation Biology.
As Colandr is built by and continues to be run by volunteers, we ask your patience as we respond to your questions as quickly as we can. We recognize there are lots of areas of improvement for Colandr, however, fixes will remain a task for the future but we encourage you to submit suggestions and bugs with the form below.
And so grew the idea for Colandr - how can we apply machine learning and natural language processing algorithms to make this process of synthesis faster, more efficient, and more affordable?
In a collaboration between the SNAPP Evidence-Based Conservation group, DataKind, and Conservation International - a team of data and computer science volunteers spent the better part of 18 months building Colandr. A lot of research went in to the development of Colandr, and we relied heavily on two DataKind data experts, Burton and Sam to construct the ranking, screening, and tagging processes. We succeeded due to the project management skills of another volunteer, Bob Minnich. These four papers form the backbone of the processes that the DataKind team built: Distributed Representations of Words and Phrases and their Compositionality, GloVe: Global Vectors for Word Representation, Using text mining for study identification in systematic reviews: a systematic review of current approaches, Pinpointing needles in giant haystacks:use of text mining to reduce impractical screening workload in extremely large scoping reviews.
Colandr was launched in July 2017 at the International Congress for Conservation Biology in Cartagena, Colombia.
If you use Colandr in your work, please cite our paper in Conservation Biology.
As Colandr is built by and continues to be run by volunteers, we ask your patience as we respond to your questions as quickly as we can. We recognize there are lots of areas of improvement for Colandr, however, fixes will remain a task for the future but we encourage you to submit suggestions and bugs with the form below.
Code behind Colandr
The techniques used are natural language processing (generally) and word2vec and GloVe vectors (specifically) with logistic regression models.
Colandr sits on a AWS server hosted by Conservation International.
The front-end can be viewed and cloned in this GitHub repository.
The back-end can be viewed and cloned in this GitHub repository.
Colandr sits on a AWS server hosted by Conservation International.
The front-end can be viewed and cloned in this GitHub repository.
The back-end can be viewed and cloned in this GitHub repository.
Extensions and useful community tools
Colandr is an open source, open access tool - meaning that you, the user, has free and transparent access to the inner workings of the platform. We specifically chose to develop open source so that others would have the opportunity to add their vast expertise and creativity to further evolving and developing Colandr. As we developed this tool with an awesome team of DataKind volunteers, we were limited in time and resources and thus, all the functionalities we may desire and all the bugs we would like to fix, could not be achieved.
However, we welcome you to continue to report bugs that you find here (but please, do not use this form to report problems like forgetting your password) and we will add these to the running list for future development.
Other entrepreneurial users have developed add-ons for Colandr to help streamline the process.
However, we welcome you to continue to report bugs that you find here (but please, do not use this form to report problems like forgetting your password) and we will add these to the running list for future development.
Other entrepreneurial users have developed add-ons for Colandr to help streamline the process.
- A script to run in R to batch download PDFs from included citations - available on GitHub. Author: Stephen Wood (SNAPP/TNC)