About

Knowledge is being generated at an exponentially increasing rate.

How can we comb through this information rapidly and efficiently to make evidence-informed decisions?

Colandr: the platform

Colandr is a web-based, open access platform for conducting evidence reviews. Colandr can be used by collaborative teams of any size and provides an organizational structure to manage information throughout the entire evidence review process.

Colandr's design is structured around the steps of a systematic evidence synthesis, however, it can be used for all types of reviews/syntheses of documents. Colandr is a two-system platform (see diagram below). System 1 focuses on smart-sorting citations by relevance for inclusion at subsequent stages of the review. System 2 semi-automates classification of included documents for user-defined categories. At each stage, machine learning and natural language processing algorithms work in the background to learn what is relevant to each review and suggest more relevant citations and likely classifications, respectively. At both of these stages, the process is semi-automated - so ultimate decisions for inclusion/exclusion and classification must be made by the individual user. We retain this level of user oversight to ensure transparency in the decision process.

Features

Collaborative team working
Cross-checking between members of a review team
Assistance in structuring search strings
Citation upload in common bibliographic formats (e.g. BibTex and RIS)
De-duplication of citations
Citation screening at title and abstract powered by machine learning (sorting by relevance for each review)
PDF uploading
Data extraction from full texts powered by natural language processing
Export of screening decisions and extracted data in comma-separated value (CSV) format

You can read more about Colandr in our publication in Conservation Biology

Citing Colandr

Please cite Colandr if you use it in your work:

Cheng, SH., Augustin, C., Bethel, A., Gill, D., Anzaroot, S., Brun, J., DeWilde, B., Minnich, R., Garside, R., Masuda, Y., Miller, DC., Wilkie, D., Wongbusarakum, S. and McKinnon, MC. (2018), Using machine learning to advance synthesis and use of conservation and environmental evidence. Conservation Biology, 32: 762-764. doi:10.1111/cobi.1311

The genesis of Colandr

Initially created as a DataKind DataCorps project, The impetus to develop Colandr emerged after a Science for People and Nature Partnership (SNAPP) Working Group on Evidence-Based Conservation led by Madeleine McKinnon, finally wrapped up a systematic map project on the evidence of links between conservation and human well-being. After nearly 2.5 years, the team had screened 35,000 citations, read over 3,000 full-text articles, and extracted data from over 1,000 articles. Systematic evidence syntheses are conducted using an established on a peer-reviewed a priori protocol - ensuring full transparency of methodology and providing a template for updating syntheses so they can serve as dynamic resources. However, faced with the daunting task of updating a nearly 3 year-old synthesis with potentially just as many citations to go through, the team balked and thought - can computers make this task easier?

And so grew the idea for Colandr - how can we apply machine learning and natural language processing algorithms to make this process of synthesis faster, more efficient, and more affordable?

In a collaboration between the SNAPP Evidence-Based Conservation group, DataKind, and Conservation International - a team of data and computer science volunteers spent the better part of 18 months building Colandr as a DataKind DataCorps project, A lot of research went in to the development of Colandr, and we relied heavily on two DataKind data experts, Burton DeWilde and Sam Anzaroot to construct the ranking, screening, and tagging processes. We succeeded due to the project management skills of Bob Minnich and the front-end expertise of Stan Sagalovsky.

These four papers form the backbone of the processes that the DataKind team built: Distributed Representations of Words and Phrases and their Compositionality, GloVe: Global Vectors for Word Representation, Using text mining for study identification in systematic reviews: a systematic review of current approaches, Pinpointing needles in giant haystacks:use of text mining to reduce impractical screening workload in extremely large scoping reviews.

Colandr was launched in July 2017 at the International Congress for Conservation Biology in Cartagena, Colombia.

Colandr is supported by SNAPP, DataKind, Conservation International, and the Center for Biodiversity and Conservation at the American Museum of Natural History. Sam Cheng is the Director for Colandr and runs day-to-day operations and strategic planning. She is a Biodiversity Scientist at the Center for Biodiversity and Conservation at the American Museum of Natural History. Caitlin Augustin, the Senior Director of Product at DataKind, is a key strategic partner for Colandr.

Getting started

Colandr is intended to be user-friendly and community-driven. If you need help getting started with Colandr, please see the how-to & guidance section on this page. In addition, specific queries and issues can be directly to our help line @ [email protected]. In addition, you can also direct questions to the community of users through the Google+ community.

As Colandr is built by and continues to be run by volunteers, we ask your patience as we respond to your questions as quickly as we can. We recognize there are lots of areas of improvement for Colandr - and we are endeavoring to make these fixes and developments in the near future. Please report any issues/bugs using the Bug Reporting form on this page.

Code behind Colandr

The techniques used are natural language processing (generally) and word2vec and GloVe vectors (specifically) with logistic regression models.

Colandr sits on a AWS server hosted by Conservation International.

The front-end can be viewed and cloned in this GitHub repository.

The back-end can be viewed and cloned in this GitHub repository.