iDigBio API info
R packages we like
Essential R packages
- tidyverse: must-have suite of packages for data wrangling and analysis that includes many other packages, e.g.:
- dplyr for data management
- tidyr for tidying your data
- ggplot2 for data visualization
- stringr for string filtering and manipulation (recommended by @frog_phylo)
- bdchecks: performs data check for biodiversity data
- bdclean: cleans biodiversity data
- bddwc: standardizes data to Darwin Core (DwC) format
- bdverse: suite of packages to facilitate biodiversity science for “inexperienced R users”
- BIEN: access to the Botanical Information and Ecology Network database; see tutorial
- coordinatecleaner: spatial and temporal data quality
- rgbif: access to data from GBIF via the GBIF API
- ridigbio: access to data from the iDigBio Portal via the iDigBio API
- scrubr: biodiversity occurrence data quality
- spocc: access to multiple biodiversity occurrence sources, including the iDigBio Portal and GBIF, via APIs
- spThin: tools for reducing spatial sampling bias of occurrence records (recommended by @maverickpandion)
- taxize: access to multiple taxonomic data sources via APIs
- taxizedb: access to taxonomic data sources via local database
- wallace: “R-based platform for reproducible modeling of species niches and distributions”; note that this requires rJava
Other Important R packages
- biogeo: spatial data quality
- checkpoint: managing packages for reproducibility
- dismo: species distribution modeling
- ggmap: geospatial analysis
- knitr: report generation from RMarkdown code
- lubridate: for cleaning and converting dates
- packrat: managing packages
- rOpenSci: “Our packages are carefully vetted, staff- and community-contributed R software tools that lower barriers to working with scientific data sources and data that support research applications on the web”
Links for more learning
- OpenRefine: can be used to clean data (ex. Baldwin et al. 2017).
- Biodiverse: “tool for the spatial analysis of diversity using indices based on taxonomic, phylogenetic, trait and matrix-based (e.g. genetic distance) relationships, as well as related environmental and temporal variations”
- Glosario: A glossary of terms related to programming, designed for beginnings to reference