View on GitHub

Biodiversity Specimen Data

a collection of specific use cases of biodiversity specimen data, documented conceptually and, where possible, linked to technical solutions

Code here written by [Erica Krimmel](https://orcid.org/0000-0003-3192-0080). Please see **Use Case: [Find tissue samples](https://biodiversity-specimen-data.github.io/specimen-data-use-case/use-case/identify-data-contacts)** for context. ```{r message=FALSE} # Load core libraries; install these packages if you have not already library(ridigbio) library(tidyverse) # Load library for making nice HTML output library(kableExtra) ``` We need to start with a data frame that we get by querying the `idig_search_records` function and that includes the field `recordset` (it is included by default). For simplicity sake you can rename your own data frame `records` to most easily reuse the code in this example. ```{r} # Get data frame to use as example records <- idig_search_records(rq = list(family = "veneridae", county = "los angeles county")) ``` Our example `records` data frame looks like this: ```{r echo = FALSE, results = 'asis'} knitr::kable(head(records)) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% scroll_box(width = "100%") ``` We will use attributes attached to our `records` data frame to figure out contact information for each of the recordsets providing data here. For background reading on what we mean by attributes, see [Hadley Wikham's explanation in Advanced R](http://adv-r.had.co.nz/Data-structures.html#attributes). We can use attributes here because the `ridigbio` package has structured the results of the `idig_search_records` function in a specific way. The code below will not work as expected with a data frame that did not originate from the `idig_search_records` function. ```{r} # Count how many records in the data were contributed by each recordset recordtally <- records %>% group_by(recordset) %>% tally() # Get metadata from the attributes of the `records` data frame collections <- tibble(collection = attr(records, "attribution")) %>% # Expand information captured in nested lists hoist(collection, recordset_uuid = "uuid", recordset_name = "name", recordset_url= "url", contacts = "contacts") %>% # Get rid of extraneous attribution metadata select(-collection) %>% # Expand information captured in nested lists unnest_longer(contacts) %>% # Expand information captured in nested lists unnest_wider(contacts) %>% # Remove any contacts without an email address listed filter(!is.na(email)) %>% # Get rid of duplicate contacts within the same recordset distinct() %>% # Rename some columns rename(contact_role = role, contact_email = email) %>% # Group first and last names together in the same column unite(col = "contact_name", first_name, last_name, sep = " ", na.rm = TRUE) %>% # Restructure data frame so that there is one row per recordset group_by(recordset_uuid) %>% mutate(contact_index = row_number()) %>% pivot_wider(names_from = contact_index, values_from = c(contact_name, contact_role, contact_email)) %>% # Include how many records in the data were contributed by each recordset left_join(recordtally, by = c("recordset_uuid"="recordset")) %>% # Rearrange columns so that contact information is grouped by person select(starts_with("recordset"), "recordset_recordtally" = n, contains("1"), contains("2"), contains("3"), contains("4"), contains("5"), contains("6"), contains("7"), contains("8"), contains("9"), contains("10"), everything()) %>% # Get rid of any rows which don't actually contribute data to `records`; # necessary because the attribute metadata by default includes all recordsets # in iDigBio that match the `idig_search_records` query, even if you filter # or limit those results in your own code filter(recordset_uuid %in% records$recordset) ``` Our newly constructed `collections` data frame contains contact information for each of the collections (i.e. recordsets) providing data, and looks like this: ```{r echo = FALSE, results = 'asis'} knitr::kable(collections) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% scroll_box(height = "400px") ``` We can contact each collection by looking for the most appropriate person listed in each row, often someone with the role of "collection manager" or "curator." Because each collection publishes this kind of metadata separately, sometimes the contacts listed also include people who are not directly responsible for managing physical specimens, and who may not be able to help you. These people often have roles such as "information architect," "programmer," or "database manager." All contacts listed per recordset have been included here, and it is up to you to decide who to reach out to. It is frequently helpful to provide your collection contact with a spreadsheet listing the specimen records you are interested in. We can generate these spreadsheets automatically, as shown in the code below. ```{r} # Generate a spreadsheet for each recordset containing only the rows provided by # that recordset, and named according to the recordset uuid. for (i in seq_along(collections$recordset_uuid)) { filename <- paste("records_", collections$recordset_uuid, ".csv", sep = "", na = "") subset <- records %>% filter(recordset == collections$recordset_uuid[i]) # Save files to your working directory write_csv(subset, filename[i]) } ``` For specific research requests there are many ways you could modify the code demonstrated here to be more helpful, e.g. by including additional fields available through `idig_search_records`. See also the ridigbio function `idig_build_attrib` for a summary of recordsets used by records in the data frame, minus contact information.