Code here written by Erica Krimmel. Please see Use Case: Find tissue samples for context.
# Load core libraries; install these packages if you have not already
library(ridigbio)
library(tidyverse)
# Load library for making nice HTML output
library(kableExtra)
We need to start with a data frame that we get by querying the idig_search_records
function and that includes the field recordset
(it is included by default). For simplicity sake you can rename your own data frame records
to most easily reuse the code in this example.
# Get data frame to use as example
records <- idig_search_records(rq = list(family = "veneridae",
county = "los angeles county"))
Our example records
data frame looks like this:
uuid | occurrenceid | catalognumber | family | genus | scientificname | country | stateprovince | geopoint.lon | geopoint.lat | datecollected | collector | recordset |
---|---|---|---|---|---|---|---|---|---|---|---|---|
00ea8cd3-68ee-48f3-b0e4-fa556bccd576 | urn:catalog:ucmp:i:237778 | 237778 | veneridae | saxidomus | saxidomus nuttalli | united states | california | NA | NA | NA | NA | 5ab348ab-439a-4697-925c-d6abe0c09b92 |
01f20e87-ba23-4edd-8a98-1c5bd47146e6 | urn:catalog:ucmp:i:233957 | 233957 | veneridae | amiantis | amiantis callosa | united states | california | NA | NA | NA | NA | 5ab348ab-439a-4697-925c-d6abe0c09b92 |
02210ee1-adb2-4657-b665-5e1120d5344c | urn:catalog:ucmp:i:237218 | 237218 | veneridae | globivenus | globivenus fordii | united states | california | NA | NA | NA | NA | 5ab348ab-439a-4697-925c-d6abe0c09b92 |
027eb9f9-80c2-4c53-9438-ae52487ebbbc | urn:catalog:ucmp:i:245128 | 245128 | veneridae | saxidomus | saxidomus nuttalli | united states | california | NA | NA | NA | NA | 5ab348ab-439a-4697-925c-d6abe0c09b92 |
02877ef3-7948-48f7-b579-1acaaaab38d5 | urn:catalog:ucmp:i:231776 | 231776 | veneridae | amiantis | amiantis callosa | united states | california | NA | NA | NA | NA | 5ab348ab-439a-4697-925c-d6abe0c09b92 |
03e33d03-6170-4093-8c8f-22c13c232048 | http://arctos.database.museum/guid/dmns:inv:14809?seid=2048855 | dmns:inv:14809 | veneridae | leukoma | leukoma laciniata | united states | california | -118.1336 | 33.77104 | NA | collector(s): james e. steadman | 1e86442f-35a5-4e7b-9a38-4599e4d3b510 |
We will use attributes attached to our records
data frame to figure out contact information for each of the recordsets providing data here. For background reading on what we mean by attributes, see Hadley Wikham’s explanation in Advanced R. We can use attributes here because the ridigbio
package has structured the results of the idig_search_records
function in a specific way. The code below will not work as expected with a data frame that did not originate from the idig_search_records
function.
# Count how many records in the data were contributed by each recordset
recordtally <- records %>%
group_by(recordset) %>%
tally()
# Get metadata from the attributes of the `records` data frame
collections <- tibble(collection = attr(records, "attribution")) %>%
# Expand information captured in nested lists
hoist(collection,
recordset_uuid = "uuid",
recordset_name = "name",
recordset_url= "url",
contacts = "contacts") %>%
# Get rid of extraneous attribution metadata
select(-collection) %>%
# Expand information captured in nested lists
unnest_longer(contacts) %>%
# Expand information captured in nested lists
unnest_wider(contacts) %>%
# Remove any contacts without an email address listed
filter(!is.na(email)) %>%
# Get rid of duplicate contacts within the same recordset
distinct() %>%
# Rename some columns
rename(contact_role = role, contact_email = email) %>%
# Group first and last names together in the same column
unite(col = "contact_name",
first_name, last_name,
sep = " ",
na.rm = TRUE) %>%
# Restructure data frame so that there is one row per recordset
group_by(recordset_uuid) %>%
mutate(contact_index = row_number()) %>%
pivot_wider(names_from = contact_index,
values_from = c(contact_name, contact_role, contact_email)) %>%
# Include how many records in the data were contributed by each recordset
left_join(recordtally, by = c("recordset_uuid"="recordset")) %>%
# Rearrange columns so that contact information is grouped by person
select(starts_with("recordset"),
"recordset_recordtally" = n,
contains("1"),
contains("2"),
contains("3"),
contains("4"),
contains("5"),
contains("6"),
contains("7"),
contains("8"),
contains("9"),
contains("10"),
everything()) %>%
# Get rid of any rows which don't actually contribute data to `records`;
# necessary because the attribute metadata by default includes all recordsets
# in iDigBio that match the `idig_search_records` query, even if you filter
# or limit those results in your own code
filter(recordset_uuid %in% records$recordset)
Our newly constructed collections
data frame contains contact information for each of the collections (i.e. recordsets) providing data, and looks like this:
recordset_uuid | recordset_name | recordset_url | recordset_recordtally | contact_name_1 | contact_role_1 | contact_email_1 | contact_name_2 | contact_role_2 | contact_email_2 | contact_name_3 | contact_role_3 | contact_email_3 | contact_name_4 | contact_role_4 | contact_email_4 | contact_name_5 | contact_role_5 | contact_email_5 | contact_name_6 | contact_role_6 | contact_email_6 | contact_name_7 | contact_role_7 | contact_email_7 | contact_name_8 | contact_role_8 | contact_email_8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5ab348ab-439a-4697-925c-d6abe0c09b92 | University of California Museum of Paleontology | 625 | Joyce Gross | Programmer | jdeck@berkeley.edu | Patricia Holroyd | Museum Scientist | pholroyd@berkeley.edu | Diane Erwin | Senior Museum Scientist for Paleobotany | dmerwin@berkeley.edu | Erica Clites | Museum Scientist for Invertebrate Paleontology | eclites@berkeley.edu | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |
5082e6c8-8f5b-4bf6-a930-e3e6de7bf6fb | LACM Invertebrate Paleontology | https://nhm.org/site/research-collections/invertebrate-paleontology | 63 | Austin Hendy | Collection Manager | ahendy@nhm.org | William Mertz | Database Manager | wmertz@nhm.org | Kevin Love | NA | klove@flmnh.ufl.edu | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
6bb853ab-e8ea-43b1-bd83-47318fc4c345 | UF Invertebrate Zoology | 54 | Gustav Paulay | Curator of Invertebrate Zoology | paulay@flmnh.ufl.edu | Warren Brown | IT Director | netadmin@flmnh.ufl.edu | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |
bd61c458-b865-4b05-9f1f-735c49066e55 | CAS Invertebrate Zoology (IZ) | http://www.calacademy.org/scientists/izg-collections | 26 | Stanley Blum | Research Information Manager | sblum@calacademy.org | Jon Fong | Programmer | jfong@calacademy.org | Christina Piotrowski | IZ Collections Manager, Department of Invertebrate Zoology & Geology | CPiotrowski@calacademy.org | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
41b119de-f745-482d-be42-a0155bc76e5d | CMC Cincinnati Museum Center Invertebrate Paleontology | 16 | Brenda Hunda | Curator of Invertebrate Paleontology | BHunda@cincymuseum.org | Anne Kling | Manager, Collection Databases and Websites | AKling@cincymuseum.org | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |
e8a10a16-86af-42b2-be40-9d6a1b21859a | CHAS Malacology Collection (Arctos) | http://www.naturemuseum.org/the-museum/collections/invertebrates | 13 | Dawn Roberts | Director of Collections | droberts@naturemuseum.org | Erica Krimmel | Assistant Collections Manager | ekrimmel@naturemuseum.org | David Bloom | Coordinator | dbloom@vertnet.org | John Wieczorek | Information Architect | tuco@berkeley.edu | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
774a153b-e556-47f6-95d1-bab49e61cc58 | ANSP Malacology | 7 | Collections Management | Biodiversity Informatics Manager | bdim@ansp.org | Biodiversity Informatics Manager | NA | bdim@ansp.org | Collection Management | Biodiversity Informatics Manager | bdim@ansp.org | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |
1ba0bbad-28a7-4c50-8992-a028f79d1dc5 | University of Florida Invertebrate Paleontology | 6 | Roger Portell | Collection Manager | portell@flmnh.ufl.edu | Office of Museum Technology OMT | OMT | netadmin@flmnh.ufl.edu | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |
1e86442f-35a5-4e7b-9a38-4599e4d3b510 | DMNS Marine Invertebrate Collection (Arctos) | http://www.dmns.org/science/collections/dmns-zoology-collections | 5 | Paula Cushing | Curator of Invertebrate Zoology | paula.cushing@dmns.org | Laura Russell | VertNet Programmer | larussell@vertnet.org | David Bloom | VertNet Coordinator | dbloom@vertnet.org | John Wieczorek | Information Architect | tuco@berkeley.edu | Dusty McDonald | Arctos Database Programmer | dlmcdonald@alaska.edu | Phyllis Sharp | Departmental Associate | sharpphyl@gmail.com | Bryan Johnson | Departmental Associate | spiralsofthenautilus@gmail.com | NA | NA | NA |
137ed4cd-5172-45a5-acdb-8e1de9a64e32 | Invertebrate Paleontology Division, Yale Peabody Museum | 3 | Larry Gall | Head, Computer Systems Office | lawrence.gall@yale.edu | Susan Butts | Division of Invertebrate Paleontology | susan.butts@yale.edu | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |
97058091-eb35-401b-b286-18465761f832 | Delaware Museum of Natural History – Mollusks | http://www.delmnh.org/mollusks/ | 1 | NA | invertadmin@asu.edu | Elizabeth Shea | NA | eshea@delmnh.org | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |
a6eee223-cf3b-4079-8bb2-b77dad8cae9d | NMNH Extant Specimen Records | http://collections.nmnh.si.edu | 1 | Thomas Orrell | NMNH Informatics | orrellt@si.edu | Chris Tuccinardi | Information Management | tuccinar@si.edu | Karen Reed | Data Manager | reedk@si.edu | Jessica Bird | Collections Information Manager | birdj@si.edu | Jeff Williams | Collection Manager | williamsjt@si.edu | Kenneth Tighe | Database Coordinator | tighek@si.edu | Brian Schmidt | Museum Specialist | schmidtb@si.edu | Ingrid Rochon | Scientific Data Manager | rochoni@si.edu |
We can contact each collection by looking for the most appropriate person listed in each row, often someone with the role of “collection manager” or “curator.” Because each collection publishes this kind of metadata separately, sometimes the contacts listed also include people who are not directly responsible for managing physical specimens, and who may not be able to help you. These people often have roles such as “information architect,” “programmer,” or “database manager.” All contacts listed per recordset have been included here, and it is up to you to decide who to reach out to.
It is frequently helpful to provide your collection contact with a spreadsheet listing the specimen records you are interested in. We can generate these spreadsheets automatically, as shown in the code below.
# Generate a spreadsheet for each recordset containing only the rows provided by
# that recordset, and named according to the recordset uuid.
for (i in seq_along(collections$recordset_uuid)) {
filename <- paste("records_", collections$recordset_uuid, ".csv",
sep = "", na = "")
subset <- records %>%
filter(recordset == collections$recordset_uuid[i])
# Save files to your working directory
write_csv(subset, filename[i])
}
For specific research requests there are many ways you could modify the code demonstrated here to be more helpful, e.g. by including additional fields available through idig_search_records
. See also the ridigbio function idig_build_attrib
for a summary of recordsets used by records in the data frame, minus contact information.