Identify people to contact based on data provided, using R and iDigBio

Code here written by Erica Krimmel. Please see Use Case: Find tissue samples for context.

# Load core libraries; install these packages if you have not already
library(ridigbio)
library(tidyverse)

# Load library for making nice HTML output
library(kableExtra)

We need to start with a data frame that we get by querying the idig_search_records function and that includes the field recordset (it is included by default). For simplicity sake you can rename your own data frame records to most easily reuse the code in this example.

# Get data frame to use as example
records <- idig_search_records(rq = list(family = "veneridae", 
                                         county = "los angeles county"))

Our example records data frame looks like this:

uuid	occurrenceid	catalognumber	family	genus	scientificname	country	stateprovince	geopoint.lon	geopoint.lat	datecollected	collector	recordset
00ea8cd3-68ee-48f3-b0e4-fa556bccd576	urn:catalog:ucmp:i:237778	237778	veneridae	saxidomus	saxidomus nuttalli	united states	california	NA	NA	NA	NA	5ab348ab-439a-4697-925c-d6abe0c09b92
01f20e87-ba23-4edd-8a98-1c5bd47146e6	urn:catalog:ucmp:i:233957	233957	veneridae	amiantis	amiantis callosa	united states	california	NA	NA	NA	NA	5ab348ab-439a-4697-925c-d6abe0c09b92
02210ee1-adb2-4657-b665-5e1120d5344c	urn:catalog:ucmp:i:237218	237218	veneridae	globivenus	globivenus fordii	united states	california	NA	NA	NA	NA	5ab348ab-439a-4697-925c-d6abe0c09b92
027eb9f9-80c2-4c53-9438-ae52487ebbbc	urn:catalog:ucmp:i:245128	245128	veneridae	saxidomus	saxidomus nuttalli	united states	california	NA	NA	NA	NA	5ab348ab-439a-4697-925c-d6abe0c09b92
02877ef3-7948-48f7-b579-1acaaaab38d5	urn:catalog:ucmp:i:231776	231776	veneridae	amiantis	amiantis callosa	united states	california	NA	NA	NA	NA	5ab348ab-439a-4697-925c-d6abe0c09b92
03e33d03-6170-4093-8c8f-22c13c232048	http://arctos.database.museum/guid/dmns:inv:14809?seid=2048855	dmns:inv:14809	veneridae	leukoma	leukoma laciniata	united states	california	-118.1336	33.77104	NA	collector(s): james e. steadman	1e86442f-35a5-4e7b-9a38-4599e4d3b510

We will use attributes attached to our records data frame to figure out contact information for each of the recordsets providing data here. For background reading on what we mean by attributes, see Hadley Wikham’s explanation in Advanced R. We can use attributes here because the ridigbio package has structured the results of the idig_search_records function in a specific way. The code below will not work as expected with a data frame that did not originate from the idig_search_records function.

# Count how many records in the data were contributed by each recordset
recordtally <- records %>% 
  group_by(recordset) %>% 
  tally()

# Get metadata from the attributes of the `records` data frame
collections <- tibble(collection = attr(records, "attribution")) %>% 
  # Expand information captured in nested lists
  hoist(collection, 
        recordset_uuid = "uuid",
        recordset_name = "name",
        recordset_url= "url",
        contacts = "contacts") %>% 
  # Get rid of extraneous attribution metadata
  select(-collection) %>% 
  # Expand information captured in nested lists
  unnest_longer(contacts) %>% 
  # Expand information captured in nested lists
  unnest_wider(contacts) %>% 
  # Remove any contacts without an email address listed
  filter(!is.na(email)) %>% 
  # Get rid of duplicate contacts within the same recordset
  distinct() %>% 
  # Rename some columns
  rename(contact_role = role, contact_email = email) %>% 
  # Group first and last names together in the same column
  unite(col = "contact_name", 
        first_name, last_name, 
        sep = " ", 
        na.rm = TRUE) %>% 
  # Restructure data frame so that there is one row per recordset
  group_by(recordset_uuid) %>% 
  mutate(contact_index = row_number()) %>% 
  pivot_wider(names_from = contact_index,
                values_from = c(contact_name, contact_role, contact_email)) %>%
  # Include how many records in the data were contributed by each recordset
  left_join(recordtally, by = c("recordset_uuid"="recordset")) %>% 
  # Rearrange columns so that contact information is grouped by person
  select(starts_with("recordset"),
         "recordset_recordtally" = n,
         contains("1"),
         contains("2"),
         contains("3"),
         contains("4"),
         contains("5"),
         contains("6"),
         contains("7"),
         contains("8"),
         contains("9"),
         contains("10"),
         everything()) %>% 
  # Get rid of any rows which don't actually contribute data to `records`;
  # necessary because the attribute metadata by default includes all recordsets
  # in iDigBio that match the `idig_search_records` query, even if you filter
  # or limit those results in your own code
  filter(recordset_uuid %in% records$recordset)

Our newly constructed collections data frame contains contact information for each of the collections (i.e. recordsets) providing data, and looks like this:

recordset_uuid	recordset_name	recordset_url	recordset_recordtally	contact_name_1	contact_role_1	contact_email_1	contact_name_2	contact_role_2	contact_email_2	contact_name_3	contact_role_3	contact_email_3	contact_name_4	contact_role_4	contact_email_4	contact_name_5	contact_role_5	contact_email_5	contact_name_6	contact_role_6	contact_email_6	contact_name_7	contact_role_7	contact_email_7	contact_name_8	contact_role_8	contact_email_8
5ab348ab-439a-4697-925c-d6abe0c09b92	University of California Museum of Paleontology		625	Joyce Gross	Programmer	jdeck@berkeley.edu	Patricia Holroyd	Museum Scientist	pholroyd@berkeley.edu	Diane Erwin	Senior Museum Scientist for Paleobotany	dmerwin@berkeley.edu	Erica Clites	Museum Scientist for Invertebrate Paleontology	eclites@berkeley.edu	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
5082e6c8-8f5b-4bf6-a930-e3e6de7bf6fb	LACM Invertebrate Paleontology	https://nhm.org/site/research-collections/invertebrate-paleontology	63	Austin Hendy	Collection Manager	ahendy@nhm.org	William Mertz	Database Manager	wmertz@nhm.org	Kevin Love	NA	klove@flmnh.ufl.edu	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
6bb853ab-e8ea-43b1-bd83-47318fc4c345	UF Invertebrate Zoology		54	Gustav Paulay	Curator of Invertebrate Zoology	paulay@flmnh.ufl.edu	Warren Brown	IT Director	netadmin@flmnh.ufl.edu	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
bd61c458-b865-4b05-9f1f-735c49066e55	CAS Invertebrate Zoology (IZ)	http://www.calacademy.org/scientists/izg-collections	26	Stanley Blum	Research Information Manager	sblum@calacademy.org	Jon Fong	Programmer	jfong@calacademy.org	Christina Piotrowski	IZ Collections Manager, Department of Invertebrate Zoology & Geology	CPiotrowski@calacademy.org	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
41b119de-f745-482d-be42-a0155bc76e5d	CMC Cincinnati Museum Center Invertebrate Paleontology		16	Brenda Hunda	Curator of Invertebrate Paleontology	BHunda@cincymuseum.org	Anne Kling	Manager, Collection Databases and Websites	AKling@cincymuseum.org	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
e8a10a16-86af-42b2-be40-9d6a1b21859a	CHAS Malacology Collection (Arctos)	http://www.naturemuseum.org/the-museum/collections/invertebrates	13	Dawn Roberts	Director of Collections	droberts@naturemuseum.org	Erica Krimmel	Assistant Collections Manager	ekrimmel@naturemuseum.org	David Bloom	Coordinator	dbloom@vertnet.org	John Wieczorek	Information Architect	tuco@berkeley.edu	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
774a153b-e556-47f6-95d1-bab49e61cc58	ANSP Malacology		7	Collections Management	Biodiversity Informatics Manager	bdim@ansp.org	Biodiversity Informatics Manager	NA	bdim@ansp.org	Collection Management	Biodiversity Informatics Manager	bdim@ansp.org	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
1ba0bbad-28a7-4c50-8992-a028f79d1dc5	University of Florida Invertebrate Paleontology		6	Roger Portell	Collection Manager	portell@flmnh.ufl.edu	Office of Museum Technology OMT	OMT	netadmin@flmnh.ufl.edu	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
1e86442f-35a5-4e7b-9a38-4599e4d3b510	DMNS Marine Invertebrate Collection (Arctos)	http://www.dmns.org/science/collections/dmns-zoology-collections	5	Paula Cushing	Curator of Invertebrate Zoology	paula.cushing@dmns.org	Laura Russell	VertNet Programmer	larussell@vertnet.org	David Bloom	VertNet Coordinator	dbloom@vertnet.org	John Wieczorek	Information Architect	tuco@berkeley.edu	Dusty McDonald	Arctos Database Programmer	dlmcdonald@alaska.edu	Phyllis Sharp	Departmental Associate	sharpphyl@gmail.com	Bryan Johnson	Departmental Associate	spiralsofthenautilus@gmail.com	NA	NA	NA
137ed4cd-5172-45a5-acdb-8e1de9a64e32	Invertebrate Paleontology Division, Yale Peabody Museum		3	Larry Gall	Head, Computer Systems Office	lawrence.gall@yale.edu	Susan Butts	Division of Invertebrate Paleontology	susan.butts@yale.edu	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
97058091-eb35-401b-b286-18465761f832	Delaware Museum of Natural History – Mollusks	http://www.delmnh.org/mollusks/	1		NA	invertadmin@asu.edu	Elizabeth Shea	NA	eshea@delmnh.org	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
a6eee223-cf3b-4079-8bb2-b77dad8cae9d	NMNH Extant Specimen Records	http://collections.nmnh.si.edu	1	Thomas Orrell	NMNH Informatics	orrellt@si.edu	Chris Tuccinardi	Information Management	tuccinar@si.edu	Karen Reed	Data Manager	reedk@si.edu	Jessica Bird	Collections Information Manager	birdj@si.edu	Jeff Williams	Collection Manager	williamsjt@si.edu	Kenneth Tighe	Database Coordinator	tighek@si.edu	Brian Schmidt	Museum Specialist	schmidtb@si.edu	Ingrid Rochon	Scientific Data Manager	rochoni@si.edu

We can contact each collection by looking for the most appropriate person listed in each row, often someone with the role of “collection manager” or “curator.” Because each collection publishes this kind of metadata separately, sometimes the contacts listed also include people who are not directly responsible for managing physical specimens, and who may not be able to help you. These people often have roles such as “information architect,” “programmer,” or “database manager.” All contacts listed per recordset have been included here, and it is up to you to decide who to reach out to.

It is frequently helpful to provide your collection contact with a spreadsheet listing the specimen records you are interested in. We can generate these spreadsheets automatically, as shown in the code below.

# Generate a spreadsheet for each recordset containing only the rows provided by
# that recordset, and named according to the recordset uuid.
for (i in seq_along(collections$recordset_uuid)) {
  
  filename <- paste("records_", collections$recordset_uuid, ".csv",
                    sep = "", na = "")
  
  subset <- records %>% 
    filter(recordset == collections$recordset_uuid[i])
  
  # Save files to your working directory
  write_csv(subset, filename[i])
}

For specific research requests there are many ways you could modify the code demonstrated here to be more helpful, e.g. by including additional fields available through idig_search_records. See also the ridigbio function idig_build_attrib for a summary of recordsets used by records in the data frame, minus contact information.