vignettes/articles/building-checklist.Rmd
building-checklist.Rmd
A common task that Plants of the World Online (POWO) can be used for is to build a checklist of native species for a country.
In this demonstration, we will:
In addition to kewr, we’ll load:
We’ll get our list of accepted species for Iceland, using the POWO search API.
query <- list(distribution="Iceland")
filters <- c("accepted", "species")
iceland_species <- search_powo(query, filters=filters, limit=1000)
In total, we have this many accepted species in Iceland:
iceland_species$total
#> [1] 889
To get the native distribution for all our species, we need to use POWO’s lookup API for every single one.
First we’ll extract a list of IDs from our results, using the map
function from purrr.
ids <- map(iceland_species$results, ~str_extract(.x$fqId, "[\\d\\-]+$"))
Then we need to make all of our requests. To make things easier, we’ll define a simple function that just accepts a species ID, and makes use of a progress bar to track our requests!
pb <- progress_bar$new(
format=" requesting [:bar] :current/:total (:percent)",
total=length(ids)
)
fcn <- function(id) {
pb$tick()
lookup_powo(id, distribution=TRUE)
}
iceland_records <- map(ids, fcn)
Now we have all the records for our species, we can tidy them as a data frame to make subsequent analysis a bit easier.
iceland_checklist <- map_dfr(iceland_records, tidy)
iceland_checklist
#> # A tibble: 889 × 28
#> modified bibliographicCitation genus taxonomicStatus kingdom phylum family
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2021-12-0… IPNI 2021. Published … Ante… Accepted Plantae Magno… Aster…
#> 2 2021-12-0… IPNI 2021. Published … Hier… Accepted Plantae Magno… Aster…
#> 3 2021-12-0… IPNI 2021. Published … Hier… Accepted Plantae Magno… Aster…
#> 4 2021-12-0… IPNI 2021. Published … Hier… Accepted Plantae Magno… Aster…
#> 5 2021-12-0… IPNI 2021. Published … Hier… Accepted Plantae Magno… Aster…
#> 6 2021-12-0… IPNI 2021. Published … Hier… Accepted Plantae Magno… Aster…
#> 7 2021-12-0… IPNI 2021. Published … Hier… Accepted Plantae Magno… Aster…
#> 8 2021-12-0… IPNI 2021. Published … Hier… Accepted Plantae Magno… Aster…
#> 9 2021-12-0… IPNI 2021. Published … Hier… Accepted Plantae Magno… Aster…
#> 10 2021-12-0… IPNI 2021. Published … Hier… Accepted Plantae Magno… Aster…
#> # … with 879 more rows, and 21 more variables: nomenclaturalCode <chr>,
#> # source <chr>, namePublishedInYear <int>, taxonRemarks <chr>,
#> # nomenclaturalStatus <chr>, synonym <lgl>, plantae <lgl>, fungi <lgl>,
#> # fqId <chr>, name <chr>, authors <chr>, species <chr>, rank <chr>,
#> # reference <chr>, classification <list>, distribution <list>,
#> # distributionEnvelope <list>, synonyms <list>, basionym <list>,
#> # childNameUsages <list>, basionymOf <list>
To narrow our species down, we’ll add an extra column to indicate if a species is native to Iceland or not. This will let us filter our data using that column.
I’ve done this below in a single, chained command by using the pipe (%>%
) operator from dplyr. I’ve also taken advantage of the rowwise
feature in the newer versions of dplyr.
check_native <- function(dist, country="Iceland") {
native_dist <- dist$natives[[1]]
country %in% native_dist$name
}
iceland_checklist <-
iceland_checklist %>%
rowwise() %>%
mutate(is_native=check_native(distribution)) %>%
ungroup() %>%
filter(is_native)
Now all we have to do is tidy up our data frame by removing any columns we don’t want anymore.
iceland_checklist <-
iceland_checklist %>%
select(fqId, name, authors, taxonomicStatus, plantae, fungi,
kingdom, phylum, family, genus, species)
iceland_checklist
#> # A tibble: 783 × 11
#> fqId name authors taxonomicStatus plantae fungi kingdom phylum family genus
#> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr> <chr> <chr> <chr>
#> 1 urn:… Ante… A.E.Po… Accepted TRUE FALSE Plantae Magno… Aster… Ante…
#> 2 urn:… Hier… Omang Accepted TRUE FALSE Plantae Magno… Aster… Hier…
#> 3 urn:… Hier… Ósk. Accepted TRUE FALSE Plantae Magno… Aster… Hier…
#> 4 urn:… Hier… (Dahls… Accepted TRUE FALSE Plantae Magno… Aster… Hier…
#> 5 urn:… Hier… (Omang… Accepted TRUE FALSE Plantae Magno… Aster… Hier…
#> 6 urn:… Hier… Omang Accepted TRUE FALSE Plantae Magno… Aster… Hier…
#> 7 urn:… Hier… Ósk. Accepted TRUE FALSE Plantae Magno… Aster… Hier…
#> 8 urn:… Hier… Omang Accepted TRUE FALSE Plantae Magno… Aster… Hier…
#> 9 urn:… Hier… Ósk. Accepted TRUE FALSE Plantae Magno… Aster… Hier…
#> 10 urn:… Hier… Omang Accepted TRUE FALSE Plantae Magno… Aster… Hier…
#> # … with 773 more rows, and 1 more variable: species <chr>
We can use our results from before to narrow the list down further to just species that are endemic to Iceland.
check_endemic <- function(dist, country="Iceland") {
native_dist <- dist$natives[[1]]
native <- country %in% native_dist$name
endemic <- length(native_dist$name) == 1
native & endemic
}
iceland_endemics <- map_dfr(iceland_records, tidy)
iceland_endemics <-
iceland_endemics %>%
rowwise() %>%
mutate(is_endemic=check_endemic(distribution)) %>%
ungroup() %>%
filter(is_endemic) %>%
select(fqId, name, authors, taxonomicStatus, plantae, fungi,
kingdom, phylum, family, genus, species)
How do the number of species in each list compare?