Building a species checklist • kewr

A common task that Plants of the World Online (POWO) can be used for is to build a checklist of native species for a country.

In this demonstration, we will:

Request a list of all accepted species that occur in a country.
Get the native distribution of all those species.
Narrow our checklist down to native species.
Build another checklist for endemic species.

Setup

In addition to kewr, we’ll load:

dplyr to manipulate the data
tidyr to reshape data frames
purrr to map functions across items in a list
progress to add a progress bar
stringr to extract some data from strings

library(kewr)
library(dplyr)
library(tidyr)
library(purrr)
library(progress)
library(stringr)

1. Requesting a list of accepted species

We’ll get our list of accepted species for Iceland, using the POWO search API.

query <- list(distribution="Iceland")
filters <- c("accepted", "species")

iceland_species <- search_powo(query, filters=filters, limit=1000)

In total, we have this many accepted species in Iceland:

iceland_species$total
#> [1] 889

2. Get the native distribution of all the species

To get the native distribution for all our species, we need to use POWO’s lookup API for every single one.

First we’ll extract a list of IDs from our results, using the map function from purrr.

ids <- map(iceland_species$results, ~str_extract(.x$fqId, "[\\d\\-]+$"))

Then we need to make all of our requests. To make things easier, we’ll define a simple function that just accepts a species ID, and makes use of a progress bar to track our requests!

pb <- progress_bar$new(
  format="  requesting [:bar] :current/:total (:percent)",
  total=length(ids)
)

fcn <- function(id) {
  pb$tick()
  
  lookup_powo(id, distribution=TRUE)
}

iceland_records <- map(ids, fcn)

Now we have all the records for our species, we can tidy them as a data frame to make subsequent analysis a bit easier.

iceland_checklist <- map_dfr(iceland_records, tidy)
iceland_checklist
#> # A tibble: 889 × 28
#>    modified   bibliographicCitation  genus taxonomicStatus kingdom phylum family
#>    <chr>      <chr>                  <chr> <chr>           <chr>   <chr>  <chr> 
#>  1 2021-12-0… IPNI 2021. Published … Ante… Accepted        Plantae Magno… Aster…
#>  2 2021-12-0… IPNI 2021. Published … Hier… Accepted        Plantae Magno… Aster…
#>  3 2021-12-0… IPNI 2021. Published … Hier… Accepted        Plantae Magno… Aster…
#>  4 2021-12-0… IPNI 2021. Published … Hier… Accepted        Plantae Magno… Aster…
#>  5 2021-12-0… IPNI 2021. Published … Hier… Accepted        Plantae Magno… Aster…
#>  6 2021-12-0… IPNI 2021. Published … Hier… Accepted        Plantae Magno… Aster…
#>  7 2021-12-0… IPNI 2021. Published … Hier… Accepted        Plantae Magno… Aster…
#>  8 2021-12-0… IPNI 2021. Published … Hier… Accepted        Plantae Magno… Aster…
#>  9 2021-12-0… IPNI 2021. Published … Hier… Accepted        Plantae Magno… Aster…
#> 10 2021-12-0… IPNI 2021. Published … Hier… Accepted        Plantae Magno… Aster…
#> # … with 879 more rows, and 21 more variables: nomenclaturalCode <chr>,
#> #   source <chr>, namePublishedInYear <int>, taxonRemarks <chr>,
#> #   nomenclaturalStatus <chr>, synonym <lgl>, plantae <lgl>, fungi <lgl>,
#> #   fqId <chr>, name <chr>, authors <chr>, species <chr>, rank <chr>,
#> #   reference <chr>, classification <list>, distribution <list>,
#> #   distributionEnvelope <list>, synonyms <list>, basionym <list>,
#> #   childNameUsages <list>, basionymOf <list>

3. Narrow the checklist to native species

To narrow our species down, we’ll add an extra column to indicate if a species is native to Iceland or not. This will let us filter our data using that column.

I’ve done this below in a single, chained command by using the pipe (%>%) operator from dplyr. I’ve also taken advantage of the rowwise feature in the newer versions of dplyr.

check_native <- function(dist, country="Iceland") {
  native_dist <- dist$natives[[1]]
  
  country %in% native_dist$name
}

iceland_checklist <-
  iceland_checklist %>%
  rowwise() %>%
  mutate(is_native=check_native(distribution)) %>%
  ungroup() %>%
  filter(is_native)

Now all we have to do is tidy up our data frame by removing any columns we don’t want anymore.

iceland_checklist <-
  iceland_checklist %>%
  select(fqId, name, authors, taxonomicStatus, plantae, fungi,
         kingdom, phylum, family, genus, species)

iceland_checklist
#> # A tibble: 783 × 11
#>    fqId  name  authors taxonomicStatus plantae fungi kingdom phylum family genus
#>    <chr> <chr> <chr>   <chr>           <lgl>   <lgl> <chr>   <chr>  <chr>  <chr>
#>  1 urn:… Ante… A.E.Po… Accepted        TRUE    FALSE Plantae Magno… Aster… Ante…
#>  2 urn:… Hier… Omang   Accepted        TRUE    FALSE Plantae Magno… Aster… Hier…
#>  3 urn:… Hier… Ósk.    Accepted        TRUE    FALSE Plantae Magno… Aster… Hier…
#>  4 urn:… Hier… (Dahls… Accepted        TRUE    FALSE Plantae Magno… Aster… Hier…
#>  5 urn:… Hier… (Omang… Accepted        TRUE    FALSE Plantae Magno… Aster… Hier…
#>  6 urn:… Hier… Omang   Accepted        TRUE    FALSE Plantae Magno… Aster… Hier…
#>  7 urn:… Hier… Ósk.    Accepted        TRUE    FALSE Plantae Magno… Aster… Hier…
#>  8 urn:… Hier… Omang   Accepted        TRUE    FALSE Plantae Magno… Aster… Hier…
#>  9 urn:… Hier… Ósk.    Accepted        TRUE    FALSE Plantae Magno… Aster… Hier…
#> 10 urn:… Hier… Omang   Accepted        TRUE    FALSE Plantae Magno… Aster… Hier…
#> # … with 773 more rows, and 1 more variable: species <chr>

4. Build a checklist of endemic species

We can use our results from before to narrow the list down further to just species that are endemic to Iceland.

check_endemic <- function(dist, country="Iceland") {
  native_dist <- dist$natives[[1]]
  
  native <- country %in% native_dist$name
  endemic <- length(native_dist$name) == 1
  
  native & endemic
}

iceland_endemics <- map_dfr(iceland_records, tidy)

iceland_endemics <-
  iceland_endemics %>%
  rowwise() %>%
  mutate(is_endemic=check_endemic(distribution)) %>%
  ungroup() %>%
  filter(is_endemic) %>%
  select(fqId, name, authors, taxonomicStatus, plantae, fungi,
         kingdom, phylum, family, genus, species)

How do the number of species in each list compare?

paste("native species: ", nrow(iceland_checklist))
#> [1] "native species:  783"
paste("endemic species: ", nrow(iceland_endemics))
#> [1] "endemic species:  262"