The Kew Reconciliation Service (KRS) allows a user submit a taxon for matching against IPNI.
The reconciliation service is an Open Refine style API that allows matching of a single name against IPNI. The matching is done by a series of transformations configured to botanical names in IPNI. These transformations are detailed here, I think.
It appears that KRS is the service that sits behind KNMS. KNMS allows matching of batches of names in one request but does not allow matching to different parts of a name. If you have a set of names to match and just want to do simple matching, I’d use KNMS. But if you want to specify which parts of the names to match on, I’d use KRS.
library(kewr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
To use KRS, you can just submit a single name for matching.
match <- match_krs("Solanum sanchez-vegae S.Knapp")
match
#> <KRS match: 1 names matched to 'Solanum sanchez-vegae S.Knapp'>
#> List of 1
#> $ :List of 5
#> ..$ id : chr "77103635-1"
#> ..$ name : chr "Solanaceae Solanum sanchez-vegae S.Knapp"
#> ..$ type :List of 1
#> .. ..$ :List of 2
#> .. .. ..$ id : chr "/biology/organism_classification/scientific_name"
#> .. .. ..$ name: chr "Scientific name"
#> ..$ score: num 100
#> ..$ match: logi TRUE
This also works without the author string included:
match <- match_krs("Solanum sanchez-vegae")
match
#> <KRS match: 1 names matched to 'Solanum sanchez-vegae'>
#> List of 1
#> $ :List of 5
#> ..$ id : chr "77103635-1"
#> ..$ name : chr "Solanaceae Solanum sanchez-vegae S.Knapp"
#> ..$ type :List of 1
#> .. ..$ :List of 2
#> .. .. ..$ id : chr "/biology/organism_classification/scientific_name"
#> .. .. ..$ name: chr "Scientific name"
#> ..$ score: num 100
#> ..$ match: logi TRUE
The match results can be converted to a table for easier inspection.
tidy(match)
#> # A tibble: 1 × 5
#> id name type score match
#> <chr> <chr> <list> <dbl> <lgl>
#> 1 77103635-1 Solanaceae Solanum sanchez-vegae S.Knapp <tibble [1 × … 100 TRUE
The reconciliation service provides a specification for matching to different parts of a botanical name. This is described in detail here.
For example, if we want to match to the genus name Myrcia, we could submit a simple request like before.
match <- match_krs("Myrcia")
match
#> <KRS match: 2 names matched to 'Myrcia'>
#> List of 1
#> $ :List of 5
#> ..$ id : chr "30001220-2"
#> ..$ name : chr "Myrtaceae Myrcia DC."
#> ..$ type :List of 1
#> .. ..$ :List of 2
#> .. .. ..$ id : chr "/biology/organism_classification/scientific_name"
#> .. .. ..$ name: chr "Scientific name"
#> ..$ score: num 50
#> ..$ match: logi FALSE
But this has returned more than one result. We can be more specific by matching to the genus and the author.
match <- match_krs(list(genus="Myrcia", author="DC"))
match
#> <KRS match: 1 names matched to genus='Myrcia', author='DC'>
#> List of 1
#> $ :List of 5
#> ..$ id : chr "30001220-2"
#> ..$ name : chr "Myrtaceae Myrcia DC."
#> ..$ type :List of 1
#> .. ..$ :List of 2
#> .. .. ..$ id : chr "/biology/organism_classification/scientific_name"
#> .. .. ..$ name: chr "Scientific name"
#> ..$ score: num 100
#> ..$ match: logi TRUE
Which has narrowed it down more.
We can specify a match for every part of a name like this.
match <- match_krs(list(genus="Myrcia", species="magnolifolia", infra="latifolia",
author="Berg"))
match
#> <KRS match: 1 names matched to genus='Myrcia', species='magnolifolia', infra='latifolia', author='Berg'>
#> List of 1
#> $ :List of 5
#> ..$ id : chr "165832-2"
#> ..$ name : chr "Myrtaceae Myrcia magnoliifolia DC. var. latifolia O.Berg"
#> ..$ type :List of 1
#> .. ..$ :List of 2
#> .. .. ..$ id : chr "/biology/organism_classification/scientific_name"
#> .. .. ..$ name: chr "Scientific name"
#> ..$ score: num 100
#> ..$ match: logi TRUE
This match has worked even though there’s a minor misspelling of the specific epithet and the author string. Matching to the taxon name works by a set of pre-configured string transformations that catch some common mistakes in botanical names. The matching to author strings is also slightly fuzzy.
This matching also handles different taxonomic ranks using ordered epithets, where the highest rank is specified as epithet_1
down to epithet_3
.
match <- match_krs(list(epithet_1="Solanaceae"))
match
#> <KRS match: 2 names matched to epithet_1='Solanaceae'>
#> List of 1
#> $ :List of 5
#> ..$ id : chr "60437408-2"
#> ..$ name : chr "Solanaceae Adans."
#> ..$ type :List of 1
#> .. ..$ :List of 2
#> .. .. ..$ id : chr "/biology/organism_classification/scientific_name"
#> .. .. ..$ name: chr "Scientific name"
#> ..$ score: num 50
#> ..$ match: logi FALSE
This also works for infrageneric names.
match <- match_krs(list(epithet_1="Acacia", epithet_2="Aculeiferum", author="Vassal"))
match
#> <KRS match: 3 names matched to epithet_1='Acacia', epithet_2='Aculeiferum', author='Vassal'>
#> List of 1
#> $ :List of 5
#> ..$ id : chr "53905-3"
#> ..$ name : chr "Mimosaceae Acacia sect. Aculeiferum (Vassal) Vassal"
#> ..$ type :List of 1
#> .. ..$ :List of 2
#> .. .. ..$ id : chr "/biology/organism_classification/scientific_name"
#> .. .. ..$ name: chr "Scientific name"
#> ..$ score: num 33
#> ..$ match: logi FALSE
It should be noted that these last two examples give a score lower than 100, because they return more than one match.
If you want to do simple matching to more than one name, it might be easier to use KNMS.
If you want to match the individual parts of multiple names, you can apply the matching function to the rows of a data frame, using dplyr::rowwise
.
names <- tibble(
genus=c("Poa", "Myrcia", "Solanum"),
species=c("annua", "almasensis", "sanchez-vegae"),
author=c("L.", "NicLugh.", "S.Knapp")
)
matches <-
names %>%
rowwise() %>%
mutate(match=list(match_krs(list(genus=genus, species=species, author=author)))) %>%
mutate(match=list(tidy(match))) %>%
unnest(cols=c(match))
matches
#> # A tibble: 3 × 8
#> genus species author id name type score match
#> <chr> <chr> <chr> <chr> <chr> <list> <dbl> <lgl>
#> 1 Poa annua L. 320035-2 Poaceae Poa an… <tibble… 100 TRUE
#> 2 Myrcia almasensis NicLugh. 304073-2 Myrtaceae Myrc… <tibble… 100 TRUE
#> 3 Solanum sanchez-vegae S.Knapp 77103635-1 Solanaceae Sol… <tibble… 100 TRUE