
Pivot a textgrid into wide format, respecting nested tiers
Source:R/pivot.R
pivot_textgrid_tiers.RdPivot a textgrid into wide format, respecting nested tiers
Arguments
- data
a textgrid dataframe created with
read_textgrid()- tiers
character vector of tiers to pivot into wide format. When
tiershas more than 1 element, the tiers are treated as nested. For example, iftiersisc("utterance", "word", "phone"), where"utterance"intervals contain"word"intervals which in turn contain"phone"intervals, the output will have one row per"phone"interval and includeutterance_*andword_*columns for the utterance and word intervals that contain each phone interval.tiersshould be ordered from broadest to narrowest (e.g,"word"preceding"phone").- join_cols
character vector of the columns that will uniquely identify a textgrid file. Defaults to
"file"because these columns have identical values for tiers read from the same textgrid file.
Value
a dataframe with just the intervals from tiers named in tiers
converted into a wide format. Columns are renamed so that the text column
is pivot into columns named after the tier names. For example, the text
column in a words tier is renamed to words. The xmax, xmin,
annotation_num, tier_num, tier_type are also prefixed with the tier
name. For example, the xmax column in a words tier is renamed to
words_xmax. An additional helper column xmid is added and prefixed
appropriately. See examples below.
Details
For the joining nested intervals, two intervals a and b are combined into
the same row if they match on the values in the join_cols columns and if
the a$xmin <= b$xmid and b$xmid <= a$xmax. That is, if the midpoint of
b is contained inside the interval a.
Examples
data <- example_textgrid(3) |>
read_textgrid()
data
#> # A tibble: 17 × 10
#> file tier_num tier_name tier_type tier_xmin tier_xmax xmin xmax text
#> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 nested-in… 1 words Interval… 0 1.86 0 0.419 ""
#> 2 nested-in… 1 words Interval… 0 1.86 0.419 0.761 "hug"
#> 3 nested-in… 1 words Interval… 0 1.86 0.761 0.854 ""
#> 4 nested-in… 1 words Interval… 0 1.86 0.854 1.44 "dad…
#> 5 nested-in… 1 words Interval… 0 1.86 1.44 1.86 ""
#> 6 nested-in… 2 phones Interval… 0 1.86 0 0.419 "sil"
#> 7 nested-in… 2 phones Interval… 0 1.86 0.419 0.524 "HH"
#> 8 nested-in… 2 phones Interval… 0 1.86 0.524 0.637 "AH1"
#> 9 nested-in… 2 phones Interval… 0 1.86 0.637 0.761 "G"
#> 10 nested-in… 2 phones Interval… 0 1.86 0.761 0.854 "sp"
#> 11 nested-in… 2 phones Interval… 0 1.86 0.854 1.05 "D"
#> 12 nested-in… 2 phones Interval… 0 1.86 1.05 1.23 "AE1"
#> 13 nested-in… 2 phones Interval… 0 1.86 1.23 1.32 "D"
#> 14 nested-in… 2 phones Interval… 0 1.86 1.32 1.44 "IY0"
#> 15 nested-in… 2 phones Interval… 0 1.86 1.44 1.79 "sp"
#> 16 nested-in… 2 phones Interval… 0 1.86 1.79 1.86 ""
#> 17 nested-in… 3 utterance Interval… 0 1.86 0 1.86 "hug…
#> # ℹ 1 more variable: annotation_num <int>
# With a single tier, we get just that tier with the columns prefixed with
# the tier_name
pivot_textgrid_tiers(data, "utterance")
#> # A tibble: 1 × 10
#> file utterance utterance_xmin utterance_xmax utterance_xmid
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 nested-intervals.TextG… hug daddy 0 1.86 0.932
#> # ℹ 5 more variables: utterance_annotation_num <int>, utterance_tier_num <int>,
#> # utterance_tier_type <chr>, tier_xmin <dbl>, tier_xmax <dbl>
pivot_textgrid_tiers(data, "words")
#> # A tibble: 5 × 10
#> file words words_xmin words_xmax words_xmid words_annotation_num
#> <chr> <chr> <dbl> <dbl> <dbl> <int>
#> 1 nested-intervals.… "" 0 0.419 0.210 1
#> 2 nested-intervals.… "hug" 0.419 0.761 0.590 2
#> 3 nested-intervals.… "" 0.761 0.854 0.808 3
#> 4 nested-intervals.… "dad… 0.854 1.44 1.15 4
#> 5 nested-intervals.… "" 1.44 1.86 1.65 5
#> # ℹ 4 more variables: words_tier_num <int>, words_tier_type <chr>,
#> # tier_xmin <dbl>, tier_xmax <dbl>
# With multiple tiers, intervals in one tier that contain intervals in
# another tier are combined into the same row.
a <- pivot_textgrid_tiers(data, c("utterance", "words"))
cols <- c(
"utterance", "utterance_xmin", "utterance_xmax",
"words", "words_xmin", "words_xmax"
)
a[cols]
#> # A tibble: 5 × 6
#> utterance utterance_xmin utterance_xmax words words_xmin words_xmax
#> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 hug daddy 0 1.86 "" 0 0.419
#> 2 hug daddy 0 1.86 "hug" 0.419 0.761
#> 3 hug daddy 0 1.86 "" 0.761 0.854
#> 4 hug daddy 0 1.86 "daddy" 0.854 1.44
#> 5 hug daddy 0 1.86 "" 1.44 1.86
a <- pivot_textgrid_tiers(data, c("utterance", "words", "phones"))
cols <- c(cols, "phones", "phones_xmin", "phones_xmax")
a[cols]
#> # A tibble: 11 × 9
#> utterance utterance_xmin utterance_xmax words words_xmin words_xmax phones
#> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr>
#> 1 hug daddy 0 1.86 "" 0 0.419 "sil"
#> 2 hug daddy 0 1.86 "hug" 0.419 0.761 "HH"
#> 3 hug daddy 0 1.86 "hug" 0.419 0.761 "AH1"
#> 4 hug daddy 0 1.86 "hug" 0.419 0.761 "G"
#> 5 hug daddy 0 1.86 "" 0.761 0.854 "sp"
#> 6 hug daddy 0 1.86 "daddy" 0.854 1.44 "D"
#> 7 hug daddy 0 1.86 "daddy" 0.854 1.44 "AE1"
#> 8 hug daddy 0 1.86 "daddy" 0.854 1.44 "D"
#> 9 hug daddy 0 1.86 "daddy" 0.854 1.44 "IY0"
#> 10 hug daddy 0 1.86 "" 1.44 1.86 "sp"
#> 11 hug daddy 0 1.86 "" 1.44 1.86 ""
#> # ℹ 2 more variables: phones_xmin <dbl>, phones_xmax <dbl>