Pivot a textgrid into wide format, respecting nested tiers

Usage

pivot_textgrid_tiers(data, tiers, join_cols = "file")

Arguments

data: a textgrid dataframe created with read_textgrid()
tiers: character vector of tiers to pivot into wide format. When tiers has more than 1 element, the tiers are treated as nested. For example, if tiers is c("utterance", "word", "phone"), where "utterance" intervals contain "word" intervals which in turn contain "phone" intervals, the output will have one row per "phone" interval and include utterance_* and word_* columns for the utterance and word intervals that contain each phone interval. tiers should be ordered from broadest to narrowest (e.g, "word" preceding "phone").
join_cols: character vector of the columns that will uniquely identify a textgrid file. Defaults to "file" because these columns have identical values for tiers read from the same textgrid file.

Value

a dataframe with just the intervals from tiers named in tiers converted into a wide format. Columns are renamed so that the text column is pivot into columns named after the tier names. For example, the text column in a words tier is renamed to words. The xmax, xmin, annotation_num, tier_num, tier_type are also prefixed with the tier name. For example, the xmax column in a words tier is renamed to words_xmax. An additional helper column xmid is added and prefixed appropriately. See examples below.

Details

For the joining nested intervals, two intervals a and b are combined into the same row if they match on the values in the join_cols columns and if the a$xmin <= b$xmid and b$xmid <= a$xmax. That is, if the midpoint of b is contained inside the interval a.

Examples

data <- example_textgrid(3) |>
  read_textgrid()
data
#> # A tibble: 17 × 10
#>    file       tier_num tier_name tier_type tier_xmin tier_xmax  xmin  xmax text 
#>    <chr>         <int> <chr>     <chr>         <dbl>     <dbl> <dbl> <dbl> <chr>
#>  1 nested-in…        1 words     Interval…         0      1.86 0     0.419 ""   
#>  2 nested-in…        1 words     Interval…         0      1.86 0.419 0.761 "hug"
#>  3 nested-in…        1 words     Interval…         0      1.86 0.761 0.854 ""   
#>  4 nested-in…        1 words     Interval…         0      1.86 0.854 1.44  "dad…
#>  5 nested-in…        1 words     Interval…         0      1.86 1.44  1.86  ""   
#>  6 nested-in…        2 phones    Interval…         0      1.86 0     0.419 "sil"
#>  7 nested-in…        2 phones    Interval…         0      1.86 0.419 0.524 "HH" 
#>  8 nested-in…        2 phones    Interval…         0      1.86 0.524 0.637 "AH1"
#>  9 nested-in…        2 phones    Interval…         0      1.86 0.637 0.761 "G"  
#> 10 nested-in…        2 phones    Interval…         0      1.86 0.761 0.854 "sp" 
#> 11 nested-in…        2 phones    Interval…         0      1.86 0.854 1.05  "D"  
#> 12 nested-in…        2 phones    Interval…         0      1.86 1.05  1.23  "AE1"
#> 13 nested-in…        2 phones    Interval…         0      1.86 1.23  1.32  "D"  
#> 14 nested-in…        2 phones    Interval…         0      1.86 1.32  1.44  "IY0"
#> 15 nested-in…        2 phones    Interval…         0      1.86 1.44  1.79  "sp" 
#> 16 nested-in…        2 phones    Interval…         0      1.86 1.79  1.86  ""   
#> 17 nested-in…        3 utterance Interval…         0      1.86 0     1.86  "hug…
#> # ℹ 1 more variable: annotation_num <int>

# With a single tier, we get just that tier with the columns prefixed with
# the tier_name
pivot_textgrid_tiers(data, "utterance")
#> # A tibble: 1 × 10
#>   file                    utterance utterance_xmin utterance_xmax utterance_xmid
#>   <chr>                   <chr>              <dbl>          <dbl>          <dbl>
#> 1 nested-intervals.TextG… hug daddy              0           1.86          0.932
#> # ℹ 5 more variables: utterance_annotation_num <int>, utterance_tier_num <int>,
#> #   utterance_tier_type <chr>, tier_xmin <dbl>, tier_xmax <dbl>
pivot_textgrid_tiers(data, "words")
#> # A tibble: 5 × 10
#>   file               words words_xmin words_xmax words_xmid words_annotation_num
#>   <chr>              <chr>      <dbl>      <dbl>      <dbl>                <int>
#> 1 nested-intervals.… ""         0          0.419      0.210                    1
#> 2 nested-intervals.… "hug"      0.419      0.761      0.590                    2
#> 3 nested-intervals.… ""         0.761      0.854      0.808                    3
#> 4 nested-intervals.… "dad…      0.854      1.44       1.15                     4
#> 5 nested-intervals.… ""         1.44       1.86       1.65                     5
#> # ℹ 4 more variables: words_tier_num <int>, words_tier_type <chr>,
#> #   tier_xmin <dbl>, tier_xmax <dbl>

# With multiple tiers, intervals in one tier that contain intervals in
# another tier are combined into the same row.
a <- pivot_textgrid_tiers(data, c("utterance", "words"))
cols <- c(
  "utterance", "utterance_xmin", "utterance_xmax",
  "words", "words_xmin", "words_xmax"
)
a[cols]
#> # A tibble: 5 × 6
#>   utterance utterance_xmin utterance_xmax words   words_xmin words_xmax
#>   <chr>              <dbl>          <dbl> <chr>        <dbl>      <dbl>
#> 1 hug daddy              0           1.86 ""           0          0.419
#> 2 hug daddy              0           1.86 "hug"        0.419      0.761
#> 3 hug daddy              0           1.86 ""           0.761      0.854
#> 4 hug daddy              0           1.86 "daddy"      0.854      1.44 
#> 5 hug daddy              0           1.86 ""           1.44       1.86 

a <- pivot_textgrid_tiers(data, c("utterance", "words", "phones"))
cols <- c(cols, "phones", "phones_xmin", "phones_xmax")
a[cols]
#> # A tibble: 11 × 9
#>    utterance utterance_xmin utterance_xmax words   words_xmin words_xmax phones
#>    <chr>              <dbl>          <dbl> <chr>        <dbl>      <dbl> <chr> 
#>  1 hug daddy              0           1.86 ""           0          0.419 "sil" 
#>  2 hug daddy              0           1.86 "hug"        0.419      0.761 "HH"  
#>  3 hug daddy              0           1.86 "hug"        0.419      0.761 "AH1" 
#>  4 hug daddy              0           1.86 "hug"        0.419      0.761 "G"   
#>  5 hug daddy              0           1.86 ""           0.761      0.854 "sp"  
#>  6 hug daddy              0           1.86 "daddy"      0.854      1.44  "D"   
#>  7 hug daddy              0           1.86 "daddy"      0.854      1.44  "AE1" 
#>  8 hug daddy              0           1.86 "daddy"      0.854      1.44  "D"   
#>  9 hug daddy              0           1.86 "daddy"      0.854      1.44  "IY0" 
#> 10 hug daddy              0           1.86 ""           1.44       1.86  "sp"  
#> 11 hug daddy              0           1.86 ""           1.44       1.86  ""    
#> # ℹ 2 more variables: phones_xmin <dbl>, phones_xmax <dbl>