<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://tjmahr.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://tjmahr.github.io/" rel="alternate" type="text/html" /><updated>2025-11-14T15:45:57-06:00</updated><id>https://tjmahr.github.io/feed.xml</id><title type="html">Higher Order Functions</title><subtitle>Blog and research notebook by an R programming enthusiast</subtitle><author><name>Tristan Mahr</name><email>tjmahrweb@gmail.com</email></author><entry><title type="html">readtextgrid now uses C++ (and ChatGPT helped)</title><link href="https://tjmahr.github.io/readtextgrid-cpp-llms/" rel="alternate" type="text/html" title="readtextgrid now uses C++ (and ChatGPT helped)" /><published>2025-11-14T00:00:00-06:00</published><updated>2025-11-14T00:00:00-06:00</updated><id>https://tjmahr.github.io/readtextgrid-cpp-llms</id><content type="html" xml:base="https://tjmahr.github.io/readtextgrid-cpp-llms/"><![CDATA[<p>In this post, I announce the release of version of 0.2.0 of the
<a href="https://cran.r-project.org/package=readtextgrid" title="readtextgrid on CRAN">readtextgrid</a> R package, describe the problem that
the package solves, and share some thoughts on LLM-assisted programming.</p>

<h2 id="textgrids-are-a-way-to-annotate-audio-data">Textgrids are a way to annotate audio data</h2>

<p><a href="https://www.fon.hum.uva.nl/praat/">Praat</a> is a program for speech and 
acoustic analysis that has been around for over 30 years. It includes a 
scripting language for manipulating and analyzing data and for creating
annotation workflows. Users can annotate intervals or points of time 
in a sound file using a <strong>textgrid</strong> object. Here is a screenshot of a
textgrid in Praat:</p>

<figure class="" style="max-width: 100%; display: block; margin: 2em auto;"><img src="/assets/images/2025-11-library-tidyverse.png" alt="Screenshot of a Praat editor window showing the amplitude wave form, spectrogram, and textgrid annotations. The audio file is of me saying *library tidyverse library brms*." /><figcaption>
      Screenshot of a Praat editor window.

    </figcaption></figure>

<p>There are three rows in the image, all three of them sharing the same
<em>x</em> axis (time).</p>

<ol>
  <li>Amplitude waveform, showing intensity over time</li>
  <li>Spectrogram, showing how the intensity (<em>color</em>) at frequencies (<em>y</em>) changes over
time. Red dots mark estimated formants (resonances) in the speech signal.</li>
  <li>Textgrid of text annotations for the recording</li>
</ol>

<p>A user can edit the textgrid by adding or adjusting boundaries and
adding annotations, and Praat will save this data to a <code class="language-plaintext highlighter-rouge">.TextGrid</code> file.</p>

<p>Other programs can produce <code class="language-plaintext highlighter-rouge">.TextGrid</code> files: the textgrid pictured here
is the result of forced alignment, specifically by the <a href="https://montreal-forced-aligner.readthedocs.io/en/latest/" title="Montreal Forced Aligner homepage">Montreal Forced
Aligner</a>. I told the program I said “library tidy verse library
b r m s”, and it looked up the pronunciations of those words and used an
acoustic model to estimate the time intervals of each word and each
speech sound. The aligner produced a <code class="language-plaintext highlighter-rouge">.TextGrid</code> file for this alignment.</p>

<p>These textgrids are the bread and butter of some of the research that we
do. For example, <a href="https://pubs.asha.org/doi/10.1044/2021_JSLHR-21-00206" title="Speech Development Between 30 and 119 Months in Typical Children II: Articulation Rate Growth Curves">our article</a> on speaking/articulation rate in children
involved over 30,000 single-sentence <code class="language-plaintext highlighter-rouge">.wav</code> files and <code class="language-plaintext highlighter-rouge">.TextGrid</code> files. We
used the alignments to determine the duration of time spent speaking, the
number of vowels in each utterance and hence the speaking rate in
syllables per second.</p>

<p>Reading these <code class="language-plaintext highlighter-rouge">.TextGrid</code> files into R was cumbersome, so I wrote and
released <a href="https://cran.r-project.org/package=readtextgrid" title="readtextgrid on CRAN">readtextgrid</a>, an R package built around one 
simple function:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readtextgrid</span><span class="p">)</span><span class="w">

</span><span class="n">path_tg</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"_R/data/mfa-out/library-tidyverse-library-brms.TextGrid"</span><span class="w"> 
</span><span class="n">data_tg</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read_textgrid</span><span class="p">(</span><span class="n">path_tg</span><span class="p">)</span><span class="w">

</span><span class="n">data_tg</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 43 × 10</span><span class="w">
</span><span class="c1">#&gt;    file       tier_num tier_name tier_type tier_xmin tier_xmax  xmin  xmax text </span><span class="w">
</span><span class="c1">#&gt;    &lt;chr&gt;         &lt;int&gt; &lt;chr&gt;     &lt;chr&gt;         &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;</span><span class="w">
</span><span class="c1">#&gt;  1 library-t…        1 words     Interval…         0      3.60  0     0.08 ""   </span><span class="w">
</span><span class="c1">#&gt;  2 library-t…        1 words     Interval…         0      3.60  0.08  0.74 "lib…</span><span class="w">
</span><span class="c1">#&gt;  3 library-t…        1 words     Interval…         0      3.60  0.74  1.12 "tid…</span><span class="w">
</span><span class="c1">#&gt;  4 library-t…        1 words     Interval…         0      3.60  1.12  1.58 "ver…</span><span class="w">
</span><span class="c1">#&gt;  5 library-t…        1 words     Interval…         0      3.60  1.58  1.74 ""   </span><span class="w">
</span><span class="c1">#&gt;  6 library-t…        1 words     Interval…         0      3.60  1.74  2.46 "lib…</span><span class="w">
</span><span class="c1">#&gt;  7 library-t…        1 words     Interval…         0      3.60  2.46  2.72 "b"  </span><span class="w">
</span><span class="c1">#&gt;  8 library-t…        1 words     Interval…         0      3.60  2.72  2.9  "r"  </span><span class="w">
</span><span class="c1">#&gt;  9 library-t…        1 words     Interval…         0      3.60  2.9   3.04 "m"  </span><span class="w">
</span><span class="c1">#&gt; 10 library-t…        1 words     Interval…         0      3.60  3.04  3.46 "s"  </span><span class="w">
</span><span class="c1">#&gt; # ℹ 33 more rows</span><span class="w">
</span><span class="c1">#&gt; # ℹ 1 more variable: annotation_num &lt;int&gt;</span><span class="w">
</span></code></pre></div></div>

<p>The function returns a tidy tibble with one row per annotation. The filename is
stored as a column too so that we can <code class="language-plaintext highlighter-rouge">lapply()</code> over a directory of files.
Annotations are numbered so that we can <code class="language-plaintext highlighter-rouge">group_by(text, annotation_num)</code> and 
have repeated words handled separately.</p>

<p>With this textgrid in R, I can measure speaking rate, for example:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data_tg</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">filter</span><span class="p">(</span><span class="n">tier_name</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"phones"</span><span class="p">,</span><span class="w"> </span><span class="n">text</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">summarise</span><span class="p">(</span><span class="w">
    </span><span class="n">speaking_time</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">xmax</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">xmin</span><span class="p">),</span><span class="w">
    </span><span class="c1"># vowels have numbers to indicate degree of stress</span><span class="w">
    </span><span class="n">num_vowels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">str_detect</span><span class="p">(</span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="s2">"\\d"</span><span class="p">))</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="w">
    </span><span class="n">syllables_per_sec</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">num_vowels</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">speaking_time</span><span class="w"> 
  </span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 1 × 3</span><span class="w">
</span><span class="c1">#&gt;   speaking_time num_vowels syllables_per_sec</span><span class="w">
</span><span class="c1">#&gt;           &lt;dbl&gt;      &lt;int&gt;             &lt;dbl&gt;</span><span class="w">
</span><span class="c1">#&gt; 1          3.22         13              4.04</span><span class="w">
</span></code></pre></div></div>

<p>Or annotate a spectrogram:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">path_spectrogram</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"_R/data/mfa/library-tidyverse-library-brms.csv"</span><span class="w">
</span><span class="n">data_spectrogram</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">readr</span><span class="o">::</span><span class="n">read_csv</span><span class="p">(</span><span class="n">path_spectrogram</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; Rows: 249366 Columns: 6</span><span class="w">
</span><span class="c1">#&gt; ── Column specification ────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt; Delimiter: ","</span><span class="w">
</span><span class="c1">#&gt; dbl (6): y, x, power, time, frequency, db</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ℹ Use `spec()` to retrieve the full column specification for this data.</span><span class="w">
</span><span class="c1">#&gt; ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.</span><span class="w">

</span><span class="n">data_spectrogram</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="w">
    </span><span class="c1"># reserve more of the color variation for intensities above 15 dB</span><span class="w">
    </span><span class="n">db</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="n">db</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="n">db</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">frequency</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_raster</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_text</span><span class="p">(</span><span class="w">
    </span><span class="n">aes</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">xmin</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">xmax</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w">
    </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data_tg</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">tier_name</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"words"</span><span class="p">),</span><span class="w">
    </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">6500</span><span class="p">,</span><span class="w">
    </span><span class="n">vjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="w">
  </span><span class="p">)</span><span class="w">  </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_text</span><span class="p">(</span><span class="w">
    </span><span class="n">aes</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">xmin</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">xmax</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w">
    </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data_tg</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">tier_name</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"phones"</span><span class="p">),</span><span class="w">
    </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">6100</span><span class="p">,</span><span class="w">
    </span><span class="n">vjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w">
    </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="w">
  </span><span class="p">)</span><span class="w">  </span><span class="o">+</span><span class="w">
  </span><span class="n">ylim</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="m">6600</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">scale_fill_gradient</span><span class="p">(</span><span class="n">low</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">high</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">guides</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"time [s]"</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"frequency [Hz]"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<div class="figure" style="text-align: center">
<img src="/figs/2025-11-14-readtextgrid-cpp-llms/unnamed-chunk-4-1.png" alt="Spectrogram of me saying 'library tidyverse library brms'" width="80%" />
<p class="caption">Spectrogram of me saying 'library tidyverse library brms'</p>
</div>

<p><img src="/assets/images/2021-03-read-textgrid-logo.png" alt="Package hex logo" class="align-right" style="max-width: 30%;" /></p>

<p>I released the first version of the package in 2020. This package,
notably for me, contains the first hex badge I ever made.</p>

<h2 id="my-original-textgrid-parser-and-its-problem">My original <code class="language-plaintext highlighter-rouge">.TextGrid</code> parser and its problem</h2>

<p>Here is what the contents of the <code class="language-plaintext highlighter-rouge">.TextGrid</code> file look like. It’s not the whole
file but enough to give a sense of the structure:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">path_tg</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">readLines</span><span class="p">()</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">head</span><span class="p">(</span><span class="m">26</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="nf">c</span><span class="p">(</span><span class="s2">"[... TRUNCATED ... ]"</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">writeLines</span><span class="p">()</span><span class="w">
</span><span class="c1">#&gt; File type = "ooTextFile"</span><span class="w">
</span><span class="c1">#&gt; Object class = "TextGrid"</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; xmin = 0 </span><span class="w">
</span><span class="c1">#&gt; xmax = 3.596009 </span><span class="w">
</span><span class="c1">#&gt; tiers? &lt;exists&gt; </span><span class="w">
</span><span class="c1">#&gt; size = 2 </span><span class="w">
</span><span class="c1">#&gt; item []: </span><span class="w">
</span><span class="c1">#&gt;     item [1]:</span><span class="w">
</span><span class="c1">#&gt;         class = "IntervalTier" </span><span class="w">
</span><span class="c1">#&gt;         name = "words" </span><span class="w">
</span><span class="c1">#&gt;         xmin = 0 </span><span class="w">
</span><span class="c1">#&gt;         xmax = 3.596009 </span><span class="w">
</span><span class="c1">#&gt;         intervals: size = 11 </span><span class="w">
</span><span class="c1">#&gt;         intervals [1]:</span><span class="w">
</span><span class="c1">#&gt;             xmin = 0.0 </span><span class="w">
</span><span class="c1">#&gt;             xmax = 0.08 </span><span class="w">
</span><span class="c1">#&gt;             text = "" </span><span class="w">
</span><span class="c1">#&gt;         intervals [2]:</span><span class="w">
</span><span class="c1">#&gt;             xmin = 0.08 </span><span class="w">
</span><span class="c1">#&gt;             xmax = 0.74 </span><span class="w">
</span><span class="c1">#&gt;             text = "library" </span><span class="w">
</span><span class="c1">#&gt;         intervals [3]:</span><span class="w">
</span><span class="c1">#&gt;             xmin = 0.74 </span><span class="w">
</span><span class="c1">#&gt;             xmax = 1.12 </span><span class="w">
</span><span class="c1">#&gt;             text = "tidy" </span><span class="w">
</span><span class="c1">#&gt; [... TRUNCATED ... ]</span><span class="w">
</span></code></pre></div></div>

<p>The first 7 lines provide some metadata about the time range of the
audio and the number of tiers (<code class="language-plaintext highlighter-rouge">size = 2</code>). The file then writes out each
tier (<code class="language-plaintext highlighter-rouge">item [n]</code> lines) by first giving the <code class="language-plaintext highlighter-rouge">class</code>, <code class="language-plaintext highlighter-rouge">name</code>, time
duration and number of marks or intervals. Each mark or interval is
enumerated with time values <code class="language-plaintext highlighter-rouge">xmin</code>, <code class="language-plaintext highlighter-rouge">xmax</code> and <code class="language-plaintext highlighter-rouge">text</code> values.</p>

<p>Because nearly everything here follows a <code class="language-plaintext highlighter-rouge">key = value</code> syntax and
because sections are split from each other very neatly with <code class="language-plaintext highlighter-rouge">item [n]:</code>
or <code class="language-plaintext highlighter-rouge">interval [n]:</code> lines, I was able to write <strong>a simple parser using
regular expressions</strong>: Split the file into <code class="language-plaintext highlighter-rouge">item [n]</code> sections, split
those into <code class="language-plaintext highlighter-rouge">interval [n]</code> sections, and extract key-value pairs.</p>

<p>This easy approach came with limitations. First, the <a href="https://www.fon.hum.uva.nl/praat/manual/TextGrid_file_formats.html" title="TextGrid file formats">TextGrid
specification</a> was much more flexible. For example, Praat
also provides much less verbose “short” format textgrids which are like
a stream of time and text annotations:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">path_tg_short</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"_R/data/mfa-out/library-tidyverse-library-brms-short.TextGrid"</span><span class="w">
</span><span class="n">path_tg_short</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">readLines</span><span class="p">()</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">head</span><span class="p">(</span><span class="m">26</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="nf">c</span><span class="p">(</span><span class="s2">"[... TRUNCATED ... ]"</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">writeLines</span><span class="p">()</span><span class="w">
</span><span class="c1">#&gt; File type = "ooTextFile"</span><span class="w">
</span><span class="c1">#&gt; Object class = "TextGrid"</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; 0</span><span class="w">
</span><span class="c1">#&gt; 3.596009</span><span class="w">
</span><span class="c1">#&gt; &lt;exists&gt;</span><span class="w">
</span><span class="c1">#&gt; 2</span><span class="w">
</span><span class="c1">#&gt; "IntervalTier"</span><span class="w">
</span><span class="c1">#&gt; "words"</span><span class="w">
</span><span class="c1">#&gt; 0</span><span class="w">
</span><span class="c1">#&gt; 3.596009</span><span class="w">
</span><span class="c1">#&gt; 11</span><span class="w">
</span><span class="c1">#&gt; 0</span><span class="w">
</span><span class="c1">#&gt; 0.08</span><span class="w">
</span><span class="c1">#&gt; ""</span><span class="w">
</span><span class="c1">#&gt; 0.08</span><span class="w">
</span><span class="c1">#&gt; 0.74</span><span class="w">
</span><span class="c1">#&gt; "library"</span><span class="w">
</span><span class="c1">#&gt; 0.74</span><span class="w">
</span><span class="c1">#&gt; 1.12</span><span class="w">
</span><span class="c1">#&gt; "tidy"</span><span class="w">
</span><span class="c1">#&gt; 1.12</span><span class="w">
</span><span class="c1">#&gt; 1.58</span><span class="w">
</span><span class="c1">#&gt; "verse"</span><span class="w">
</span><span class="c1">#&gt; 1.58</span><span class="w">
</span><span class="c1">#&gt; 1.74</span><span class="w">
</span><span class="c1">#&gt; [... TRUNCATED ... ]</span><span class="w">
</span></code></pre></div></div>

<p>Everything is in the same order, but the annotations are gone. It turns
out that all of the helpful labels from before were actually <em>comments</em>
that get ignored. Everything that isn’t a number or a string in
double-quotes (or a <code class="language-plaintext highlighter-rouge">&lt;flag&gt;</code>) is a comment.</p>

<p>There are also other quirks (<code class="language-plaintext highlighter-rouge">"</code> escapement, <code class="language-plaintext highlighter-rouge">!</code> comments, deviations
between the Praat description of the format and the behavior of
<code class="language-plaintext highlighter-rouge">praat.exe</code>). I have them documented as a kind of <a href="https://www.tjmahr.com/readtextgrid/articles/textgrid-specification.html" title="Textgrid Specification article">unofficial
specification</a> in an article on the package website.</p>

<p>But my original regular-expression based parser could only handle the
verbose long-format textgrids. I knew this. I put this in <a href="https://github.com/tjmahr/readtextgrid/issues/4" title="ugh support textgrid grammar #4">a GitHub issue
in 2020</a>. And this compatibility oversight was never a
problem for me until I tried a new phonetics tool that defaulted to
saving the textgrids in the short format. Now, readtextgrid could not
in fact “read textgrid”.</p>

<h2 id="the-new-r-based-tokenizer">The new R-based tokenizer</h2>

<p><a href="https://jofrhwld.github.io/" title="Josef Fruehwald's homepage">Josef Fruehwald</a>, a linguist with <a href="https://jofrhwld.github.io/software/" title="Josef Fruehwald's software page">lots of
acoustics/phonetics software</a>, submitted a pull request to implement a
proper parser that I eventually rewrote to handle various edge cases and
undocumented behavior in the <code class="language-plaintext highlighter-rouge">.TextGrid</code> specification. I made an
<a href="https://github.com/tjmahr/readtextgrid/blob/ed971e48ab3ea33e3efe0ba59f45ae3e41d07a32/tests/testthat/test-data/hard-to-parse.TextGrid" title="hard-to-parse.TextGrid on GitHub">adversarial <code class="language-plaintext highlighter-rouge">.TextGrid</code> file</a> 😈 that could still be opened
by <code class="language-plaintext highlighter-rouge">praat.exe</code> but was meant to be difficult to parse. This was a fun
development loop: Make the file harder, update the parser to handle the
new feature, repeat.</p>

<p>Because the essential data in the file are just string tokens and
number tokens, I needed to make a <a href="https://en.wikipedia.org/wiki/Lexical_analysis" title="Wikipedia page on Lexical Analysis">tokenizer</a>: a piece
of software that reads in characters, groups them into tokens, and
figures out what kind of data the token represents. The initial R-based
version of the tokenizer did the following:</p>

<ul>
  <li>Read the file character by character</li>
  <li>Gather the characters for the current token and keep them when they
form a valid string or number</li>
  <li>Shift between three states (<code class="language-plaintext highlighter-rouge">in_string</code>, <code class="language-plaintext highlighter-rouge">in_strong_comment</code> for <code class="language-plaintext highlighter-rouge">!
comments</code>, <code class="language-plaintext highlighter-rouge">in_escaped_quote</code>)</li>
</ul>

<p>These three states determine how we interpret spaces, newlines, and <code class="language-plaintext highlighter-rouge">"</code>
characters. For example, a newline ends a <code class="language-plaintext highlighter-rouge">! comment</code> but a newline can
appear in a string so it doesn’t end a string. Moreover, in a comment,
<code class="language-plaintext highlighter-rouge">"</code> is ignored, but in a string, it might be the end of the string or an
escaped quote (doubled double-quotes are used for <code class="language-plaintext highlighter-rouge">"</code> characters: the
string <code class="language-plaintext highlighter-rouge">"""a"""</code> has the text <code class="language-plaintext highlighter-rouge">"a"</code>).</p>

<p>But at a high level, the code was simple:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nf">seq_along</span><span class="p">(</span><span class="n">all_char</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">

  </span><span class="c1"># { ... examine current character ... }</span><span class="w">
  
  </span><span class="c1"># { ... handle comment state ... }</span><span class="w">
  
  </span><span class="c1"># { ... collect token if we see whitespace and are not in a string  ... }</span><span class="w">
  
  </span><span class="c1"># { ... handle string and escaped quote state ... }</span><span class="w">

</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The new character-by-character parser worked 🎉. It had conquered
the adversarial example file, but there was still one more problem. It was
slower than the original regular-expression parser!</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tg_lines</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">readLines</span><span class="p">(</span><span class="n">path_tg</span><span class="p">)</span><span class="w">

</span><span class="n">bench</span><span class="o">::</span><span class="n">mark</span><span class="p">(</span><span class="w">
  </span><span class="n">legacy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">readtextgrid</span><span class="o">:::</span><span class="n">legacy_read_textgrid_lines</span><span class="p">(</span><span class="n">tg_lines</span><span class="p">),</span><span class="w">
  </span><span class="n">new_r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">readtextgrid</span><span class="o">:::</span><span class="n">r_read_textgrid_lines</span><span class="p">(</span><span class="n">tg_lines</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 2 × 6</span><span class="w">
</span><span class="c1">#&gt;   expression      min   median `itr/sec` mem_alloc `gc/sec`</span><span class="w">
</span><span class="c1">#&gt;   &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:tm&gt;     &lt;dbl&gt; &lt;bch:byt&gt;    &lt;dbl&gt;</span><span class="w">
</span><span class="c1">#&gt; 1 legacy       75.1ms   76.1ms      13.1    6.81MB     19.7</span><span class="w">
</span><span class="c1">#&gt; 2 new_r        70.7ms   72.3ms      13.7  590.88KB     10.3</span><span class="w">
</span></code></pre></div></div>

<p>At this point, I asked ChatGPT for tips on speeding up the tokenizer.</p>

<h2 id="some-thoughts-about-llms">Some thoughts about LLMs</h2>

<blockquote class="bluesky-embed" data-bluesky-uri="at://did:plc:4t2ziwnnescprzorvmrfduey/app.bsky.feed.post/3ltdjbaktss2s" data-bluesky-cid="bafyreihipumdkez3ifgsyt3eqbgelzuzljvllfwj4lbnfju55vvoxqc5gu" data-bluesky-embed-color-mode="system"><p lang="en">the thing about (the current) chatgpt is that it writes like a fucking idiot with excellent grammar</p>&mdash; sarah jeong (<a href="https://bsky.app/profile/did:plc:4t2ziwnnescprzorvmrfduey?ref_src=embed">@sarahjeong.bsky.social</a>) <a href="https://bsky.app/profile/did:plc:4t2ziwnnescprzorvmrfduey/post/3ltdjbaktss2s?ref_src=embed">July 6, 2025 at 7:20 PM</a></blockquote>

<p>Now, let’s talk about large language models (LLMs). There’s a lot I
could say about them.<sup id="fnref:fn-etc" role="doc-noteref"><a href="#fn:fn-etc" class="footnote" rel="footnote">1</a></sup> As a language scientist, I’ll start here: They
know <em>syntax</em>. They know which words go together and can generate very
plausible sequences of words. They do not know <em>semantics</em> however. They
don’t have any firsthand knowledge or experience about what those
sequences express. They can’t introspect about that knowledge or
experience to see whether things “make sense”.<sup id="fnref:fn-reasoning" role="doc-noteref"><a href="#fn:fn-reasoning" class="footnote" rel="footnote">2</a></sup> They
<a href="https://www.andrewheiss.com/ai/#text" title="Andrew Heiss's AI position statement">don’t care</a> about the truth or falsity of statements.
They just make plausible sequences of words.</p>

<p>Now, it turns out that if you learn how to make sequences of words from an
Internet-sized corpus of text, then a lot of the plausible sequences you make
will turn out to be true. If you read 10,000 cookbooks, you could probably
provide a very classic recipe for scrambled eggs. But because you don’t know
about sarcasm or can’t draw on your own experience of trying to not ingest
non-food chemicals, you might suggest putting <a href="https://www.bbc.com/news/articles/cd11gzejgz4o" title="Glue pizza and eat rocks: Google AI search errors go viral">glue on a pizza</a>.</p>

<p>So, as we use an LLM, we need to ask ourselves how much we care about
the truth or care about knowing or understanding things. That may sound
like a glib or weird statement: Shouldn’t we always care about the
truth? Well, sometimes we don’t. We just want some syntax; we want
<a href="https://usethis.r-lib.org/" title="usethis R package">boilerplate or templates</a> to fill out.<sup id="fnref:fn-tradition" role="doc-noteref"><a href="#fn:fn-tradition" class="footnote" rel="footnote">3</a></sup>
For example, I can ask an LLM to “write some unit tests for a function
<code class="language-plaintext highlighter-rouge">round_to(xs, unit)</code> that rounds a vector of values to an arbitrary
unit” and receive:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">test_that</span><span class="p">(</span><span class="s2">"round_to() rounds to nearest multiple of unit"</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">expect_equal</span><span class="p">(</span><span class="n">round_to</span><span class="p">(</span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w"> </span><span class="m">6</span><span class="p">)</span><span class="w">
  </span><span class="n">expect_equal</span><span class="p">(</span><span class="n">round_to</span><span class="p">(</span><span class="m">4.9</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w"> </span><span class="m">4</span><span class="p">)</span><span class="w">
  </span><span class="n">expect_equal</span><span class="p">(</span><span class="n">round_to</span><span class="p">(</span><span class="m">5.1</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w"> </span><span class="m">6</span><span class="p">)</span><span class="w">
  </span><span class="n">expect_equal</span><span class="p">(</span><span class="n">round_to</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">),</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">))</span><span class="w">
</span><span class="p">})</span><span class="w">
</span></code></pre></div></div>

<p>These tests are not useful <em>until</em> I plug in the correct values for the expected 
output.</p>

<p>In other cases, we don’t quite care about truth or comprehension because we can
get external corroboration.<sup id="fnref:fn-wood" role="doc-noteref"><a href="#fn:fn-wood" class="footnote" rel="footnote">4</a></sup> When I ask ChatGPT for an obfuscated R script to
make Pac-Man in ggplot2, I can run the code to see if it works without trying to
decipher its syntax:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">()</span><span class="o">+</span><span class="w">
</span><span class="n">geom_polygon</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">data</span><span class="o">=</span><span class="n">within</span><span class="p">(</span><span class="n">data.frame</span><span class="p">(</span><span class="n">t</span><span class="p">(</span><span class="n">sapply</span><span class="p">(</span><span class="n">seq</span><span class="p">(</span><span class="n">a</span><span class="o">&lt;-</span><span class="nb">pi</span><span class="o">/</span><span class="m">9</span><span class="p">,</span><span class="m">2</span><span class="o">*</span><span class="nb">pi</span><span class="o">-</span><span class="n">a</span><span class="p">,</span><span class="n">l</span><span class="o">&lt;</span><span class="m">-4e2</span><span class="p">),</span><span class="w">
</span><span class="k">function</span><span class="p">(</span><span class="n">t</span><span class="p">)</span><span class="nf">c</span><span class="p">(</span><span class="nf">cos</span><span class="p">(</span><span class="n">t</span><span class="p">),</span><span class="nf">sin</span><span class="p">(</span><span class="n">t</span><span class="p">))))),</span><span class="w">
</span><span class="p">{</span><span class="n">rbind</span><span class="p">(</span><span class="n">.</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="nf">cos</span><span class="p">(</span><span class="n">a</span><span class="p">),</span><span class="nf">sin</span><span class="p">(</span><span class="n">a</span><span class="p">))</span><span class="o">-&gt;</span><span class="n">df</span><span class="p">;</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[,</span><span class="m">1</span><span class="p">];</span><span class="n">y</span><span class="o">=</span><span class="n">df</span><span class="p">[,</span><span class="m">2</span><span class="p">]}),</span><span class="w">
</span><span class="n">fill</span><span class="o">=</span><span class="s2">"#FF0"</span><span class="p">,</span><span class="n">col</span><span class="o">=</span><span class="m">1</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">annotate</span><span class="p">(</span><span class="s2">"point"</span><span class="p">,</span><span class="n">x</span><span class="o">=</span><span class="m">.35</span><span class="p">,</span><span class="n">y</span><span class="o">=</span><span class="m">.5</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="m">3</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">annotate</span><span class="p">(</span><span class="s2">"point"</span><span class="p">,</span><span class="n">x</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">1.4</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">2.6</span><span class="p">),</span><span class="n">y</span><span class="o">=</span><span class="m">0</span><span class="p">,</span><span class="n">size</span><span class="o">=</span><span class="m">3</span><span class="p">)</span><span class="o">+</span><span class="w">
</span><span class="n">coord_equal</span><span class="p">(</span><span class="n">xlim</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">-1.2</span><span class="p">,</span><span class="m">3</span><span class="p">),</span><span class="n">ylim</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">-1.2</span><span class="p">,</span><span class="m">1.2</span><span class="p">))</span><span class="o">+</span><span class="w">
</span><span class="n">theme_void</span><span class="p">()</span><span class="w">
</span><span class="c1">#&gt; Error in eval(substitute(expr), e): object '.' not found</span><span class="w">
</span></code></pre></div></div>

<p>(Strangely, this is the case where a dot 
<a href="https://www.youtube.com/watch?v=NxSj2T2vx7M">kills</a> Pac-Man.)</p>

<h3 id="vibes-are-semantic-vapor">Vibes are semantic vapor</h3>

<p>When we abandon caring about truth or understanding things and just rely
on external corroboration, we are in the realm of <a href="https://en.wikipedia.org/wiki/Vibe_coding">vibe
coding</a>. I like this term
because of its insouciant honesty: <em>Truth? Comprehension? We’re just
going off the vibes.</em> It would be a great help if we used the word more
liberally. A YouTube video called “A vibe history of NES videogames”? No
thanks.<sup id="fnref:fn-abadox" role="doc-noteref"><a href="#fn:fn-abadox" class="footnote" rel="footnote">5</a></sup></p>

<p>If we lean into vibes, we need to get better at external corroboration
and know our programming languages even better. R is a flexible
programming language and it does some things that “help” the user that
can lead to silent bugs. Famously, function arguments and <code class="language-plaintext highlighter-rouge">$</code> will match
partial names.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Look at the "Call:" in the output</span><span class="w">
</span><span class="n">lm</span><span class="p">(</span><span class="n">f</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">hp</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">cyl</span><span class="p">,</span><span class="w"> </span><span class="n">d</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mtcars</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; Call:</span><span class="w">
</span><span class="c1">#&gt; lm(formula = hp ~ cyl, data = mtcars)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; Coefficients:</span><span class="w">
</span><span class="c1">#&gt; (Intercept)          cyl  </span><span class="w">
</span><span class="c1">#&gt;      -51.05        31.96</span><span class="w">

</span><span class="c1"># There is no `m` column</span><span class="w">
</span><span class="nf">all</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">m</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">mtcars</span><span class="o">$</span><span class="n">mpg</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] TRUE</span><span class="w">
</span></code></pre></div></div>

<p>A student I work with was trying to compute sensitivity and specificity on 
weighted data. The LLM suggested the following:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Make some weighted data using frequencies</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">pROC</span><span class="o">::</span><span class="n">aSAH</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">count</span><span class="p">(</span><span class="n">outcome</span><span class="p">,</span><span class="w"> </span><span class="n">age</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"weight"</span><span class="p">)</span><span class="w">

</span><span class="c1"># What the LLM did:</span><span class="w">
</span><span class="n">pROC</span><span class="o">::</span><span class="n">roc</span><span class="p">(</span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="s2">"outcome"</span><span class="p">,</span><span class="w"> </span><span class="s2">"age"</span><span class="p">,</span><span class="w"> </span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="o">$</span><span class="n">weight</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; Setting levels: control = Good, case = Poor</span><span class="w">
</span><span class="c1">#&gt; Setting direction: controls &lt; cases</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; Call:</span><span class="w">
</span><span class="c1">#&gt; roc.data.frame(data = data, response = "outcome", predictor = "age",     weights = data$weight)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; Data: age in 44 controls (outcome Good) &lt; 30 cases (outcome Poor).</span><span class="w">
</span><span class="c1">#&gt; Area under the curve: 0.5947</span><span class="w">
</span></code></pre></div></div>

<p>This code runs without any problems. It’s wrong, but it runs. The problem 
is that <a href="https://rdrr.io/pkg/pROC/man/roc.html" title="Documentation for pROC::roc()"><code class="language-plaintext highlighter-rouge">pROC::roc(...)</code></a> supports variadic arguments (<code class="language-plaintext highlighter-rouge">...</code>):</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Note the dots</span><span class="w">
</span><span class="n">pROC</span><span class="o">:::</span><span class="n">roc</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">formals</span><span class="p">()</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">str</span><span class="p">()</span><span class="w">
</span><span class="c1">#&gt; Dotted pair list of 1</span><span class="w">
</span><span class="c1">#&gt;  $ ...: symbol</span><span class="w">
</span><span class="n">pROC</span><span class="o">:::</span><span class="n">roc.data.frame</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">formals</span><span class="p">()</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">str</span><span class="p">()</span><span class="w">
</span><span class="c1">#&gt; Dotted pair list of 5</span><span class="w">
</span><span class="c1">#&gt;  $ data     : symbol </span><span class="w">
</span><span class="c1">#&gt;  $ response : symbol </span><span class="w">
</span><span class="c1">#&gt;  $ predictor: symbol </span><span class="w">
</span><span class="c1">#&gt;  $ ret      : language c("roc", "coords", "all_coords")</span><span class="w">
</span><span class="c1">#&gt;  $ ...      : symbol</span><span class="w">
</span></code></pre></div></div>

<p>Those <code class="language-plaintext highlighter-rouge">...</code> are for forwarding arguments to other functions that <code class="language-plaintext highlighter-rouge">roc()</code> 
might call internally. Unfortunately,
functions by default don’t check the contents of the <code class="language-plaintext highlighter-rouge">...</code> to see if they
have unsupported arguments. Thus, bad arguments are ignored silently:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># method and weights are not real arguments</span><span class="w">
</span><span class="n">pROC</span><span class="o">::</span><span class="n">roc</span><span class="p">(</span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="s2">"outcome"</span><span class="p">,</span><span class="w"> </span><span class="s2">"age"</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fake</span><span class="p">,</span><span class="w"> </span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fake</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; Setting levels: control = Good, case = Poor</span><span class="w">
</span><span class="c1">#&gt; Setting direction: controls &lt; cases</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; Call:</span><span class="w">
</span><span class="c1">#&gt; roc.data.frame(data = data, response = "outcome", predictor = "age",     method = fake, weights = fake)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; Data: age in 44 controls (outcome Good) &lt; 30 cases (outcome Poor).</span><span class="w">
</span><span class="c1">#&gt; Area under the curve: 0.5947</span><span class="w">
</span></code></pre></div></div>

<p>The LLM hallucinated a <code class="language-plaintext highlighter-rouge">weights</code> argument, which is a plausible
argument,<sup id="fnref:fn_weights" role="doc-noteref"><a href="#fn:fn_weights" class="footnote" rel="footnote">6</a></sup> and the <code class="language-plaintext highlighter-rouge">...</code> syntax behavior swallowed it up like
Pac-Man. It always comes back to Pac-Man. I ended up writing <a href="https://www.tjmahr.com/wisclabmisc/reference/compute_sens_spec_from_ecdf.html">a
function</a>
that could compute sens and spec on weighted data.</p>

<blockquote class="bluesky-embed" data-bluesky-uri="at://did:plc:5wm25vgenhgut3iqfjf4ozj5/app.bsky.feed.post/3m5kfxmdkx22a" data-bluesky-cid="bafyreidioowqsfnvvsvesncnb5s6ukkyhxqadinceqnax6ogjt2cj3635i" data-bluesky-embed-color-mode="system"><p lang="en">Unfortunately the space of LLM code errors and the space of human errors are not the same, making hard-won code review instincts misfire</p>&mdash; Eugene Vinitsky 🍒 (<a href="https://bsky.app/profile/did:plc:5wm25vgenhgut3iqfjf4ozj5?ref_src=embed">@eugenevinitsky.bsky.social</a>) <a href="https://bsky.app/profile/did:plc:5wm25vgenhgut3iqfjf4ozj5/post/3m5kfxmdkx22a?ref_src=embed">November 13, 2025 at 6:21 PM</a></blockquote>

<p>As users, we can guard against the first two silent problems with
<code class="language-plaintext highlighter-rouge">options(warnPartialMatchArgs, warnPartialMatchDollar)</code>, and as
developers, we can prevent the second problem with
<a href="https://rlang.r-lib.org/reference/check_dots_used.html" title="Documentation for check_dots_used()"><code class="language-plaintext highlighter-rouge">rlang::check_dots_used()</code></a> and friends. But like I said
at the outset, external corroboration requires us to know <em>even more</em>
about the language in order to vibe safely.</p>

<h3 id="syntax-and-semantics-again">Syntax and semantics, again</h3>

<p>In this mini-position statement on LLM assistance, the two principles I
am trying to develop are:</p>

<ul>
  <li>LLMs know text distributions very well. Use them to generate starter syntax.</li>
  <li>LLMs don’t understand anything. It’s all bullshit and vibes.</li>
</ul>

<p>If we think of LLMs as syntax generators, we can imagine
some pretty good use cases:</p>

<ul>
  <li>Write unit tests for a function that does…</li>
  <li>Set up Roxygen docs for this function</li>
  <li>Create a function to simulate data for a model of <code class="language-plaintext highlighter-rouge">rt ~ group + (1 | id)</code></li>
  <li>Write a Stan program to fit this model. (<a href="https://chatgpt.com/share/691763d2-87a4-8005-9342-bee0d0222348">Mind your priors.</a>)</li>
  <li><em>Spoiler alert</em>: Convert this R loop into C++ code</li>
</ul>

<p>Still, we need to be mindful of the semantic limitations and skeptical
of the output. We should audit the results and make sure we comprehend
them, or admit upfront that this code is running on vibes. In either case, 
we also need to be vigilant about bugs that could happen silently or bugs
that a machine might make but a human wouldn’t (hallucinations).</p>

<p>One thing I worry about with LLM reliance is skill atrophy. If I keep
using this bot as a crutch, then some of my skills will get weaker. Sam Mehr
has a take I quite like that puts this concern upfront. LLMs are
fine for code we don’t feel bothered to learn:</p>

<blockquote class="bluesky-embed" data-bluesky-uri="at://did:plc:v6qwaqo24zfrq5fj7ceibxqk/app.bsky.feed.post/3lp42tel3lk2e" data-bluesky-cid="bafyreigsxov4rjblnirmd7samors5k6ygw63lfrenx35g5iwczrs6eawxi" data-bluesky-embed-color-mode="system"><p lang="en">re AI, a PhD student mentioned sheepishly that they used chatgpt for advice on coding up an unusual element in javascript. Almost apologized
<br /><br />
I&#x27;m like no no no you&#x27;re a psych PhD, not CS, this is exactly what LLMs are for! Doing a so-so job at things you just need done &amp; don&#x27;t care about learning!</p>&mdash; samuel mehr (<a href="https://bsky.app/profile/did:plc:v6qwaqo24zfrq5fj7ceibxqk?ref_src=embed">@mehr.nz</a>) <a href="https://bsky.app/profile/did:plc:v6qwaqo24zfrq5fj7ceibxqk/post/3lp42tel3lk2e?ref_src=embed">May 13, 2025 at 10:32 PM</a></blockquote>

<p>I quite like programming and want to learn. I like to read the release
notes, dig into the documentation and
<a href="https://bsky.app/profile/tjmahr.com/post/3m5hjc3ct6s26">experiment</a>
with new modeling features. At the same time, sometimes I just want a
bash script to unzip all <code class="language-plaintext highlighter-rouge">.zip</code> files in a directory. Time was, we would
find something from Stack Overflow to adapt for that problem. Now, we
ask ChatGPT for the code, look it over quick, test it and move on. That
seems fine. A metacognitive awareness about what is worth 
learning and what problems are worth solving in a slower methodical way 
is very useful for an LLM user.</p>

<p>Finally, to be clear—I can’t believe I need to make this
disclaimer—we should always care about truth and accuracy when we
write prose and publish it and put our name on it. Vibes are not
scientific or scholarly. When I see emails or code documentation with
immaculate formatting and perfect language, my bullshit sensor goes off
and I worry that I need to read extra carefully because a smooth-talking
robot is trying to pull a fast one on me. I don’t use LLMs for writing
except for proofreading or requests for nitpicking. I have an
instruction in ChatGPT that says not to revise anything I write unless
it sneaks Magic: The Gathering card names into the output. (Alas, it
generally ignores that <em>diabolic edict</em> of mine.)</p>

<h2 id="ai-assistance-in-readtextgrid">AI assistance in readtextgrid</h2>

<p>Because the old parser was outperforming the newer, more robust parser, I
asked ChatGPT for ways to make my textgrid parsing faster. For example,
one version of the loop collected characters in a vector and then
<code class="language-plaintext highlighter-rouge">paste0()</code>-ed them together. ChatGPT suggested that because we are
iterating over character indices we instead use 
<code class="language-plaintext highlighter-rouge">substring()</code> to extract tokens from the text. That worked, and it ran
faster, until it failed a unit test on a character wearing a diacritic. 
After a few rounds of trying to improve the loop, I asked quite
bluntly: “How can we move the tokenize loop into Rcpp or cpp11 with the
viewest [<em>sic</em>] headaches possible”.</p>

<p>And it provided some very legible cpp11 code. I had never used C++ with
R before. To get started, I had to call on
<a href="https://usethis.r-lib.org/reference/use_cpp11.html"><code class="language-plaintext highlighter-rouge">usethis::use_cpp11()</code></a>
to make the necessary boilerplate—you just need syntax sometimes—and
I had to troubleshoot the first couple versions of the function because
of errors. The <a href="https://cpp11.r-lib.org/articles/cpp11.html">cpp11
documentation</a> is small in
a good way. It has examples of converting R code into C++ equivalents,
which is precisely the activity that I was up to.</p>

<p>What I liked about the ChatGPT output is how clear the translation was.
In the R version, part of the character processing loop is to peek ahead
to the next character to see whether <code class="language-plaintext highlighter-rouge">"</code> is an escaped quote <code class="language-plaintext highlighter-rouge">""</code> or the
end of a string:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># ... in the character processing loop</span><span class="w">

    </span><span class="c1"># Start or close string mode if we see "</span><span class="w">
    </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c_starts_string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="c1"># Check for "" escapes</span><span class="w">
      </span><span class="n">peek_c</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">all_char</span><span class="p">[</span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1</span><span class="p">]</span><span class="w">
      </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">peek_c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"\""</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">in_string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">in_escaped_quote</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
      </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">in_string</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="o">!</span><span class="n">in_string</span><span class="w">
      </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">

</span><span class="c1"># ...</span><span class="w">
</span></code></pre></div></div>

<p>And here is the C++ version of the peek ahead code:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// ... helper functions ...</span>

  <span class="c1">// Is this a UTF-8 continuation byte? (10xxxxxx)</span>
  <span class="k">auto</span> <span class="n">is_cont</span> <span class="o">=</span> <span class="p">[](</span><span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">b</span><span class="p">)</span><span class="o">-&gt;</span><span class="kt">bool</span> <span class="p">{</span>
    <span class="c1">// Are the first two bits 10?</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">b</span> <span class="o">&amp;</span> <span class="mh">0xC0</span><span class="p">)</span> <span class="o">==</span> <span class="mh">0x80</span><span class="p">;</span>
  <span class="p">};</span>

<span class="c1">// ... in the character processing loop ...</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">b</span> <span class="o">==</span> <span class="mh">0x22</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// '"'</span>
      <span class="c1">// peek ahead to see if we have a double "" escapement</span>
      <span class="kt">size_t</span> <span class="n">j</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
      <span class="c1">// We need the next character, not just the next byte, so we skip</span>
      <span class="c1">// continuation characters.</span>
      <span class="k">while</span> <span class="p">(</span><span class="n">j</span> <span class="o">&lt;</span> <span class="n">nbytes</span> <span class="o">&amp;&amp;</span> <span class="n">is_cont</span><span class="p">(</span><span class="k">static_cast</span><span class="o">&lt;</span><span class="kt">unsigned</span> <span class="kt">char</span><span class="o">&gt;</span><span class="p">(</span><span class="n">src</span><span class="p">[</span><span class="n">j</span><span class="p">])))</span> <span class="o">++</span><span class="n">j</span><span class="p">;</span>
      <span class="c1">// Use `0x00` dummy character if we are at the end of the string</span>
      <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">nextb</span> <span class="o">=</span> <span class="p">(</span><span class="n">j</span> <span class="o">&lt;</span> <span class="n">nbytes</span><span class="p">)</span> <span class="o">?</span> <span class="k">static_cast</span><span class="o">&lt;</span><span class="kt">unsigned</span> <span class="kt">char</span><span class="o">&gt;</span><span class="p">(</span><span class="n">src</span><span class="p">[</span><span class="n">j</span><span class="p">])</span> <span class="o">:</span> <span class="mh">0x00</span><span class="p">;</span>

      <span class="k">if</span> <span class="p">(</span><span class="n">in_string</span> <span class="o">&amp;&amp;</span> <span class="n">nextb</span> <span class="o">==</span> <span class="mh">0x22</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">esc_next</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>    <span class="c1">// consume next '"' once</span>
      <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">in_string</span> <span class="o">=</span> <span class="o">!</span><span class="n">in_string</span><span class="p">;</span>
      <span class="p">}</span>
    <span class="p">}</span>

<span class="c1">// ...</span>
</code></pre></div></div>

<p>There is a logical correspondence between the lines that I wrote myself
in R and the lines that the LLM provided for C++. The C++ version works
at the level of bytes instead of characters, and that matters:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s2">"é"</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">nchar</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"chars"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] 1</span><span class="w">
</span><span class="s2">"é"</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">nchar</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"bytes"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] 2</span><span class="w">
</span></code></pre></div></div>

<p>But the C++ code makes sense to me. It looks <em>plausible</em>, right? Still,
plausible isn’t enough. I asked the LLM a lot of follow-up questions:
what does <code class="language-plaintext highlighter-rouge">auto</code> do, what is <code class="language-plaintext highlighter-rouge">size_t</code> doing, and so on. And I annotated
the C++ code with comments for my own understanding.</p>

<p>During my auditing, I went down a particular rabbithole to make sure I
understood how Unicode bytes get packed into UTF-8 sequences. I learned
how the character <code class="language-plaintext highlighter-rouge">é</code> for example has the codepoint (character number)
<code class="language-plaintext highlighter-rouge">U+00E9</code> in Unicode, so it falls in the range of codepoints that need to
be split into two bytes. The <a href="https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G27288" title="Table 3-6. UTF-8 Bit Distribution">scheme for two-byte
encoding</a> is</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>character number               -&gt;               character encoding
codepoint -&gt; 00000yyy yyxxxxxx -&gt; 110yyyyy 10xxxxxx -&gt; UTF-8 bytes
00E9      -&gt; 00000000 11101001 -&gt; 11000011 10101001 -&gt; c3 a9
</code></pre></div></div>

<p>Which we can check by hand:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bitchar_to_raw</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">xs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">xs</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
    </span><span class="n">strsplit</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
    </span><span class="n">lapply</span><span class="p">(</span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">as.integer</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">rev</span><span class="p">()</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">packBits</span><span class="p">())</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
    </span><span class="n">unlist</span><span class="p">()</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">bitchar_to_raw</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"11000011"</span><span class="p">,</span><span class="w"> </span><span class="s2">"10101001"</span><span class="p">))</span><span class="w">
</span><span class="c1">#&gt; [1] c3 a9</span><span class="w">
</span><span class="n">charToRaw</span><span class="p">(</span><span class="s2">"é"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] c3 a9</span><span class="w">
</span></code></pre></div></div>

<p>In the UTF-8 scheme, bytes that start with <code class="language-plaintext highlighter-rouge">10</code> are only the second,
third and fourth bytes in a character’s encoding—that is, only the
<em>continuation</em> bytes. Now, at this point, we can comprehend <em>why</em> the C++
is checking for continuation characters and why the check for
continuation characters involves checking the first two bits.</p>

<p>Another rabbithole involved how to parse numbers. At first, the LLM
suggested I use one of R’s own C functions to handle it. That idea 
seems really powerful to me—wait, now I can tap into what R’s own 
routines?!—but R’s parser was a bit stricter than what I needed to
match <code class="language-plaintext highlighter-rouge">praat.exe</code>.</p>

<p>This new C++ based tokenizer yielded a <strong>huge performance gain</strong>:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bench</span><span class="o">::</span><span class="n">mark</span><span class="p">(</span><span class="w">
  </span><span class="n">legacy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">readtextgrid</span><span class="o">:::</span><span class="n">legacy_read_textgrid_lines</span><span class="p">(</span><span class="n">tg_lines</span><span class="p">),</span><span class="w">
  </span><span class="n">new_r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">readtextgrid</span><span class="o">:::</span><span class="n">r_read_textgrid_lines</span><span class="p">(</span><span class="n">tg_lines</span><span class="p">),</span><span class="w">
  </span><span class="n">new_cpp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">readtextgrid</span><span class="o">::</span><span class="n">read_textgrid_lines</span><span class="p">(</span><span class="n">tg_lines</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 3 × 6</span><span class="w">
</span><span class="c1">#&gt;   expression      min   median `itr/sec` mem_alloc `gc/sec`</span><span class="w">
</span><span class="c1">#&gt;   &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:tm&gt;     &lt;dbl&gt; &lt;bch:byt&gt;    &lt;dbl&gt;</span><span class="w">
</span><span class="c1">#&gt; 1 legacy      65.22ms  68.77ms      13.8    6.49MB     4.59</span><span class="w">
</span><span class="c1">#&gt; 2 new_r       72.67ms   88.5ms      11.5  363.33KB     5.74</span><span class="w">
</span><span class="c1">#&gt; 3 new_cpp      3.12ms   3.64ms     272.    96.77KB     4.11</span><span class="w">
</span></code></pre></div></div>

<p>That’s an improvement of 10–15x! Now, I find myself wondering: What else
could use a cpp11 speed boost?</p>

<p>One downside of adopting cpp11 is that the package needs to compile
code. As a result, I can’t just tell people to try the developer version
of the package with
<a href="https://remotes.r-lib.org/reference/install_github.html"><code class="language-plaintext highlighter-rouge">remotes::install_github()</code></a>.
CRAN compiles packages so end users don’t face this issue when
installing the official released version of packages.</p>

<p>One workaround I adopted was relying on <a href="https://ropensci.org/r-universe/">R
Universe</a> which will provide compiled
versions of packages hosted on GitHub. Then we change the installation
instructions to:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">install.packages</span><span class="p">(</span><span class="w">
  </span><span class="s2">"readtextgrid"</span><span class="p">,</span><span class="w"> 
  </span><span class="n">repos</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"https://tjmahr.r-universe.dev"</span><span class="p">,</span><span class="w"> </span><span class="s2">"https://cloud.r-project.org"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>You might have seen this pattern elsewhere.
<a href="https://mc-stan.org/cmdstanr/">cmdstanr</a> skips CRAN entirely and only
uses R Universe.</p>

<h2 id="parting-thoughts">Parting thoughts</h2>

<p>An LLM helped me translate pokey R code into fast C++ code. The code is
<em>live now</em> on <a href="https://cran.r-project.org/package=readtextgrid" title="readtextgrid on CRAN">CRAN</a>, released in readtextgrid 0.2.0. I’m
maybe kind of a C++ developer now? (Nah.)</p>

<p>This kind of code translation strikes me as an easy win for R developers: 
“I have my version that works right now, but I think it can
go faster. Help me convert this to C++.” I took care to make sure I
understood the output. The syntax came easy, but the semantics
(comprehension and validation) took more time.</p>

<p>If I ask myself, <em>could I have done this translation to C++ without an
LLM?</em> The answer is no, not in a reasonable timeframe, certainly not as
fast as the two days it took me in this case. That’s a pretty undeniable
boost.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:fn-etc" role="doc-endnote">
      <p>Things I won’t talk about: Plagiarism, safety, energy use, hype,
undercooked AI features making things slower and dumber, stupid people 
emboldened by how trivial AI makes everything seem—<em>we won’t need 
programmers or doctors or historians or whatever</em> is what someone with no
expertise in programming, medicine, history, etc. would say—dumdums 
tearing down <a href="https://theknowledge.io/chestertons-fence-explained/">fences</a>,
<a href="https://www.notebookcheck.net/Hideo-Kojima-says-AI-is-a-friend-not-a-threat-to-creativity-in-game-development.1141848.0.html">creativity versus productivity</a>, aesthetic homogenization or 
how I keep seeing the same comic style in YouTube thumbnails, nobody asked 
for slop, oh they did ask for slop, etc. <a href="#fnref:fn-etc" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:fn-reasoning" role="doc-endnote">
      <p>There is something introspective about 
<a href="https://magazine.sebastianraschka.com/p/understanding-reasoning-llms" title="Understanding Reasoning LLMs">reasoning models</a> which will break a prompt into steps and 
work through them. But still, I’m thinking about what the ground truth is in 
this reasoning. The statistical regularities of word patterns? <a href="#fnref:fn-reasoning" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:fn-tradition" role="doc-endnote">
      <p>I think there is a great “tradition”—not sure of the right 
word here—in learning programming and other tools where we start from a
starter template or maybe small sample project and we experimentally
tweak the code and iterate until it turns into the thing we want. It’s
like <a href="https://en.wikipedia.org/wiki/Lev_Vygotsky#Scaffolding">scaffolding</a>
but at a less metaphorical level: Code that sets a foundation for 
self-directed learning. <a href="#fnref:fn-tradition" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:fn-wood" role="doc-endnote">
      <p>I asked ChatGPT for help making a shopping list for a small woodworking
project, and it offered a cutting plan for the lumber. <em>Sure, why not?</em> It
messed up the math with a plan that involved cutting off 74 inches of
wood from a 6-foot piece of lumber. My external corroboration in this case 
was a scrap of wood. <a href="#fnref:fn-wood" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:fn-abadox" role="doc-endnote">
      <p>I am still immensely annoyed about a YouTube video that tried to 
tell me Abadox was a “controversial” NES game. Get out of here. Nobody 
talked about that game. Show me a newspaper clipping or something. <a href="#fnref:fn-abadox" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:fn_weights" role="doc-endnote">
      <p>Let’s count functions with <code class="language-plaintext highlighter-rouge">weights</code> arguments in some base R
packages:</p>

      <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">get_funcs_with_weights</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">pkg</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">ns</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">asNamespace</span><span class="p">(</span><span class="n">pkg</span><span class="p">)</span><span class="w">
  </span><span class="n">ls</span><span class="p">(</span><span class="n">ns</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
    </span><span class="n">lapply</span><span class="p">(</span><span class="n">get</span><span class="p">,</span><span class="w"> </span><span class="n">envir</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ns</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
    </span><span class="n">setNames</span><span class="p">(</span><span class="n">ls</span><span class="p">(</span><span class="n">ns</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
    </span><span class="n">Filter</span><span class="p">(</span><span class="n">f</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">is.function</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
    </span><span class="n">lapply</span><span class="p">(</span><span class="n">formals</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
    </span><span class="n">Filter</span><span class="p">(</span><span class="n">f</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="s2">"weights"</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">x</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
    </span><span class="nf">names</span><span class="p">()</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">get_funcs_with_weights</span><span class="p">(</span><span class="s2">"stats"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;  [1] "density.default" "glm"             "glm.fit"         "lm"             </span><span class="w">
</span><span class="c1">#&gt;  [5] "loess"           "nls"             "ppr.default"     "ppr.formula"    </span><span class="w">
</span><span class="c1">#&gt;  [9] "predict.lm"      "predLoess"       "simpleLoess"</span><span class="w">
</span><span class="n">get_funcs_with_weights</span><span class="p">(</span><span class="s2">"mgcv"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;  [1] "bam"             "bfgs"            "deriv.check"     "deriv.check5"   </span><span class="w">
</span><span class="c1">#&gt;  [5] "efsud"           "efsudr"          "find.null.dev"   "gam"            </span><span class="w">
</span><span class="c1">#&gt;  [9] "gam.fit3"        "gam.fit4"        "gam.fit5"        "gamm"           </span><span class="w">
</span><span class="c1">#&gt; [13] "gammPQL"         "initial.spg"     "jagam"           "mgcv.find.theta"</span><span class="w">
</span><span class="c1">#&gt; [17] "mgcv.get.scale"  "newton"          "scasm"           "score.transect" </span><span class="w">
</span><span class="c1">#&gt; [21] "simplyFit"</span><span class="w">
</span><span class="n">get_funcs_with_weights</span><span class="p">(</span><span class="s2">"MASS"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] "glm.nb"      "glmmPQL"     "polr"        "rlm.default" "rlm.formula"</span><span class="w">
</span><span class="c1">#&gt; [6] "theta.md"    "theta.ml"    "theta.mm"</span><span class="w">
</span><span class="n">get_funcs_with_weights</span><span class="p">(</span><span class="s2">"nlme"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;  [1] "gls"               "gnls"              "lme"              </span><span class="w">
</span><span class="c1">#&gt;  [4] "lme.formula"       "lme.groupedData"   "lme.lmList"       </span><span class="w">
</span><span class="c1">#&gt;  [7] "nlme"              "nlme.formula"      "nlme.nlsList"     </span><span class="w">
</span><span class="c1">#&gt; [10] "plot.simulate.lme"</span><span class="w">
</span></code></pre></div>      </div>
      <p><a href="#fnref:fn_weights" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Tristan Mahr</name><email>tjmahrweb@gmail.com</email></author><category term="r" /><category term="c++" /><category term="cpp11" /><category term="llms" /><summary type="html"><![CDATA[Text parsing and thoughts about LLM-assisted R development]]></summary></entry><entry><title type="html">Notes on Citing R and R Packages</title><link href="https://tjmahr.github.io/r-package-citation-notes/" rel="alternate" type="text/html" title="Notes on Citing R and R Packages" /><published>2024-05-03T00:00:00-05:00</published><updated>2024-05-03T00:00:00-05:00</updated><id>https://tjmahr.github.io/r-package-citation-notes</id><content type="html" xml:base="https://tjmahr.github.io/r-package-citation-notes/"><![CDATA[<p>Our group has started using a new knowledge base system, so I have been
writing up and revisiting some of my documentation. Here I am going to
share a guide I wrote about citing R packages in academic writing.</p>

<h2 id="which-software-to-cite">Which software to cite</h2>

<p>Let’s make a distinction here between <em>reporting</em> (or summarizing) an
analysis and <em>reproducing</em> (or carrying out) an analysis.</p>

<p>Our main manuscript document is for <em>reporting</em>. We want to report which
tools and which versions of those tools we used to get our statistical
results. We don’t need to include every computational detail. We will
save that level of detail for a supplemental document that shows the
exact modeling code and <a href="https://r-lib.github.io/sessioninfo/reference/session_info.html"><code class="language-plaintext highlighter-rouge">sessioninfo::session_info()</code></a> for
<em>reproducing</em> our results. Moreover, journals will sometimes limit the
number of references in a manuscript and a full R analysis might draw
on 15 packages, so we in general cannot cite everything that helped us
get our results. So, we can think more generally about <strong>citation
priorities</strong>.</p>

<p>For an analysis carried out in R, these items have the <strong>highest
priority</strong> for citations:</p>

<ul>
  <li>R (the programming language / analysis environment).</li>
  <li>Third party packages that carried out the analyses.
    <ul>
      <li>For example, nlme, lme4, ordinal, rms, brms.</li>
    </ul>
  </li>
  <li>If a package calls on another language or analysis tool, cite that
tool as well.
    <ul>
      <li>For example, brms and rstanarm fit models using the Stan
programming language, so we need to cite and version Stan as
well.</li>
    </ul>
  </li>
  <li>Packages that performed additional computation on analysis results.
    <ul>
      <li>For example, emmeans to get marginal means from a fitted model.</li>
    </ul>
  </li>
  <li>Packages that visualized analysis results automatically. For
example, <a href="https://easystats.github.io/see/">see</a> or
<a href="https://interactions.jacob-long.com/">interactions</a>.</li>
</ul>

<p>The following items would have the <strong>lowest priority</strong> for citations:</p>

<ul>
  <li>RStudio: It’s just an interface to the language. (Ideally, an
analysis could be run without touching RStudio.)</li>
  <li>The built-in stats package.</li>
  <li>knitr/quarto/rmarkdown: These performed R computations for us and
stored the results in a document.</li>
  <li>Siloed off parts of a main package.
    <ul>
      <li>For example, the gamlss package
fits GAMLSS models but the distributions for model families are
stored in the package gamlss.dist. gamlss needs gamlss.dist to work,
but gamlss is the main important thing to cite.</li>
    </ul>
  </li>
  <li>Data storage formats.</li>
</ul>

<p>If space and the publication venue permit, we can also cite and version
the key R packages that manipulated or visualized the data such as
tidyverse, ggplot2, broom, tidybayes/ggdist, etc. Be generous. We do
want to credit the tools we used to get our results after all!</p>

<h2 id="where-to-get-citation-information">Where to get citation information</h2>

<p>Creators of scientific software will often tell users how to cite their
software. Scientific software tools often have an associated article
that announces the software and describes how to use it, and authors will
ask users to cite that publication so they can obtain academic credit
for their software work.</p>

<p>For R and R packages, the <a href="https://rdrr.io/r/utils/citation.html"><code class="language-plaintext highlighter-rouge">citation()</code></a> function will tell
users how to cite their software. lme4 is one of those packages that
directs users to a publication.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">citation</span><span class="p">(</span><span class="s2">"lme4"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; To cite lme4 in publications use:</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;   Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015).</span><span class="w">
</span><span class="c1">#&gt;   Fitting Linear Mixed-Effects Models Using lme4. Journal of</span><span class="w">
</span><span class="c1">#&gt;   Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; A BibTeX entry for LaTeX users is</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;   @Article{,</span><span class="w">
</span><span class="c1">#&gt;     title = {Fitting Linear Mixed-Effects Models Using {lme4}},</span><span class="w">
</span><span class="c1">#&gt;     author = {Douglas Bates and Martin M{\"a}chler and Ben Bolker and Steve Walker},</span><span class="w">
</span><span class="c1">#&gt;     journal = {Journal of Statistical Software},</span><span class="w">
</span><span class="c1">#&gt;     year = {2015},</span><span class="w">
</span><span class="c1">#&gt;     volume = {67},</span><span class="w">
</span><span class="c1">#&gt;     number = {1},</span><span class="w">
</span><span class="c1">#&gt;     pages = {1--48},</span><span class="w">
</span><span class="c1">#&gt;     doi = {10.18637/jss.v067.i01},</span><span class="w">
</span><span class="c1">#&gt;   }</span><span class="w">
</span></code></pre></div></div>

<p>Notice in the BibTeX entry at the bottom how <code class="language-plaintext highlighter-rouge">{lme4}</code> is put in braces.
These braces tell LaTeX not to change the capitalization of that word
when printing the title. Some journals or formats have different
preferences for how to capitalize titles, but as a general rule of
thumb, software titles need to be printed verbatim, or as they would be
used by the user. (That is, <code class="language-plaintext highlighter-rouge">library(Lme4)</code> will not load the lme4
package). When creating bibliography entries, take care to preserve the
capitalization so that the software name is accurate. Take care also to
differentiate between statistical methods and software names: “We fit
GAMLSS models with the gamlss package”.</p>

<p>For CRAN packages, the output of <code class="language-plaintext highlighter-rouge">citation()</code> is also provided online in
HTML. The CRAN package description page (e.g.,
<a href="https://cran.r-project.org/web/packages/lme4/">lme4</a>) includes a
<em>Citation</em> entry which generates a formatted version of the citation
information (e.g., <a href="https://cran.r-project.org/web/packages/lme4/citation.html">lme4 citation
info</a>).</p>

<p>When the software doesn’t have a publication, R will generate a citation
for you. The ordinal package is one such example.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">citation</span><span class="p">(</span><span class="s2">"ordinal"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; To cite 'ordinal' in publications use:</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;   Christensen R (2023). _ordinal-Regression Models for Ordinal Data_. R</span><span class="w">
</span><span class="c1">#&gt;   package version 2023.12-4,</span><span class="w">
</span><span class="c1">#&gt;   &lt;https://CRAN.R-project.org/package=ordinal&gt;.</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; A BibTeX entry for LaTeX users is</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;   @Manual{,</span><span class="w">
</span><span class="c1">#&gt;     title = {ordinal---Regression Models for Ordinal Data},</span><span class="w">
</span><span class="c1">#&gt;     author = {Rune H. B. Christensen},</span><span class="w">
</span><span class="c1">#&gt;     year = {2023},</span><span class="w">
</span><span class="c1">#&gt;     note = {R package version 2023.12-4},</span><span class="w">
</span><span class="c1">#&gt;     url = {https://CRAN.R-project.org/package=ordinal},</span><span class="w">
</span><span class="c1">#&gt;   }</span><span class="w">
</span></code></pre></div></div>
<p>The underscores <code class="language-plaintext highlighter-rouge">_</code> in the title indicate that the title would be
italicized when the citation is <a href="https://cran.r-project.org/web/packages/ordinal/citation.html">viewed on
CRAN</a>.</p>

<h2 id="how-to-cite-and-version-r-and-r-packages">How to cite and version R and R packages</h2>

<p>As a rule of thumb, any citation of a resource should answer these
questions:</p>

<ul>
  <li>Who (authors)</li>
  <li>What (title and sometimes format)</li>
  <li>When (year)</li>
  <li>Where (journal, URL, book, DOI)</li>
</ul>

<p>Then for software, we can add the following:</p>

<ul>
  <li>Which (version)</li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">citation()</code> will answer these questions for you.</p>

<p>There are a couple of other functions to know when it comes to package versions.
<a href="https://rdrr.io/r/utils/packageDescription.html"><code class="language-plaintext highlighter-rouge">utils::packageVersion()</code></a> provides the package version as a string:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">utils</span><span class="o">::</span><span class="n">packageVersion</span><span class="p">(</span><span class="s2">"lme4"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] '1.1.35.3'</span><span class="w">
</span><span class="n">utils</span><span class="o">::</span><span class="n">packageVersion</span><span class="p">(</span><span class="s2">"ordinal"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] '2023.12.4'</span><span class="w">
</span></code></pre></div></div>

<p>For the current R version, a bunch of built-in functions can tell you
everything you need to know. I can never remember which of these
functions I want (it’s <a href="https://rdrr.io/r/base/numeric_version.html"><code class="language-plaintext highlighter-rouge">getRversion()</code></a>), so I will sometimes use
<code class="language-plaintext highlighter-rouge">utils::packageVersion("base")</code> to get a simple version number.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">R.version.string</span><span class="w">
</span><span class="c1">#&gt; [1] "R version 4.4.0 (2024-04-24 ucrt)"</span><span class="w">
</span><span class="n">R.version</span><span class="w">
</span><span class="c1">#&gt;                _                                </span><span class="w">
</span><span class="c1">#&gt; platform       x86_64-w64-mingw32               </span><span class="w">
</span><span class="c1">#&gt; arch           x86_64                           </span><span class="w">
</span><span class="c1">#&gt; os             mingw32                          </span><span class="w">
</span><span class="c1">#&gt; crt            ucrt                             </span><span class="w">
</span><span class="c1">#&gt; system         x86_64, mingw32                  </span><span class="w">
</span><span class="c1">#&gt; status                                          </span><span class="w">
</span><span class="c1">#&gt; major          4                                </span><span class="w">
</span><span class="c1">#&gt; minor          4.0                              </span><span class="w">
</span><span class="c1">#&gt; year           2024                             </span><span class="w">
</span><span class="c1">#&gt; month          04                               </span><span class="w">
</span><span class="c1">#&gt; day            24                               </span><span class="w">
</span><span class="c1">#&gt; svn rev        86474                            </span><span class="w">
</span><span class="c1">#&gt; language       R                                </span><span class="w">
</span><span class="c1">#&gt; version.string R version 4.4.0 (2024-04-24 ucrt)</span><span class="w">
</span><span class="c1">#&gt; nickname       Puppy Cup</span><span class="w">
</span><span class="n">getRversion</span><span class="p">()</span><span class="w">
</span><span class="c1">#&gt; [1] '4.4.0'</span><span class="w">

</span><span class="n">utils</span><span class="o">::</span><span class="n">packageVersion</span><span class="p">(</span><span class="s2">"base"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] '4.4.0'</span><span class="w">
</span></code></pre></div></div>

<p>For Stan, depending on the backend used, the software version is available via:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># rstanarm and default brms</span><span class="w">
</span><span class="n">rstan</span><span class="o">::</span><span class="n">stan_version</span><span class="p">()</span><span class="w">
</span><span class="c1">#&gt; [1] "2.32.2"</span><span class="w">

</span><span class="c1"># non-default for brms</span><span class="w">
</span><span class="n">cmdstanr</span><span class="o">::</span><span class="n">cmdstan_version</span><span class="p">()</span><span class="w">
</span><span class="c1">#&gt; [1] "2.34.1"</span><span class="w">
</span></code></pre></div></div>

<h2 id="examples">Examples</h2>

<p>A simple example of R, a modeling R package and a helper R package:</p>

<blockquote>
  <p>Analyses were carried out the R programming language (vers. 4.2.0, R
Core Team, 2021). Mixed models were estimated using the lme4 package
(vers. 1.1.28, Bates et al., 2015). We estimated marginal means and
contrasts using the emmeans package (vers. 1.7.2, Lenth, 2021).</p>
</blockquote>

<p>Below is the actual RMarkdown content, so that version numbers and
citations are inlined automatically. (We’re omitting details on creating
.bib files or using pandoc’s <code class="language-plaintext highlighter-rouge">@</code> citations.)</p>

<div class="language-md highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">```</span><span class="nl">{r}
</span><span class="sb">v_lme4 &lt;- packageVersion("lme4")
v_r &lt;- packageVersion("base")
v_emmeans &lt;- packageVersion("emmeans")</span>
<span class="p">```</span>

Analyses were carried out the R programming language [vers. <span class="sb">`r v_r`</span>,
@rstats]. Mixed models were estimated using the lme4 package
[vers. <span class="sb">`r v_lme4`</span>, @lme4]. We estimated marginal means and contrasts
using the emmeans package [vers. <span class="sb">`r v_emmeans`</span>, @emmeans].
</code></pre></div></div>

<p>This aspect of the code is invisible, but I use <a href="https://en.wikipedia.org/wiki/Non-breaking_space">nonbreaking
spaces <code class="language-plaintext highlighter-rouge"> </code></a> (HTML
<code class="language-plaintext highlighter-rouge">&amp;nbsp;</code>) after <code class="language-plaintext highlighter-rouge">vers.</code> so that the <code class="language-plaintext highlighter-rouge">vers.</code> and the version number stay
on the same line.</p>

<p>Here is a more involved example involving an additional language and an R
package that interfaces to that language:</p>

<blockquote>
  <p>We estimated the models using Stan (vers. 2.27.0, Carpenter et al., 2017) via
the brms package (vers. 2.16.1, Bürkner, 2017) and tidybayes package
(vers. 3.0.4, Kay, 2021) in R (vers. 4.3.0, R Core Team, 2021).</p>
</blockquote>

<p>Behind the scenes, I had written the following RMarkdown:</p>

<div class="language-md highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">```</span><span class="nl">{r}
</span><span class="sb">model &lt;- targets::tar_read(model_random_slope)
v_stan &lt;- model$version$cmdstan
v_brms &lt;- model$version$brms
v_tidybayes &lt;- packageVersion("tidybayes")
v_r &lt;- getRversion()</span>
<span class="p">```</span>

We estimated the models using Stan [vers. <span class="sb">`r v_stan`</span>, @stan] via the
brms package [vers. <span class="sb">`r v_brms`</span>, @brms-jss] and tidybayes package
[vers. <span class="sb">`r v_tidybayes`</span>, @R-tidybayes] in R [vers. <span class="sb">`r v_r`</span>, @r-base].
</code></pre></div></div>

<p>Notice that I am reading in a cached model object (<code class="language-plaintext highlighter-rouge">targets::tar_read()</code>) and
reading the software versions from that object. This arrangement avoids problems
where models are fitted with one version of a package but
<code class="language-plaintext highlighter-rouge">utils::packageVersion()</code> returns a different, more recent package version. brms
stored these versions automatically for me. In general, when I cache a model
like this, I store the package version in the model object.</p>

<h2 id="a-note-on-automatic-citation-helpers">A note on automatic citation helpers</h2>

<p>A tool like the <a href="https://pakillo.github.io/grateful/">grateful package</a>
will generate a list of references and citations for us. Suppose we had
the following where we fit a model and look at a summary of it.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">mgcv</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; Loading required package: nlme</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; Attaching package: 'nlme'</span><span class="w">
</span><span class="c1">#&gt; The following object is masked from 'package:dplyr':</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;     collapse</span><span class="w">
</span><span class="c1">#&gt; This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.</span><span class="w">

</span><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">MASS</span><span class="o">::</span><span class="n">mcycle</span><span class="w">
</span><span class="n">model</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gam</span><span class="p">(</span><span class="n">accel</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">times</span><span class="p">,</span><span class="w"> </span><span class="n">bs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"cr"</span><span class="p">),</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">)</span><span class="w">
</span><span class="n">broom</span><span class="o">::</span><span class="n">tidy</span><span class="p">(</span><span class="n">model</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 1 × 5</span><span class="w">
</span><span class="c1">#&gt;   term       edf ref.df statistic p.value</span><span class="w">
</span><span class="c1">#&gt;   &lt;chr&gt;    &lt;dbl&gt;  &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt;</span><span class="w">
</span><span class="c1">#&gt; 1 s(times)  8.39   8.87      53.8       0</span><span class="w">
</span></code></pre></div></div>

<p>grateful detects the following packages in use:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">grateful</span><span class="o">::</span><span class="n">scan_packages</span><span class="p">(</span><span class="n">pkgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Session"</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;         pkg version</span><span class="w">
</span><span class="c1">#&gt; 1      base   4.4.0</span><span class="w">
</span><span class="c1">#&gt; 2     knitr    1.46</span><span class="w">
</span><span class="c1">#&gt; 3      mgcv   1.9.1</span><span class="w">
</span><span class="c1">#&gt; 4      nlme 3.1.164</span><span class="w">
</span><span class="c1">#&gt; 5 tidyverse   2.0.0</span><span class="w">
</span></code></pre></div></div>

<p>Note that broom is excluded despite being used and that nlme is included 
despite not being loaded directly. That’s because broom is loaded by 
tidyverse and gets absorbed by the tidyverse citation and because mgcv 
loads nlme (as its start-up message says). (And knitr is loaded when I render 
my blog posts but not when I work within this R session interactively.)</p>

<p>Okay, so we should just list the packages manually and disable tidyverse
from absorbing broom. grateful can create a bibliography for us:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">grateful</span><span class="o">::</span><span class="n">get_pkgs_info</span><span class="p">(</span><span class="w">
  </span><span class="n">pkgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"mgcv"</span><span class="p">,</span><span class="w"> </span><span class="s2">"broom"</span><span class="p">),</span><span class="w"> 
  </span><span class="n">out.dir</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getwd</span><span class="p">(),</span><span class="w"> 
  </span><span class="n">cite.tidyverse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;     pkg version                                         citekeys</span><span class="w">
</span><span class="c1">#&gt; 1 broom   1.0.5                                            broom</span><span class="w">
</span><span class="c1">#&gt; 2  mgcv   1.9.1 mgcv2011, mgcv2016, mgcv2004, mgcv2017, mgcv2003</span><span class="w">

</span><span class="n">a</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">grateful</span><span class="o">::</span><span class="n">cite_packages</span><span class="p">(</span><span class="w">
  </span><span class="n">pkgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"mgcv"</span><span class="p">,</span><span class="w"> </span><span class="s2">"broom"</span><span class="p">),</span><span class="w"> 
  </span><span class="n">out.dir</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getwd</span><span class="p">(),</span><span class="w"> 
  </span><span class="n">cite.tidyverse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> 
  </span><span class="n">output</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"paragraph"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">a</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">stringr</span><span class="o">::</span><span class="n">str_wrap</span><span class="p">(</span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">72</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">writeLines</span><span class="p">()</span><span class="w">
</span><span class="c1">#&gt; We used the following R packages: broom v. 1.0.5 [@broom], mgcv v. 1.9.1</span><span class="w">
</span><span class="c1">#&gt; [@mgcv2003; @mgcv2004; @mgcv2011; @mgcv2016; @mgcv2017].</span><span class="w">
</span></code></pre></div></div>

<p>Now, we have the issue where <code class="language-plaintext highlighter-rouge">citation(mgcv)</code> has <a href="https://cran.r-project.org/web/packages/mgcv/citation.html">multiple publications
associated</a>
with it, not all of which are relevant for our usage.</p>

<p>The point of this example is that a tool like grateful—and more
generally tools that produce code for us—can be useful to compile 
information and get the ball rolling for us. But, we still have
to edit and refine the outputs to work correctly.</p>

<hr />

<p><em>Last knitted on 2024-05-07. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2024-05-03-r-package-citation-notes.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:si" role="doc-endnote">

      <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#&gt; ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  setting         value</span><span class="w">
</span><span class="c1">#&gt;  version         R version 4.4.0 (2024-04-24 ucrt)</span><span class="w">
</span><span class="c1">#&gt;  os              Windows 10 x64 (build 19045)</span><span class="w">
</span><span class="c1">#&gt;  system          x86_64, mingw32</span><span class="w">
</span><span class="c1">#&gt;  ui              RTerm</span><span class="w">
</span><span class="c1">#&gt;  language        (EN)</span><span class="w">
</span><span class="c1">#&gt;  collate         English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  ctype           English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  tz              America/Chicago</span><span class="w">
</span><span class="c1">#&gt;  date            2024-05-07</span><span class="w">
</span><span class="c1">#&gt;  pandoc          NA</span><span class="w">
</span><span class="c1">#&gt;  stan (rstan)    2.32.2</span><span class="w">
</span><span class="c1">#&gt;  stan (cmdstanr) 2.34.1</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  ! package        * version  date (UTC) lib source</span><span class="w">
</span><span class="c1">#&gt;    abind            1.4-5    2016-07-21 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    backports        1.4.1    2021-12-13 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    broom            1.0.5    2023-06-09 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    cachem           1.0.8    2023-05-01 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    checkmate        2.3.1    2023-12-04 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    cli              3.6.2    2023-12-11 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    cmdstanr         0.7.1    2024-05-03 [1] local</span><span class="w">
</span><span class="c1">#&gt;    codetools        0.2-20   2024-03-31 [2] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    colorspace       2.1-0    2023-01-23 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    curl             5.2.1    2024-03-01 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    distributional   0.4.0    2024-02-07 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    downlit          0.4.3    2023-06-29 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    dplyr          * 1.1.4    2023-11-17 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    evaluate         0.23     2023-11-01 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    fansi            1.0.6    2023-12-08 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    fastmap          1.1.1    2023-02-24 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    forcats        * 1.0.0    2023-01-29 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    generics         0.1.3    2022-07-05 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    ggplot2        * 3.5.1    2024-04-23 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    git2r            0.33.0   2023-11-26 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    glue             1.7.0    2024-01-09 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    grateful         0.2.4    2023-10-22 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    gridExtra        2.3      2017-09-09 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    gtable           0.3.5    2024-04-22 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    here             1.0.1    2020-12-13 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    hms              1.1.3    2023-03-21 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    inline           0.3.19   2021-05-31 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    jsonlite         1.8.8    2023-12-04 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    knitr          * 1.46     2024-04-06 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    lattice          0.22-6   2024-03-20 [2] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    lifecycle        1.0.4    2023-11-07 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    loo              2.7.0    2024-02-24 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    lubridate      * 1.9.3    2023-09-27 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    magrittr         2.0.3    2022-03-30 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    MASS             7.3-60.2 2024-04-24 [2] local</span><span class="w">
</span><span class="c1">#&gt;    Matrix           1.7-0    2024-03-22 [2] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    matrixStats      1.3.0    2024-04-11 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    memoise          2.0.1    2021-11-26 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    mgcv           * 1.9-1    2023-12-21 [2] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    munsell          0.5.1    2024-04-01 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    nlme           * 3.1-164  2023-11-27 [2] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    pillar           1.9.0    2023-03-22 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    pkgbuild         1.4.4    2024-03-17 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    pkgconfig        2.0.3    2019-09-22 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    posterior        1.5.0    2023-10-31 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    processx         3.8.4    2024-03-16 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    ps               1.7.6    2024-01-18 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    purrr          * 1.0.2    2023-08-10 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    QuickJSR         1.1.3    2024-01-31 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    R6               2.5.1    2021-08-19 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    ragg             1.3.0    2024-03-13 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    Rcpp             1.0.12   2024-01-09 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;  D RcppParallel     5.1.7    2023-02-27 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    readr          * 2.1.5    2024-01-10 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    rlang            1.1.3    2024-01-10 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    rprojroot        2.0.4    2023-11-05 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    rstan            2.32.6   2024-03-05 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    rstudioapi       0.16.0   2024-03-24 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    scales           1.3.0    2023-11-28 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    sessioninfo      1.2.2    2021-12-06 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    StanHeaders      2.32.7   2024-04-25 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    stringi          1.8.3    2023-12-11 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    stringr        * 1.5.1    2023-11-14 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    systemfonts      1.0.6    2024-03-07 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    tensorA          0.36.2.1 2023-12-13 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    textshaping      0.3.7    2023-10-09 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    tibble         * 3.2.1    2023-03-20 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    tidyr          * 1.3.1    2024-01-24 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    tidyselect       1.2.1    2024-03-11 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    tidyverse      * 2.0.0    2023-02-22 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    timechange       0.3.0    2024-01-18 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    tzdb             0.4.0    2023-05-12 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    utf8             1.2.4    2023-10-22 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    V8               4.4.2    2024-02-15 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    vctrs            0.6.5    2023-12-01 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    withr            3.0.0    2024-01-16 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    xfun             0.43     2024-03-25 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt;    yaml             2.3.8    2023-12-11 [1] CRAN (R 4.4.0)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;  [1] C:/Users/mahr/AppData/Local/R/win-library/4.4</span><span class="w">
</span><span class="c1">#&gt;  [2] C:/Program Files/R/R-4.4.0/library</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;  D ── DLL MD5 mismatch, broken installation.</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div>      </div>
      <p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Tristan Mahr</name><email>tjmahrweb@gmail.com</email></author><category term="r" /><category term="brms" /><category term="stan" /><summary type="html"><![CDATA[Who, what, where, when and which]]></summary></entry><entry><title type="html">Ordering constraints in brms using contrast coding</title><link href="https://tjmahr.github.io/bayesian-ordering-constraint/" rel="alternate" type="text/html" title="Ordering constraints in brms using contrast coding" /><published>2023-07-03T00:00:00-05:00</published><updated>2023-07-03T00:00:00-05:00</updated><id>https://tjmahr.github.io/bayesian-ordering-constraint</id><content type="html" xml:base="https://tjmahr.github.io/bayesian-ordering-constraint/"><![CDATA[<p>Mattan S. Ben-Shachar wrote an <a href="https://blog.msbstats.info/posts/2023-06-26-order-constraints-in-brms/" title="Order Constraints in Bayes Models (with brms)">excellent tutorial</a>
about how to impose ordering constraints in Bayesian regression models.
In that post, the data comes from archaeology (inspired by
<a href="https://arxiv.org/abs/1704.07141">Buck, 2017</a> but not an exact copy).
We have samples from different layers (<code class="language-plaintext highlighter-rouge">Layer</code>) in a site, and for each
sample, we have a <code class="language-plaintext highlighter-rouge">C14</code> radiocarbon date measurement and its associated
measurement <code class="language-plaintext highlighter-rouge">error</code>.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">

</span><span class="n">table1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tribble</span><span class="p">(</span><span class="w">
  </span><span class="o">~</span><span class="n">Layer</span><span class="p">,</span><span class="w">  </span><span class="o">~</span><span class="n">C14</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">error</span><span class="p">,</span><span class="w">
     </span><span class="s2">"B"</span><span class="p">,</span><span class="w"> </span><span class="m">-5773</span><span class="p">,</span><span class="w">     </span><span class="m">30</span><span class="p">,</span><span class="w">
     </span><span class="s2">"B"</span><span class="p">,</span><span class="w"> </span><span class="m">-5654</span><span class="p">,</span><span class="w">     </span><span class="m">30</span><span class="p">,</span><span class="w">
     </span><span class="s2">"B"</span><span class="p">,</span><span class="w"> </span><span class="m">-5585</span><span class="p">,</span><span class="w">     </span><span class="m">30</span><span class="p">,</span><span class="w">
     </span><span class="s2">"C"</span><span class="p">,</span><span class="w"> </span><span class="m">-5861</span><span class="p">,</span><span class="w">     </span><span class="m">30</span><span class="p">,</span><span class="w">
     </span><span class="s2">"C"</span><span class="p">,</span><span class="w"> </span><span class="m">-5755</span><span class="p">,</span><span class="w">     </span><span class="m">30</span><span class="p">,</span><span class="w">
     </span><span class="s2">"E"</span><span class="p">,</span><span class="w"> </span><span class="m">-5850</span><span class="p">,</span><span class="w">     </span><span class="m">50</span><span class="p">,</span><span class="w">
     </span><span class="s2">"E"</span><span class="p">,</span><span class="w"> </span><span class="m">-5928</span><span class="p">,</span><span class="w">     </span><span class="m">50</span><span class="p">,</span><span class="w">
     </span><span class="s2">"E"</span><span class="p">,</span><span class="w"> </span><span class="m">-5905</span><span class="p">,</span><span class="w">     </span><span class="m">50</span><span class="p">,</span><span class="w">
     </span><span class="s2">"G"</span><span class="p">,</span><span class="w"> </span><span class="m">-6034</span><span class="p">,</span><span class="w">     </span><span class="m">30</span><span class="p">,</span><span class="w">
     </span><span class="s2">"G"</span><span class="p">,</span><span class="w"> </span><span class="m">-6184</span><span class="p">,</span><span class="w">     </span><span class="m">30</span><span class="p">,</span><span class="w">
     </span><span class="s2">"I"</span><span class="p">,</span><span class="w"> </span><span class="m">-6248</span><span class="p">,</span><span class="w">     </span><span class="m">50</span><span class="p">,</span><span class="w">
     </span><span class="s2">"I"</span><span class="p">,</span><span class="w"> </span><span class="m">-6350</span><span class="p">,</span><span class="w">     </span><span class="m">50</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="n">table1</span><span class="o">$</span><span class="n">Layer</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">table1</span><span class="o">$</span><span class="n">Layer</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>Because of how the layers are ordered—new stuff piled on top of older
stuff—we <em>a priori</em> expect deeper layers to have older dates, so these
are the ordering constraints:</p>

\[\mu_{\text{Layer I}} &lt; \mu_{\text{Layer G}} &lt; \mu_{\text{Layer E}} &lt; \mu_{\text{Layer C}} &lt; \mu_{\text{Layer B}}\]

<p>where <em>μ</em> is the average <code class="language-plaintext highlighter-rouge">C14</code> age of a layer.</p>

<p>Ben-Shachar’s post works through some ways in brms to achieve this
constraint:</p>

<ol>
  <li>
    <p>Fit the usual model but filter out posterior draws where the
ordering constraint is violated.</p>
  </li>
  <li>
    <p>Have the Stan sampler <code class="language-plaintext highlighter-rouge">reject</code> draws where the constraint is
violated. But note that the <a href="https://mc-stan.org/docs/reference-manual/reject-statements.html" title="Stan Manual: Reject statements">documentation for
<code class="language-plaintext highlighter-rouge">reject</code></a> has a section titled “Rejection is not for
constraints”.</p>
  </li>
  <li>
    <p>Use brms’s monotonic effect <a href="https://paul-buerkner.github.io/brms/articles/brms_monotonic.html" title="Estimating Monotonic Effects with brms"><code class="language-plaintext highlighter-rouge">mo()</code></a> syntax.</p>
  </li>
</ol>

<p>In this post, I am going to add another option to this list:</p>

<ol start="4"><li> Use contrast coding so the model parameters 
represent the differences between successive levels, and use priors to enforce 
the ordering constraint.</li></ol>

<h2 id="big-idea-of-contrast-coding">Big idea of contrast coding</h2>

<p>When our model includes categorical variables, we need some way to code
those variables in our model (that is, use numbers to represent the
category levels). Our choice of coding scheme will change the meaning of
the model parameters, allowing us to perform different comparisons (test
different statistical hypotheses) about the means of the category
levels. Let’s spell that out again, because it is the big idea of the
contrast coding:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>different contrast coding schemes &lt;-&gt; 
  different parameter meanings &lt;-&gt; 
    different comparisons / hypotheses
</code></pre></div></div>

<p>(Isn’t that an eye-popping graphic?)</p>

<p>The toolbox of contrast coding schemes is deep but also confusing.
Whenever I step away from R’s default contrast coding, I usually have
these pages open to help me: <a href="https://stats.oarc.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/" title="https://stats.oarc.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/">some tutorial on a UCLA
page</a>, Lisa DeBruine’s <a href="https://debruine.github.io/faux/articles/contrasts.html">comparison
article</a>, and the menu of <a href="https://rdrr.io/pkg/emmeans/man/emmc-functions.html">contrast schemes in
emmeans</a>. So, let’s 
review the basics by looking at R’s default contrast coding scheme.</p>

<h2 id="the-default-dummy-coding">The default: dummy coding</h2>

<p>By default, R will code categorical variables in a regression model
using “treatment” or “dummy” coding. In this scheme,</p>

<ul>
  <li>The intercept is the mean of one of the category levels (the
<em>reference level</em>)</li>
  <li>Parameters estimate the difference between each other level and the
reference level</li>
</ul>

<p>Let’s fit a simple linear model and work through the parameter meanings:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Layer</span><span class="p">,</span><span class="w"> </span><span class="n">table1</span><span class="p">)</span><span class="w">
</span><span class="n">coef</span><span class="p">(</span><span class="n">m1</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; (Intercept)      LayerC      LayerE      LayerG      LayerI </span><span class="w">
</span><span class="c1">#&gt;  -5670.6667   -137.3333   -223.6667   -438.3333   -628.3333</span><span class="w">
</span></code></pre></div></div>

<p>Here, the <code class="language-plaintext highlighter-rouge">(Intercept)</code> is the mean of the reference level, and the
reference level is the level of the categorical variable not listed in
the other parameter names (<code class="language-plaintext highlighter-rouge">LayerB</code>). Each of the other parameters is a
difference from that reference level. Layer C’s mean is <code class="language-plaintext highlighter-rouge">(Intercept)</code> +
<code class="language-plaintext highlighter-rouge">LayerC</code>. The <a href="https://rdrr.io/r/stats/model.matrix.html"><code class="language-plaintext highlighter-rouge">model.matrix()</code></a> shows how these
categorical variables are coded in the model’s design/contrast matrix:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Matrix has 1 row per observation but we just want 1 per category level</span><span class="w">
</span><span class="n">mat_m1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">m1</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">model.matrix</span><span class="p">()</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
  </span><span class="n">unique</span><span class="p">()</span><span class="w">
</span><span class="n">mat_m1</span><span class="w">
</span><span class="c1">#&gt;    (Intercept) LayerC LayerE LayerG LayerI</span><span class="w">
</span><span class="c1">#&gt; 1            1      0      0      0      0</span><span class="w">
</span><span class="c1">#&gt; 4            1      1      0      0      0</span><span class="w">
</span><span class="c1">#&gt; 6            1      0      1      0      0</span><span class="w">
</span><span class="c1">#&gt; 9            1      0      0      1      0</span><span class="w">
</span><span class="c1">#&gt; 11           1      0      0      0      1</span><span class="w">
</span></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">(Intercept)</code> is the model constant, so naturally, it’s switched on
(equals 1) for every row. Each of the other columns are <em>indicator
variables</em>. <code class="language-plaintext highlighter-rouge">layerC</code> turns on for the layer C rows, <code class="language-plaintext highlighter-rouge">layerE</code> turns on
for layer E rows, and so on.</p>

<p>Matrix multiplying the contrast matrix by the model coefficients will 
compute the mean values of each layer.</p>

\[\mathbf{\hat y} = \mathbf{X}\boldsymbol{\beta}\]

<p>Think of this equation as a contract for a contrast coding scheme:
Multiplying the contrast matrix by the model coefficients should give us
the means of the category levels.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mat_m1</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">coef</span><span class="p">(</span><span class="n">m1</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;         [,1]</span><span class="w">
</span><span class="c1">#&gt; 1  -5670.667</span><span class="w">
</span><span class="c1">#&gt; 4  -5808.000</span><span class="w">
</span><span class="c1">#&gt; 6  -5894.333</span><span class="w">
</span><span class="c1">#&gt; 9  -6109.000</span><span class="w">
</span><span class="c1">#&gt; 11 -6299.000</span><span class="w">

</span><span class="c1"># Means by hand</span><span class="w">
</span><span class="n">aggregate</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">Layer</span><span class="p">,</span><span class="w"> </span><span class="n">table1</span><span class="p">,</span><span class="w"> </span><span class="n">mean</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;   Layer       C14</span><span class="w">
</span><span class="c1">#&gt; 1     B -5670.667</span><span class="w">
</span><span class="c1">#&gt; 2     C -5808.000</span><span class="w">
</span><span class="c1">#&gt; 3     E -5894.333</span><span class="w">
</span><span class="c1">#&gt; 4     G -6109.000</span><span class="w">
</span><span class="c1">#&gt; 5     I -6299.000</span><span class="w">
</span></code></pre></div></div>

<p>If the matrix multiplication is too quick, here it is in slow motion
where each row has been weighted (multiplied) by coefficients:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Sums of the rows are the means</span><span class="w">
</span><span class="n">mat_m1</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">diag</span><span class="p">(</span><span class="n">coef</span><span class="p">(</span><span class="n">m1</span><span class="p">))</span><span class="w">
</span><span class="c1">#&gt;         [,1]      [,2]      [,3]      [,4]      [,5]</span><span class="w">
</span><span class="c1">#&gt; 1  -5670.667    0.0000    0.0000    0.0000    0.0000</span><span class="w">
</span><span class="c1">#&gt; 4  -5670.667 -137.3333    0.0000    0.0000    0.0000</span><span class="w">
</span><span class="c1">#&gt; 6  -5670.667    0.0000 -223.6667    0.0000    0.0000</span><span class="w">
</span><span class="c1">#&gt; 9  -5670.667    0.0000    0.0000 -438.3333    0.0000</span><span class="w">
</span><span class="c1">#&gt; 11 -5670.667    0.0000    0.0000    0.0000 -628.3333</span><span class="w">
</span></code></pre></div></div>

<h3 id="successive-differences-coding">Successive differences coding</h3>

<p>Now, let’s look at a different kind of coding: (reverse) successive differences 
coding. In this scheme:</p>

<ul>
  <li>The intercept is the mean of the levels means</li>
  <li>Parameters estimate the difference between adjacent levels</li>
  <li>but I have to reverse how the levels are ordered in the underlying
<a href="https://rdrr.io/r/base/factor.html"><code class="language-plaintext highlighter-rouge">factor()</code></a> so that the differences are positive, comparing each
layer with the one <em>below</em> it. (<code class="language-plaintext highlighter-rouge">LayerB - LayerC</code> should be positive).</li>
</ul>

<p>We apply this coding by creating a new factor and setting the
<a href="https://rdrr.io/r/stats/contrasts.html"><code class="language-plaintext highlighter-rouge">contrast()</code></a>. R lets us set
the contrast to the name of a function that computes contrasts, so
we use <code class="language-plaintext highlighter-rouge">"contr.sdif"</code>.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">contr.sdif</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">MASS</span><span class="o">::</span><span class="n">contr.sdif</span><span class="w">

</span><span class="c1"># Reverse the factor levels</span><span class="w">
</span><span class="n">table1</span><span class="o">$</span><span class="n">LayerAlt</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">table1</span><span class="o">$</span><span class="n">Layer</span><span class="p">,</span><span class="w"> </span><span class="n">rev</span><span class="p">(</span><span class="n">levels</span><span class="p">(</span><span class="n">table1</span><span class="o">$</span><span class="n">Layer</span><span class="p">)))</span><span class="w">

</span><span class="n">contrasts</span><span class="p">(</span><span class="n">table1</span><span class="o">$</span><span class="n">LayerAlt</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"contr.sdif"</span><span class="w">
</span></code></pre></div></div>

<p>Then we just fit the model as usual. As intended, the model’s
coefficients are different.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">LayerAlt</span><span class="p">,</span><span class="w"> </span><span class="n">table1</span><span class="p">)</span><span class="w">
</span><span class="n">coef</span><span class="p">(</span><span class="n">m2</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; (Intercept) LayerAltG-I LayerAltE-G LayerAltC-E LayerAltB-C </span><span class="w">
</span><span class="c1">#&gt; -5956.20000   190.00000   214.66667    86.33333   137.33333</span><span class="w">
</span></code></pre></div></div>

<p>We can compute the mean of layer means and the layer differences by hand
to confirm that the model parameters are computing what we expect.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Make a list so we can write out the diffs easily</span><span class="w">
</span><span class="n">layer_means</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">table1</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">split</span><span class="p">(</span><span class="o">~</span><span class="w"> </span><span class="n">Layer</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">lapply</span><span class="p">(</span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">C14</span><span class="p">))</span><span class="w">
</span><span class="n">str</span><span class="p">(</span><span class="n">layer_means</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; List of 5</span><span class="w">
</span><span class="c1">#&gt;  $ B: num -5671</span><span class="w">
</span><span class="c1">#&gt;  $ C: num -5808</span><span class="w">
</span><span class="c1">#&gt;  $ E: num -5894</span><span class="w">
</span><span class="c1">#&gt;  $ G: num -6109</span><span class="w">
</span><span class="c1">#&gt;  $ I: num -6299</span><span class="w">

</span><span class="n">data.frame</span><span class="p">(</span><span class="w">
  </span><span class="n">model_coef</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">coef</span><span class="p">(</span><span class="n">m2</span><span class="p">),</span><span class="w">
  </span><span class="n">by_hand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
    </span><span class="n">mean</span><span class="p">(</span><span class="n">unlist</span><span class="p">(</span><span class="n">layer_means</span><span class="p">)),</span><span class="w">
    </span><span class="n">layer_means</span><span class="o">$</span><span class="n">G</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">layer_means</span><span class="o">$</span><span class="n">I</span><span class="p">,</span><span class="w">
    </span><span class="n">layer_means</span><span class="o">$</span><span class="n">E</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">layer_means</span><span class="o">$</span><span class="n">G</span><span class="p">,</span><span class="w">
    </span><span class="n">layer_means</span><span class="o">$</span><span class="n">C</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">layer_means</span><span class="o">$</span><span class="n">E</span><span class="p">,</span><span class="w">
    </span><span class="n">layer_means</span><span class="o">$</span><span class="n">B</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">layer_means</span><span class="o">$</span><span class="n">C</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;              model_coef     by_hand</span><span class="w">
</span><span class="c1">#&gt; (Intercept) -5956.20000 -5956.20000</span><span class="w">
</span><span class="c1">#&gt; LayerAltG-I   190.00000   190.00000</span><span class="w">
</span><span class="c1">#&gt; LayerAltE-G   214.66667   214.66667</span><span class="w">
</span><span class="c1">#&gt; LayerAltC-E    86.33333    86.33333</span><span class="w">
</span><span class="c1">#&gt; LayerAltB-C   137.33333   137.33333</span><span class="w">
</span></code></pre></div></div>

<p>Back to our contrast coding contract, we see that the contrast matrix
matrix-multiplied by the model coefficients gives us the level means.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mat_m2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">unique</span><span class="p">(</span><span class="n">model.matrix</span><span class="p">(</span><span class="n">m2</span><span class="p">))</span><span class="w">

</span><span class="n">mat_m2</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">coef</span><span class="p">(</span><span class="n">m2</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;         [,1]</span><span class="w">
</span><span class="c1">#&gt; 1  -5670.667</span><span class="w">
</span><span class="c1">#&gt; 4  -5808.000</span><span class="w">
</span><span class="c1">#&gt; 6  -5894.333</span><span class="w">
</span><span class="c1">#&gt; 9  -6109.000</span><span class="w">
</span><span class="c1">#&gt; 11 -6299.000</span><span class="w">

</span><span class="c1"># By hand</span><span class="w">
</span><span class="n">aggregate</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">Layer</span><span class="p">,</span><span class="w"> </span><span class="n">table1</span><span class="p">,</span><span class="w"> </span><span class="n">mean</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;   Layer       C14</span><span class="w">
</span><span class="c1">#&gt; 1     B -5670.667</span><span class="w">
</span><span class="c1">#&gt; 2     C -5808.000</span><span class="w">
</span><span class="c1">#&gt; 3     E -5894.333</span><span class="w">
</span><span class="c1">#&gt; 4     G -6109.000</span><span class="w">
</span><span class="c1">#&gt; 5     I -6299.000</span><span class="w">
</span></code></pre></div></div>

<p>It’s so clean and simple. We still get the level means and the
parameters estimate specific comparisons of interest to us. So, how are
the categorical variables and their differences coded in the model’s
contrast matrix?</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mat_m2</span><span class="w">
</span><span class="c1">#&gt;    (Intercept) LayerAltG-I LayerAltE-G LayerAltC-E LayerAltB-C</span><span class="w">
</span><span class="c1">#&gt; 1            1         0.2         0.4         0.6         0.8</span><span class="w">
</span><span class="c1">#&gt; 4            1         0.2         0.4         0.6        -0.2</span><span class="w">
</span><span class="c1">#&gt; 6            1         0.2         0.4        -0.4        -0.2</span><span class="w">
</span><span class="c1">#&gt; 9            1         0.2        -0.6        -0.4        -0.2</span><span class="w">
</span><span class="c1">#&gt; 11           1        -0.8        -0.6        -0.4        -0.2</span><span class="w">
</span></code></pre></div></div>

<p>Wait… what? 😕</p>

<h2 id="the-comparison-matrix">The Comparison Matrix</h2>

<p>When I first started drafting this post, I made it to this point and
noped out for a few days. My curiosity did win out eventually, and I hit
the books (remembered <a href="https://twitter.com/CookieSci/status/1562221740230676481">this
tweet</a> and
<a href="https://twitter.com/bolkerb/status/1565077056169312257">this handout</a>,
watched <a href="https://www.youtube.com/watch?v=yLgPpmXVVbs">this video</a>, read
<a href="https://www.sciencedirect.com/science/article/pii/S0749596X19300695">this
paper</a>,
and read section 9.1.2 in <em>Applied Regression Analysis &amp; Generalized
Linear Models</em>). Now, for the rest of the post.</p>

<p>The best formal, citable source for what I describe here is <a href="https://www.sciencedirect.com/science/article/pii/S0749596X19300695">Schad and
colleagues
(2020)</a>,
but what they call a “hypothesis matrix”, I’m calling a <em>comparison
matrix</em>. I do this for two reasons: 1) to get away from hypothesis
testing mindset (see Figure 1) and 2) because we are using the
hypothesis matrix to apply a constraint among parameter values (remember
that?).</p>

<figure class="" style="max-width: 66%; display: block; margin: 2em auto;"><img src="/assets/images/2023-07-bayes-sign.jpeg" alt="In this house, we beleive: Bayes is good, estimate with uncertainty is better than hypothesis testing, math is hard, sampling is easy, Bayesian estimation wtih informative priors is indistinguishable from data falsifications, and it kicks ass." /><figcaption>
      Figure 1. The sign in my yard.

    </figcaption></figure>

<p>In this approach, we define the model parameters <strong><em>β</em></strong> by
matrix-multiplying the the comparison matrix <strong>C</strong> (which activates or
weights different level means) and the levels means <strong><em>μ</em></strong>.</p>

\[\mathbf{C}\boldsymbol{\mu} = \boldsymbol{\beta} \\
 \begin{bmatrix}
  \textrm{weights for comparison 1} \\
  \textrm{weights for comparison 2} \\
  \textrm{weights for comparison 3} \\
  \cdots \\
 \end{bmatrix}
 \begin{bmatrix}
  \mu_1 \\
  \mu_2 \\
  \mu_3 \\
  \cdots \\
 \end{bmatrix} = 
 \begin{bmatrix}
  \beta_0 \\
  \beta_1 \\
  \beta_2 \\
  \cdots \\
 \end{bmatrix}\]

<p>So, in the dummy-coded version of the model, we had the following
comparison matrix:</p>

\[\mathbf{C}_\text{dummy}\boldsymbol{\mu} = \boldsymbol{\beta}_\text{dummy} \\
 \begin{bmatrix}
  1 &amp; 0 &amp; 0 &amp; 0 &amp; 0 \\
  -1 &amp; 1 &amp; 0 &amp; 0 &amp; 0 \\
  -1 &amp; 0 &amp; 1 &amp; 0 &amp; 0 \\
  -1 &amp; 0 &amp; 0 &amp; 1 &amp; 0 \\
  -1 &amp; 0 &amp; 0 &amp; 0 &amp; 1 \\
 \end{bmatrix}
 \begin{bmatrix}
  \mu_{\text{Layer B}} \\
  \mu_{\text{Layer C}} \\
  \mu_{\text{Layer E}} \\
  \mu_{\text{Layer G}} \\
  \mu_{\text{Layer I}} \\
 \end{bmatrix} = 
 \begin{bmatrix}
  \beta_0: \mu_{\text{Layer B}} \\
  \beta_1: \mu_{\text{Layer C}} - \mu_{\text{Layer B}} \\
  \beta_2: \mu_{\text{Layer E}} - \mu_{\text{Layer B}} \\
  \beta_3: \mu_{\text{Layer G}} - \mu_{\text{Layer B}} \\
  \beta_4: \mu_{\text{Layer I}} - \mu_{\text{Layer B}} \\
 \end{bmatrix}\]

<p>The first row in <strong>C</strong> sets the Layer B as the reference value for the
dummy coding. The second row turns on both Layer B and Layer C, but
Layer B is negatively weighted. Thus, the corresponding model
coefficient is the difference between Layers C and B.</p>

<p>The comparison matrix for the reverse successive difference contrast
coding is similar. The first row activates all of the layers buts
equally weights them, so we get a mean of means for the model intercept. Each
row after the first is the difference between two layer means.</p>

\[\mathbf{C}_\text{rev-diffs}\boldsymbol{\mu} = \boldsymbol{\beta}_\text{rev-diffs} \\
 \begin{bmatrix}
  .2 &amp; .2 &amp; .2 &amp; .2 &amp; .2 \\
  0 &amp;  0 &amp;  0 &amp;  1 &amp; -1 \\
  0 &amp;  0 &amp;  1 &amp; -1 &amp;  0 \\
  0 &amp;  1 &amp; -1 &amp;  0 &amp;  0 \\
  1 &amp; -1 &amp;  0 &amp;  0 &amp;  0 \\
 \end{bmatrix}
 \begin{bmatrix}
  \mu_{\text{Layer B}} \\
  \mu_{\text{Layer C}} \\
  \mu_{\text{Layer E}} \\
  \mu_{\text{Layer G}} \\
  \mu_{\text{Layer I}} \\
 \end{bmatrix} = 
 \begin{bmatrix}
  \beta_0: \text{mean of } \mu \\
  \beta_1: \mu_{\text{Layer G}} - \mu_{\text{Layer I}} \\
  \beta_2: \mu_{\text{Layer E}} - \mu_{\text{Layer G}} \\
  \beta_3: \mu_{\text{Layer C}} - \mu_{\text{Layer E}} \\
  \beta_4: \mu_{\text{Layer B}} - \mu_{\text{Layer C}} \\
 \end{bmatrix}\]

<p>Now, here is the magic part 🔮. Multiplying both sides by the inverse of
the comparison matrix will set up a design matrix for the linear model
which follows the contract for the contrast matrices I described above:</p>

\[\mathbf{C}\boldsymbol{\mu} = \boldsymbol{\beta} \\
\mathbf{C}^{-1}\mathbf{C}\boldsymbol{\mu} = \mathbf{C}^{-1}\boldsymbol{\beta} \\
\boldsymbol{\mu} = \mathbf{C}^{-1}\boldsymbol{\beta} \\
\mathbf{\hat y} = \mathbf{X}\boldsymbol{\beta} \\\]

<p>So, we can invert<sup id="fnref:invert" role="doc-noteref"><a href="#fn:invert" class="footnote" rel="footnote">1</a></sup> our comparison matrix to get the model’s contrast matrix:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">comparisons</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
  </span><span class="m">.2</span><span class="p">,</span><span class="w"> </span><span class="m">.2</span><span class="p">,</span><span class="w"> </span><span class="m">.2</span><span class="p">,</span><span class="w"> </span><span class="m">.2</span><span class="p">,</span><span class="w"> </span><span class="m">.2</span><span class="p">,</span><span class="w">
   </span><span class="m">0</span><span class="p">,</span><span class="w">  </span><span class="m">0</span><span class="p">,</span><span class="w">  </span><span class="m">0</span><span class="p">,</span><span class="w">  </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w">
   </span><span class="m">0</span><span class="p">,</span><span class="w">  </span><span class="m">0</span><span class="p">,</span><span class="w">  </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w">  </span><span class="m">0</span><span class="p">,</span><span class="w">
   </span><span class="m">0</span><span class="p">,</span><span class="w">  </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w">  </span><span class="m">0</span><span class="p">,</span><span class="w">  </span><span class="m">0</span><span class="p">,</span><span class="w">
   </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w">  </span><span class="m">0</span><span class="p">,</span><span class="w">  </span><span class="m">0</span><span class="p">,</span><span class="w">  </span><span class="m">0</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="n">mat_comparisons</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="n">comparisons</span><span class="p">,</span><span class="w"> </span><span class="n">nrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">byrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">solve</span><span class="p">(</span><span class="n">mat_comparisons</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;      [,1] [,2] [,3] [,4] [,5]</span><span class="w">
</span><span class="c1">#&gt; [1,]    1  0.2  0.4  0.6  0.8</span><span class="w">
</span><span class="c1">#&gt; [2,]    1  0.2  0.4  0.6 -0.2</span><span class="w">
</span><span class="c1">#&gt; [3,]    1  0.2  0.4 -0.4 -0.2</span><span class="w">
</span><span class="c1">#&gt; [4,]    1  0.2 -0.6 -0.4 -0.2</span><span class="w">
</span><span class="c1">#&gt; [5,]    1 -0.8 -0.6 -0.4 -0.2</span><span class="w">

</span><span class="n">mat_m2</span><span class="w">
</span><span class="c1">#&gt;    (Intercept) LayerAltG-I LayerAltE-G LayerAltC-E LayerAltB-C</span><span class="w">
</span><span class="c1">#&gt; 1            1         0.2         0.4         0.6         0.8</span><span class="w">
</span><span class="c1">#&gt; 4            1         0.2         0.4         0.6        -0.2</span><span class="w">
</span><span class="c1">#&gt; 6            1         0.2         0.4        -0.4        -0.2</span><span class="w">
</span><span class="c1">#&gt; 9            1         0.2        -0.6        -0.4        -0.2</span><span class="w">
</span><span class="c1">#&gt; 11           1        -0.8        -0.6        -0.4        -0.2</span><span class="w">
</span></code></pre></div></div>

<p>Or, perhaps more commonly, we can take the contrast matrix used by a model and
recover the comparison matrix, which is a nice trick when we have R
automatically set the contrast values for us:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Dummy coding example</span><span class="w">
</span><span class="n">mat_m1</span><span class="w">
</span><span class="c1">#&gt;    (Intercept) LayerC LayerE LayerG LayerI</span><span class="w">
</span><span class="c1">#&gt; 1            1      0      0      0      0</span><span class="w">
</span><span class="c1">#&gt; 4            1      1      0      0      0</span><span class="w">
</span><span class="c1">#&gt; 6            1      0      1      0      0</span><span class="w">
</span><span class="c1">#&gt; 9            1      0      0      1      0</span><span class="w">
</span><span class="c1">#&gt; 11           1      0      0      0      1</span><span class="w">
</span><span class="n">solve</span><span class="p">(</span><span class="n">mat_m1</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;              1 4 6 9 11</span><span class="w">
</span><span class="c1">#&gt; (Intercept)  1 0 0 0  0</span><span class="w">
</span><span class="c1">#&gt; LayerC      -1 1 0 0  0</span><span class="w">
</span><span class="c1">#&gt; LayerE      -1 0 1 0  0</span><span class="w">
</span><span class="c1">#&gt; LayerG      -1 0 0 1  0</span><span class="w">
</span><span class="c1">#&gt; LayerI      -1 0 0 0  1</span><span class="w">

</span><span class="c1"># Successive differences coding example</span><span class="w">
</span><span class="n">mat_m2</span><span class="w">
</span><span class="c1">#&gt;    (Intercept) LayerAltG-I LayerAltE-G LayerAltC-E LayerAltB-C</span><span class="w">
</span><span class="c1">#&gt; 1            1         0.2         0.4         0.6         0.8</span><span class="w">
</span><span class="c1">#&gt; 4            1         0.2         0.4         0.6        -0.2</span><span class="w">
</span><span class="c1">#&gt; 6            1         0.2         0.4        -0.4        -0.2</span><span class="w">
</span><span class="c1">#&gt; 9            1         0.2        -0.6        -0.4        -0.2</span><span class="w">
</span><span class="c1">#&gt; 11           1        -0.8        -0.6        -0.4        -0.2</span><span class="w">
</span><span class="n">solve</span><span class="p">(</span><span class="n">mat_m2</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;               1    4    6    9   11</span><span class="w">
</span><span class="c1">#&gt; (Intercept) 0.2  0.2  0.2  0.2  0.2</span><span class="w">
</span><span class="c1">#&gt; LayerAltG-I 0.0  0.0  0.0  1.0 -1.0</span><span class="w">
</span><span class="c1">#&gt; LayerAltE-G 0.0  0.0  1.0 -1.0  0.0</span><span class="w">
</span><span class="c1">#&gt; LayerAltC-E 0.0  1.0 -1.0  0.0  0.0</span><span class="w">
</span><span class="c1">#&gt; LayerAltB-C 1.0 -1.0  0.0  0.0  0.0</span><span class="w">
</span></code></pre></div></div>

<p>As I said earlier, there are all kinds of contrast coding schemes which
allow us to define the model parameters in terms of specific
comparisons, and this post only mentions two such schemes (dummy coding
and a reversed version of successive differences coding).</p>

<h2 id="finally-in-layer-i-of-this-post-the-brms-model">Finally, in Layer I of this post, the brms model</h2>

<p>Now that we know about contrasts, and how they let us define model
parameters in terms of the comparisons we want to make, we can use this
technique to enforce an ordering constraint.</p>

<p>We set up our model as in Ben-Shachar’s <a href="https://blog.msbstats.info/posts/2023-06-26-order-constraints-in-brms/" title="Order Constraints in Bayes Models (with brms)">post</a>, but
here we set a prior for <code class="language-plaintext highlighter-rouge">normal(500, 250)</code> on the non-intercept
coefficients with a lower-bound of 0 <code class="language-plaintext highlighter-rouge">lb = 0</code> to enforce the
ordering constraint.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">brms</span><span class="p">)</span><span class="w">
</span><span class="n">priors</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> 
  </span><span class="n">set_prior</span><span class="p">(</span><span class="s2">"normal(-5975, 1000)"</span><span class="p">,</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Intercept"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">set_prior</span><span class="p">(</span><span class="s2">"normal(500, 250)"</span><span class="p">,</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"b"</span><span class="p">,</span><span class="w"> </span><span class="n">lb</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">set_prior</span><span class="p">(</span><span class="s2">"exponential(0.01)"</span><span class="p">,</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"sigma"</span><span class="p">)</span><span class="w">

</span><span class="n">validate_prior</span><span class="p">(</span><span class="w">
  </span><span class="n">priors</span><span class="p">,</span><span class="w">
  </span><span class="n">bf</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">se</span><span class="p">(</span><span class="n">error</span><span class="p">,</span><span class="w"> </span><span class="n">sigma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">LayerAlt</span><span class="p">),</span><span class="w">
  </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table1</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;                prior     class        coef group resp dpar nlpar lb ub</span><span class="w">
</span><span class="c1">#&gt;     normal(500, 250)         b                                    0   </span><span class="w">
</span><span class="c1">#&gt;     normal(500, 250)         b LayerAltBMC                        0   </span><span class="w">
</span><span class="c1">#&gt;     normal(500, 250)         b LayerAltCME                        0   </span><span class="w">
</span><span class="c1">#&gt;     normal(500, 250)         b LayerAltEMG                        0   </span><span class="w">
</span><span class="c1">#&gt;     normal(500, 250)         b LayerAltGMI                        0   </span><span class="w">
</span><span class="c1">#&gt;  normal(-5975, 1000) Intercept                                        </span><span class="w">
</span><span class="c1">#&gt;    exponential(0.01)     sigma                                    0   </span><span class="w">
</span><span class="c1">#&gt;        source</span><span class="w">
</span><span class="c1">#&gt;          user</span><span class="w">
</span><span class="c1">#&gt;  (vectorized)</span><span class="w">
</span><span class="c1">#&gt;  (vectorized)</span><span class="w">
</span><span class="c1">#&gt;  (vectorized)</span><span class="w">
</span><span class="c1">#&gt;  (vectorized)</span><span class="w">
</span><span class="c1">#&gt;          user</span><span class="w">
</span><span class="c1">#&gt;          user</span><span class="w">
</span></code></pre></div></div>

<p>We fit the model:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m3</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">brm</span><span class="p">(</span><span class="w">
  </span><span class="n">bf</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">se</span><span class="p">(</span><span class="n">error</span><span class="p">,</span><span class="w"> </span><span class="n">sigma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">LayerAlt</span><span class="p">),</span><span class="w">
  </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gaussian</span><span class="p">(</span><span class="s2">"identity"</span><span class="p">),</span><span class="w">
  </span><span class="n">prior</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">priors</span><span class="p">,</span><span class="w">
  </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table1</span><span class="p">,</span><span class="w">
  </span><span class="n">seed</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4321</span><span class="p">,</span><span class="w">
  </span><span class="n">backend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"cmdstanr"</span><span class="p">,</span><span class="w">
  </span><span class="n">cores</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> 
  </span><span class="c1"># caching</span><span class="w">
  </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"_caches/2023-07-03"</span><span class="p">,</span><span class="w"> 
  </span><span class="n">file_refit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"on_change"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>We can see that the level differences are indeed positive with 95%
intervals of positive values.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">summary</span><span class="p">(</span><span class="n">m3</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;  Family: gaussian </span><span class="w">
</span><span class="c1">#&gt;   Links: mu = identity; sigma = identity </span><span class="w">
</span><span class="c1">#&gt; Formula: C14 | se(error, sigma = TRUE) ~ 1 + LayerAlt </span><span class="w">
</span><span class="c1">#&gt;    Data: table1 (Number of observations: 12) </span><span class="w">
</span><span class="c1">#&gt;   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;</span><span class="w">
</span><span class="c1">#&gt;          total post-warmup draws = 4000</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; Population-Level Effects: </span><span class="w">
</span><span class="c1">#&gt;             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS</span><span class="w">
</span><span class="c1">#&gt; Intercept   -5957.60     27.91 -6011.89 -5900.71 1.00     1964     1715</span><span class="w">
</span><span class="c1">#&gt; LayerAltGMI   211.00     82.29    51.67   378.86 1.00     1693      939</span><span class="w">
</span><span class="c1">#&gt; LayerAltEMG   206.15     71.30    68.47   349.07 1.00     1937     1185</span><span class="w">
</span><span class="c1">#&gt; LayerAltCME   105.55     62.84     7.90   243.81 1.00     1377     1023</span><span class="w">
</span><span class="c1">#&gt; LayerAltBMC   145.95     65.13    23.63   279.12 1.00     1684      857</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; Family Specific Parameters: </span><span class="w">
</span><span class="c1">#&gt;       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS</span><span class="w">
</span><span class="c1">#&gt; sigma    79.03     26.95    41.05   142.49 1.00     1651     2149</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; Draws were sampled using sample(hmc). For each parameter, Bulk_ESS</span><span class="w">
</span><span class="c1">#&gt; and Tail_ESS are effective sample size measures, and Rhat is the potential</span><span class="w">
</span><span class="c1">#&gt; scale reduction factor on split chains (at convergence, Rhat = 1).</span><span class="w">
</span><span class="n">bayesplot</span><span class="o">::</span><span class="n">mcmc_intervals</span><span class="p">(</span><span class="n">m3</span><span class="p">,</span><span class="w"> </span><span class="n">regex_pars</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Layer"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<div class="figure" style="text-align: center">
<img src="/figs/2023-07-03-bayesian-ordering-constraint/level-diffs-1.png" alt="Estimates of the level differences." width="80%" />
<p class="caption">Estimates of the level differences.</p>
</div>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">conditional_effects</span><span class="p">(</span><span class="n">m3</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<div class="figure" style="text-align: center">
<img src="/figs/2023-07-03-bayesian-ordering-constraint/level-means-1.png" alt="Conditional means for each layer." width="80%" />
<p class="caption">Conditional means for each layer.</p>
</div>

<h2 id="normally-i-dont-think-you-need-contrast-codes">Normally, I don’t think you need contrast codes</h2>

<p>My general advice for contrast coding is to just fit the model and then
have the software compute the appropriate estimates and comparisons
afterwards on the outcome scale. For example,
<a href="https://cran.r-project.org/web/packages/emmeans/vignettes/comparisons.html">emmeans</a>
can take a fitted model, run requested comparisons, and handle multiple
comparisons and <em>p</em>-value adjustments for us.
<a href="https://vincentarelbundock.github.io/marginaleffects/">marginaleffects</a>
probably does this too. (I really need to play with it.) And in a
Bayesian model, we can compute comparisons of interest by doing math on
the posterior samples (estimating things and computing differences and
summarizing the distribution of the differences), but this particular
model, where the coding was needed to impose the prior ordering
constraint, ruled out the posterior post-processing approach.</p>

<hr />

<p><em>Last knitted on 2023-07-05. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2023-07-03-bayesian-ordering-constraint.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">2</a></sup></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:invert" role="doc-endnote">
      <p>I use <a href="https://rdrr.io/r/base/solve.html"><code class="language-plaintext highlighter-rouge">solve()</code></a> here for the inversion, but <a href="https://www.sciencedirect.com/science/article/pii/S0749596X19300695">Schad and 
colleagues 
(2020)</a> 
use the generalized inverse <a href="https://rdrr.io/pkg/MASS/man/ginv.html"><code class="language-plaintext highlighter-rouge">MASS::ginv()</code></a> or 
<a href="https://cran.r-project.org/web/packages/matlib/vignettes/ginv.html"><code class="language-plaintext highlighter-rouge">matlib::Ginv()</code></a>. 
<code class="language-plaintext highlighter-rouge">solve()</code> only works on square matrices, but the generalized inverse 
works on non-square matrices. <a href="#fnref:invert" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:si" role="doc-endnote">

      <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#&gt; ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  setting         value</span><span class="w">
</span><span class="c1">#&gt;  version         R version 4.3.0 (2023-04-21 ucrt)</span><span class="w">
</span><span class="c1">#&gt;  os              Windows 11 x64 (build 22621)</span><span class="w">
</span><span class="c1">#&gt;  system          x86_64, mingw32</span><span class="w">
</span><span class="c1">#&gt;  ui              RTerm</span><span class="w">
</span><span class="c1">#&gt;  language        (EN)</span><span class="w">
</span><span class="c1">#&gt;  collate         English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  ctype           English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  tz              America/Chicago</span><span class="w">
</span><span class="c1">#&gt;  date            2023-07-05</span><span class="w">
</span><span class="c1">#&gt;  pandoc          NA</span><span class="w">
</span><span class="c1">#&gt;  stan (rstan)    2.26.1</span><span class="w">
</span><span class="c1">#&gt;  stan (cmdstanr) 2.32.0</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  ! package        * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#&gt;    abind            1.4-5   2016-07-21 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    backports        1.4.1   2021-12-13 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    base64enc        0.1-3   2015-07-28 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    bayesplot        1.10.0  2022-11-16 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    bridgesampling   1.1-2   2021-04-16 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    brms           * 2.19.0  2023-03-14 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    Brobdingnag      1.2-9   2022-10-19 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    cachem           1.0.8   2023-05-01 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    callr            3.7.3   2022-11-02 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    checkmate        2.2.0   2023-04-27 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    cmdstanr         0.5.3   2023-04-24 [1] local</span><span class="w">
</span><span class="c1">#&gt;    coda             0.19-4  2020-09-30 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    codetools        0.2-19  2023-02-01 [2] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    colourpicker     1.2.0   2022-10-28 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    crayon           1.5.2   2022-09-29 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    crosstalk        1.2.0   2021-11-04 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    curl             5.0.1   2023-06-07 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    digest           0.6.32  2023-06-26 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#&gt;    distributional   0.3.2   2023-03-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    downlit          0.4.3   2023-06-29 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    dplyr          * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    DT               0.28    2023-05-18 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    dygraphs         1.1.1.6 2018-07-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    ellipsis         0.3.2   2021-04-29 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    emmeans          1.8.7   2023-06-23 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#&gt;    estimability     1.4.1   2022-08-05 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    farver           2.1.1   2022-07-06 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    ggplot2        * 3.4.2   2023-04-03 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    git2r            0.32.0  2023-04-12 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#&gt;    glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    gridExtra        2.3     2017-09-09 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    gtools           3.9.4   2022-11-27 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    here             1.0.1   2020-12-13 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    highr            0.10    2022-12-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    hms              1.1.3   2023-03-21 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    htmltools        0.5.5   2023-03-23 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    htmlwidgets      1.6.2   2023-03-17 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    httpuv           1.6.11  2023-05-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    igraph           1.5.0   2023-06-16 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#&gt;    inline           0.3.19  2021-05-31 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    jsonlite         1.8.5   2023-06-05 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#&gt;    knitr          * 1.43    2023-05-25 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    labeling         0.4.2   2020-10-20 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    later            1.3.1   2023-05-02 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    lattice          0.21-8  2023-04-05 [2] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    loo              2.6.0   2023-03-31 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    lubridate      * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    markdown         1.7     2023-05-16 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    MASS           * 7.3-60  2023-05-04 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    Matrix           1.5-4   2023-04-04 [2] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    matrixStats      1.0.0   2023-06-02 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    memoise          2.0.1   2021-11-26 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    mime             0.12    2021-09-28 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    miniUI           0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    mvtnorm          1.2-2   2023-06-08 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#&gt;    nlme             3.1-162 2023-01-31 [2] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    pkgbuild         1.4.2   2023-06-26 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#&gt;    pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    plyr             1.8.8   2022-11-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    posterior        1.4.1   2023-03-14 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    prettyunits      1.1.1   2020-01-24 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    processx         3.8.1   2023-04-18 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#&gt;    promises         1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    ps               1.7.5   2023-04-18 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    purrr          * 1.0.1   2023-01-10 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    ragg             1.2.5   2023-01-12 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    Rcpp           * 1.0.10  2023-01-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;  D RcppParallel     5.1.7   2023-02-27 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    readr          * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    reshape2         1.4.4   2020-04-09 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    rprojroot        2.0.3   2022-04-02 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    rstan            2.26.22 2023-05-02 [1] local</span><span class="w">
</span><span class="c1">#&gt;    rstantools       2.3.1   2023-03-30 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    rstudioapi       0.14    2022-08-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    shiny            1.7.4   2022-12-15 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    shinyjs          2.1.0   2021-12-23 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    shinystan        2.6.0   2022-03-03 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    shinythemes      1.2.0   2021-01-25 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    StanHeaders      2.26.27 2023-06-14 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#&gt;    stringi          1.7.12  2023-01-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    stringr        * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    systemfonts      1.0.4   2022-02-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    tensorA          0.36.2  2020-11-19 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    textshaping      0.3.6   2021-10-13 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    threejs          0.3.3   2020-01-21 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    tidyr          * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    timechange       0.2.0   2023-01-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    tzdb             0.4.0   2023-05-12 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    V8               4.3.0   2023-04-08 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#&gt;    vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#&gt;    withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    xfun             0.39    2023-04-20 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    xtable           1.8-4   2019-04-21 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    xts              0.13.1  2023-04-16 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt;    zoo              1.8-12  2023-04-13 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;  [1] C:/Users/Tristan/AppData/Local/R/win-library/4.3</span><span class="w">
</span><span class="c1">#&gt;  [2] C:/Program Files/R/R-4.3.0/library</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;  D ── DLL MD5 mismatch, broken installation.</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div>      </div>
      <p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Tristan Mahr</name><email>tjmahrweb@gmail.com</email></author><category term="r" /><category term="bayesian" /><category term="brms" /><category term="math" /><summary type="html"><![CDATA[But mostly how contrast matrices are computed]]></summary></entry><entry><title type="html">How to score Rock Paper Scissors</title><link href="https://tjmahr.github.io/rock-paper-scissors-lists-are-trees/" rel="alternate" type="text/html" title="How to score Rock Paper Scissors" /><published>2022-12-06T00:00:00-06:00</published><updated>2022-12-06T00:00:00-06:00</updated><id>https://tjmahr.github.io/rock-paper-scissors-lists-are-trees</id><content type="html" xml:base="https://tjmahr.github.io/rock-paper-scissors-lists-are-trees/"><![CDATA[<p>Ho ho ho, it is the most wonderful time of the year: Advent of code!</p>

<p>AOC is a yearly collection of programming puzzles throughout the
first 25 days of December. I like it… so much so that I wrote <a href="https://github.com/tjmahr/aoc">an R
package</a> for completing my puzzles using
the structure of an R package. The puzzles start out easy and get
progressively more elaborate or devious in their requirements. But I am
going to talk about an easy puzzle in this post, and specifically, one
little trick I used in my solution.</p>

<p><a href="https://adventofcode.com/2022/day/2">Day 2 of 2022</a> requires us to score games of Rock Paper Scissors. The
moves are encoded using letters, where our opponent’s moves are coded as
<code class="language-plaintext highlighter-rouge">A</code>, <code class="language-plaintext highlighter-rouge">B</code>, <code class="language-plaintext highlighter-rouge">C</code> and ours are coded as <code class="language-plaintext highlighter-rouge">X</code>, <code class="language-plaintext highlighter-rouge">Y</code>, <code class="language-plaintext highlighter-rouge">Z</code>. So, an input
describing three moves will look like the following:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">example_input</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
  </span><span class="s2">"A Y"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"B X"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"C Z"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>Where the letters mean the following:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">move_codes</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
  </span><span class="s2">"A"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rock"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"B"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"paper"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"C"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"scissors"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"X"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rock"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"Y"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"paper"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"Z"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"scissors"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>This encoding seems like a weird bit of indirection thrown on, and <em>it is</em>,
because the puzzle changes the meanings of the letters in Part 2. Still,
it is straightforward to parse the input into a list of roshambo moves.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">input</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">example_input</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">strsplit</span><span class="p">(</span><span class="s2">" "</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="c1"># Use character subsetting to convert letters to moves</span><span class="w">
  </span><span class="n">lapply</span><span class="p">(</span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="n">unname</span><span class="p">(</span><span class="n">move_codes</span><span class="p">[</span><span class="n">x</span><span class="p">]))</span><span class="w"> 

</span><span class="c1"># Our character's move is the second element in each vector</span><span class="w">
</span><span class="n">str</span><span class="p">(</span><span class="n">input</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; List of 3</span><span class="w">
</span><span class="c1">#&gt;  $ : chr [1:2] "rock" "paper"</span><span class="w">
</span><span class="c1">#&gt;  $ : chr [1:2] "paper" "rock"</span><span class="w">
</span><span class="c1">#&gt;  $ : chr [1:2] "scissors" "scissors"</span><span class="w">
</span></code></pre></div></div>

<p>Now, for the point of this post, <strong>how do we score each game?</strong></p>

<p>The naive approach is to start typing away furiously</p>

<p><img src="/figs/2022-12-06-rock-paper-scissors-lists-are-trees//unnamed-chunk-5.svg" alt="center" width="100%" style="display: block; margin: auto;" /></p>

<p>before eventually noping the hell out of there.</p>

<p>What we have is a decision tree: we need to follow a branch for player
one and another branch for player two. And here’s the main point of this
post: <strong>nested lists are trees</strong>. (Yes, I love lists—see <a href="/lists-knitr-secret-weapon/">this
post</a> where I use them in my knitr
reporting.) The top (outer) level of the list will be all of the player
one options, and then the bottom (inner) level will be all the player
two options. The nodes of the tree (bottom level values) are the
outcomes of the games.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">run_game</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">pair</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="c1"># nested lists are trees</span><span class="w">
  </span><span class="n">rules</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
    </span><span class="n">rock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
      </span><span class="n">rock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"draw"</span><span class="p">,</span><span class="w">
      </span><span class="n">scissors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lose"</span><span class="p">,</span><span class="w">
      </span><span class="n">paper</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"win"</span><span class="w">
    </span><span class="p">),</span><span class="w">
    </span><span class="n">scissors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
      </span><span class="n">scissors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"draw"</span><span class="p">,</span><span class="w">
      </span><span class="n">rock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"win"</span><span class="p">,</span><span class="w">
      </span><span class="n">paper</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lose"</span><span class="w">
    </span><span class="p">),</span><span class="w">
    </span><span class="n">paper</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
      </span><span class="n">paper</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"draw"</span><span class="p">,</span><span class="w">
      </span><span class="n">scissors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"win"</span><span class="p">,</span><span class="w">
      </span><span class="n">rock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lose"</span><span class="w">
    </span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w">

  </span><span class="c1"># Because `rules[[pair[1]]][[pair[2]]]` is unsightly:</span><span class="w">
  </span><span class="n">rules</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
    </span><span class="n">getElement</span><span class="p">(</span><span class="n">pair</span><span class="p">[</span><span class="m">1</span><span class="p">])</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
    </span><span class="n">getElement</span><span class="p">(</span><span class="n">pair</span><span class="p">[</span><span class="m">2</span><span class="p">])</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>At this point, we could take a second to ponder how the structure of
several nested if-elses—the actual shape of the code, indenting in and
out in and in again—resembles the structure and the shape of the
nested list, and ponder further about how the regular, orderly shape of
code could be the whispers of hidden data, saying “<code class="language-plaintext highlighter-rouge">list()</code> me, <code class="language-plaintext highlighter-rouge">list()</code>
me”. Or, we could run the code and see it in action.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">input</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">lapply</span><span class="p">(</span><span class="n">run_game</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [[1]]</span><span class="w">
</span><span class="c1">#&gt; [1] "win"</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; [[2]]</span><span class="w">
</span><span class="c1">#&gt; [1] "lose"</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; [[3]]</span><span class="w">
</span><span class="c1">#&gt; [1] "draw"</span><span class="w">

</span><span class="c1"># Or to repeat the input</span><span class="w">
</span><span class="n">input</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">stats</span><span class="o">::</span><span class="n">setNames</span><span class="p">(</span><span class="n">input</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">lapply</span><span class="p">(</span><span class="n">run_game</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; $`c("rock", "paper")`</span><span class="w">
</span><span class="c1">#&gt; [1] "win"</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; $`c("paper", "rock")`</span><span class="w">
</span><span class="c1">#&gt; [1] "lose"</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; $`c("scissors", "scissors")`</span><span class="w">
</span><span class="c1">#&gt; [1] "draw"</span><span class="w">
</span></code></pre></div></div>

<hr />

<p>Earlier in the post, I used <a href="https://adv-r.hadley.nz/subsetting.html#lookup-tables">character
subsetting</a> to
convert letters into moves. This process turned a matching/replacement
problem into a data lookup problem. The Rock Paper Scissors are the same
trick again: converting a decision tree into a data lookup problem.</p>

<hr />

<p><em>Last knitted on 2022-12-06. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2022-12-06-rock-paper-scissors-lists-are-trees.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:si" role="doc-endnote">

      <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#&gt; ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  setting  value</span><span class="w">
</span><span class="c1">#&gt;  version  R version 4.2.2 (2022-10-31 ucrt)</span><span class="w">
</span><span class="c1">#&gt;  os       Windows 10 x64 (build 22621)</span><span class="w">
</span><span class="c1">#&gt;  system   x86_64, mingw32</span><span class="w">
</span><span class="c1">#&gt;  ui       RTerm</span><span class="w">
</span><span class="c1">#&gt;  language (EN)</span><span class="w">
</span><span class="c1">#&gt;  collate  English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  ctype    English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  tz       America/Chicago</span><span class="w">
</span><span class="c1">#&gt;  date     2022-12-06</span><span class="w">
</span><span class="c1">#&gt;  pandoc   NA</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  package     * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#&gt;  asciicast     2.3.0   2022-12-05 [1] CRAN (R 4.2.2)</span><span class="w">
</span><span class="c1">#&gt;  cli           3.4.1   2022-09-23 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  curl          4.3.3   2022-10-06 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  evaluate      0.18    2022-11-07 [1] CRAN (R 4.2.2)</span><span class="w">
</span><span class="c1">#&gt;  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  git2r         0.30.1  2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  here          1.0.1   2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  jsonlite      1.8.3   2022-10-21 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  knitr       * 1.40    2022-08-24 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  magick        2.7.3   2021-08-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pillar        1.8.1   2022-08-19 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  processx      3.8.0   2022-10-26 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  ps            1.7.2   2022-10-26 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ragg          1.2.4   2022-10-24 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  Rcpp          1.0.9   2022-07-08 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  rlang         1.0.6   2022-09-24 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringi       1.7.8   2022-07-11 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  stringr       1.4.1   2022-08-20 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  systemfonts   1.0.4   2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  textshaping   0.3.6   2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tibble        3.1.8   2022-07-22 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  V8            4.2.2   2022-11-03 [1] CRAN (R 4.2.2)</span><span class="w">
</span><span class="c1">#&gt;  vctrs         0.5.0   2022-10-22 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt;  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  xfun          0.34    2022-10-18 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;  [1] C:/Users/trist/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#&gt;  [2] C:/Program Files/R/R-4.2.2/library</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div>      </div>
      <p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Tristan Mahr</name><email>tjmahrweb@gmail.com</email></author><category term="r" /><category term="advent of code" /><summary type="html"><![CDATA[Lists are trees]]></summary></entry><entry><title type="html">Creating a Summoning Salt-style speedrun plot</title><link href="https://tjmahr.github.io/summoning-salt-plot/" rel="alternate" type="text/html" title="Creating a Summoning Salt-style speedrun plot" /><published>2022-05-24T00:00:00-05:00</published><updated>2022-05-24T00:00:00-05:00</updated><id>https://tjmahr.github.io/summoning-salt-plot</id><content type="html" xml:base="https://tjmahr.github.io/summoning-salt-plot/"><![CDATA[<p>A videogame speedrun is a challenge to beat the game as quickly as
possible. It’s time attack racing but for a videogame. There are, in my
mind, two ways to make a run’s time go faster: Playing better and more
smoothly (optimizations, having better luck) and playing less of the
game (better routing, new glitches/skips). The history of a speedrun
category then is often an exciting mix of evolutionary improvements as
players level up their skills and revolutionary jumps as players find
new ways to cut through the game.</p>

<p><a href="https://www.youtube.com/c/SummoningSalt">Summoning Salt</a> is a Youtube
creator who creates documentaries that trace out the world record
progression in a speedrun. The videos are immensely enjoyable, as Salt
dishes out the history bit by bit, record by record, sometimes in a suspenseful fashion.</p>

<p>As a data visualization person, I’ve noticed that Summoning
Salt recently started to use a new prop in the videos: A step graph of the
world record times. The graph is developed throughout a video as players 
(represented by individual colors) lower the times with new
records (points) until you get a full reveal of a timeline like the
following:</p>

<figure class="" style="max-width: 100%; display: block; margin: 2em auto;"><img src="/assets/images/2022-05-wr-plot-1.png" alt="Screenshot of a timeline from a Summoning Salt video." /><figcaption>
      Screenshot of a timeline from a Summoning Salt video.

    </figcaption></figure>

<p>Let’s recreate this figure in R with ggplot2.</p>

<h2 id="warp-pipe-obtaining-the-data">Warp pipe: Obtaining the data</h2>

<p>The game in question is <em>New Super Mario Bros Wii</em>, and the record
keeper is the site <a href="https://www.speedrun.com/nsmbw">speedrun.com</a>. There
is not just one speedrun category for this game, so in particular, we
want the “Any%” record history (i.e., “any percent”: you don’t have
play every level, and you can skip parts of the game.)</p>

<p>We need to get the leaderboard history data from speedrun.com. There is an
<a href="https://github.com/speedruncomorg/api">official REST API</a> for the
site’s data, but it’s not straightforward how to query it to obtain the
data needed for a world record progression. (Apparently, one could
request <a href="https://github.com/speedruncomorg/api/issues/123">the leaderboard on different
dates</a> and work
backwards through time.) But that’s okay, we are not going to use the
API. Instead, the <a href="https://www.speedrun.com/nsmbw/gamestats">statistics page for the
game</a> has a plot that is
tantalizingly close to the one we want to create.</p>

<figure class="" style="max-width: 100%; display: block; margin: 2em auto;"><img src="/assets/images/2022-05-wr-plot-2.png" alt="A timeline figure from speedrun.com." /><figcaption>
      A timeline figure from speedrun.com.

    </figcaption></figure>

<p>This plot is <em>interactive</em>, and our browser is downloading the data and
plotting it for us. If we snoop around the page, we can find the JSON
data behind the plot. In Firefox, when I right-click on the plot and hit
“Inspect”, I see the HTML code that contains the plot. Just below the
plot’s div is a chunk of Javascript.</p>

<figure class="" style="max-width: 100%; display: block; margin: 2em auto;"><img src="/assets/images/2022-05-firefox-shot2.png" alt="A screenshot of the Firefox inspector showing the speedrun data in a Javascript script tag." /><figcaption>
      A screenshot of the Firefox inspector showing the speedrun data in a Javascript script tag.

    </figcaption></figure>

<p>The first line of it is all the speedrun data that is being plotted. We
save that JSON into <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/data/2022-05-23-nsmbw-runs.json">its own file</a>.</p>

<h2 id="ground-pound-filtering-and-cleaning-the-data">Ground pound: Filtering and cleaning the data</h2>

<p>Let’s read the data into R. JSON is short for “Javascript Object
Notation”, and it’s basically the equivalent of a <code class="language-plaintext highlighter-rouge">list()</code> in R. Hence,
<a href="https://rdrr.io/pkg/jsonlite/man/read_json.html">jsonlite</a> provides a large, deeply nested list for us.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">

</span><span class="c1"># a helper function to download the data from github</span><span class="w">
</span><span class="c1"># in case you want to play along</span><span class="w">
</span><span class="n">path_blog_data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">file.path</span><span class="p">(</span><span class="w">
    </span><span class="s2">"https://raw.githubusercontent.com"</span><span class="p">,</span><span class="w">
    </span><span class="s2">"tjmahr/tjmahr.github.io/master/_R/data"</span><span class="p">,</span><span class="w">
    </span><span class="n">x</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">json_runs</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">path_blog_data</span><span class="p">(</span><span class="s2">"2022-05-23-nsmbw-runs.json"</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">jsonlite</span><span class="o">::</span><span class="n">read_json</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>

<p>The plot on the statistics page has a dropdown menu for different
kinds of records to display, so this JSON object has a sublist for each
dropdown menu choice. What we want is the first sublist (full game runs)
then its first sublist (with a <code class="language-plaintext highlighter-rouge">label</code> of <code class="language-plaintext highlighter-rouge">"Any% - Physical"</code>) then its
<code class="language-plaintext highlighter-rouge">"data"</code>.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Dropdown menu choices</span><span class="w">
</span><span class="n">str</span><span class="p">(</span><span class="n">json_runs</span><span class="p">,</span><span class="w"> </span><span class="n">max.level</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; List of 10</span><span class="w">
</span><span class="c1">#&gt;  $ 0   :List of 7</span><span class="w">
</span><span class="c1">#&gt;  $ 6789:List of 18</span><span class="w">
</span><span class="c1">#&gt;  $ 6805:List of 18</span><span class="w">
</span><span class="c1">#&gt;  $ 6815:List of 18</span><span class="w">
</span><span class="c1">#&gt;  $ 6826:List of 19</span><span class="w">
</span><span class="c1">#&gt;  $ 6841:List of 18</span><span class="w">
</span><span class="c1">#&gt;  $ 6846:List of 20</span><span class="w">
</span><span class="c1">#&gt;  $ 6859:List of 19</span><span class="w">
</span><span class="c1">#&gt;  $ 6868:List of 22</span><span class="w">
</span><span class="c1">#&gt;  $ 6882:List of 18</span><span class="w">

</span><span class="c1"># Full game run histories</span><span class="w">
</span><span class="n">str</span><span class="p">(</span><span class="n">json_runs</span><span class="p">[[</span><span class="m">1</span><span class="p">]],</span><span class="w"> </span><span class="n">max.level</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; List of 7</span><span class="w">
</span><span class="c1">#&gt;  $ :List of 7</span><span class="w">
</span><span class="c1">#&gt;   ..$ label                    : chr "Any% - Physical"</span><span class="w">
</span><span class="c1">#&gt;   ..$ data                     :List of 30</span><span class="w">
</span><span class="c1">#&gt;   ..$ borderColor              : chr "#EE4444"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointBorderColor         : chr "#EE4444"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointHoverBackgroundColor: chr "#EE4444"</span><span class="w">
</span><span class="c1">#&gt;   ..$ hidden                   : logi FALSE</span><span class="w">
</span><span class="c1">#&gt;   ..$ steppedLine              : logi TRUE</span><span class="w">
</span><span class="c1">#&gt;  $ :List of 7</span><span class="w">
</span><span class="c1">#&gt;   ..$ label                    : chr "Cannonless - Physical"</span><span class="w">
</span><span class="c1">#&gt;   ..$ data                     :List of 25</span><span class="w">
</span><span class="c1">#&gt;   ..$ borderColor              : chr "#EF8241"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointBorderColor         : chr "#EF8241"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointHoverBackgroundColor: chr "#EF8241"</span><span class="w">
</span><span class="c1">#&gt;   ..$ hidden                   : logi FALSE</span><span class="w">
</span><span class="c1">#&gt;   ..$ steppedLine              : logi TRUE</span><span class="w">
</span><span class="c1">#&gt;  $ :List of 7</span><span class="w">
</span><span class="c1">#&gt;   ..$ label                    : chr "100% - Physical"</span><span class="w">
</span><span class="c1">#&gt;   ..$ data                     :List of 17</span><span class="w">
</span><span class="c1">#&gt;   ..$ borderColor              : chr "#F0C03E"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointBorderColor         : chr "#F0C03E"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointHoverBackgroundColor: chr "#F0C03E"</span><span class="w">
</span><span class="c1">#&gt;   ..$ hidden                   : logi FALSE</span><span class="w">
</span><span class="c1">#&gt;   ..$ steppedLine              : logi TRUE</span><span class="w">
</span><span class="c1">#&gt;  $ :List of 7</span><span class="w">
</span><span class="c1">#&gt;   ..$ label                    : chr "Any% No W5 - Physical"</span><span class="w">
</span><span class="c1">#&gt;   ..$ data                     :List of 22</span><span class="w">
</span><span class="c1">#&gt;   ..$ borderColor              : chr "#8AC951"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointBorderColor         : chr "#8AC951"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointHoverBackgroundColor: chr "#8AC951"</span><span class="w">
</span><span class="c1">#&gt;   ..$ hidden                   : logi TRUE</span><span class="w">
</span><span class="c1">#&gt;   ..$ steppedLine              : logi TRUE</span><span class="w">
</span><span class="c1">#&gt;  $ :List of 7</span><span class="w">
</span><span class="c1">#&gt;   ..$ label                    : chr "Low% - Physical"</span><span class="w">
</span><span class="c1">#&gt;   ..$ data                     :List of 18</span><span class="w">
</span><span class="c1">#&gt;   ..$ borderColor              : chr "#09B876"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointBorderColor         : chr "#09B876"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointHoverBackgroundColor: chr "#09B876"</span><span class="w">
</span><span class="c1">#&gt;   ..$ hidden                   : logi TRUE</span><span class="w">
</span><span class="c1">#&gt;   ..$ steppedLine              : logi TRUE</span><span class="w">
</span><span class="c1">#&gt;  $ :List of 7</span><span class="w">
</span><span class="c1">#&gt;   ..$ label                    : chr "Any% Multiplayer - Physical"</span><span class="w">
</span><span class="c1">#&gt;   ..$ data                     :List of 11</span><span class="w">
</span><span class="c1">#&gt;   ..$ borderColor              : chr "#44BBEE"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointBorderColor         : chr "#44BBEE"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointHoverBackgroundColor: chr "#44BBEE"</span><span class="w">
</span><span class="c1">#&gt;   ..$ hidden                   : logi TRUE</span><span class="w">
</span><span class="c1">#&gt;   ..$ steppedLine              : logi TRUE</span><span class="w">
</span><span class="c1">#&gt;  $ :List of 7</span><span class="w">
</span><span class="c1">#&gt;   ..$ label                    : chr "All Regular Exits - Physical"</span><span class="w">
</span><span class="c1">#&gt;   ..$ data                     :List of 7</span><span class="w">
</span><span class="c1">#&gt;   ..$ borderColor              : chr "#6666EE"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointBorderColor         : chr "#6666EE"</span><span class="w">
</span><span class="c1">#&gt;   ..$ pointHoverBackgroundColor: chr "#6666EE"</span><span class="w">
</span><span class="c1">#&gt;   ..$ hidden                   : logi TRUE</span><span class="w">
</span><span class="c1">#&gt;   ..$ steppedLine              : logi TRUE</span><span class="w">

</span><span class="c1"># Just want the data field from the first one</span><span class="w">
</span><span class="n">json_any_percent</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">json_runs</span><span class="p">[[</span><span class="m">1</span><span class="p">]][[</span><span class="m">1</span><span class="p">]][[</span><span class="s2">"data"</span><span class="p">]]</span><span class="w">
</span></code></pre></div></div>

<p>Here are the first two points’ worth of date. We have a not-so-obviously
encoded date (<code class="language-plaintext highlighter-rouge">x</code>), the run length in seconds (<code class="language-plaintext highlighter-rouge">y</code>) and the <code class="language-plaintext highlighter-rouge">player</code>. We
are going to convert each of these lists into a dataframe and bind them
together.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">json_any_percent</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">head</span><span class="p">(</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> </span><span class="n">str</span><span class="p">()</span><span class="w">
</span><span class="c1">#&gt; List of 2</span><span class="w">
</span><span class="c1">#&gt;  $ :List of 4</span><span class="w">
</span><span class="c1">#&gt;   ..$ x      : int 1306670400</span><span class="w">
</span><span class="c1">#&gt;   ..$ y      : int 1616</span><span class="w">
</span><span class="c1">#&gt;   ..$ players:List of 1</span><span class="w">
</span><span class="c1">#&gt;   .. ..$ : chr "RaikerZ"</span><span class="w">
</span><span class="c1">#&gt;   ..$ link   : chr "/nsmbw/run/2216987"</span><span class="w">
</span><span class="c1">#&gt;  $ :List of 4</span><span class="w">
</span><span class="c1">#&gt;   ..$ x      : int 1325246400</span><span class="w">
</span><span class="c1">#&gt;   ..$ y      : int 1549</span><span class="w">
</span><span class="c1">#&gt;   ..$ players:List of 1</span><span class="w">
</span><span class="c1">#&gt;   .. ..$ : chr "RaikerZ"</span><span class="w">
</span><span class="c1">#&gt;   ..$ link   : chr "/nsmbw/run/2216995"</span><span class="w">

</span><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">json_any_percent</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">lapply</span><span class="p">(</span><span class="w">
    </span><span class="c1"># turn one list into a dataframe</span><span class="w">
    </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> 
      </span><span class="n">tibble</span><span class="p">(</span><span class="w">
        </span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="o">$</span><span class="n">x</span><span class="p">,</span><span class="w"> 
        </span><span class="n">run_time_s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="o">$</span><span class="n">y</span><span class="p">,</span><span class="w"> 
        </span><span class="n">player</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="o">$</span><span class="n">players</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="w">
      </span><span class="p">)</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">bind_rows</span><span class="p">()</span><span class="w">

</span><span class="n">data</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 30 × 3</span><span class="w">
</span><span class="c1">#&gt;          date run_time_s player       </span><span class="w">
</span><span class="c1">#&gt;         &lt;int&gt;      &lt;dbl&gt; &lt;chr&gt;        </span><span class="w">
</span><span class="c1">#&gt;  1 1306670400       1616 RaikerZ      </span><span class="w">
</span><span class="c1">#&gt;  2 1325246400       1549 RaikerZ      </span><span class="w">
</span><span class="c1">#&gt;  3 1332763200       1531 RaikerZ      </span><span class="w">
</span><span class="c1">#&gt;  4 1349870400       1527 RaikerZ      </span><span class="w">
</span><span class="c1">#&gt;  5 1457179200       1526 GreenUprooter</span><span class="w">
</span><span class="c1">#&gt;  6 1461585600       1523 Auchgard     </span><span class="w">
</span><span class="c1">#&gt;  7 1461672000       1522 Auchgard     </span><span class="w">
</span><span class="c1">#&gt;  8 1461758400       1519 Auchgard     </span><span class="w">
</span><span class="c1">#&gt;  9 1470744000       1514 Auchgard     </span><span class="w">
</span><span class="c1">#&gt; 10 1471521600       1512 Auchgard     </span><span class="w">
</span><span class="c1">#&gt; # … with 20 more rows</span><span class="w">
</span></code></pre></div></div>

<p>Lastly, we need to do something about those dates. When you see a
date-time represented by a single large number, it’s probably a
<a href="https://rdrr.io/r/base/as.POSIXlt.html">POSIX</a> date representing the date-time as the number of
seconds since some origin date-time (see also <a href="https://en.wikipedia.org/wiki/Unix_time">Unix
Time</a>). Using the default Unix
origin time seems to give the correct date conversion:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="w">
    </span><span class="n">date_posix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.POSIXct</span><span class="p">(</span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">tz</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"UTC"</span><span class="p">,</span><span class="w"> </span><span class="n">origin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"1970-01-01"</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w"> 

</span><span class="n">data</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 30 × 4</span><span class="w">
</span><span class="c1">#&gt;          date run_time_s player        date_posix         </span><span class="w">
</span><span class="c1">#&gt;         &lt;int&gt;      &lt;dbl&gt; &lt;chr&gt;         &lt;dttm&gt;             </span><span class="w">
</span><span class="c1">#&gt;  1 1306670400       1616 RaikerZ       2011-05-29 12:00:00</span><span class="w">
</span><span class="c1">#&gt;  2 1325246400       1549 RaikerZ       2011-12-30 12:00:00</span><span class="w">
</span><span class="c1">#&gt;  3 1332763200       1531 RaikerZ       2012-03-26 12:00:00</span><span class="w">
</span><span class="c1">#&gt;  4 1349870400       1527 RaikerZ       2012-10-10 12:00:00</span><span class="w">
</span><span class="c1">#&gt;  5 1457179200       1526 GreenUprooter 2016-03-05 12:00:00</span><span class="w">
</span><span class="c1">#&gt;  6 1461585600       1523 Auchgard      2016-04-25 12:00:00</span><span class="w">
</span><span class="c1">#&gt;  7 1461672000       1522 Auchgard      2016-04-26 12:00:00</span><span class="w">
</span><span class="c1">#&gt;  8 1461758400       1519 Auchgard      2016-04-27 12:00:00</span><span class="w">
</span><span class="c1">#&gt;  9 1470744000       1514 Auchgard      2016-08-09 12:00:00</span><span class="w">
</span><span class="c1">#&gt; 10 1471521600       1512 Auchgard      2016-08-18 12:00:00</span><span class="w">
</span><span class="c1">#&gt; # … with 20 more rows</span><span class="w">
</span></code></pre></div></div>

<h2 id="triple-jump-plotting">Triple jump: Plotting</h2>

<p>First, let’s get the data on the panel. I could spend an endless amount
of time tweaking or customizing a plot’s theme, so I do the styling
last. Otherwise, styling would fill up all of the time I’ve set aside to
work on the plot.</p>

<p>We want to draw a point for each particular record-setting event, and we
want to draw a line that connects all of the points.
<a href="https://rdrr.io/pkg/ggplot2/man/geom_path.html"><code class="language-plaintext highlighter-rouge">geom_step()</code></a> draws a line plot but it can move
straight up/down or straight left/right—no diagonal lines—so it’s
what we want. We also want to the color of these geometries to change
with the record holder (<code class="language-plaintext highlighter-rouge">player</code>).</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_step</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-05-24-summoning-salt-plot/plot-oops-1.png" title="A step plot with one line per player. It is not what we want." alt="A step plot with one line per player. It is not what we want." width="80%" style="display: block; margin: auto;" /></p>

<p>Oops! It assumed that we wanted to connected the dots separately for
each color. We have to set the <code class="language-plaintext highlighter-rouge">group</code> aesthetic to a constant value so
there is only one line drawn.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_step</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-05-24-summoning-salt-plot/plot-grouped-correctly-1.png" title="A step plot showing the world record progression. There is a single line and it changes color whenever a new record-holder takes over." alt="A step plot showing the world record progression. There is a single line and it changes color whenever a new record-holder takes over." width="80%" style="display: block; margin: auto;" /></p>

<p>Making the Summoning Salt version is just a matter of theming at this
point. We use <a href="https://rdrr.io/pkg/ggplot2/man/ggtheme.html"><code class="language-plaintext highlighter-rouge">theme_void()</code></a> to completely wipe out
the current theme, and we hide the color legend.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_step</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">theme_void</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">guides</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-05-24-summoning-salt-plot/void-plot-1-1.png" title="A step plot showing the world record progression." alt="A step plot showing the world record progression." width="80%" style="display: block; margin: auto;" /></p>

<p>Next, we are going to use the showtext package to obtain an 8-bit font:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">showtext</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; Loading required package: sysfonts</span><span class="w">
</span><span class="c1">#&gt; Loading required package: showtextdb</span><span class="w">
</span><span class="n">font_add_google</span><span class="p">(</span><span class="s2">"Press Start 2P"</span><span class="p">)</span><span class="w">
</span><span class="n">showtext_auto</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>The void theme provides nothing, so we have to specify the main colors,
the axis lines, and the plotting margin. We also crank up the chroma
values to have more intense colors for the black background.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_step</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">guides</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">scale_color_discrete</span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">255</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"World Record Timeline"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_void</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">base_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Press Start 2P"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme</span><span class="p">(</span><span class="w">
    </span><span class="n">plot.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w"> 
    </span><span class="n">plot.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">),</span><span class="w">
    </span><span class="n">axis.line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_line</span><span class="p">(</span><span class="w">
      </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> 
      </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> 
      </span><span class="c1"># more 8-bit looking lines</span><span class="w">
      </span><span class="n">lineend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"square"</span><span class="w">
    </span><span class="p">),</span><span class="w"> 
    </span><span class="n">plot.margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">margin</span><span class="p">(</span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="s2">"pt"</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w"> 
</span></code></pre></div></div>

<p><img src="/figs/2022-05-24-summoning-salt-plot/void-plot-2-1.png" title="A step plot showing the world record progression. There is a black background now and an 8-bit looking font." alt="A step plot showing the world record progression. There is a black background now and an 8-bit looking font." width="80%" style="display: block; margin: auto;" /></p>

<p>To keep overlapping points from looking like blobs, we can use a filled
point. For these, <code class="language-plaintext highlighter-rouge">color</code> is used on the border and <code class="language-plaintext highlighter-rouge">fill</code> is used on
the inside. We will set the outline of the points to black and the fill
to the player color. (If you look at more professional data
visualizations, you see this trick frequently with white bordering
around points.) With a new fill aesthetic in place, e have to make sure
that guide for the fill doesn’t appear and that fill and color have the
same color scale.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_step</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">(</span><span class="w">
    </span><span class="n">aes</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">),</span><span class="w">
    </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">21</span><span class="p">,</span><span class="w">
    </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">,</span><span class="w"> 
    </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="c1"># no legend for fill</span><span class="w">
  </span><span class="n">guides</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="c1"># fill and color get same scale</span><span class="w">
  </span><span class="n">scale_color_discrete</span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">255</span><span class="p">,</span><span class="w"> </span><span class="n">aesthetics</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"color"</span><span class="p">,</span><span class="w"> </span><span class="s2">"fill"</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"World Record Timeline"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_void</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">base_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Press Start 2P"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme</span><span class="p">(</span><span class="w">
    </span><span class="n">plot.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w"> 
    </span><span class="n">plot.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">),</span><span class="w">
    </span><span class="n">axis.line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_line</span><span class="p">(</span><span class="w">
      </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> 
      </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> 
      </span><span class="n">lineend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"square"</span><span class="w">
    </span><span class="p">),</span><span class="w"> 
    </span><span class="n">plot.margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">margin</span><span class="p">(</span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="s2">"pt"</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w"> 
</span></code></pre></div></div>

<p><img src="/figs/2022-05-24-summoning-salt-plot/void-plot-3-1.png" title="A step plot showing the world record progression. The points have been restyled to have a black outline." alt="A step plot showing the world record progression. The points have been restyled to have a black outline." width="80%" style="display: block; margin: auto;" /></p>

<p>Finally, let’s make another version of this figure. How might we make a
more accessible presentation of this information (of who held a record
and when), assuming that we only have a static image? A legend with
players/colors is a nonstarter. We could give each player their own
distinct point shape so that color/shape encode the same information,
but shapes get rough once you have to use more than four of them. We
could use a player’s first letter instead of a point (show an F for
FadeVanity) but the letters quickly overlap.</p>

<p>One idea would be to label the point with an annotation whenever there
is a new record holder.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">showtext_auto</span><span class="p">(</span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">|&gt;</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="w">
    </span><span class="c1"># Remove the country flag annotation from this player</span><span class="w">
    </span><span class="n">player2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="w">
      </span><span class="n">player</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"[gb/eng]FadeVanity"</span><span class="p">,</span><span class="w"> 
      </span><span class="s2">"FadeVanity"</span><span class="p">,</span><span class="w"> 
      </span><span class="n">player</span><span class="w">
    </span><span class="p">),</span><span class="w">
    </span><span class="c1"># Record whenever the title holder changes as an "era"</span><span class="w">
    </span><span class="n">change</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">lag</span><span class="p">(</span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nf">is.na</span><span class="p">(</span><span class="n">lag</span><span class="p">(</span><span class="n">player</span><span class="p">)),</span><span class="w">
    </span><span class="n">era</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">cumsum</span><span class="p">(</span><span class="n">change</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w"> 

</span><span class="c1"># I am going to hardcode some vertical position adjustments for the labels.</span><span class="w">
</span><span class="n">offsets</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">-4</span><span class="p">,</span><span class="w"> </span><span class="m">-3</span><span class="p">,</span><span class="w"> </span><span class="m">-2</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">)</span><span class="w">

</span><span class="n">data_lab</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">group_by</span><span class="p">(</span><span class="n">era</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="c1"># Label the last point in an era</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="n">run_time_s</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">run_time_s</span><span class="p">))</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">ungroup</span><span class="p">()</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">offsets</span><span class="p">)</span><span class="w">

</span><span class="n">nudge_factor</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">30</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_text</span><span class="p">(</span><span class="w">
    </span><span class="n">aes</span><span class="p">(</span><span class="w">
      </span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player2</span><span class="p">,</span><span class="w">
      </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">nudge_factor</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">offset</span><span class="w"> 
    </span><span class="p">),</span><span class="w">
    </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w">
    </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w">
    </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data_lab</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_segment</span><span class="p">(</span><span class="w">
    </span><span class="n">aes</span><span class="p">(</span><span class="w">
      </span><span class="c1"># i.e., run the line up to .95 of the label's nudging</span><span class="w">
      </span><span class="n">yend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">nudge_factor</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="m">.95</span><span class="p">,</span><span class="w"> 
      </span><span class="n">xend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="w">
    </span><span class="p">),</span><span class="w">     
    </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data_lab</span><span class="p">,</span><span class="w"> 
    </span><span class="n">linetype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"dashed"</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_step</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="c1"># yes, I'm adding forty million seconds to the last datetime</span><span class="w">
  </span><span class="n">expand_limits</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">data</span><span class="o">$</span><span class="n">date_posix</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">4e7</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">guides</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">scale_x_datetime</span><span class="p">(</span><span class="w">
    </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">,</span><span class="w">
    </span><span class="n">date_breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"2 years"</span><span class="p">,</span><span class="w"> 
    </span><span class="n">date_labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%Y"</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="w">
    </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"World record"</span><span class="p">,</span><span class="w">
    </span><span class="n">breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">21</span><span class="o">:</span><span class="m">27</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="m">60</span><span class="p">,</span><span class="w">
    </span><span class="c1"># Show the minutes value with zero-padded seconds</span><span class="w">
    </span><span class="n">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="n">sprintf</span><span class="p">(</span><span class="s2">"%d:%02.f"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">%/%</span><span class="w"> </span><span class="m">60</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="m">60</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">theme_minimal</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">14</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme</span><span class="p">(</span><span class="n">plot.margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">margin</span><span class="p">(</span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="s2">"pt"</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-05-24-summoning-salt-plot/informative-plot-1.png" title="A step plot showing the world record progression. The name of the player is next to their point whenever the record changes." alt="A step plot showing the world record progression. The name of the player is next to their point whenever the record changes." width="80%" style="display: block; margin: auto;" /></p>

<hr />

<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2022-05-24-summoning-salt-plot.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:si" role="doc-endnote">

      <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#&gt; ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  setting  value</span><span class="w">
</span><span class="c1">#&gt;  version  R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#&gt;  os       Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#&gt;  system   x86_64, mingw32</span><span class="w">
</span><span class="c1">#&gt;  ui       RTerm</span><span class="w">
</span><span class="c1">#&gt;  language (EN)</span><span class="w">
</span><span class="c1">#&gt;  collate  English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  ctype    English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  tz       America/Chicago</span><span class="w">
</span><span class="c1">#&gt;  date     2022-05-27</span><span class="w">
</span><span class="c1">#&gt;  pandoc   NA</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  package     * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#&gt;  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  backports     1.4.1   2021-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  broom         0.8.0   2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cachem        1.0.6   2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cli           3.3.0   2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  colorspace    2.0-3   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  curl          4.3.2   2021-06-23 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  dbplyr        2.1.1   2021-04-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  downlit       0.4.0   2021-10-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  dplyr       * 1.0.9   2022-04-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  farver        2.1.0   2021-02-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  generics      0.1.2   2022-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ggplot2     * 3.3.6   2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  git2r         0.30.1  2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  haven         2.5.0   2022-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  here          1.0.1   2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  hms           1.1.1   2021-09-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  httr          1.4.3   2022-05-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  jsonlite      1.8.0   2022-02-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  knitr       * 1.39    2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  labeling      0.4.2   2020-10-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  lubridate     1.8.0   2021-10-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  modelr        0.1.8   2020-05-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ragg          1.2.2   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  readr       * 2.1.2   2022-01-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  readxl        1.4.0   2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rvest         1.0.2   2021-10-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  scales        1.2.0   2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  showtext    * 0.9-5   2022-02-09 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  showtextdb  * 3.0     2020-06-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  sysfonts    * 0.8.8   2022-03-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  systemfonts   1.0.4   2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  textshaping   0.3.6   2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tibble      * 3.1.7   2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tidyr       * 1.2.0   2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tidyverse   * 1.3.1   2021-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  xfun          0.31    2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  xml2          1.3.3   2021-11-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;  [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#&gt;  [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div>      </div>
      <p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Tristan Mahr</name><email>tjmahrweb@gmail.com</email></author><category term="r" /><category term="ggplot2" /><summary type="html"><![CDATA[*Cool 8-bit music plays over a montage of me editing R code*]]></summary></entry><entry><title type="html">The cursed Morgan Stanley Covid-19 visualization</title><link href="https://tjmahr.github.io/morgan-stanley-cursed-covid-plot/" rel="alternate" type="text/html" title="The cursed Morgan Stanley Covid-19 visualization" /><published>2022-03-23T00:00:00-05:00</published><updated>2022-03-23T00:00:00-05:00</updated><id>https://tjmahr.github.io/morgan-stanley-cursed-covid-plot</id><content type="html" xml:base="https://tjmahr.github.io/morgan-stanley-cursed-covid-plot/"><![CDATA[<p>Darren Dahly, username <a href="https://twitter.com/statsepi">@statsepi</a>, asked
people on Twitter to share some of their favorite or least favorite data
visualizations from the pandemic. I nominated the notorious <a href="https://twitter.com/WhiteHouseCEA45/status/1257680258364555264">“cubic fit”
‘forecast’</a>
from the Council of Economic Advisers. But then there was the reply
by Travis Whitfill, username
<a href="https://twitter.com/twhitfill">@twhitfill</a>, showing a nightmare of a
figure from a report produced by Morgan Stanley:</p>

<blockquote class="twitter-tweet" data-conversation="none" data-dnt="true"><p lang="en" dir="ltr">I’d like to submit this one from Morgan Stanley 🤦🏻‍♂️ <a href="https://t.co/D5CYi6zSrT">pic.twitter.com/D5CYi6zSrT</a></p>
  <img src="/assets/images/2022-03-morgan-stanley.jpg" alt="A two panel plot showing the current number of Covid-19 patients in ICU beds in 'closed' versus 'open' states." />
  <br />
&mdash; Travis Whitfill MPH (@twhitfill) <a href="https://twitter.com/twhitfill/status/1505974833217437696?ref_src=twsrc%5Etfw">March 21, 2022</a></blockquote>

<p>The main statistical problem here is the completely inappropriate
“smoothing” line. The panel on the left is really two linear trends: a
steady trend around 8,500 patients until May 6th and a decreasing trend
from 11,000 patients starting on May 7th. Upon seeing data like these
points, I would be inclined to ask, “What changed in the data? Was a new
state added to the dataset? Did the definition of what counts as an ICU
bed change?” The analysts here instead imposed a linear trend on the
points.</p>

<p>Another problem with this plot is rhetorical: it’s tryhard
counterintuitive bullshit. I think analysts will fetishize surprising or
counterintuitive findings, with an attitude of “oh, you would think that
such-and-such is true but the data show us that <em>actually</em> the opposite
is true”. At the time of this plot, our belief was something like
“Covid-19 protections like stay-at-home orders can help flatten the curve
and reduce the spread of the disease and the number of
hospitalizations.” This plot sashays into the room and tells us “well,
according to the data, it’s the states without Covid-19 protections that have
decreasing numbers of ICU patients, and get this: Covid lockdowns make things
worse!”. Granted, I could not find the original report for this
image, so I don’t know how the authors interpreted it in the report’s
narrative. Yet, I can only assume the authors added these linear trend
lines–overriding the default GAM or LOESS smooth used by
<a href="https://rdrr.io/pkg/ggplot2/man/geom_smooth.html"><code class="language-plaintext highlighter-rouge">stat_smooth()</code></a>–to make this particular point.</p>

<p>When I first saw it, this plot <a href="https://twitter.com/tjmahr/status/1506019955661234184">made me
quip</a>: “I hate
statistics now. it’s been a good run. gonna live my days out as a
druid”. But it’s been a few days, and I’m still haunted by this plot.
What did go wrong? Why do the ICU counts shoot upwards like that? So, I
investigated it.</p>

<h2 id="attempt-1-there-is-no-jump">Attempt 1: There is no jump</h2>

<p>I tried to find the original report, searching Google and Twitter for a
report with this image from around May 12, 2020 (when @twhitfill <a href="https://twitter.com/twhitfill/status/1263119423847661569">first
shared it</a>),
but nothing came up. After dredging through a bunch of Morgan Stanley
report PDFs, I noticed that the reports usually had a small number of
authors, so I am wondering whether (and hoping that) the original report
was something more akin to a dashed-off newsletter than a research
report.</p>

<p>Failing to find the original image, I tried to recreate it in R. The
original image credits The COVID Tracking Project, and <a href="https://covidtracking.com/data/download">their downloads
page</a> provides a .csv file with
state-level data. Here we read in just the relevant columns, filter down
to the time range of the cursed image, and plot the total number of
current ICU patients.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">

</span><span class="c1"># a helper function to download the data from github</span><span class="w">
</span><span class="c1"># in case you want to play along</span><span class="w">
</span><span class="n">path_blog_data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">file.path</span><span class="p">(</span><span class="w">
    </span><span class="s2">"https://raw.githubusercontent.com"</span><span class="p">,</span><span class="w">
    </span><span class="s2">"tjmahr/tjmahr.github.io/master/_R/data"</span><span class="p">,</span><span class="w">
    </span><span class="n">x</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">readr</span><span class="o">::</span><span class="n">read_csv</span><span class="p">(</span><span class="w">
  </span><span class="n">path_blog_data</span><span class="p">(</span><span class="s2">"all-states-history.csv"</span><span class="p">),</span><span class="w"> 
  </span><span class="n">col_types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cols</span><span class="p">(</span><span class="w">
    </span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_date</span><span class="p">(),</span><span class="w"> 
    </span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_character</span><span class="p">(),</span><span class="w"> 
    </span><span class="n">inIcuCurrently</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_number</span><span class="p">(),</span><span class="w"> 
    </span><span class="n">.default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_skip</span><span class="p">()</span><span class="w">
  </span><span class="p">),</span><span class="w"> 
  </span><span class="n">progress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">filter</span><span class="p">(</span><span class="w">
    </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-04-28"</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w">
    </span><span class="n">date</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-05-11"</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w">

</span><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">stat_summary</span><span class="p">(</span><span class="n">fun</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"sum"</span><span class="p">,</span><span class="w"> </span><span class="n">geom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"point"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="w">
    </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w"> 
    </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
    </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Data from The COVID Project (March 23, 2022)"</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; Warning: Removed 454 rows containing non-finite values (stat_summary).</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/most-recent-totals-1.png" title="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers steadily decrease from around 14,000 to 12,000." alt="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers steadily decrease from around 14,000 to 12,000." width="80%" style="display: block; margin: auto;" /></p>

<p>There is no jump in ICU patients ❌, and because the jump disappeared
when we used a more recent (and presumably better) version of the
dataset, the jump was probably some kind of artifact.</p>

<p>Out of curiosity, let’s look at the state-by-state data. Because
(<em>spoiler alert</em>) about half the states only have <code class="language-plaintext highlighter-rouge">NA</code> values for this
time period, we will filter out the <code class="language-plaintext highlighter-rouge">NA</code> points and look at the
remaining points.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">inIcuCurrently</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">facet_wrap</span><span class="p">(</span><span class="s2">"state"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="w">
    </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w"> 
    </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
    </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Data from The COVID Project (March 23, 2022)"</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">   
  </span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/most-recent-state-1.png" title="A plot showing the current number of Covid-19 patients in ICU beds in states with available data (around 30)." alt="A plot showing the current number of Covid-19 patients in ICU beds in states with available data (around 30)." width="80%" style="display: block; margin: auto;" /></p>

<p>So, some states have ICU patient data added midway through this window and
many states are completely missing data from this window. The whole
open-versus-closed-states question was doomed from the get-go because we
don’t know what happened in every state.</p>

<h2 id="attempt-2-lets-go-back-in-time">Attempt 2: Let’s go back in time</h2>

<p>If we poke around the COVID Tracking Project’s GitHub repository, we
find a <a href="https://github.com/COVID19Tracking/covid-tracking-data/tree/master/data">folder of data
backups</a>
with a file called <code class="language-plaintext highlighter-rouge">states_daily_4pm_et.csv</code>. This file provides the
same result as the previously loaded data.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">readr</span><span class="o">::</span><span class="n">read_csv</span><span class="p">(</span><span class="w">
  </span><span class="n">path_blog_data</span><span class="p">(</span><span class="s2">"states_daily_4pm_et.csv"</span><span class="p">),</span><span class="w"> 
  </span><span class="n">col_types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cols</span><span class="p">(</span><span class="w">
    </span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_date</span><span class="p">(</span><span class="s2">"%Y%m%d"</span><span class="p">),</span><span class="w">
    </span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_character</span><span class="p">(),</span><span class="w">
    </span><span class="n">inIcuCurrently</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_number</span><span class="p">(),</span><span class="w">
    </span><span class="n">.default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_skip</span><span class="p">()</span><span class="w">
  </span><span class="p">),</span><span class="w">
  </span><span class="n">progress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">filter</span><span class="p">(</span><span class="w">
    </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-04-28"</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w">
    </span><span class="n">date</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-05-11"</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w">

</span><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">stat_summary</span><span class="p">(</span><span class="n">fun</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"sum"</span><span class="p">,</span><span class="w"> </span><span class="n">geom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"point"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="w">
    </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w"> 
    </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
    </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Data from The COVID Project (March 23, 2022)"</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; Warning: Removed 454 rows containing non-finite values (stat_summary).</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/latest-total-1.png" title="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers steadily decrease from around 14,000 to 12,000." alt="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers steadily decrease from around 14,000 to 12,000." width="80%" style="display: block; margin: auto;" /></p>

<p>But because this file is hosted on GitHub, we can go back in time and find
the <a href="https://github.com/COVID19Tracking/covid-tracking-data/blob/5ec9962d5f5f6505bb0593df150ab62867af98f7/data/states_daily_4pm_et.csv">version of the data from
May 12, 2020</a>
and use that file instead.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">readr</span><span class="o">::</span><span class="n">read_csv</span><span class="p">(</span><span class="w">
  </span><span class="n">path_blog_data</span><span class="p">(</span><span class="s2">"2020-05-12-states_daily_4pm_et.csv"</span><span class="p">),</span><span class="w"> 
  </span><span class="n">col_types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cols</span><span class="p">(</span><span class="w">
    </span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_date</span><span class="p">(</span><span class="s2">"%Y%m%d"</span><span class="p">),</span><span class="w">
    </span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_character</span><span class="p">(),</span><span class="w">
    </span><span class="n">inIcuCurrently</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_number</span><span class="p">(),</span><span class="w">
    </span><span class="n">.default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_skip</span><span class="p">()</span><span class="w">
  </span><span class="p">),</span><span class="w">
  </span><span class="n">progress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">filter</span><span class="p">(</span><span class="w">
    </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-04-28"</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w">
    </span><span class="n">date</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-05-11"</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w">

</span><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">stat_summary</span><span class="p">(</span><span class="n">fun</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"sum"</span><span class="p">,</span><span class="w"> </span><span class="n">geom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"point"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="w">
    </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w"> 
    </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
    </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Data from The COVID19 Project (May 12, 2020)"</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; Warning: Removed 477 rows containing non-finite values (stat_summary).</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/old-total-1.png" title="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers hover around 9000 and then rapidly jump to over 12000 after May 7." alt="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers hover around 9000 and then rapidly jump to over 12000 after May 7." width="80%" style="display: block; margin: auto;" /></p>

<p>There it is: the jump ICU patients on May 7th ✔️. Let’s look at the
state-by-state data:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">inIcuCurrently</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">facet_wrap</span><span class="p">(</span><span class="s2">"state"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="w">
    </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w"> 
    </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
    </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Data from The COVID Project (May 12, 2020)"</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/old-state-1.png" title="A plot showing the current number of Covid-19 patients in ICU beds in states with available data (around 25). Of note is the New York which only has 5 points and they are all above 2000." alt="A plot showing the current number of Covid-19 patients in ICU beds in states with available data (around 25). Of note is the New York which only has 5 points and they are all above 2000." width="80%" style="display: block; margin: auto;" /></p>

<p>Look at New York (NY)! That’s the jump in original plot. New York had a
large number of ICU patients but their data only became available on
May 7th, giving the spurious increase in ICU patients.</p>

<p>By adding incomplete data from NY to the rest of the states, the analyst
effectively treated all of the missing points in the NY panel as zeros.</p>

<h2 id="what-could-they-have-done-differently">What could they have done differently?</h2>

<p>It’s fun to complain about haunted plots, but I will try to be
constructive for a moment. How would a fixed version of this plot look?</p>

<p><strong>Option 1: Don’t do it.</strong> Given all the missing and incomplete data,
it’s just not worth it to make this plot.</p>

<p><strong>Option 2: Don’t aggregate.</strong> Or we might embrace the missingness, and
show all and only the data we have. Here is a sketch of this kind of
approach. We will show individual state data and provide labels for the
states that stand out from the pack. We will also note the number of
missing lines in the caption.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data_for_plot</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">inIcuCurrently</span><span class="p">))</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">group_by</span><span class="p">(</span><span class="n">state</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">state_icu_max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">inIcuCurrently</span><span class="p">))</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">ungroup</span><span class="p">()</span><span class="w"> 

</span><span class="n">total_regions</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data</span><span class="o">$</span><span class="n">state</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">unique</span><span class="p">()</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="nf">length</span><span class="p">()</span><span class="w">
</span><span class="n">plotted_regions</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data_for_plot</span><span class="o">$</span><span class="n">state</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">unique</span><span class="p">()</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="nf">length</span><span class="p">()</span><span class="w">

</span><span class="n">ggplot</span><span class="p">(</span><span class="n">data_for_plot</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geomtextpath</span><span class="o">::</span><span class="n">geom_textline</span><span class="p">(</span><span class="w">
    </span><span class="n">aes</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">state</span><span class="p">,</span><span class="w"> </span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">state</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">state</span><span class="p">),</span><span class="w">
    </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">state_icu_max</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="m">250</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geomtextpath</span><span class="o">::</span><span class="n">scale_hjust_discrete</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_line</span><span class="p">(</span><span class="w">
    </span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">state</span><span class="p">),</span><span class="w">
    </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">state_icu_max</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="m">250</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="w">
    </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w"> 
    </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
    </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">glue</span><span class="o">::</span><span class="n">glue</span><span class="p">(</span><span class="w">
      </span><span class="s2">"
      Data from The COVID Project (May 12, 2020).
      No data available for {total_regions - plotted_regions} states/territories.
      "</span><span class="w">
    </span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">14</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/try-to-fix-it-1.png" title="An attempt to fix the plot that uses the bad data. It shows one line per included state. In the middle of the line is the abbreviation for the state. In the top right, we can see the NY line dominating the rest of the lines. The caption notes the number of missing states/territories." alt="An attempt to fix the plot that uses the bad data. It shows one line per included state. In the middle of the line is the abbreviation for the state. In the top right, we can see the NY line dominating the rest of the lines. The caption notes the number of missing states/territories." width="80%" style="display: block; margin: auto;" /></p>

<p>And then we can put the linear regression “smooth” on it. 🙃</p>

<h2 id="update-notes-from-the-tracking-project-trenches-mar-24-2022">Update: Notes from the Tracking Project trenches [<em>Mar. 24, 2022</em>]</h2>

<p>After releasing this post, COVID Tracking Project alum Quang Nguyen
<a href="https://twitter.com/quangpmnguyen/status/1506807264295936002">shared some behind the scenes
details</a>
of what happened around May 7th, 2020. I will repost the Twitter thread here:</p>

<blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr">OMG <a href="https://twitter.com/COVID19Tracking?ref_src=twsrc%5Etfw">@COVID19Tracking</a> history lesson (short 🧵)!! First, shoutout to our data infrastructure folks <a href="https://twitter.com/zachlipton?ref_src=twsrc%5Etfw">@zachlipton</a> <a href="https://twitter.com/JuliaKodysh?ref_src=twsrc%5Etfw">@JuliaKodysh</a> for the GitHub archive! Second, I actually dug through the slack to figure out what happened (jokes on me, I was shift lead that day). <a href="https://t.co/7U6LOm8HKE">https://t.co/7U6LOm8HKE</a> <a href="https://t.co/iXdPO9EV6A">pic.twitter.com/iXdPO9EV6A</a></p>

&mdash; Quang Nguyen (@quangpmnguyen) <a href="https://twitter.com/quangpmnguyen/status/1506807264295936002?ref_src=twsrc%5Etfw">March 24, 2022</a></blockquote>

<blockquote class="twitter-tweet" data-conversation="none" data-dnt="true"><p lang="en" dir="ltr">The problem was, back in May 2020, the only way you can get hospitalization data for the state of NY was to take low-res screenshots of the governor&#39;s presentation and then try to piece the information together (also shoutout to <a href="https://twitter.com/justinhendrix?ref_src=twsrc%5Etfw">@justinhendrix</a> for watching these press conf.). <a href="https://t.co/5UnGP1RUox">pic.twitter.com/5UnGP1RUox</a></p>

  <img src="/assets/images/2022-03-cuomo.jpg" alt="A screenshot of a Slack post of two screenshots of a Cuomo Covid update showing statistics drawn on hard-to-read plots in the background." />
  <br />
  
&mdash; Quang Nguyen (@quangpmnguyen) <a href="https://twitter.com/quangpmnguyen/status/1506807269752815617?ref_src=twsrc%5Etfw">March 24, 2022</a></blockquote>

<blockquote class="twitter-tweet" data-conversation="none" data-dnt="true"><p lang="en" dir="ltr">Using this weird graph, we actually tried to back-calculate total hospitalization numbers, but unfortunately, it was super messy and nothing came out of it. This source also doesn&#39;t have current ICU numbers.</p>&mdash; Quang Nguyen (@quangpmnguyen) <a href="https://twitter.com/quangpmnguyen/status/1506807271698968579?ref_src=twsrc%5Etfw">March 24, 2022</a></blockquote>

<blockquote class="twitter-tweet" data-conversation="none" data-dnt="true"><p lang="en" dir="ltr">We actually found a new source from Twitter (!!) who apparently got these numbers from a press email list from the governor (??). May 7th was the first day where we got data directly from the email list, which was the BLIP in total ICU data that made it onto the disastrous graph.</p>&mdash; Quang Nguyen (@quangpmnguyen) <a href="https://twitter.com/quangpmnguyen/status/1506807272990773248?ref_src=twsrc%5Etfw">March 24, 2022</a></blockquote>

<blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr">The bottom line is: data from 2020 was a mess, and don&#39;t trust anything that came out of it. A group of volunteers taped it together using nothing but hot glue and scotch tape.</p>&mdash; Quang Nguyen (@quangpmnguyen) <a href="https://twitter.com/quangpmnguyen/status/1506807274211270662?ref_src=twsrc%5Etfw">March 24, 2022</a></blockquote>

<p>The fact they had to pull numbers from the graphs in the Governor’s
Covid briefings is an important reminder that high-quality Covid-19 was
hard to come by at the start of the pandemic (<a href="https://www.nytimes.com/2022/03/15/nyregion/nursing-home-deaths-cuomo-covid.html">especially from the Cuomo
administration</a>).
We needed something like the COVID Tracking Project where volunteers
would go to heroic lengths to curate data.</p>

<hr />

<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2022-03-23-morgan-stanley-cursed-covid-plot.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:si" role="doc-endnote">

      <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#&gt; ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  setting  value</span><span class="w">
</span><span class="c1">#&gt;  version  R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#&gt;  os       Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#&gt;  system   x86_64, mingw32</span><span class="w">
</span><span class="c1">#&gt;  ui       RTerm</span><span class="w">
</span><span class="c1">#&gt;  language (EN)</span><span class="w">
</span><span class="c1">#&gt;  collate  English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  ctype    English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  tz       America/Chicago</span><span class="w">
</span><span class="c1">#&gt;  date     2022-05-27</span><span class="w">
</span><span class="c1">#&gt;  pandoc   NA</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  package      * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#&gt;  assertthat     0.2.1   2019-03-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  backports      1.4.1   2021-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  bit            4.0.4   2020-08-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  bit64          4.0.5   2020-08-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  broom          0.8.0   2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cachem         1.0.6   2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cellranger     1.1.0   2016-07-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cli            3.3.0   2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  colorspace     2.0-3   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  crayon         1.5.1   2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  curl           4.3.2   2021-06-23 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  DBI            1.1.2   2021-12-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  dbplyr         2.1.1   2021-04-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  digest         0.6.29  2021-12-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  downlit        0.4.0   2021-10-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  dplyr        * 1.0.9   2022-04-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ellipsis       0.3.2   2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  evaluate       0.15    2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fansi          1.0.3   2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  farver         2.1.0   2021-02-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fastmap        1.1.0   2021-01-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  forcats      * 0.5.1   2021-01-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fs             1.5.2   2021-12-08 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  generics       0.1.2   2022-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  geomtextpath   0.1.0   2022-01-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ggplot2      * 3.3.6   2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  git2r          0.30.1  2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  glue           1.6.2   2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  gtable         0.3.0   2019-03-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  haven          2.5.0   2022-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  here           1.0.1   2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  highr          0.9     2021-04-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  hms            1.1.1   2021-09-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  httr           1.4.3   2022-05-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  jsonlite       1.8.0   2022-02-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  knitr        * 1.39    2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  labeling       0.4.2   2020-10-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  lifecycle      1.0.1   2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  lubridate      1.8.0   2021-10-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  magrittr       2.0.3   2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  memoise        2.0.1   2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  modelr         0.1.8   2020-05-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  munsell        0.5.0   2018-06-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pillar         1.7.0   2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  purrr        * 0.3.4   2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  R6             2.5.1   2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ragg           1.2.2   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  readr        * 2.1.2   2022-01-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  readxl         1.4.0   2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  reprex         2.0.1   2021-08-05 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rlang          1.0.2   2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rprojroot      2.0.3   2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rstudioapi     0.13    2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rvest          1.0.2   2021-10-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  scales         1.2.0   2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringi        1.7.6   2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringr      * 1.4.0   2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  systemfonts    1.0.4   2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  textshaping    0.3.6   2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tibble       * 3.1.7   2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tidyr        * 1.2.0   2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tidyselect     1.1.2   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tidyverse    * 1.3.1   2021-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tzdb           0.3.0   2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  utf8           1.2.2   2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  vctrs          0.4.1   2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  vroom          1.5.7   2021-11-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  withr          2.5.0   2022-03-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  xfun           0.31    2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  xml2           1.3.3   2021-11-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;  [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#&gt;  [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div>      </div>
      <p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Tristan Mahr</name><email>tjmahrweb@gmail.com</email></author><category term="r" /><category term="covid19" /><category term="ggplot2" /><summary type="html"><![CDATA[What went wrong?]]></summary></entry><entry><title type="html">Self-documenting plots in ggplot2</title><link href="https://tjmahr.github.io/self-titled-ggplot2-plots/" rel="alternate" type="text/html" title="Self-documenting plots in ggplot2" /><published>2022-03-10T00:00:00-06:00</published><updated>2022-03-10T00:00:00-06:00</updated><id>https://tjmahr.github.io/self-titled-ggplot2-plots</id><content type="html" xml:base="https://tjmahr.github.io/self-titled-ggplot2-plots/"><![CDATA[<p>When I am showing off a plotting technique in
<a href="https://ggplot2.tidyverse.org/">ggplot2</a>, I sometimes like to include
the R code that produced the plot <em>as part of the plot</em>. Here is an
example I made to demonstrate the <code class="language-plaintext highlighter-rouge">debug</code> parameter in
<a href="https://rdrr.io/pkg/ggplot2/man/element.html"><code class="language-plaintext highlighter-rouge">element_text()</code></a>:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">

</span><span class="n">self_document</span><span class="p">(</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme</span><span class="p">(</span><span class="n">axis.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">debug</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/basic-example-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image. The plot theme includes yellow shading and points in the x and y axis titles." alt="A ggplot2 plot of a histogram with the plotting code above the image. The plot theme includes yellow shading and points in the x and y axis titles." width="80%" style="display: block; margin: auto;" /></p>

<p>Let’s call these “self-documenting plots”. If we’re feeling nerdy, we
might also call them “qquines”, although they are not true
<a href="https://en.wikipedia.org/wiki/Quine_%28computing%29">quines</a>.</p>

<p>In this post, we will build up a <code class="language-plaintext highlighter-rouge">self_document()</code> function from scratch. Here are
the problems we need to sort out:</p>

<ul>
  <li>how to put plotting code above a title</li>
  <li>how to capture plotting code and convert it into text</li>
</ul>

<h2 id="creating-the-code-annotation">Creating the code annotation</h2>

<p>As a first step, let’s just treat our plotting code as a string that 
is ready to use for annotation.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_text</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s1">'ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(bins = 20, color = "white") +
  labs(title = "A basic histogram")'</span><span class="w">

</span><span class="n">p_plot</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>In order to have a titled plot along with this annotation, we need some
way to combine these two graphical objects together (the code and the
plot produced by ggplot2). I like the
<a href="https://patchwork.data-imaginist.com/articles/patchwork.html">patchwork</a>
package for this job. Here we use
<a href="https://patchwork.data-imaginist.com/reference/wrap_elements.html"><code class="language-plaintext highlighter-rouge">wrap_elements()</code></a> to capture the plot into a
“patch” that patchwork can annotate.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">patchwork</span><span class="p">)</span><span class="w">
</span><span class="n">wrap_elements</span><span class="p">(</span><span class="n">p_plot</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">plot_annotation</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p_text</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-strings-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image. Here the title is in the default font." alt="A ggplot2 plot of a histogram with the plotting code above the image. Here the title is in the default font." width="50%" style="display: block; margin: auto;" /></p>

<p>Let’s style this title to use a monospaced font. I use Windows and like
Consolas, so I will use that font.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Use default mono font if "Consolas" is not available</span><span class="w">
</span><span class="n">extrafont</span><span class="o">::</span><span class="n">loadfonts</span><span class="p">(</span><span class="n">device</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"win"</span><span class="p">,</span><span class="w"> </span><span class="n">quiet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">monofont</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="w">
  </span><span class="n">extrafont</span><span class="o">::</span><span class="n">choose_font</span><span class="p">(</span><span class="s2">"Consolas"</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> 
  </span><span class="s2">"mono"</span><span class="p">,</span><span class="w"> 
  </span><span class="s2">"Consolas"</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="n">title_theme</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">theme</span><span class="p">(</span><span class="w">
  </span><span class="n">plot.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="w">
    </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">monofont</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rel</span><span class="p">(</span><span class="m">.9</span><span class="p">),</span><span class="w"> 
    </span><span class="n">margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">margin</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">5.5</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"pt"</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="n">wrap_elements</span><span class="p">(</span><span class="n">p_plot</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">plot_annotation</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p_text</span><span class="p">,</span><span class="w"> </span><span class="n">theme</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">title_theme</span><span class="p">)</span><span class="w">  
</span></code></pre></div></div>

<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-strings-consolas-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image. Here the title is in Consolas." alt="A ggplot2 plot of a histogram with the plotting code above the image. Here the title is in Consolas." width="50%" style="display: block; margin: auto;" /></p>

<p>One problem with this setup is that the plotting code has to be edited
in two places: the plot <code class="language-plaintext highlighter-rouge">p_plot</code> and the title <code class="language-plaintext highlighter-rouge">p_text</code>. As a result,
it’s easy for these two pieces of code to fall out of sync with each
other, turning our self-documenting plot into a lying liar plot.</p>

<p>The solution is pretty easy: Tell R that <code class="language-plaintext highlighter-rouge">p_text</code> is code with
<a href="https://rdrr.io/r/base/parse.html"><code class="language-plaintext highlighter-rouge">parse()</code></a> and evaluate the code with
<a href="https://rdrr.io/r/base/eval.html"><code class="language-plaintext highlighter-rouge">eval()</code></a>:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">wrap_elements</span><span class="p">(</span><span class="n">eval</span><span class="p">(</span><span class="n">parse</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p_text</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">plot_annotation</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p_text</span><span class="p">,</span><span class="w"> </span><span class="n">theme</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">title_theme</span><span class="p">)</span><span class="w">  
</span></code></pre></div></div>

<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-strings-consolas-eval-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>

<p>This <em>works</em>. It gets the job done. But we find ourselves in a clumsy
workflow, either having to edit R code inside of quotes or editing the
plot interactively and then having to wrap it in quotes. Let’s do better.</p>

<h2 id="capturing-plotting-code-as-a-string">Capturing plotting code as a string</h2>

<p>Time for some <em>nonstandard evaluation</em>. I will use the
<a href="https://rlang.r-lib.org/">rlang</a> package, although in principle we
could use functions in base R to accomplish these goals.</p>

<p>First, we are going to use <a href="https://rdrr.io/pkg/rlang/man/expr.html"><code class="language-plaintext highlighter-rouge">rlang::expr()</code></a> to
capture/quote/<a href="https://rlang.r-lib.org/reference/topic-defuse.html">defuse</a>
the R code as an expression. We can print the code as code, print it as
text, and use <code class="language-plaintext highlighter-rouge">eval()</code> to show the plot.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_code</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr</span><span class="p">(</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="c1"># print the expressions</span><span class="w">
</span><span class="n">p_code</span><span class="w">
</span><span class="c1">#&gt; ggplot(mtcars, aes(x = mpg)) + geom_histogram(bins = 20, color = "white") + </span><span class="w">
</span><span class="c1">#&gt;     labs(title = "A basic histogram")</span><span class="w">

</span><span class="c1"># expression =&gt; text</span><span class="w">
</span><span class="n">rlang</span><span class="o">::</span><span class="n">expr_text</span><span class="p">(</span><span class="n">p_code</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] "ggplot(mtcars, aes(x = mpg)) + geom_histogram(bins = 20, color = \"white\") + \n    labs(title = \"A basic histogram\")"</span><span class="w">

</span><span class="n">eval</span><span class="p">(</span><span class="n">p_code</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-code-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>

<p>Then, it should be straightforward to make the self-documenting plot, right?</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_code</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr</span><span class="p">(</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="n">wrap_elements</span><span class="p">(</span><span class="n">eval</span><span class="p">(</span><span class="n">p_code</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">plot_annotation</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr_text</span><span class="p">(</span><span class="n">p_code</span><span class="p">),</span><span class="w"> </span><span class="n">theme</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">title_theme</span><span class="p">)</span><span class="w">  
</span></code></pre></div></div>

<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-code-eval-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image. In this case, the title is mostly on one line and some text is cut off from the image." alt="A ggplot2 plot of a histogram with the plotting code above the image. In this case, the title is mostly on one line and some text is cut off from the image." width="50%" style="display: block; margin: auto;" /></p>

<p>Hey, it reformatted the title! Indeed, in the process of capturing the
code, the code formatting was lost. To get something closer to the
source code we provided, we have to reformat the captured code before we
print it.</p>

<p>The <a href="https://styler.r-lib.org/">styler</a> package provides a suite of
functions for reformatting code. We can define our own coding
styles/formatting rules to customize how styler works. I like the styler
rules used by Garrick Aden-Buie in his
<a href="https://github.com/gadenbuie/grkstyle">grkstyle</a> package, so I will use
<code class="language-plaintext highlighter-rouge">grkstyle::grk_style_text()</code> to reformat the code.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_code</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr</span><span class="p">(</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="n">wrap_elements</span><span class="p">(</span><span class="n">eval</span><span class="p">(</span><span class="n">p_code</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">plot_annotation</span><span class="p">(</span><span class="w">
    </span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr_text</span><span class="p">(</span><span class="n">p_code</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
      </span><span class="n">grkstyle</span><span class="o">::</span><span class="n">grk_style_text</span><span class="p">()</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
      </span><span class="c1"># reformatting returns a vector of lines,</span><span class="w">
      </span><span class="c1"># so we have to combine them</span><span class="w">
      </span><span class="n">paste0</span><span class="p">(</span><span class="n">collapse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"\n"</span><span class="p">),</span><span class="w"> 
    </span><span class="n">theme</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">title_theme</span><span class="w">
  </span><span class="p">)</span><span class="w"> 
</span></code></pre></div></div>

<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-code-eval-style-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>

<h2 id="putting-it-all-together">Putting it all together</h2>

<p>When we write our <code class="language-plaintext highlighter-rouge">self_document()</code> function, the only change we have to
make is using <a href="https://rdrr.io/pkg/rlang/man/defusing-advanced.html"><code class="language-plaintext highlighter-rouge">rlang::enexpr()</code></a> instead <code class="language-plaintext highlighter-rouge">rlang::expr()</code>. The
en-variant is used when we want to <em>en</em>-quote exactly what the user
provided. Aside from that change, our <code class="language-plaintext highlighter-rouge">self_document()</code> function just bundles together all of the code we developed above:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">self_document</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">monofont</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="w">
    </span><span class="n">extrafont</span><span class="o">::</span><span class="n">choose_font</span><span class="p">(</span><span class="s2">"Consolas"</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> 
    </span><span class="s2">"mono"</span><span class="p">,</span><span class="w"> 
    </span><span class="s2">"Consolas"</span><span class="w">
  </span><span class="p">)</span><span class="w">
  
  </span><span class="n">p</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">enexpr</span><span class="p">(</span><span class="n">expr</span><span class="p">)</span><span class="w">
  </span><span class="n">title</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr_text</span><span class="p">(</span><span class="n">p</span><span class="p">)</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
    </span><span class="n">grkstyle</span><span class="o">::</span><span class="n">grk_style_text</span><span class="p">()</span><span class="w"> </span><span class="o">|&gt;</span><span class="w"> 
    </span><span class="n">paste0</span><span class="p">(</span><span class="n">collapse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"\n"</span><span class="p">)</span><span class="w">
  
  </span><span class="n">patchwork</span><span class="o">::</span><span class="n">wrap_elements</span><span class="p">(</span><span class="n">eval</span><span class="p">(</span><span class="n">p</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> 
    </span><span class="n">patchwork</span><span class="o">::</span><span class="n">plot_annotation</span><span class="p">(</span><span class="w">
      </span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">title</span><span class="p">,</span><span class="w"> 
      </span><span class="n">theme</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">theme</span><span class="p">(</span><span class="w">
        </span><span class="n">plot.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="w">
          </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">monofont</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rel</span><span class="p">(</span><span class="m">.9</span><span class="p">),</span><span class="w"> 
          </span><span class="n">margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">margin</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">5.5</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"pt"</span><span class="p">)</span><span class="w">
        </span><span class="p">)</span><span class="w">
      </span><span class="p">)</span><span class="w">
    </span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>And let’s confirm that it works.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">self_document</span><span class="p">(</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/final-demo-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>

<p>Because we developed this function on top of rlang, we can do some tricks like 
injecting a variable’s value when capturing the code. For example, here I 
use <code class="language-plaintext highlighter-rouge">!! color</code> to replace the <code class="language-plaintext highlighter-rouge">color</code> variable with the actual value.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">color</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"white"</span><span class="w">
</span><span class="n">self_document</span><span class="p">(</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">!!</span><span class="w"> </span><span class="n">color</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/final-demo-inject-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>

<p>And if you are wondering, yes, we can <code class="language-plaintext highlighter-rouge">self_document()</code> a
<code class="language-plaintext highlighter-rouge">self_document()</code> plot.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">self_document</span><span class="p">(</span><span class="w">
  </span><span class="n">self_document</span><span class="p">(</span><span class="w">
    </span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/final-demo-self-document-1.png" title="A self_document() plot of a plot of a histogram with the plotting code above the image. There are two sets of code on top of each other." alt="A self_document() plot of a plot of a histogram with the plotting code above the image. There are two sets of code on top of each other." width="50%" style="display: block; margin: auto;" /></p>

<h2 id="alas-comments-are-lost">Alas, comments are lost</h2>

<p>One downside of this approach is that helpful comments are lost.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">self_document</span><span class="p">(</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">!!</span><span class="w"> </span><span class="n">color</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="c1"># get rid of that grey</span><span class="w">
    </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/final-demo-no-comments-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>

<p>I am not sure how to include comments. One place where comments are stored 
and printed is in function bodies:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">!!</span><span class="w"> </span><span class="n">color</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="c1"># get rid of that grey</span><span class="w">
  </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">print</span><span class="p">(</span><span class="n">f</span><span class="p">,</span><span class="w"> </span><span class="n">useSource</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; function() {</span><span class="w">
</span><span class="c1">#&gt; ggplot(mtcars, aes(x = mpg)) +</span><span class="w">
</span><span class="c1">#&gt;   geom_histogram(bins = 20, color = !! color) +</span><span class="w">
</span><span class="c1">#&gt;   # get rid of that grey</span><span class="w">
</span><span class="c1">#&gt;   theme_minimal() +</span><span class="w">
</span><span class="c1">#&gt;   labs(title = "A basic histogram")</span><span class="w">
</span><span class="c1">#&gt; }</span><span class="w">
</span><span class="c1">#&gt; &lt;environment: 0x00000222d313b848&gt;</span><span class="w">
</span></code></pre></div></div>

<p>I have no idea how to go about exploiting this feature for
self-documenting plots, however.</p>

<hr />

<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2022-03-10-self-titled-ggplot2-plots.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:si" role="doc-endnote">

      <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#&gt; ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  setting  value</span><span class="w">
</span><span class="c1">#&gt;  version  R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#&gt;  os       Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#&gt;  system   x86_64, mingw32</span><span class="w">
</span><span class="c1">#&gt;  ui       RTerm</span><span class="w">
</span><span class="c1">#&gt;  language (EN)</span><span class="w">
</span><span class="c1">#&gt;  collate  English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  ctype    English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  tz       America/Chicago</span><span class="w">
</span><span class="c1">#&gt;  date     2022-05-27</span><span class="w">
</span><span class="c1">#&gt;  pandoc   NA</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  package     * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#&gt;  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cachem        1.0.6   2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cli           3.3.0   2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  colorspace    2.0-3   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  downlit       0.4.0   2021-10-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  dplyr         1.0.9   2022-04-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  extrafont     0.18    2022-04-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  extrafontdb   1.0     2012-06-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  farver        2.1.0   2021-02-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  generics      0.1.2   2022-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ggplot2     * 3.3.6   2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  git2r         0.30.1  2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  grkstyle      0.0.3   2022-05-25 [1] Github (gadenbuie/grkstyle@6a7011c)</span><span class="w">
</span><span class="c1">#&gt;  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  here          1.0.1   2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  knitr       * 1.39    2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  labeling      0.4.2   2020-10-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  patchwork   * 1.1.1   2020-12-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  R.oo          1.24.0  2020-08-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  R.utils       2.11.0  2021-09-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ragg          1.2.2   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  Rttf2pt1      1.3.10  2022-02-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  scales        1.2.0   2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  styler        1.7.0   2022-03-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  systemfonts   1.0.4   2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  textshaping   0.3.6   2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tibble        3.1.7   2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  xfun          0.31    2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;  [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#&gt;  [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div>      </div>
      <p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Tristan Mahr</name><email>tjmahrweb@gmail.com</email></author><category term="r" /><category term="rlang" /><category term="ggplot2" /><category term="nonstandard evaluation" /><summary type="html"><![CDATA[Including plotting code as an annotation on a plot]]></summary></entry><entry><title type="html">Custom syntax highlighting themes in RMarkdown (and pandoc)</title><link href="https://tjmahr.github.io/custom-highlighting-pandoc-rmarkdown/" rel="alternate" type="text/html" title="Custom syntax highlighting themes in RMarkdown (and pandoc)" /><published>2021-11-17T00:00:00-06:00</published><updated>2021-11-17T00:00:00-06:00</updated><id>https://tjmahr.github.io/custom-highlighting-pandoc-rmarkdown</id><content type="html" xml:base="https://tjmahr.github.io/custom-highlighting-pandoc-rmarkdown/"><![CDATA[<p>I recently developed and released an R package called
<a href="https://github.com/tjmahr/solarizeddocx" title="GitHub page for solarizeddocx">solarizeddocx</a>. It provides <code class="language-plaintext highlighter-rouge">solarizeddocx::document()</code>, an
<a href="https://rmarkdown.rstudio.com/">RMarkdown</a> output format for
<a href="https://github.com/altercation/solarized" title="GitHub page for solarized">solarized</a>-highlighted Microsoft Word documents . The image below
shows a comparison of the solarizeddocx and the default docx format:</p>

<figure class="" style="max-width: 100%; display: block; margin: 2em auto;"><img src="/assets/images/2021-11-solarized.png" alt="Side-by-side comparison of solarizeddocx::document() and rmarkdown::word_document()" /><figcaption>
      Side-by-side comparison of <code class="language-plaintext highlighter-rouge">solarizeddocx::document()</code> and <a href="https://pkgs.rstudio.com/rmarkdown/reference/word_document.html"><code class="language-plaintext highlighter-rouge">rmarkdown::word_document()</code></a>.

    </figcaption></figure>

<p>The package provides a demo document which is essentially a vignette
where I describe all the customizations used by the package and put the
syntax highlighting to the test. The demo can be rendered and viewed
with:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># install.packages("devtools")</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"tjmahr/solarizeddocx"</span><span class="p">)</span><span class="w">
</span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">demo_document</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>

<p>The format can used in RMarkdown document via YAML metadata.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>output: 
  solarizeddocx::document: default
</code></pre></div></div>

<p>Or explicitly with rmarkdown:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rmarkdown</span><span class="o">::</span><span class="n">render</span><span class="p">(</span><span class="w">
  </span><span class="s2">"README.Rmd"</span><span class="p">,</span><span class="w"> 
  </span><span class="n">output_format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">document</span><span class="p">()</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>solarizeddocx also exports its document assets so that they can be used
in other output formats, and it exports theme-building tools to create
new <a href="https://pandoc.org/MANUAL.html" title="Pandoc User's Guide">pandoc</a> syntax highlighting themes. I am most proud of these
features, so I will demonstrate each of these in turn and create a brand
new syntax highlighting theme in this post.</p>

<h2 id="knitr-rmd-to-md-conversion">knitr: .Rmd to .md conversion</h2>

<p>To give a simplified description, RMarkdown works by knitting the code
in an RMarkdown (.Rmd) file with <a href="https://yihui.org/knitr/" title="knitr homepage">knitr</a> to obtain a markdown (.md)
file and then post-processing this knitr output with other tools. In
particular, it uses pandoc which converts between all kinds of document
formats. For this demonstration, we will do the knitting and pandoc
steps separately without relying on RMarkdown. That said, the options we
pass to pandoc can usually be used in RMarkdown (as we demonstrate at
the very end of this post).</p>

<p>Our input file is a small .Rmd file. It’s very basic, meant to
illustrate some function calls, strings, numbers, code comments and
output.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#&gt;")
```
Fit a model with `lm`():
```{r}
model &lt;- lm(mpg ~ 1 + cyl, mtcars)
coefs &lt;- coef(model)

# prediction for 8 cylinders
coefs["(Intercept)"] + 8 * coefs["cyl"]

predict(model, data.frame(cyl = 8L))
```
</code></pre></div></div>

<p>We <a href="https://rdrr.io/pkg/knitr/man/knit.html"><code class="language-plaintext highlighter-rouge">knit()</code></a> the document to run the code and store results
in a markdown file. (Actually, we use <a href="https://rdrr.io/pkg/knitr/man/knit_child.html"><code class="language-plaintext highlighter-rouge">knit_child()</code></a>
because I was getting some weird using-<code class="language-plaintext highlighter-rouge">knit()</code>-inside-of-<code class="language-plaintext highlighter-rouge">knit()</code>
issues when rendering this post. But in general, we would <code class="language-plaintext highlighter-rouge">knit()</code>.)</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">md_file</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tempfile</span><span class="p">(</span><span class="n">fileext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">".md"</span><span class="p">)</span><span class="w">
</span><span class="n">knit_func</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="nf">interactive</span><span class="p">())</span><span class="w"> </span><span class="n">knitr</span><span class="o">::</span><span class="n">knit</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="n">knitr</span><span class="o">::</span><span class="n">knit_child</span><span class="w">
</span><span class="n">knit_func</span><span class="p">(</span><span class="w">
  </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">file_code_block</span><span class="p">(),</span><span class="w"> 
  </span><span class="n">output</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">md_file</span><span class="p">,</span><span class="w">
  </span><span class="n">quiet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>This is the content of the file.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
Fit a model with `lm`():

```r
model &lt;- lm(mpg ~ 1 + cyl, mtcars)
coefs &lt;- coef(model)

# prediction for 8 cylinders
coefs["(Intercept)"] + 8 * coefs["cyl"]
#&gt; (Intercept) 
#&gt;    14.87826

predict(model, data.frame(cyl = 8L))
#&gt;        1 
#&gt; 14.87826
```
</code></pre></div></div>

<h2 id="pandoc-md-to-everything-conversion">pandoc: .md to <em>everything</em> conversion</h2>

<p>Everything we do with syntax highlighting occurs at this point when we
have an .md file. For this demo, we will use pandoc to convert this .md
file to an HTML document.</p>

<p>To make life easier, let’s set up a workflow for quickly converting a
.md file to an HTML document and taking a screenshot of the document.
<code class="language-plaintext highlighter-rouge">run_pandoc()</code> is a wrapper over
<a href="https://pkgs.rstudio.com/rmarkdown/reference/pandoc_convert.html"><code class="language-plaintext highlighter-rouge">rmarkdown::pandoc_convert()</code></a> but hard-codes
some output options and lets us more easily forward <code class="language-plaintext highlighter-rouge">options</code> to pandoc
using <code class="language-plaintext highlighter-rouge">...</code>.s <code class="language-plaintext highlighter-rouge">page_thumbnail()</code> is a wrapper over
<a href="http://wch.github.io/webshot/reference/webshot.html"><code class="language-plaintext highlighter-rouge">webshot::webshot()</code></a> with some predefined output
options. <code class="language-plaintext highlighter-rouge">pd_style()</code> and <code class="language-plaintext highlighter-rouge">pd_syntax()</code> are helpers we will use later
for setting pandoc options.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">run_pandoc</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">output</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tempfile</span><span class="p">(</span><span class="n">fileext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">".html"</span><span class="p">)</span><span class="w">
  </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">pandoc_convert</span><span class="p">(</span><span class="w">
    </span><span class="n">input</span><span class="p">,</span><span class="w"> 
    </span><span class="n">to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"html5"</span><span class="p">,</span><span class="w"> 
    </span><span class="n">output</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">output</span><span class="p">,</span><span class="w">
    </span><span class="n">options</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
      </span><span class="s2">"--standalone"</span><span class="p">,</span><span class="w"> 
      </span><span class="n">...</span><span class="w">
    </span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w">
  </span><span class="n">output</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">page_thumbnail</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">webshot</span><span class="o">::</span><span class="n">webshot</span><span class="p">(</span><span class="w">
    </span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">url</span><span class="p">,</span><span class="w">  
    </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">file</span><span class="p">,</span><span class="w">
    </span><span class="n">vwidth</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">500</span><span class="p">,</span><span class="w"> 
    </span><span class="n">vheight</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">350</span><span class="p">,</span><span class="w">
    </span><span class="n">zoom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">pd_style</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"--highlight-style"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="n">pd_syntax</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"--syntax-definition"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">)</span><span class="w">

</span><span class="c1"># Update from May 2022: Make file paths into urls</span><span class="w">
</span><span class="n">url_file</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="s2">"file://localhost/"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>These tools let us preview the default syntax highlighting in pandoc:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">results</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">run_pandoc</span><span class="p">(</span><span class="n">md_file</span><span class="p">,</span><span class="w"> </span><span class="n">pd_style</span><span class="p">(</span><span class="s2">"tango"</span><span class="p">))</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">results</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot1.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot1-1.png" title="Screenshot of html file created by pandoc" alt="Screenshot of html file created by pandoc" width="80%" style="display: block; margin: auto;" /></p>

<h2 id="setting-pandoc-options">Setting pandoc options</h2>

<p>Here is the pandoc HTML output but this time using my solarized (light)
highlighting style:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">theme_sl</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">file_solarized_light_theme</span><span class="p">()</span><span class="w">
</span><span class="n">results</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">run_pandoc</span><span class="p">(</span><span class="n">md_file</span><span class="p">,</span><span class="w"> </span><span class="n">pd_style</span><span class="p">(</span><span class="n">theme_sl</span><span class="p">))</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">results</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot2.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot2-1.png" title="Screenshot of html file created by pandoc. It now has solarized colors." alt="Screenshot of html file created by pandoc. It now has solarized colors." width="80%" style="display: block; margin: auto;" /></p>

<p>By convention, we see two kinds of comment lines: actual code comments
(<code class="language-plaintext highlighter-rouge">#</code>) and R output (<code class="language-plaintext highlighter-rouge">#&gt;</code>). The <code class="language-plaintext highlighter-rouge">#&gt;</code> comments helpful because I can copy
a whole code block (output included) and run it in R without that output
being interpreted as code. But <strong>these comments represent two different
kinds of information</strong>, and I’d like them to be styled differently. The
<code class="language-plaintext highlighter-rouge">#</code> code comments can stay unintrusive (light italic type), but the <code class="language-plaintext highlighter-rouge">#&gt;</code>
out comments should be legible (darker roman type).</p>

<p>To treat these two type of comments differently, I modified the <a href="https://github.com/KDE/syntax-highlighting/blob/master/data/syntax/r.xml" title="GitHub page for the r.xml syntax definition">R
syntax definition</a> used by pandoc to recognize <code class="language-plaintext highlighter-rouge">#</code> and <code class="language-plaintext highlighter-rouge">#&gt;</code>
as different entities. We can pass that syntax definition to pandoc:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">syntax_sl</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">file_syntax_definition</span><span class="p">()</span><span class="w">
</span><span class="n">results</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">run_pandoc</span><span class="p">(</span><span class="w">
  </span><span class="n">md_file</span><span class="p">,</span><span class="w"> 
  </span><span class="n">pd_style</span><span class="p">(</span><span class="n">theme_sl</span><span class="p">),</span><span class="w"> 
  </span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">results</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot3.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot3-1.png" title="Screenshot of html file created by pandoc. It now has solarized colors and differently styled #&gt; comments." alt="Screenshot of html file created by pandoc. It now has solarized colors and differently styled #&gt; comments." width="80%" style="display: block; margin: auto;" /></p>

<h2 id="creating-a-theme-from-scratch">Creating a theme from scratch</h2>

<p>Maybe you’re thinking, <em>that’s cool… if you like solarized. What about
something fun like Fairy Floss?</em> Okay, fine, let’s make <a href="https://github.com/sailorhg/fairyfloss">Fairy
Floss</a>… right now… in this
blog post.</p>

<p>First, let’s store the Fairy Floss colors in a handy list:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ff_colors</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
  </span><span class="n">gold</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#e6c000"</span><span class="p">,</span><span class="w">
  </span><span class="n">yellow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#ffea00"</span><span class="p">,</span><span class="w">
  </span><span class="n">dark_purple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#5a5475"</span><span class="p">,</span><span class="w">
  </span><span class="n">white</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#f8f8f2"</span><span class="p">,</span><span class="w">
  </span><span class="n">pink</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#ffb8d1"</span><span class="p">,</span><span class="w">
  </span><span class="n">salmon</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#ff857f"</span><span class="p">,</span><span class="w">
  </span><span class="n">purple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#c5a3ff"</span><span class="p">,</span><span class="w">
  </span><span class="n">teal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#c2ffdf"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>If we use the correct command, pandoc will provide us with a syntax
highlighting theme as a JSON file. <code class="language-plaintext highlighter-rouge">copy_base_pandoc_theme()</code> will call
this command for us. We can read that file into R and see that it is a
list of global style options followed by a list of individual style
definitions.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">temptheme</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tempfile</span><span class="p">(</span><span class="n">fileext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">".theme"</span><span class="p">)</span><span class="w"> 
</span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">copy_base_pandoc_theme</span><span class="p">(</span><span class="n">temptheme</span><span class="p">)</span><span class="w">

</span><span class="n">data_theme</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">::</span><span class="n">read_json</span><span class="p">(</span><span class="n">temptheme</span><span class="p">)</span><span class="w">
</span><span class="n">str</span><span class="p">(</span><span class="n">data_theme</span><span class="p">,</span><span class="w"> </span><span class="n">max.level</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; List of 5</span><span class="w">
</span><span class="c1">#&gt;  $ text-color                  : NULL</span><span class="w">
</span><span class="c1">#&gt;  $ background-color            : NULL</span><span class="w">
</span><span class="c1">#&gt;  $ line-number-color           : chr "#aaaaaa"</span><span class="w">
</span><span class="c1">#&gt;  $ line-number-background-color: NULL</span><span class="w">
</span><span class="c1">#&gt;  $ text-styles                 :List of 29</span><span class="w">
</span><span class="c1">#&gt;   ..$ Other         :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Attribute     :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ SpecialString :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Annotation    :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Function      :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ String        :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ ControlFlow   :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Operator      :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Error         :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ BaseN         :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Alert         :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Variable      :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ BuiltIn       :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Extension     :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Preprocessor  :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Information   :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ VerbatimString:List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Warning       :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Documentation :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Import        :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Char          :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ DataType      :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Float         :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Comment       :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ CommentVar    :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Constant      :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ SpecialChar   :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ DecVal        :List of 5</span><span class="w">
</span><span class="c1">#&gt;   ..$ Keyword       :List of 5</span><span class="w">
</span></code></pre></div></div>

<p>Each of those individual style definitions is a list of color options
and font style options:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">str</span><span class="p">(</span><span class="n">data_theme</span><span class="o">$</span><span class="n">`text-styles`</span><span class="o">$</span><span class="n">Comment</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; List of 5</span><span class="w">
</span><span class="c1">#&gt;  $ text-color      : chr "#60a0b0"</span><span class="w">
</span><span class="c1">#&gt;  $ background-color: NULL</span><span class="w">
</span><span class="c1">#&gt;  $ bold            : logi FALSE</span><span class="w">
</span><span class="c1">#&gt;  $ italic          : logi TRUE</span><span class="w">
</span><span class="c1">#&gt;  $ underline       : logi FALSE</span><span class="w">
</span></code></pre></div></div>

<p>solarizeddocx provides a helper function <code class="language-plaintext highlighter-rouge">set_theme_text_style()</code> for
setting individual style options. Let’s set up Fairy Floss’s global and
comment styles. We use the fake name <code class="language-plaintext highlighter-rouge">"global"</code> to access the global
style options, and we use style definition names like <code class="language-plaintext highlighter-rouge">"Comment"</code> to
access those specifically.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">magrittr</span><span class="p">)</span><span class="w">
</span><span class="n">ff_theme</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data_theme</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">set_theme_text_style</span><span class="p">(</span><span class="w">
    </span><span class="s2">"global"</span><span class="p">,</span><span class="w"> 
    </span><span class="n">background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">dark_purple</span><span class="p">,</span><span class="w">
    </span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">set_theme_text_style</span><span class="p">(</span><span class="w">
    </span><span class="s2">"Comment"</span><span class="p">,</span><span class="w">
    </span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">gold</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">set_theme_text_style</span><span class="p">(</span><span class="w">
    </span><span class="s2">"String"</span><span class="p">,</span><span class="w">
    </span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">yellow</span><span class="w"> 
  </span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>Let’s preview our partial theme:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">write_pandoc_theme</span><span class="p">(</span><span class="n">ff_theme</span><span class="p">,</span><span class="w"> </span><span class="n">temptheme</span><span class="p">)</span><span class="w">
</span><span class="n">results</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">run_pandoc</span><span class="p">(</span><span class="w">
  </span><span class="n">md_file</span><span class="p">,</span><span class="w"> 
  </span><span class="n">pd_style</span><span class="p">(</span><span class="n">temptheme</span><span class="p">),</span><span class="w"> 
  </span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">results</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot4.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot4-1.png" title="Screenshot of html file created by pandoc. It has a purple background, white text, gold comments and yellow strings, but it still looks bad because not all of the colors are done." alt="Screenshot of html file created by pandoc. It has a purple background, white text, gold comments and yellow strings, but it still looks bad because not all of the colors are done." width="80%" style="display: block; margin: auto;" /></p>

<p>This is a good start, but when I first ported the solarized theme, I had
to use 20 calls to <code class="language-plaintext highlighter-rouge">set_theme_text_style()</code>. That’s a lot. Plus,
<strong>themes are data</strong>. Can’t we just describe what needs to change in a
list? Yes. For this post, I made
<code class="language-plaintext highlighter-rouge">solarizeddocx::patch_theme_text_style()</code> where we describe the changes
to make as a list of patches.</p>

<p>Let’s write our list of patches to make to the base theme. Because some
style definitions are identical, we will use tibble’s lazy list
<a href="https://rdrr.io/pkg/tibble/man/lst.html"><code class="language-plaintext highlighter-rouge">tibble::lst()</code></a>to reuse patches along the way. For this
application of the palette, I consulted the <a href="http://tmtheme-editor.herokuapp.com/#!/editor/url/https://raw.githubusercontent.com/sailorhg/fairyfloss/gh-pages/fairyfloss.tmTheme" title="Fairy Floss Theme in online editor">Fairy Floss .tmTheme
file</a> and the <a href="https://github.com/gadenbuie/rsthemes/blob/main/inst/templates/fairyfloss.scss#L31-L42" title="GitHub source for  rsthemes/inst/templates/fairyfloss.scss">rsthemes implementation</a> of Fairy
Floss.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">patches</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tibble</span><span class="o">::</span><span class="n">lst</span><span class="p">(</span><span class="w">
  </span><span class="n">global</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
    </span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="p">,</span><span class="w">
    </span><span class="n">background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">dark_purple</span><span class="w">
  </span><span class="p">),</span><span class="w">
  </span><span class="c1"># # comments</span><span class="w">
  </span><span class="n">Comment</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">gold</span><span class="p">,</span><span class="w"> </span><span class="n">italic</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">bold</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">),</span><span class="w">
  </span><span class="c1"># ## comments</span><span class="w">
  </span><span class="n">Documentation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Comment</span><span class="p">,</span><span class="w">
  </span><span class="c1"># #&gt; comments</span><span class="w">
  </span><span class="n">Information</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">gold</span><span class="p">,</span><span class="w"> </span><span class="n">italic</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">bold</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">),</span><span class="w">
  </span><span class="n">Keyword</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">pink</span><span class="p">),</span><span class="w">
  </span><span class="n">ControlFlow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">pink</span><span class="p">,</span><span class="w"> </span><span class="n">bold</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">),</span><span class="w">
  </span><span class="n">Operator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">pink</span><span class="p">),</span><span class="w">
  </span><span class="n">Function</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">teal</span><span class="p">),</span><span class="w">
  </span><span class="n">Attribute</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="p">),</span><span class="w">
  </span><span class="n">Variable</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="p">),</span><span class="w">
  </span><span class="c1"># this should be code outside of a code block</span><span class="w">
  </span><span class="n">VerbatimString</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
    </span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="p">,</span><span class="w"> 
    </span><span class="n">background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">dark_purple</span><span class="w">
  </span><span class="p">),</span><span class="w">
  </span><span class="n">Other</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Variable</span><span class="p">,</span><span class="w">
  </span><span class="n">Constant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">purple</span><span class="p">),</span><span class="w">
  </span><span class="n">Error</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">salmon</span><span class="p">),</span><span class="w">
  </span><span class="n">Alert</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Error</span><span class="p">,</span><span class="w">
  </span><span class="n">Warning</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Error</span><span class="p">,</span><span class="w">
  </span><span class="n">Float</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">purple</span><span class="p">),</span><span class="w">
  </span><span class="n">DecVal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Float</span><span class="p">,</span><span class="w">
  </span><span class="n">BaseN</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Float</span><span class="p">,</span><span class="w">
  </span><span class="n">SpecialChar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="p">),</span><span class="w">
  </span><span class="n">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">yellow</span><span class="p">),</span><span class="w">
  </span><span class="n">Char</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w">
  </span><span class="n">SpecialString</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">String</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<div class="notice--info">
  <p><strong>Save yourself from guessing and checking.</strong> These style definition
names are documented on <a href="https://docs.kde.org/stable5/en/kate/katepart/highlight.html#kate-highlight-default-styles">this
page</a>.
I wish I had found this page before starting to port the solarized
theme. My initial approach was to use the style inspector in Microsoft
Word and look at the style names applied to pieces of code. The downside
of that approach is that in order to figure out what a <code class="language-plaintext highlighter-rouge">SpecialChar</code>
was, I had to write a <code class="language-plaintext highlighter-rouge">SpecialChar</code>. (Escape sequences inside of strings
like <code class="language-plaintext highlighter-rouge">"hello\nthere"</code> are <code class="language-plaintext highlighter-rouge">SpecialChars</code> in the R syntax definition used
by pandoc.)</p>
</div>

<p>Now we apply our patches to the theme:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ff_theme</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">patch_theme_text_style</span><span class="p">(</span><span class="w">
  </span><span class="n">data_theme</span><span class="p">,</span><span class="w">
  </span><span class="n">patches</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">write_pandoc_theme</span><span class="p">(</span><span class="n">ff_theme</span><span class="p">,</span><span class="w"> </span><span class="n">temptheme</span><span class="p">)</span><span class="w">
</span><span class="n">results</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">run_pandoc</span><span class="p">(</span><span class="w">
  </span><span class="n">md_file</span><span class="p">,</span><span class="w"> 
  </span><span class="n">pd_style</span><span class="p">(</span><span class="n">temptheme</span><span class="p">),</span><span class="w"> 
  </span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">results</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot5.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot5-1.png" title="Screenshot of html file created by pandoc. It now has Fairy Floss colors." alt="Screenshot of html file created by pandoc. It now has Fairy Floss colors." width="80%" style="display: block; margin: auto;" /></p>

<p>Wonderful!</p>

<h2 id="sneaking-these-features-into-rmarkdown">Sneaking these features into RMarkdown</h2>

<div class="notice--info">
  <p><strong>Update: This problem has been fixed</strong>. When I first wrote this post,
it was not possible to use custom highlighting themes with RMarkdown
HTML documents. The syntax highlighting for this format was overhauled
in
<a href="https://cran.r-project.org/web/packages/rmarkdown/news/news.html">rmarkdown 2.12</a>.
[<em>May 27, 2022</em>]</p>
</div>

<p>So far, we have set these options by directly calling pandoc with the
style and syntax options. <del>We can use these options in RMarkdown <em>some of
the time</em>. For example, here we try to send the Fairy Floss theme into
an <a href="https://pkgs.rstudio.com/rmarkdown/reference/html_document.html"><code class="language-plaintext highlighter-rouge">html_document()</code></a> and fail.</del></p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">out</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">render</span><span class="p">(</span><span class="w">
  </span><span class="n">md_file</span><span class="p">,</span><span class="w"> 
  </span><span class="n">output_format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">html_document</span><span class="p">(</span><span class="w">
    </span><span class="c1"># Update, May 2022: Adding this line fixes things</span><span class="w">
    </span><span class="n">highlight</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pd_style</span><span class="p">(</span><span class="n">temptheme</span><span class="p">)[</span><span class="m">2</span><span class="p">],</span><span class="w">
    </span><span class="n">pandoc_args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
      </span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">)</span><span class="w">
    </span><span class="p">)</span><span class="w">
  </span><span class="p">),</span><span class="w">
  </span><span class="n">quiet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">out</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot6.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot6-1.png" title="Screenshot of html file created by RMarkdown. It has the default colors." alt="Screenshot of html file created by RMarkdown. It has the default colors." width="80%" style="display: block; margin: auto;" /></p>

<p><del>RMarkdown assembles and performs a giant pandoc command. The problem,
as far as I can tell, is that this command includes our
<code class="language-plaintext highlighter-rouge">pd_style(temptheme)</code> which sets the option for
<code class="language-plaintext highlighter-rouge">--highlight-style</code>—but later on it also includes <code class="language-plaintext highlighter-rouge">--no-highlight</code>
which blocks our style. Bummer.</del></p>

<p>If we use the simpler <a href="https://pkgs.rstudio.com/rmarkdown/reference/html_document_base.html"><code class="language-plaintext highlighter-rouge">html_document_base()</code></a>
format, however, we can see Fairy Floss output.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">out</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">render</span><span class="p">(</span><span class="w">
  </span><span class="n">md_file</span><span class="p">,</span><span class="w"> 
  </span><span class="n">output_format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">html_document_base</span><span class="p">(</span><span class="w">
    </span><span class="n">pandoc_args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">pd_style</span><span class="p">(</span><span class="n">temptheme</span><span class="p">),</span><span class="w"> </span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">))</span><span class="w">
  </span><span class="p">),</span><span class="w">
  </span><span class="n">quiet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">out</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot7.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot7-1.png" title="Screenshot of html file created by RMarkdown. It has the Fairy Floss colors." alt="Screenshot of html file created by RMarkdown. It has the Fairy Floss colors." width="80%" style="display: block; margin: auto;" /></p>

<p>The options also work for the <a href="https://pkgs.rstudio.com/rmarkdown/reference/pdf_document.html"><code class="language-plaintext highlighter-rouge">pdf_document()</code></a>
format.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">out</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">render</span><span class="p">(</span><span class="w">
  </span><span class="n">md_file</span><span class="p">,</span><span class="w"> 
  </span><span class="n">output_format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">pdf_document</span><span class="p">(</span><span class="w">
    </span><span class="n">pandoc_args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">pd_style</span><span class="p">(</span><span class="n">temptheme</span><span class="p">),</span><span class="w"> </span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">))</span><span class="w">
  </span><span class="p">),</span><span class="w"> 
  </span><span class="n">quiet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="c1"># Convert to png and crop most of the empty page</span><span class="w">
</span><span class="n">png</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">pdftools</span><span class="o">::</span><span class="n">pdf_convert</span><span class="p">(</span><span class="n">out</span><span class="p">,</span><span class="w"> </span><span class="n">dpi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">144</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; Converting page 1 to file343c662113f3_1.png... done!</span><span class="w">
</span><span class="n">magick</span><span class="o">::</span><span class="n">image_read</span><span class="p">(</span><span class="n">png</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">magick</span><span class="o">::</span><span class="n">image_crop</span><span class="p">(</span><span class="n">magick</span><span class="o">::</span><span class="n">geometry_area</span><span class="p">(</span><span class="m">1050</span><span class="p">,</span><span class="w"> </span><span class="m">400</span><span class="p">,</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="m">100</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>

<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot8-1.png" title="Screenshot of a cropped pdf file created by RMarkdown. It has the Fairy Floss colors." alt="Screenshot of a cropped pdf file created by RMarkdown. It has the Fairy Floss colors." width="80%" style="display: block; margin: auto;" /></p>

<p>The options also work with <a href="https://pkgs.rstudio.com/rmarkdown/reference/word_document.html"><code class="language-plaintext highlighter-rouge">word_document()</code></a>. In
fact, that’s how <code class="language-plaintext highlighter-rouge">solarizeddocx::document()</code> works.</p>

<hr />

<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2021-11-17-custom-highlighting-pandoc-rmarkdown.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:si" role="doc-endnote">

      <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#&gt; ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  setting  value</span><span class="w">
</span><span class="c1">#&gt;  version  R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#&gt;  os       Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#&gt;  system   x86_64, mingw32</span><span class="w">
</span><span class="c1">#&gt;  ui       RTerm</span><span class="w">
</span><span class="c1">#&gt;  language (EN)</span><span class="w">
</span><span class="c1">#&gt;  collate  English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  ctype    English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  tz       America/Chicago</span><span class="w">
</span><span class="c1">#&gt;  date     2022-05-27</span><span class="w">
</span><span class="c1">#&gt;  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  package       * version    date (UTC) lib source</span><span class="w">
</span><span class="c1">#&gt;  askpass         1.1        2019-01-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  bslib           0.3.1      2021-10-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cachem          1.0.6      2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  callr           3.7.0      2021-04-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cli             3.3.0      2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  crayon          1.5.1      2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  digest          0.6.29     2021-12-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  downlit         0.4.0      2021-10-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ellipsis        0.3.2      2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  evaluate        0.15       2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fansi           1.0.3      2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fastmap         1.1.0      2021-01-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  git2r           0.30.1     2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  glue            1.6.2      2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  here            1.0.1      2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  highr           0.9        2021-04-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  htmltools       0.5.2      2021-08-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  jquerylib       0.1.4      2021-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  jsonlite        1.8.0      2022-02-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  knitr         * 1.39       2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  lifecycle       1.0.1      2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  magick          2.7.3      2021-08-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  magrittr      * 2.0.3      2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  memoise         2.0.1      2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pdftools        3.2.0      2022-04-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pillar          1.7.0      2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  processx        3.5.3      2022-03-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ps              1.7.0      2022-04-23 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  qpdf            1.1        2019-03-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  R6              2.5.1      2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ragg            1.2.2      2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  Rcpp            1.0.8.3    2022-03-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rlang           1.0.2      2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rmarkdown       2.14       2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rprojroot       2.0.3      2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rstudioapi      0.13       2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  sass            0.4.1      2022-03-23 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  sessioninfo     1.2.2      2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  solarizeddocx   0.0.1.9000 2022-05-25 [1] Github (tjmahr/solarizeddocx@8f82bf1)</span><span class="w">
</span><span class="c1">#&gt;  stringi         1.7.6      2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringr         1.4.0      2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  systemfonts     1.0.4      2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  textshaping     0.3.6      2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tibble          3.1.7      2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tinytex         0.39       2022-05-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  utf8            1.2.2      2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  vctrs           0.4.1      2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  webshot         0.5.3      2022-04-14 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  xfun            0.31       2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  yaml            2.3.5      2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;  [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#&gt;  [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div>      </div>
      <p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Tristan Mahr</name><email>tjmahrweb@gmail.com</email></author><category term="r" /><category term="knitr" /><category term="pandoc" /><category term="syntax highlighting" /><category term="rmarkdown" /><category term="solarizeddocx" /><summary type="html"><![CDATA[Now you can have Fairy Floss in quarterly-report.docx]]></summary></entry><entry><title type="html">A one-liner for generating random participant IDs</title><link href="https://tjmahr.github.io/one-liner-to-generate-ids/" rel="alternate" type="text/html" title="A one-liner for generating random participant IDs" /><published>2021-10-12T00:00:00-05:00</published><updated>2021-10-12T00:00:00-05:00</updated><id>https://tjmahr.github.io/one-liner-to-generate-ids</id><content type="html" xml:base="https://tjmahr.github.io/one-liner-to-generate-ids/"><![CDATA[<p>On one of the Slacks I browse, someone asked how to de-identify a
column of participant IDs. The original dataset was a wait list, so
the ordering of IDs itself was a sensitive feature of the data and we
need to scramble the order of IDs produced.</p>

<p>For example, suppose we have the following <em>repeated measures</em> dataset.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tibble</span><span class="o">::</span><span class="n">tribble</span><span class="p">(</span><span class="w">
  </span><span class="o">~</span><span class="w"> </span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">timepoint</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">score</span><span class="p">,</span><span class="w">
           </span><span class="s2">"DB"</span><span class="p">,</span><span class="w">           </span><span class="m">1</span><span class="p">,</span><span class="w">       </span><span class="m">7</span><span class="p">,</span><span class="w">
           </span><span class="s2">"DB"</span><span class="p">,</span><span class="w">           </span><span class="m">2</span><span class="p">,</span><span class="w">       </span><span class="m">8</span><span class="p">,</span><span class="w">
           </span><span class="s2">"DB"</span><span class="p">,</span><span class="w">           </span><span class="m">3</span><span class="p">,</span><span class="w">       </span><span class="m">8</span><span class="p">,</span><span class="w">
           </span><span class="s2">"TW"</span><span class="p">,</span><span class="w">           </span><span class="m">1</span><span class="p">,</span><span class="w">      </span><span class="kc">NA</span><span class="p">,</span><span class="w">
           </span><span class="s2">"TW"</span><span class="p">,</span><span class="w">           </span><span class="m">2</span><span class="p">,</span><span class="w">       </span><span class="m">9</span><span class="p">,</span><span class="w">
           </span><span class="s2">"CF"</span><span class="p">,</span><span class="w">           </span><span class="m">1</span><span class="p">,</span><span class="w">       </span><span class="m">9</span><span class="p">,</span><span class="w">
           </span><span class="s2">"CF"</span><span class="p">,</span><span class="w">           </span><span class="m">2</span><span class="p">,</span><span class="w">       </span><span class="m">8</span><span class="p">,</span><span class="w">
           </span><span class="s2">"JH"</span><span class="p">,</span><span class="w">           </span><span class="m">1</span><span class="p">,</span><span class="w">      </span><span class="m">10</span><span class="p">,</span><span class="w">
           </span><span class="s2">"JH"</span><span class="p">,</span><span class="w">           </span><span class="m">2</span><span class="p">,</span><span class="w">      </span><span class="m">10</span><span class="p">,</span><span class="w">
           </span><span class="s2">"JH"</span><span class="p">,</span><span class="w">           </span><span class="m">3</span><span class="p">,</span><span class="w">      </span><span class="m">10</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>We want to map the <code class="language-plaintext highlighter-rouge">participant</code> identifiers onto some sort of
shuffled-up random IDs. Suggestions included hashing the IDs with
<a href="https://rdrr.io/pkg/digest/man/sha1.html">digest</a>:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># This approach cryptographically compresses the input into a short</span><span class="w">
</span><span class="c1"># "digest". (It is not a random ID.)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="w">
    </span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Vectorize</span><span class="p">(</span><span class="n">digest</span><span class="o">::</span><span class="n">sha1</span><span class="p">)(</span><span class="n">participant</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#&gt;    participant                              timepoint score</span><span class="w">
</span><span class="c1">#&gt;    &lt;chr&gt;                                        &lt;dbl&gt; &lt;dbl&gt;</span><span class="w">
</span><span class="c1">#&gt;  1 ad61ec1247b2381922bec89483c3ce2fb67f98d9         1     7</span><span class="w">
</span><span class="c1">#&gt;  2 ad61ec1247b2381922bec89483c3ce2fb67f98d9         2     8</span><span class="w">
</span><span class="c1">#&gt;  3 ad61ec1247b2381922bec89483c3ce2fb67f98d9         3     8</span><span class="w">
</span><span class="c1">#&gt;  4 c080f9a87edc6d47f28185279fd8be068c566a37         1    NA</span><span class="w">
</span><span class="c1">#&gt;  5 c080f9a87edc6d47f28185279fd8be068c566a37         2     9</span><span class="w">
</span><span class="c1">#&gt;  6 1f9da22bf684761daec27326331c58b46502a25b         1     9</span><span class="w">
</span><span class="c1">#&gt;  7 1f9da22bf684761daec27326331c58b46502a25b         2     8</span><span class="w">
</span><span class="c1">#&gt;  8 627d211747438ae59690cea8f0a8d6adf666b974         1    10</span><span class="w">
</span><span class="c1">#&gt;  9 627d211747438ae59690cea8f0a8d6adf666b974         2    10</span><span class="w">
</span><span class="c1">#&gt; 10 627d211747438ae59690cea8f0a8d6adf666b974         3    10</span><span class="w">
</span></code></pre></div></div>

<p>But this approach seems like overkill, and hashing just transforms these
IDs. We want to be rid of them completely.</p>

<p>The <a href="https://rdrr.io/pkg/uuid/man/UUIDgenerate.html">uuid</a> package provides <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random)">another approach</a>:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">group_by</span><span class="p">(</span><span class="n">participant</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="w">
    </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">uuid</span><span class="o">::</span><span class="n">UUIDgenerate</span><span class="p">(</span><span class="n">use.time</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">ungroup</span><span class="p">()</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">relocate</span><span class="p">(</span><span class="n">participant</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#&gt;    participant                          timepoint score</span><span class="w">
</span><span class="c1">#&gt;    &lt;chr&gt;                                    &lt;dbl&gt; &lt;dbl&gt;</span><span class="w">
</span><span class="c1">#&gt;  1 03e9536d-1446-4779-ac4d-67848fa73ef4         1     7</span><span class="w">
</span><span class="c1">#&gt;  2 03e9536d-1446-4779-ac4d-67848fa73ef4         2     8</span><span class="w">
</span><span class="c1">#&gt;  3 03e9536d-1446-4779-ac4d-67848fa73ef4         3     8</span><span class="w">
</span><span class="c1">#&gt;  4 f7b73ca6-57c7-4c9a-9211-86b434912856         1    NA</span><span class="w">
</span><span class="c1">#&gt;  5 f7b73ca6-57c7-4c9a-9211-86b434912856         2     9</span><span class="w">
</span><span class="c1">#&gt;  6 81b02d88-c3bd-490b-b2dc-150077f03172         1     9</span><span class="w">
</span><span class="c1">#&gt;  7 81b02d88-c3bd-490b-b2dc-150077f03172         2     8</span><span class="w">
</span><span class="c1">#&gt;  8 60f80714-77ba-4e9f-a7d2-1943ca6724fc         1    10</span><span class="w">
</span><span class="c1">#&gt;  9 60f80714-77ba-4e9f-a7d2-1943ca6724fc         2    10</span><span class="w">
</span><span class="c1">#&gt; 10 60f80714-77ba-4e9f-a7d2-1943ca6724fc         3    10</span><span class="w">
</span></code></pre></div></div>

<p>Again, these IDs seem excessive: Imagine plotting data with one participant 
per facet.</p>

<p>When I create blogposts for this site, I use a function to create a new
.Rmd file with the date and a <a href="https://rdrr.io/pkg/ids/man/adjective_animal.html">random adjective-animal
phrase</a> for a
placeholder (e.g., <code class="language-plaintext highlighter-rouge">2021-06-28-mild-capybara.Rmd</code>). We could try that for
fun:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">group_by</span><span class="p">(</span><span class="n">participant</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="w">
    </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ids</span><span class="o">::</span><span class="n">adjective_animal</span><span class="p">()</span><span class="w">
  </span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">ungroup</span><span class="p">()</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">relocate</span><span class="p">(</span><span class="n">participant</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#&gt;    participant              timepoint score</span><span class="w">
</span><span class="c1">#&gt;    &lt;chr&gt;                        &lt;dbl&gt; &lt;dbl&gt;</span><span class="w">
</span><span class="c1">#&gt;  1 chrysoprase_bushsqueaker         1     7</span><span class="w">
</span><span class="c1">#&gt;  2 chrysoprase_bushsqueaker         2     8</span><span class="w">
</span><span class="c1">#&gt;  3 chrysoprase_bushsqueaker         3     8</span><span class="w">
</span><span class="c1">#&gt;  4 hideous_cheetah                  1    NA</span><span class="w">
</span><span class="c1">#&gt;  5 hideous_cheetah                  2     9</span><span class="w">
</span><span class="c1">#&gt;  6 powdery_siamang                  1     9</span><span class="w">
</span><span class="c1">#&gt;  7 powdery_siamang                  2     8</span><span class="w">
</span><span class="c1">#&gt;  8 ducal_hornshark                  1    10</span><span class="w">
</span><span class="c1">#&gt;  9 ducal_hornshark                  2    10</span><span class="w">
</span><span class="c1">#&gt; 10 ducal_hornshark                  3    10</span><span class="w">
</span></code></pre></div></div>

<p>But that’s too whimsical (and something like <code class="language-plaintext highlighter-rouge">hideous-cheetah</code> seems
disrespectful for human subjects).</p>

<p>One user suggested <a href="https://forcats.tidyverse.org/reference/fct_anon.html"><code class="language-plaintext highlighter-rouge">forcats::fct_anon()</code></a>:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="w">
    </span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">participant</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
      </span><span class="n">as.factor</span><span class="p">()</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
      </span><span class="n">forcats</span><span class="o">::</span><span class="n">fct_anon</span><span class="p">(</span><span class="n">prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"p0"</span><span class="p">)</span><span class="w">
    </span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#&gt;    participant timepoint score</span><span class="w">
</span><span class="c1">#&gt;    &lt;fct&gt;           &lt;dbl&gt; &lt;dbl&gt;</span><span class="w">
</span><span class="c1">#&gt;  1 p04                 1     7</span><span class="w">
</span><span class="c1">#&gt;  2 p04                 2     8</span><span class="w">
</span><span class="c1">#&gt;  3 p04                 3     8</span><span class="w">
</span><span class="c1">#&gt;  4 p02                 1    NA</span><span class="w">
</span><span class="c1">#&gt;  5 p02                 2     9</span><span class="w">
</span><span class="c1">#&gt;  6 p03                 1     9</span><span class="w">
</span><span class="c1">#&gt;  7 p03                 2     8</span><span class="w">
</span><span class="c1">#&gt;  8 p01                 1    10</span><span class="w">
</span><span class="c1">#&gt;  9 p01                 2    10</span><span class="w">
</span><span class="c1">#&gt; 10 p01                 3    10</span><span class="w">
</span></code></pre></div></div>

<p>This approach works wonderfully. The only wrinkle is that it requires
converting our IDs to a factor in order to work.</p>

<h2 id="call-me-the-match-maker">Call me the <code class="language-plaintext highlighter-rouge">match()</code>-maker</h2>

<p>My approach is a nice combination of base R functions:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="w">
    </span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">match</span><span class="p">(</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">participant</span><span class="p">)))</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#&gt;    participant timepoint score</span><span class="w">
</span><span class="c1">#&gt;          &lt;int&gt;     &lt;dbl&gt; &lt;dbl&gt;</span><span class="w">
</span><span class="c1">#&gt;  1           3         1     7</span><span class="w">
</span><span class="c1">#&gt;  2           3         2     8</span><span class="w">
</span><span class="c1">#&gt;  3           3         3     8</span><span class="w">
</span><span class="c1">#&gt;  4           1         1    NA</span><span class="w">
</span><span class="c1">#&gt;  5           1         2     9</span><span class="w">
</span><span class="c1">#&gt;  6           2         1     9</span><span class="w">
</span><span class="c1">#&gt;  7           2         2     8</span><span class="w">
</span><span class="c1">#&gt;  8           4         1    10</span><span class="w">
</span><span class="c1">#&gt;  9           4         2    10</span><span class="w">
</span><span class="c1">#&gt; 10           4         3    10</span><span class="w">
</span></code></pre></div></div>

<p><a href="https://rdrr.io/r/base/match.html"><code class="language-plaintext highlighter-rouge">match(x, table)</code></a> returns the first
positions of the <code class="language-plaintext highlighter-rouge">x</code> elements in some vector <code class="language-plaintext highlighter-rouge">table</code>. What is the
position in the alphabet of the letters L and Q and L again?</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">match</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"L"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Q"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] 12 17 12</span><span class="w">
</span></code></pre></div></div>

<p><a href="https://rdrr.io/r/base/sample.html"><code class="language-plaintext highlighter-rouge">sample()</code></a> shuffles the values in
the <code class="language-plaintext highlighter-rouge">table</code> so the order of elements is lost. The <code class="language-plaintext highlighter-rouge">unique()</code> is
optional. We could just <code class="language-plaintext highlighter-rouge">sample(data$participant)</code>. Then the first
position of one of the IDs might be a number larger than 4:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">shuffle</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="n">data</span><span class="o">$</span><span class="n">participant</span><span class="p">)</span><span class="w">
</span><span class="n">shuffle</span><span class="w">
</span><span class="c1">#&gt;  [1] "CF" "JH" "TW" "JH" "DB" "DB" "DB" "JH" "CF" "TW"</span><span class="w">

</span><span class="n">match</span><span class="p">(</span><span class="n">data</span><span class="o">$</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="n">shuffle</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt;  [1] 5 5 5 3 3 1 1 2 2 2</span><span class="w">
</span></code></pre></div></div>

<p>For more aesthetically pleasing names, and for names that will sort
correctly, we can zero-pad the results with
<a href="https://rdrr.io/r/base/sprintf.html"><code class="language-plaintext highlighter-rouge">sprintf()</code></a>. I am mostly
including this step so that I have it written down somewhere for my own
reference.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">zero_pad</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span><span class="w"> </span><span class="n">prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="c1"># use widest element if bigger than `width`</span><span class="w">
  </span><span class="n">width</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">nchar</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span><span class="w"> </span><span class="n">width</span><span class="p">))</span><span class="w">
  </span><span class="n">sprintf</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="n">prefix</span><span class="p">,</span><span class="w"> </span><span class="s2">"%0"</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="p">,</span><span class="w"> </span><span class="s2">"d"</span><span class="p">),</span><span class="w"> </span><span class="n">xs</span><span class="p">)</span><span class="w">    
</span><span class="p">}</span><span class="w">

</span><span class="n">data</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="w">
    </span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">match</span><span class="p">(</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">participant</span><span class="p">))),</span><span class="w">
    </span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">zero_pad</span><span class="p">(</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="s2">"p"</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#&gt;    participant timepoint score</span><span class="w">
</span><span class="c1">#&gt;    &lt;chr&gt;           &lt;dbl&gt; &lt;dbl&gt;</span><span class="w">
</span><span class="c1">#&gt;  1 p003                1     7</span><span class="w">
</span><span class="c1">#&gt;  2 p003                2     8</span><span class="w">
</span><span class="c1">#&gt;  3 p003                3     8</span><span class="w">
</span><span class="c1">#&gt;  4 p004                1    NA</span><span class="w">
</span><span class="c1">#&gt;  5 p004                2     9</span><span class="w">
</span><span class="c1">#&gt;  6 p002                1     9</span><span class="w">
</span><span class="c1">#&gt;  7 p002                2     8</span><span class="w">
</span><span class="c1">#&gt;  8 p001                1    10</span><span class="w">
</span><span class="c1">#&gt;  9 p001                2    10</span><span class="w">
</span><span class="c1">#&gt; 10 p001                3    10</span><span class="w">
</span></code></pre></div></div>

<h3 id="bonus-match-in-disguise">Bonus: <code class="language-plaintext highlighter-rouge">match()</code> <code class="language-plaintext highlighter-rouge">%in%</code> disguise</h3>

<p>What happens when <code class="language-plaintext highlighter-rouge">match()</code> fails to find an <code class="language-plaintext highlighter-rouge">x</code> in the table? By
default, we get <code class="language-plaintext highlighter-rouge">NA</code>. But we can customize the results with the
<code class="language-plaintext highlighter-rouge">nomatch</code> argument.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">match</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] NA  1 12</span><span class="w">
</span><span class="n">match</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">,</span><span class="w"> </span><span class="n">nomatch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-99</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] -99   1  12</span><span class="w">
</span><span class="n">match</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">,</span><span class="w"> </span><span class="n">nomatch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1]  0  1 12</span><span class="w">
</span></code></pre></div></div>

<p>If we do something like this last example, then we can check whether an
element in <code class="language-plaintext highlighter-rouge">x</code> has a match by checking for numbers greater than 0.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">match</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">,</span><span class="w"> </span><span class="n">nomatch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="c1">#&gt; [1] FALSE  TRUE  TRUE</span><span class="w">
</span></code></pre></div></div>

<p>And that is how the functions <a href="https://rdrr.io/r/base/match.html"><code class="language-plaintext highlighter-rouge">%in%</code></a> and <a href="https://rdrr.io/r/base/sets.html"><code class="language-plaintext highlighter-rouge">is.element()</code></a> are implemented
behind the scenes:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">)</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="nb">LETTERS</span><span class="w">
</span><span class="c1">#&gt; [1] FALSE  TRUE  TRUE</span><span class="w">

</span><span class="c1"># The 0L means it's an integer number instead of floating point number</span><span class="w">
</span><span class="n">`%in%`</span><span class="w">
</span><span class="c1">#&gt; function (x, table) </span><span class="w">
</span><span class="c1">#&gt; match(x, table, nomatch = 0L) &gt; 0L</span><span class="w">
</span><span class="c1">#&gt; &lt;bytecode: 0x0000019f10fbf0a0&gt;</span><span class="w">
</span><span class="c1">#&gt; &lt;environment: namespace:base&gt;</span><span class="w">

</span><span class="n">is.element</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] FALSE  TRUE  TRUE</span><span class="w">

</span><span class="n">is.element</span><span class="w">
</span><span class="c1">#&gt; function (el, set) </span><span class="w">
</span><span class="c1">#&gt; match(as.vector(el), as.vector(set), 0L) &gt; 0L</span><span class="w">
</span><span class="c1">#&gt; &lt;bytecode: 0x0000019f13c60db0&gt;</span><span class="w">
</span><span class="c1">#&gt; &lt;environment: namespace:base&gt;</span><span class="w">
</span></code></pre></div></div>

<hr />

<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2021-10-12-one-liner-to-generate-ids.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:si" role="doc-endnote">

      <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#&gt; ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  setting  value</span><span class="w">
</span><span class="c1">#&gt;  version  R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#&gt;  os       Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#&gt;  system   x86_64, mingw32</span><span class="w">
</span><span class="c1">#&gt;  ui       RTerm</span><span class="w">
</span><span class="c1">#&gt;  language (EN)</span><span class="w">
</span><span class="c1">#&gt;  collate  English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  ctype    English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  tz       America/Chicago</span><span class="w">
</span><span class="c1">#&gt;  date     2022-05-27</span><span class="w">
</span><span class="c1">#&gt;  pandoc   NA</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  package     * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#&gt;  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  backports     1.4.1   2021-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  broom         0.8.0   2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cli           3.3.0   2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  colorspace    2.0-3   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  dbplyr        2.1.1   2021-04-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  dplyr       * 1.0.9   2022-04-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  generics      0.1.2   2022-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ggplot2     * 3.3.6   2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  git2r         0.30.1  2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  haven         2.5.0   2022-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  here          1.0.1   2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  hms           1.1.1   2021-09-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  httr          1.4.3   2022-05-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ids           1.0.1   2017-05-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  jsonlite      1.8.0   2022-02-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  knitr       * 1.39    2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  lubridate     1.8.0   2021-10-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  modelr        0.1.8   2020-05-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ragg          1.2.2   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  readr       * 2.1.2   2022-01-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  readxl        1.4.0   2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rvest         1.0.2   2021-10-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  scales        1.2.0   2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  systemfonts   1.0.4   2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  textshaping   0.3.6   2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tibble      * 3.1.7   2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tidyr       * 1.2.0   2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tidyverse   * 1.3.1   2021-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  uuid          1.1-0   2022-04-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  xfun          0.31    2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  xml2          1.3.3   2021-11-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;  [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#&gt;  [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div>      </div>
      <p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Tristan Mahr</name><email>tjmahrweb@gmail.com</email></author><category term="r" /><summary type="html"><![CDATA[Find a `match()` in your base R library]]></summary></entry><entry><title type="html">Keep your R scripts locally sourced</title><link href="https://tjmahr.github.io/keep-it-locally-sourced/" rel="alternate" type="text/html" title="Keep your R scripts locally sourced" /><published>2021-08-16T00:00:00-05:00</published><updated>2021-08-16T00:00:00-05:00</updated><id>https://tjmahr.github.io/keep-it-locally-sourced</id><content type="html" xml:base="https://tjmahr.github.io/keep-it-locally-sourced/"><![CDATA[<p>A few weeks ago, I had a <em>bad</em> debugging session. The code was just not
doing what I expected, and I went down a lot of deadends trying to fix
or simplify things. I could not get the problem to happen in a
reproducible example (<a href="https://reprex.tidyverse.org/">reprex</a>) or
interactively (in RStudio). Eventually, the most minimal example of the
problem completely broke my mental model for how the code should work.</p>

<p>The problem had to do with names and what they mean. <code class="language-plaintext highlighter-rouge">select()</code> is a
function the lives in the MASS package and the dplyr package, and I
always intend for <code class="language-plaintext highlighter-rouge">select()</code> to point to
<a href="https://dplyr.tidyverse.org/reference/select.html"><code class="language-plaintext highlighter-rouge">dplyr::select()</code></a>.
But sometimes a statistics package will load in MASS and overwrite
<code class="language-plaintext highlighter-rouge">select()</code> to point to
<a href="https://rdrr.io/pkg/MASS/man/lm.ridge.html"><code class="language-plaintext highlighter-rouge">MASS::select()</code></a>. And in
this case, my attempts to use <code class="language-plaintext highlighter-rouge">select()</code> in a
<a href="https://rdrr.io/r/base/source.html"><code class="language-plaintext highlighter-rouge">source()</code></a>-ed file kept reverting
to <code class="language-plaintext highlighter-rouge">MASS::select()</code> instead of <code class="language-plaintext highlighter-rouge">dplyr::select()</code>. A tweet from the
session shows the minimal example and my wracked brain. (I will describe
the example in more detail below.)</p>

<blockquote class="twitter-tweet" data-conversation="none" data-lang="en" data-dnt="true" data-theme="light">
<p lang="en" dir="ltr">i&#39;m dry heaving here wtf is going <a href="https://t.co/KIeRJT6kwY">pic.twitter.com/KIeRJT6kwY</a></p>

  <img src="/assets/images/2021-08-wtf-debugging.jpg" width="60%" alt="Code/output where I map `select` to `dplyr::select`, create a file with one function that prints the environment of `select`, print `select` (namespace:dplyr), call the function (namespace:MASS), and print `select` (namespace:dplyr)" />
  <br />
  &mdash; tj mahr 🍍🍕 (@tjmahr) <a href="https://twitter.com/tjmahr/status/1417894498080800769?ref_src=twsrc%5Etfw">July 21, 2021</a>
</blockquote>

<p>Here’s what happens:</p>

<ol>
  <li>I explicitly assign <code class="language-plaintext highlighter-rouge">select</code> to <code class="language-plaintext highlighter-rouge">dplyr::select()</code>.</li>
  <li>I make a function <code class="language-plaintext highlighter-rouge">f()</code> that prints the environment of <code class="language-plaintext highlighter-rouge">select</code>
(where the name/function is defined), store the function in a <code class="language-plaintext highlighter-rouge">.R</code>
text file and <code class="language-plaintext highlighter-rouge">source()</code> in the text file. (<code class="language-plaintext highlighter-rouge">source()</code> runs the code
in an R script.)</li>
  <li>I print the value of <code class="language-plaintext highlighter-rouge">select</code> and see that it is indeed from the
dplyr environment.</li>
  <li>I call my function, and it says that <code class="language-plaintext highlighter-rouge">select</code> is actually in the
MASS package.</li>
  <li>I check the value of <code class="language-plaintext highlighter-rouge">select</code>, and it reports the dplyr environment
once again.</li>
</ol>

<h2 id="a-similar-problem-using-functions">A similar problem using functions</h2>

<p>This problem only happened while knitting <a href="https://github.com/tjmahr/notestar" title="My notebook system">one of my analysis
notebooks</a> (which was a clue). Right now, it’s proving
difficult for me to write examples of this problem for this blogpost, so
I’m going to show the source 😉 of the problem using functions.</p>

<p>First, let’s set up things so that <code class="language-plaintext highlighter-rouge">select</code> belongs to the MASS package.
We are also going to use the <a href="https://conflicted.r-lib.org/" title="conflicted: An Alternative Conflict Resolution Strategy">conflicted</a> package which normally prevents
package name <em>conflicts</em> from happening. This part isn’t necessary or
helpful; I just want to illustrate that this is not a simple name
conflict problem.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">conflicted</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">MASS</span><span class="p">)</span><span class="w">
</span><span class="n">environment</span><span class="p">(</span><span class="n">select</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; &lt;environment: namespace:MASS&gt;</span><span class="w">
</span></code></pre></div></div>

<p>We are going to make a function that does what my original code example
tried to do:</p>

<ul>
  <li>set <code class="language-plaintext highlighter-rouge">select</code> to dplyr explicitly</li>
  <li><code class="language-plaintext highlighter-rouge">source()</code> in a file that gives the environment of <code class="language-plaintext highlighter-rouge">select</code></li>
  <li>return the environment of <code class="language-plaintext highlighter-rouge">select</code>, both using the <code class="language-plaintext highlighter-rouge">source()</code>-ed
function and directly.</li>
</ul>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">source_in_my_code</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">...</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="c1"># set dplyr select</span><span class="w">
  </span><span class="n">select</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">dplyr</span><span class="o">::</span><span class="n">select</span><span class="w">
  
  </span><span class="c1"># write a script to temporary file</span><span class="w">
  </span><span class="n">temp_script</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tempfile</span><span class="p">(</span><span class="n">fileext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">".R"</span><span class="p">)</span><span class="w">
  </span><span class="n">my_code</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"
    f &lt;- function() environment(select)
  "</span><span class="w">
  </span><span class="n">writeLines</span><span class="p">(</span><span class="n">my_code</span><span class="p">,</span><span class="w"> </span><span class="n">temp_script</span><span class="p">)</span><span class="w">
  
  </span><span class="c1"># run the script</span><span class="w">
  </span><span class="n">source</span><span class="p">(</span><span class="n">temp_script</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w">
  
  </span><span class="nf">list</span><span class="p">(</span><span class="w">
    </span><span class="n">source_select_environment</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">f</span><span class="p">(),</span><span class="w">
    </span><span class="n">function_select_environment</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">environment</span><span class="p">(</span><span class="n">select</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">


</span><span class="n">default_results</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">source_in_my_code</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>

<p>What do you think the <code class="language-plaintext highlighter-rouge">select</code> environment should be? dplyr, right?
That’s what <code class="language-plaintext highlighter-rouge">select</code> means everywhere else inside of the function.
<code class="language-plaintext highlighter-rouge">source()</code> is just like dropping in some R code and running it, right?
That’s what I thought.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">default_results</span><span class="w">
</span><span class="c1">#&gt; $source_select_environment</span><span class="w">
</span><span class="c1">#&gt; &lt;environment: namespace:MASS&gt;</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; $function_select_environment</span><span class="w">
</span><span class="c1">#&gt; &lt;environment: namespace:dplyr&gt;</span><span class="w">
</span></code></pre></div></div>

<p>No, it’s the MASS environment. 😕</p>

<h2 id="local-and-parent-environments">Local and parent environments</h2>

<p>In order to understand what’s happening, let’s first note that R works
by evaluating expressions in an environment. The environment defines the
values of names. If a name is not found in an environment, R searches
parent environment for the name (or the parent’s parent, and so on).
This idea is <a href="https://adv-r.hadley.nz/environments.html#parents">illustrated beautifully in <em>Advanced R</em> using
diagrams</a>.</p>

<p>For an analogy, you might think of environments as looking up someone in
an office, a building directory, then an area directory:</p>

<blockquote class="twitter-tweet" data-conversation="none" data-lang="en" data-dnt="true" data-theme="light">
<p lang="en" dir="ltr">I like the multi-company building analogy. If you want to call Jim, first you look in your company directory. If there isn’t a Jim there, you look in the all-building maintenance dir. If not there, you look in the city services dir. You don’t look in another company-specific dir
  </p>
  &mdash; Brenton Wiernik 🏳️‍🌈 (@bmwiernik) <a href="https://twitter.com/bmwiernik/status/1387164714451488772?ref_src=twsrc%5Etfw">April 27, 2021</a>
</blockquote>

<p>Here is small example showing a local function environment, its
parent environment and how a name will take different values depending on
the context.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">where_am_i</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"outside of the function"</span><span class="w">
</span><span class="n">where_are_you</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"outside of the function too"</span><span class="w">

</span><span class="n">where_is_everyone</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">where_am_i</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"inside of the function"</span><span class="w">
  </span><span class="nf">list</span><span class="p">(</span><span class="w">
    </span><span class="n">where_am_i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">where_am_i</span><span class="p">,</span><span class="w">
    </span><span class="n">where_are_you</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">where_are_you</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w"> 

</span><span class="n">where_am_i</span><span class="w">
</span><span class="c1">#&gt; [1] "outside of the function"</span><span class="w">
</span><span class="n">where_is_everyone</span><span class="p">()</span><span class="w">
</span><span class="c1">#&gt; $where_am_i</span><span class="w">
</span><span class="c1">#&gt; [1] "inside of the function"</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; $where_are_you</span><span class="w">
</span><span class="c1">#&gt; [1] "outside of the function too"</span><span class="w">
</span><span class="n">where_am_i</span><span class="w">
</span><span class="c1">#&gt; [1] "outside of the function"</span><span class="w">
</span></code></pre></div></div>

<p>Outside of the function, <code class="language-plaintext highlighter-rouge">where_am_i</code> is <code class="language-plaintext highlighter-rouge">"outside of the function"</code>,
but in the body of the function, it is defined to <code class="language-plaintext highlighter-rouge">"inside of the
function"</code>. The variable <code class="language-plaintext highlighter-rouge">where_are_you</code> is <em>only</em> defined <code class="language-plaintext highlighter-rouge">"out of the
function too"</code>, so the function has to search for the variable in its
parent environment.</p>

<blockquote class="twitter-tweet" data-conversation="none" data-lang="en" data-dnt="true" data-theme="light">
<p lang="en" dir="ltr">&quot;parent&quot; environment suggests a family metaphor. if you cant find what a symbol means, ask a parent.</p>
  &mdash; tj mahr 🍍🍕 (@tjmahr) <a href="https://twitter.com/tjmahr/status/1387087953982328833?ref_src=twsrc%5Etfw">April 27, 2021</a>
</blockquote>

<h2 id="locally-sourced-r-code">Locally sourced R code</h2>

<p>Reading the <a href="https://rdrr.io/r/base/source.html">documentation to <code class="language-plaintext highlighter-rouge">source()</code></a>, we find the solution to the
original problem:</p>

<blockquote>
  <p><strong>Arguments</strong></p>

  <p><strong><code class="language-plaintext highlighter-rouge">local</code></strong> <br />
<code class="language-plaintext highlighter-rouge">TRUE</code>, <code class="language-plaintext highlighter-rouge">FALSE</code> or an environment, determining where the parsed
expressions are evaluated. <code class="language-plaintext highlighter-rouge">FALSE</code> (the default) corresponds to the
user’s workspace (the global environment) and <code class="language-plaintext highlighter-rouge">TRUE</code> to the
environment from which <code class="language-plaintext highlighter-rouge">source</code> is called.</p>
</blockquote>

<p>By default, the code evaluated by <code class="language-plaintext highlighter-rouge">source()</code> runs in the global
environment–that is, “outside” of the body of the function. The code
<em>breaks out</em> of the function environment and runs at the higher
environment.</p>

<p>My mental model for <code class="language-plaintext highlighter-rouge">source()</code> was completely wrong. <code class="language-plaintext highlighter-rouge">source()</code> is not
like dropping in the R code from a file and running it. It is more like
pausing everything that you’re doing in your current context, backing
out to the highest level context, running that code, and then resuming
what you’re doing.</p>

<p>Fortunately, if we ask source to run locally (<code class="language-plaintext highlighter-rouge">local = TRUE</code>), <code class="language-plaintext highlighter-rouge">select</code>
has the same environment inside the function and in the code run using
<code class="language-plaintext highlighter-rouge">source()</code>.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># I defined the function so it could pass arguments to source()</span><span class="w">
</span><span class="n">source_in_my_code</span><span class="p">(</span><span class="n">local</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; $source_select_environment</span><span class="w">
</span><span class="c1">#&gt; &lt;environment: namespace:dplyr&gt;</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; $function_select_environment</span><span class="w">
</span><span class="c1">#&gt; &lt;environment: namespace:dplyr&gt;</span><span class="w">
</span></code></pre></div></div>

<p>When we’re using <code class="language-plaintext highlighter-rouge">source()</code> as one of the first few lines of an R
script, the default global environment for <code class="language-plaintext highlighter-rouge">source()</code> doesn’t really
matter. But in contexts like the function example or code stored in a
custom knitr/RMarkdown setup (my original problem), this difference <em>is</em>
a problem. Therefore, in the future, I’m going to abide by the motto
<em>Keep it locally sourced</em>. This way fits my mental model for <code class="language-plaintext highlighter-rouge">source()</code>
as something that drops in R code and runs it in place.</p>

<p>And by the way, yes, even though I cited <em>Advanced R</em> above, I clearly
did not do all of the exercises:</p>

<blockquote>
  <p><a href="https://adv-r.hadley.nz/evaluation.html#exercises-61">20.2.4 Exercises</a></p>

  <ol>
    <li>Carefully read the documentation for <code class="language-plaintext highlighter-rouge">source()</code>. What environment
does it use by default? What if you supply <code class="language-plaintext highlighter-rouge">local = TRUE</code>? How do
you provide a custom environment?</li>
  </ol>
</blockquote>

<hr />

<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2021-08-16-keep-it-locally-sourced.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:si" role="doc-endnote">

      <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#&gt; ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  setting  value</span><span class="w">
</span><span class="c1">#&gt;  version  R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#&gt;  os       Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#&gt;  system   x86_64, mingw32</span><span class="w">
</span><span class="c1">#&gt;  ui       RTerm</span><span class="w">
</span><span class="c1">#&gt;  language (EN)</span><span class="w">
</span><span class="c1">#&gt;  collate  English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  ctype    English_United States.utf8</span><span class="w">
</span><span class="c1">#&gt;  tz       America/Chicago</span><span class="w">
</span><span class="c1">#&gt;  date     2022-05-27</span><span class="w">
</span><span class="c1">#&gt;  pandoc   NA</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#&gt;  package     * version    date (UTC) lib source</span><span class="w">
</span><span class="c1">#&gt;  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cachem        1.0.6      2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  cli           3.3.0      2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  conflicted  * 1.1.0      2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  crayon        1.5.1      2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  DBI           1.1.2      2021-12-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  dplyr         1.0.9      2022-04-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  emo           0.0.0.9000 2022-05-25 [1] Github (hadley/emo@3f03b11)</span><span class="w">
</span><span class="c1">#&gt;  evaluate      0.15       2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fansi         1.0.3      2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  generics      0.1.2      2022-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  git2r         0.30.1     2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  glue          1.6.2      2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  here          1.0.1      2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  knitr       * 1.39       2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  lubridate     1.8.0      2021-10-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  MASS        * 7.3-56     2022-03-23 [2] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  memoise       2.0.1      2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pillar        1.7.0      2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  R6            2.5.1      2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  ragg          1.2.2      2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rlang         1.0.2      2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rprojroot     2.0.3      2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringi       1.7.6      2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  systemfonts   1.0.4      2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  textshaping   0.3.6      2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tibble        3.1.7      2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  tidyselect    1.1.2      2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  vctrs         0.4.1      2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt;  xfun          0.31       2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt;  [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#&gt;  [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#&gt; </span><span class="w">
</span><span class="c1">#&gt; ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div>      </div>
      <p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Tristan Mahr</name><email>tjmahrweb@gmail.com</email></author><category term="r" /><category term="knitr" /><category term="nonstandard evaluation" /><summary type="html"><![CDATA[A lesson from debugging `source()`]]></summary></entry></feed>