Jekyll2024-02-02T13:54:29-06:00https://tjmahr.github.io/feed.xmlHigher Order FunctionsBlog and research notebook by an R programming enthusiastTristan Mahrtjmahrweb@gmail.comOrdering constraints in brms using contrast coding2023-07-03T00:00:00-05:002023-07-03T00:00:00-05:00https://tjmahr.github.io/bayesian-ordering-constraint<p>Mattan S. Ben-Shachar wrote an <a href="https://blog.msbstats.info/posts/2023-06-26-order-constraints-in-brms/" title="Order Constraints in Bayes Models (with brms)">excellent tutorial</a>
about how to impose ordering constraints in Bayesian regression models.
In that post, the data comes from archaeology (inspired by
<a href="https://arxiv.org/abs/1704.07141">Buck, 2017</a> but not an exact copy).
We have samples from different layers (<code class="language-plaintext highlighter-rouge">Layer</code>) in a site, and for each
sample, we have a <code class="language-plaintext highlighter-rouge">C14</code> radiocarbon date measurement and its associated
measurement <code class="language-plaintext highlighter-rouge">error</code>.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">table1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tribble</span><span class="p">(</span><span class="w">
</span><span class="o">~</span><span class="n">Layer</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">C14</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="n">error</span><span class="p">,</span><span class="w">
</span><span class="s2">"B"</span><span class="p">,</span><span class="w"> </span><span class="m">-5773</span><span class="p">,</span><span class="w"> </span><span class="m">30</span><span class="p">,</span><span class="w">
</span><span class="s2">"B"</span><span class="p">,</span><span class="w"> </span><span class="m">-5654</span><span class="p">,</span><span class="w"> </span><span class="m">30</span><span class="p">,</span><span class="w">
</span><span class="s2">"B"</span><span class="p">,</span><span class="w"> </span><span class="m">-5585</span><span class="p">,</span><span class="w"> </span><span class="m">30</span><span class="p">,</span><span class="w">
</span><span class="s2">"C"</span><span class="p">,</span><span class="w"> </span><span class="m">-5861</span><span class="p">,</span><span class="w"> </span><span class="m">30</span><span class="p">,</span><span class="w">
</span><span class="s2">"C"</span><span class="p">,</span><span class="w"> </span><span class="m">-5755</span><span class="p">,</span><span class="w"> </span><span class="m">30</span><span class="p">,</span><span class="w">
</span><span class="s2">"E"</span><span class="p">,</span><span class="w"> </span><span class="m">-5850</span><span class="p">,</span><span class="w"> </span><span class="m">50</span><span class="p">,</span><span class="w">
</span><span class="s2">"E"</span><span class="p">,</span><span class="w"> </span><span class="m">-5928</span><span class="p">,</span><span class="w"> </span><span class="m">50</span><span class="p">,</span><span class="w">
</span><span class="s2">"E"</span><span class="p">,</span><span class="w"> </span><span class="m">-5905</span><span class="p">,</span><span class="w"> </span><span class="m">50</span><span class="p">,</span><span class="w">
</span><span class="s2">"G"</span><span class="p">,</span><span class="w"> </span><span class="m">-6034</span><span class="p">,</span><span class="w"> </span><span class="m">30</span><span class="p">,</span><span class="w">
</span><span class="s2">"G"</span><span class="p">,</span><span class="w"> </span><span class="m">-6184</span><span class="p">,</span><span class="w"> </span><span class="m">30</span><span class="p">,</span><span class="w">
</span><span class="s2">"I"</span><span class="p">,</span><span class="w"> </span><span class="m">-6248</span><span class="p">,</span><span class="w"> </span><span class="m">50</span><span class="p">,</span><span class="w">
</span><span class="s2">"I"</span><span class="p">,</span><span class="w"> </span><span class="m">-6350</span><span class="p">,</span><span class="w"> </span><span class="m">50</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">table1</span><span class="o">$</span><span class="n">Layer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">table1</span><span class="o">$</span><span class="n">Layer</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>Because of how the layers are ordered—new stuff piled on top of older
stuff—we <em>a priori</em> expect deeper layers to have older dates, so these
are the ordering constraints:</p>
\[\mu_{\text{Layer I}} < \mu_{\text{Layer G}} < \mu_{\text{Layer E}} < \mu_{\text{Layer C}} < \mu_{\text{Layer B}}\]
<p>where <em>μ</em> is the average <code class="language-plaintext highlighter-rouge">C14</code> age of a layer.</p>
<p>Ben-Shachar’s post works through some ways in brms to achieve this
constraint:</p>
<ol>
<li>
<p>Fit the usual model but filter out posterior draws where the
ordering constraint is violated.</p>
</li>
<li>
<p>Have the Stan sampler <code class="language-plaintext highlighter-rouge">reject</code> draws where the constraint is
violated. But note that the <a href="https://mc-stan.org/docs/reference-manual/reject-statements.html" title="Stan Manual: Reject statements">documentation for
<code class="language-plaintext highlighter-rouge">reject</code></a> has a section titled “Rejection is not for
constraints”.</p>
</li>
<li>
<p>Use brms’s monotonic effect <a href="https://paul-buerkner.github.io/brms/articles/brms_monotonic.html" title="Estimating Monotonic Effects with brms"><code class="language-plaintext highlighter-rouge">mo()</code></a> syntax.</p>
</li>
</ol>
<p>In this post, I am going to add another option to this list:</p>
<ol start="4"><li> Use contrast coding so the model parameters
represent the differences between successive levels, and use priors to enforce
the ordering constraint.</li></ol>
<h2 id="big-idea-of-contrast-coding">Big idea of contrast coding</h2>
<p>When our model includes categorical variables, we need some way to code
those variables in our model (that is, use numbers to represent the
category levels). Our choice of coding scheme will change the meaning of
the model parameters, allowing us to perform different comparisons (test
different statistical hypotheses) about the means of the category
levels. Let’s spell that out again, because it is the big idea of the
contrast coding:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>different contrast coding schemes <->
different parameter meanings <->
different comparisons / hypotheses
</code></pre></div></div>
<p>(Isn’t that an eye-popping graphic?)</p>
<p>The toolbox of contrast coding schemes is deep but also confusing.
Whenever I step away from R’s default contrast coding, I usually have
these pages open to help me: <a href="https://stats.oarc.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/" title="https://stats.oarc.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/">some tutorial on a UCLA
page</a>, Lisa DeBruine’s <a href="https://debruine.github.io/faux/articles/contrasts.html">comparison
article</a>, and the menu of <a href="https://rdrr.io/pkg/emmeans/man/emmc-functions.html">contrast schemes in
emmeans</a>. So, let’s
review the basics by looking at R’s default contrast coding scheme.</p>
<h2 id="the-default-dummy-coding">The default: dummy coding</h2>
<p>By default, R will code categorical variables in a regression model
using “treatment” or “dummy” coding. In this scheme,</p>
<ul>
<li>The intercept is the mean of one of the category levels (the
<em>reference level</em>)</li>
<li>Parameters estimate the difference between each other level and the
reference level</li>
</ul>
<p>Let’s fit a simple linear model and work through the parameter meanings:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Layer</span><span class="p">,</span><span class="w"> </span><span class="n">table1</span><span class="p">)</span><span class="w">
</span><span class="n">coef</span><span class="p">(</span><span class="n">m1</span><span class="p">)</span><span class="w">
</span><span class="c1">#> (Intercept) LayerC LayerE LayerG LayerI </span><span class="w">
</span><span class="c1">#> -5670.6667 -137.3333 -223.6667 -438.3333 -628.3333</span><span class="w">
</span></code></pre></div></div>
<p>Here, the <code class="language-plaintext highlighter-rouge">(Intercept)</code> is the mean of the reference level, and the
reference level is the level of the categorical variable not listed in
the other parameter names (<code class="language-plaintext highlighter-rouge">LayerB</code>). Each of the other parameters is a
difference from that reference level. Layer C’s mean is <code class="language-plaintext highlighter-rouge">(Intercept)</code> +
<code class="language-plaintext highlighter-rouge">LayerC</code>. The <a href="https://rdrr.io/r/stats/model.matrix.html"><code class="language-plaintext highlighter-rouge">model.matrix()</code></a> shows how these
categorical variables are coded in the model’s design/contrast matrix:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Matrix has 1 row per observation but we just want 1 per category level</span><span class="w">
</span><span class="n">mat_m1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">m1</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">model.matrix</span><span class="p">()</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">unique</span><span class="p">()</span><span class="w">
</span><span class="n">mat_m1</span><span class="w">
</span><span class="c1">#> (Intercept) LayerC LayerE LayerG LayerI</span><span class="w">
</span><span class="c1">#> 1 1 0 0 0 0</span><span class="w">
</span><span class="c1">#> 4 1 1 0 0 0</span><span class="w">
</span><span class="c1">#> 6 1 0 1 0 0</span><span class="w">
</span><span class="c1">#> 9 1 0 0 1 0</span><span class="w">
</span><span class="c1">#> 11 1 0 0 0 1</span><span class="w">
</span></code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">(Intercept)</code> is the model constant, so naturally, it’s switched on
(equals 1) for every row. Each of the other columns are <em>indicator
variables</em>. <code class="language-plaintext highlighter-rouge">layerC</code> turns on for the layer C rows, <code class="language-plaintext highlighter-rouge">layerE</code> turns on
for layer E rows, and so on.</p>
<p>Matrix multiplying the contrast matrix by the model coefficients will
compute the mean values of each layer.</p>
\[\mathbf{\hat y} = \mathbf{X}\boldsymbol{\beta}\]
<p>Think of this equation as a contract for a contrast coding scheme:
Multiplying the contrast matrix by the model coefficients should give us
the means of the category levels.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mat_m1</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">coef</span><span class="p">(</span><span class="n">m1</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [,1]</span><span class="w">
</span><span class="c1">#> 1 -5670.667</span><span class="w">
</span><span class="c1">#> 4 -5808.000</span><span class="w">
</span><span class="c1">#> 6 -5894.333</span><span class="w">
</span><span class="c1">#> 9 -6109.000</span><span class="w">
</span><span class="c1">#> 11 -6299.000</span><span class="w">
</span><span class="c1"># Means by hand</span><span class="w">
</span><span class="n">aggregate</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">Layer</span><span class="p">,</span><span class="w"> </span><span class="n">table1</span><span class="p">,</span><span class="w"> </span><span class="n">mean</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Layer C14</span><span class="w">
</span><span class="c1">#> 1 B -5670.667</span><span class="w">
</span><span class="c1">#> 2 C -5808.000</span><span class="w">
</span><span class="c1">#> 3 E -5894.333</span><span class="w">
</span><span class="c1">#> 4 G -6109.000</span><span class="w">
</span><span class="c1">#> 5 I -6299.000</span><span class="w">
</span></code></pre></div></div>
<p>If the matrix multiplication is too quick, here it is in slow motion
where each row has been weighted (multiplied) by coefficients:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Sums of the rows are the means</span><span class="w">
</span><span class="n">mat_m1</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">diag</span><span class="p">(</span><span class="n">coef</span><span class="p">(</span><span class="n">m1</span><span class="p">))</span><span class="w">
</span><span class="c1">#> [,1] [,2] [,3] [,4] [,5]</span><span class="w">
</span><span class="c1">#> 1 -5670.667 0.0000 0.0000 0.0000 0.0000</span><span class="w">
</span><span class="c1">#> 4 -5670.667 -137.3333 0.0000 0.0000 0.0000</span><span class="w">
</span><span class="c1">#> 6 -5670.667 0.0000 -223.6667 0.0000 0.0000</span><span class="w">
</span><span class="c1">#> 9 -5670.667 0.0000 0.0000 -438.3333 0.0000</span><span class="w">
</span><span class="c1">#> 11 -5670.667 0.0000 0.0000 0.0000 -628.3333</span><span class="w">
</span></code></pre></div></div>
<h3 id="successive-differences-coding">Successive differences coding</h3>
<p>Now, let’s look at a different kind of coding: (reverse) successive differences
coding. In this scheme:</p>
<ul>
<li>The intercept is the mean of the levels means</li>
<li>Parameters estimate the difference between adjacent levels</li>
<li>but I have to reverse how the levels are ordered in the underlying
<a href="https://rdrr.io/r/base/factor.html"><code class="language-plaintext highlighter-rouge">factor()</code></a> so that the differences are positive, comparing each
layer with the one <em>below</em> it. (<code class="language-plaintext highlighter-rouge">LayerB - LayerC</code> should be positive).</li>
</ul>
<p>We apply this coding by creating a new factor and setting the
<a href="https://rdrr.io/r/stats/contrasts.html"><code class="language-plaintext highlighter-rouge">contrast()</code></a>. R lets us set
the contrast to the name of a function that computes contrasts, so
we use <code class="language-plaintext highlighter-rouge">"contr.sdif"</code>.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">contr.sdif</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">MASS</span><span class="o">::</span><span class="n">contr.sdif</span><span class="w">
</span><span class="c1"># Reverse the factor levels</span><span class="w">
</span><span class="n">table1</span><span class="o">$</span><span class="n">LayerAlt</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">table1</span><span class="o">$</span><span class="n">Layer</span><span class="p">,</span><span class="w"> </span><span class="n">rev</span><span class="p">(</span><span class="n">levels</span><span class="p">(</span><span class="n">table1</span><span class="o">$</span><span class="n">Layer</span><span class="p">)))</span><span class="w">
</span><span class="n">contrasts</span><span class="p">(</span><span class="n">table1</span><span class="o">$</span><span class="n">LayerAlt</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"contr.sdif"</span><span class="w">
</span></code></pre></div></div>
<p>Then we just fit the model as usual. As intended, the model’s
coefficients are different.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">LayerAlt</span><span class="p">,</span><span class="w"> </span><span class="n">table1</span><span class="p">)</span><span class="w">
</span><span class="n">coef</span><span class="p">(</span><span class="n">m2</span><span class="p">)</span><span class="w">
</span><span class="c1">#> (Intercept) LayerAltG-I LayerAltE-G LayerAltC-E LayerAltB-C </span><span class="w">
</span><span class="c1">#> -5956.20000 190.00000 214.66667 86.33333 137.33333</span><span class="w">
</span></code></pre></div></div>
<p>We can compute the mean of layer means and the layer differences by hand
to confirm that the model parameters are computing what we expect.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Make a list so we can write out the diffs easily</span><span class="w">
</span><span class="n">layer_means</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">table1</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">split</span><span class="p">(</span><span class="o">~</span><span class="w"> </span><span class="n">Layer</span><span class="p">)</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">lapply</span><span class="p">(</span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="n">mean</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">C14</span><span class="p">))</span><span class="w">
</span><span class="n">str</span><span class="p">(</span><span class="n">layer_means</span><span class="p">)</span><span class="w">
</span><span class="c1">#> List of 5</span><span class="w">
</span><span class="c1">#> $ B: num -5671</span><span class="w">
</span><span class="c1">#> $ C: num -5808</span><span class="w">
</span><span class="c1">#> $ E: num -5894</span><span class="w">
</span><span class="c1">#> $ G: num -6109</span><span class="w">
</span><span class="c1">#> $ I: num -6299</span><span class="w">
</span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">model_coef</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">coef</span><span class="p">(</span><span class="n">m2</span><span class="p">),</span><span class="w">
</span><span class="n">by_hand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="n">mean</span><span class="p">(</span><span class="n">unlist</span><span class="p">(</span><span class="n">layer_means</span><span class="p">)),</span><span class="w">
</span><span class="n">layer_means</span><span class="o">$</span><span class="n">G</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">layer_means</span><span class="o">$</span><span class="n">I</span><span class="p">,</span><span class="w">
</span><span class="n">layer_means</span><span class="o">$</span><span class="n">E</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">layer_means</span><span class="o">$</span><span class="n">G</span><span class="p">,</span><span class="w">
</span><span class="n">layer_means</span><span class="o">$</span><span class="n">C</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">layer_means</span><span class="o">$</span><span class="n">E</span><span class="p">,</span><span class="w">
</span><span class="n">layer_means</span><span class="o">$</span><span class="n">B</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">layer_means</span><span class="o">$</span><span class="n">C</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1">#> model_coef by_hand</span><span class="w">
</span><span class="c1">#> (Intercept) -5956.20000 -5956.20000</span><span class="w">
</span><span class="c1">#> LayerAltG-I 190.00000 190.00000</span><span class="w">
</span><span class="c1">#> LayerAltE-G 214.66667 214.66667</span><span class="w">
</span><span class="c1">#> LayerAltC-E 86.33333 86.33333</span><span class="w">
</span><span class="c1">#> LayerAltB-C 137.33333 137.33333</span><span class="w">
</span></code></pre></div></div>
<p>Back to our contrast coding contract, we see that the contrast matrix
matrix-multiplied by the model coefficients gives us the level means.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mat_m2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">unique</span><span class="p">(</span><span class="n">model.matrix</span><span class="p">(</span><span class="n">m2</span><span class="p">))</span><span class="w">
</span><span class="n">mat_m2</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">coef</span><span class="p">(</span><span class="n">m2</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [,1]</span><span class="w">
</span><span class="c1">#> 1 -5670.667</span><span class="w">
</span><span class="c1">#> 4 -5808.000</span><span class="w">
</span><span class="c1">#> 6 -5894.333</span><span class="w">
</span><span class="c1">#> 9 -6109.000</span><span class="w">
</span><span class="c1">#> 11 -6299.000</span><span class="w">
</span><span class="c1"># By hand</span><span class="w">
</span><span class="n">aggregate</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">Layer</span><span class="p">,</span><span class="w"> </span><span class="n">table1</span><span class="p">,</span><span class="w"> </span><span class="n">mean</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Layer C14</span><span class="w">
</span><span class="c1">#> 1 B -5670.667</span><span class="w">
</span><span class="c1">#> 2 C -5808.000</span><span class="w">
</span><span class="c1">#> 3 E -5894.333</span><span class="w">
</span><span class="c1">#> 4 G -6109.000</span><span class="w">
</span><span class="c1">#> 5 I -6299.000</span><span class="w">
</span></code></pre></div></div>
<p>It’s so clean and simple. We still get the level means and the
parameters estimate specific comparisons of interest to us. So, how are
the categorical variables and their differences coded in the model’s
contrast matrix?</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mat_m2</span><span class="w">
</span><span class="c1">#> (Intercept) LayerAltG-I LayerAltE-G LayerAltC-E LayerAltB-C</span><span class="w">
</span><span class="c1">#> 1 1 0.2 0.4 0.6 0.8</span><span class="w">
</span><span class="c1">#> 4 1 0.2 0.4 0.6 -0.2</span><span class="w">
</span><span class="c1">#> 6 1 0.2 0.4 -0.4 -0.2</span><span class="w">
</span><span class="c1">#> 9 1 0.2 -0.6 -0.4 -0.2</span><span class="w">
</span><span class="c1">#> 11 1 -0.8 -0.6 -0.4 -0.2</span><span class="w">
</span></code></pre></div></div>
<p>Wait… what? 😕</p>
<h2 id="the-comparison-matrix">The Comparison Matrix</h2>
<p>When I first started drafting this post, I made it to this point and
noped out for a few days. My curiosity did win out eventually, and I hit
the books (remembered <a href="https://twitter.com/CookieSci/status/1562221740230676481">this
tweet</a> and
<a href="https://twitter.com/bolkerb/status/1565077056169312257">this handout</a>,
watched <a href="https://www.youtube.com/watch?v=yLgPpmXVVbs">this video</a>, read
<a href="https://www.sciencedirect.com/science/article/pii/S0749596X19300695">this
paper</a>,
and read section 9.1.2 in <em>Applied Regression Analysis & Generalized
Linear Models</em>). Now, for the rest of the post.</p>
<p>The best formal, citable source for what I describe here is <a href="https://www.sciencedirect.com/science/article/pii/S0749596X19300695">Schad and
colleagues
(2020)</a>,
but what they call a “hypothesis matrix”, I’m calling a <em>comparison
matrix</em>. I do this for two reasons: 1) to get away from hypothesis
testing mindset (see Figure 1) and 2) because we are using the
hypothesis matrix to apply a constraint among parameter values (remember
that?).</p>
<figure class="" style="max-width: 66%; display: block; margin: 2em auto;">
<img src="/assets/images/2023-07-bayes-sign.jpeg" alt="In this house, we beleive: Bayes is good, estimate with uncertainty is better than hypothesis testing, math is hard, sampling is easy, Bayesian estimation wtih informative priors is indistinguishable from data falsifications, and it kicks ass." /><figcaption>
Figure 1. The sign in my yard.
</figcaption></figure>
<p>In this approach, we define the model parameters <strong><em>β</em></strong> by
matrix-multiplying the the comparison matrix <strong>C</strong> (which activates or
weights different level means) and the levels means <strong><em>μ</em></strong>.</p>
\[\mathbf{C}\boldsymbol{\mu} = \boldsymbol{\beta} \\
\begin{bmatrix}
\textrm{weights for comparison 1} \\
\textrm{weights for comparison 2} \\
\textrm{weights for comparison 3} \\
\cdots \\
\end{bmatrix}
\begin{bmatrix}
\mu_1 \\
\mu_2 \\
\mu_3 \\
\cdots \\
\end{bmatrix} =
\begin{bmatrix}
\beta_0 \\
\beta_1 \\
\beta_2 \\
\cdots \\
\end{bmatrix}\]
<p>So, in the dummy-coded version of the model, we had the following
comparison matrix:</p>
\[\mathbf{C}_\text{dummy}\boldsymbol{\mu} = \boldsymbol{\beta}_\text{dummy} \\
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 \\
-1 & 1 & 0 & 0 & 0 \\
-1 & 0 & 1 & 0 & 0 \\
-1 & 0 & 0 & 1 & 0 \\
-1 & 0 & 0 & 0 & 1 \\
\end{bmatrix}
\begin{bmatrix}
\mu_{\text{Layer B}} \\
\mu_{\text{Layer C}} \\
\mu_{\text{Layer E}} \\
\mu_{\text{Layer G}} \\
\mu_{\text{Layer I}} \\
\end{bmatrix} =
\begin{bmatrix}
\beta_0: \mu_{\text{Layer B}} \\
\beta_1: \mu_{\text{Layer C}} - \mu_{\text{Layer B}} \\
\beta_2: \mu_{\text{Layer E}} - \mu_{\text{Layer B}} \\
\beta_3: \mu_{\text{Layer G}} - \mu_{\text{Layer B}} \\
\beta_4: \mu_{\text{Layer I}} - \mu_{\text{Layer B}} \\
\end{bmatrix}\]
<p>The first row in <strong>C</strong> sets the Layer B as the reference value for the
dummy coding. The second row turns on both Layer B and Layer C, but
Layer B is negatively weighted. Thus, the corresponding model
coefficient is the difference between Layers C and B.</p>
<p>The comparison matrix for the reverse successive difference contrast
coding is similar. The first row activates all of the layers buts
equally weights them, so we get a mean of means for the model intercept. Each
row after the first is the difference between two layer means.</p>
\[\mathbf{C}_\text{rev-diffs}\boldsymbol{\mu} = \boldsymbol{\beta}_\text{rev-diffs} \\
\begin{bmatrix}
.2 & .2 & .2 & .2 & .2 \\
0 & 0 & 0 & 1 & -1 \\
0 & 0 & 1 & -1 & 0 \\
0 & 1 & -1 & 0 & 0 \\
1 & -1 & 0 & 0 & 0 \\
\end{bmatrix}
\begin{bmatrix}
\mu_{\text{Layer B}} \\
\mu_{\text{Layer C}} \\
\mu_{\text{Layer E}} \\
\mu_{\text{Layer G}} \\
\mu_{\text{Layer I}} \\
\end{bmatrix} =
\begin{bmatrix}
\beta_0: \text{mean of } \mu \\
\beta_1: \mu_{\text{Layer G}} - \mu_{\text{Layer I}} \\
\beta_2: \mu_{\text{Layer E}} - \mu_{\text{Layer G}} \\
\beta_3: \mu_{\text{Layer C}} - \mu_{\text{Layer E}} \\
\beta_4: \mu_{\text{Layer B}} - \mu_{\text{Layer C}} \\
\end{bmatrix}\]
<p>Now, here is the magic part 🔮. Multiplying both sides by the inverse of
the comparison matrix will set up a design matrix for the linear model
which follows the contract for the contrast matrices I described above:</p>
\[\mathbf{C}\boldsymbol{\mu} = \boldsymbol{\beta} \\
\mathbf{C}^{-1}\mathbf{C}\boldsymbol{\mu} = \mathbf{C}^{-1}\boldsymbol{\beta} \\
\boldsymbol{\mu} = \mathbf{C}^{-1}\boldsymbol{\beta} \\
\mathbf{\hat y} = \mathbf{X}\boldsymbol{\beta} \\\]
<p>So, we can invert<sup id="fnref:invert" role="doc-noteref"><a href="#fn:invert" class="footnote" rel="footnote">1</a></sup> our comparison matrix to get the model’s contrast matrix:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">comparisons</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="m">.2</span><span class="p">,</span><span class="w"> </span><span class="m">.2</span><span class="p">,</span><span class="w"> </span><span class="m">.2</span><span class="p">,</span><span class="w"> </span><span class="m">.2</span><span class="p">,</span><span class="w"> </span><span class="m">.2</span><span class="p">,</span><span class="w">
</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w">
</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w">
</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w">
</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">mat_comparisons</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="n">comparisons</span><span class="p">,</span><span class="w"> </span><span class="n">nrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">byrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">solve</span><span class="p">(</span><span class="n">mat_comparisons</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [,1] [,2] [,3] [,4] [,5]</span><span class="w">
</span><span class="c1">#> [1,] 1 0.2 0.4 0.6 0.8</span><span class="w">
</span><span class="c1">#> [2,] 1 0.2 0.4 0.6 -0.2</span><span class="w">
</span><span class="c1">#> [3,] 1 0.2 0.4 -0.4 -0.2</span><span class="w">
</span><span class="c1">#> [4,] 1 0.2 -0.6 -0.4 -0.2</span><span class="w">
</span><span class="c1">#> [5,] 1 -0.8 -0.6 -0.4 -0.2</span><span class="w">
</span><span class="n">mat_m2</span><span class="w">
</span><span class="c1">#> (Intercept) LayerAltG-I LayerAltE-G LayerAltC-E LayerAltB-C</span><span class="w">
</span><span class="c1">#> 1 1 0.2 0.4 0.6 0.8</span><span class="w">
</span><span class="c1">#> 4 1 0.2 0.4 0.6 -0.2</span><span class="w">
</span><span class="c1">#> 6 1 0.2 0.4 -0.4 -0.2</span><span class="w">
</span><span class="c1">#> 9 1 0.2 -0.6 -0.4 -0.2</span><span class="w">
</span><span class="c1">#> 11 1 -0.8 -0.6 -0.4 -0.2</span><span class="w">
</span></code></pre></div></div>
<p>Or, perhaps more commonly, we can take the contrast matrix used by a model and
recover the comparison matrix, which is a nice trick when we have R
automatically set the contrast values for us:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Dummy coding example</span><span class="w">
</span><span class="n">mat_m1</span><span class="w">
</span><span class="c1">#> (Intercept) LayerC LayerE LayerG LayerI</span><span class="w">
</span><span class="c1">#> 1 1 0 0 0 0</span><span class="w">
</span><span class="c1">#> 4 1 1 0 0 0</span><span class="w">
</span><span class="c1">#> 6 1 0 1 0 0</span><span class="w">
</span><span class="c1">#> 9 1 0 0 1 0</span><span class="w">
</span><span class="c1">#> 11 1 0 0 0 1</span><span class="w">
</span><span class="n">solve</span><span class="p">(</span><span class="n">mat_m1</span><span class="p">)</span><span class="w">
</span><span class="c1">#> 1 4 6 9 11</span><span class="w">
</span><span class="c1">#> (Intercept) 1 0 0 0 0</span><span class="w">
</span><span class="c1">#> LayerC -1 1 0 0 0</span><span class="w">
</span><span class="c1">#> LayerE -1 0 1 0 0</span><span class="w">
</span><span class="c1">#> LayerG -1 0 0 1 0</span><span class="w">
</span><span class="c1">#> LayerI -1 0 0 0 1</span><span class="w">
</span><span class="c1"># Successive differences coding example</span><span class="w">
</span><span class="n">mat_m2</span><span class="w">
</span><span class="c1">#> (Intercept) LayerAltG-I LayerAltE-G LayerAltC-E LayerAltB-C</span><span class="w">
</span><span class="c1">#> 1 1 0.2 0.4 0.6 0.8</span><span class="w">
</span><span class="c1">#> 4 1 0.2 0.4 0.6 -0.2</span><span class="w">
</span><span class="c1">#> 6 1 0.2 0.4 -0.4 -0.2</span><span class="w">
</span><span class="c1">#> 9 1 0.2 -0.6 -0.4 -0.2</span><span class="w">
</span><span class="c1">#> 11 1 -0.8 -0.6 -0.4 -0.2</span><span class="w">
</span><span class="n">solve</span><span class="p">(</span><span class="n">mat_m2</span><span class="p">)</span><span class="w">
</span><span class="c1">#> 1 4 6 9 11</span><span class="w">
</span><span class="c1">#> (Intercept) 0.2 0.2 0.2 0.2 0.2</span><span class="w">
</span><span class="c1">#> LayerAltG-I 0.0 0.0 0.0 1.0 -1.0</span><span class="w">
</span><span class="c1">#> LayerAltE-G 0.0 0.0 1.0 -1.0 0.0</span><span class="w">
</span><span class="c1">#> LayerAltC-E 0.0 1.0 -1.0 0.0 0.0</span><span class="w">
</span><span class="c1">#> LayerAltB-C 1.0 -1.0 0.0 0.0 0.0</span><span class="w">
</span></code></pre></div></div>
<p>As I said earlier, there are all kinds of contrast coding schemes which
allow us to define the model parameters in terms of specific
comparisons, and this post only mentions two such schemes (dummy coding
and a reversed version of successive differences coding).</p>
<h2 id="finally-in-layer-i-of-this-post-the-brms-model">Finally, in Layer I of this post, the brms model</h2>
<p>Now that we know about contrasts, and how they let us define model
parameters in terms of the comparisons we want to make, we can use this
technique to enforce an ordering constraint.</p>
<p>We set up our model as in Ben-Shachar’s <a href="https://blog.msbstats.info/posts/2023-06-26-order-constraints-in-brms/" title="Order Constraints in Bayes Models (with brms)">post</a>, but
here we set a prior for <code class="language-plaintext highlighter-rouge">normal(500, 250)</code> on the non-intercept
coefficients with a lower-bound of 0 <code class="language-plaintext highlighter-rouge">lb = 0</code> to enforce the
ordering constraint.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">brms</span><span class="p">)</span><span class="w">
</span><span class="n">priors</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">set_prior</span><span class="p">(</span><span class="s2">"normal(-5975, 1000)"</span><span class="p">,</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Intercept"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">set_prior</span><span class="p">(</span><span class="s2">"normal(500, 250)"</span><span class="p">,</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"b"</span><span class="p">,</span><span class="w"> </span><span class="n">lb</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">set_prior</span><span class="p">(</span><span class="s2">"exponential(0.01)"</span><span class="p">,</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"sigma"</span><span class="p">)</span><span class="w">
</span><span class="n">validate_prior</span><span class="p">(</span><span class="w">
</span><span class="n">priors</span><span class="p">,</span><span class="w">
</span><span class="n">bf</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">se</span><span class="p">(</span><span class="n">error</span><span class="p">,</span><span class="w"> </span><span class="n">sigma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">LayerAlt</span><span class="p">),</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table1</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1">#> prior class coef group resp dpar nlpar lb ub</span><span class="w">
</span><span class="c1">#> normal(500, 250) b 0 </span><span class="w">
</span><span class="c1">#> normal(500, 250) b LayerAltBMC 0 </span><span class="w">
</span><span class="c1">#> normal(500, 250) b LayerAltCME 0 </span><span class="w">
</span><span class="c1">#> normal(500, 250) b LayerAltEMG 0 </span><span class="w">
</span><span class="c1">#> normal(500, 250) b LayerAltGMI 0 </span><span class="w">
</span><span class="c1">#> normal(-5975, 1000) Intercept </span><span class="w">
</span><span class="c1">#> exponential(0.01) sigma 0 </span><span class="w">
</span><span class="c1">#> source</span><span class="w">
</span><span class="c1">#> user</span><span class="w">
</span><span class="c1">#> (vectorized)</span><span class="w">
</span><span class="c1">#> (vectorized)</span><span class="w">
</span><span class="c1">#> (vectorized)</span><span class="w">
</span><span class="c1">#> (vectorized)</span><span class="w">
</span><span class="c1">#> user</span><span class="w">
</span><span class="c1">#> user</span><span class="w">
</span></code></pre></div></div>
<p>We fit the model:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">brm</span><span class="p">(</span><span class="w">
</span><span class="n">bf</span><span class="p">(</span><span class="n">C14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">se</span><span class="p">(</span><span class="n">error</span><span class="p">,</span><span class="w"> </span><span class="n">sigma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">LayerAlt</span><span class="p">),</span><span class="w">
</span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gaussian</span><span class="p">(</span><span class="s2">"identity"</span><span class="p">),</span><span class="w">
</span><span class="n">prior</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">priors</span><span class="p">,</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table1</span><span class="p">,</span><span class="w">
</span><span class="n">seed</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4321</span><span class="p">,</span><span class="w">
</span><span class="n">backend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"cmdstanr"</span><span class="p">,</span><span class="w">
</span><span class="n">cores</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w">
</span><span class="c1"># caching</span><span class="w">
</span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"_caches/2023-07-03"</span><span class="p">,</span><span class="w">
</span><span class="n">file_refit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"on_change"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>We can see that the level differences are indeed positive with 95%
intervals of positive values.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">summary</span><span class="p">(</span><span class="n">m3</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Family: gaussian </span><span class="w">
</span><span class="c1">#> Links: mu = identity; sigma = identity </span><span class="w">
</span><span class="c1">#> Formula: C14 | se(error, sigma = TRUE) ~ 1 + LayerAlt </span><span class="w">
</span><span class="c1">#> Data: table1 (Number of observations: 12) </span><span class="w">
</span><span class="c1">#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;</span><span class="w">
</span><span class="c1">#> total post-warmup draws = 4000</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Population-Level Effects: </span><span class="w">
</span><span class="c1">#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS</span><span class="w">
</span><span class="c1">#> Intercept -5957.60 27.91 -6011.89 -5900.71 1.00 1964 1715</span><span class="w">
</span><span class="c1">#> LayerAltGMI 211.00 82.29 51.67 378.86 1.00 1693 939</span><span class="w">
</span><span class="c1">#> LayerAltEMG 206.15 71.30 68.47 349.07 1.00 1937 1185</span><span class="w">
</span><span class="c1">#> LayerAltCME 105.55 62.84 7.90 243.81 1.00 1377 1023</span><span class="w">
</span><span class="c1">#> LayerAltBMC 145.95 65.13 23.63 279.12 1.00 1684 857</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Family Specific Parameters: </span><span class="w">
</span><span class="c1">#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS</span><span class="w">
</span><span class="c1">#> sigma 79.03 26.95 41.05 142.49 1.00 1651 2149</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS</span><span class="w">
</span><span class="c1">#> and Tail_ESS are effective sample size measures, and Rhat is the potential</span><span class="w">
</span><span class="c1">#> scale reduction factor on split chains (at convergence, Rhat = 1).</span><span class="w">
</span><span class="n">bayesplot</span><span class="o">::</span><span class="n">mcmc_intervals</span><span class="p">(</span><span class="n">m3</span><span class="p">,</span><span class="w"> </span><span class="n">regex_pars</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Layer"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<div class="figure" style="text-align: center">
<img src="/figs/2023-07-03-bayesian-ordering-constraint/level-diffs-1.png" alt="Estimates of the level differences." width="80%" />
<p class="caption">Estimates of the level differences.</p>
</div>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">conditional_effects</span><span class="p">(</span><span class="n">m3</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<div class="figure" style="text-align: center">
<img src="/figs/2023-07-03-bayesian-ordering-constraint/level-means-1.png" alt="Conditional means for each layer." width="80%" />
<p class="caption">Conditional means for each layer.</p>
</div>
<h2 id="normally-i-dont-think-you-need-contrast-codes">Normally, I don’t think you need contrast codes</h2>
<p>My general advice for contrast coding is to just fit the model and then
have the software compute the appropriate estimates and comparisons
afterwards on the outcome scale. For example,
<a href="https://cran.r-project.org/web/packages/emmeans/vignettes/comparisons.html">emmeans</a>
can take a fitted model, run requested comparisons, and handle multiple
comparisons and <em>p</em>-value adjustments for us.
<a href="https://vincentarelbundock.github.io/marginaleffects/">marginaleffects</a>
probably does this too. (I really need to play with it.) And in a
Bayesian model, we can compute comparisons of interest by doing math on
the posterior samples (estimating things and computing differences and
summarizing the distribution of the differences), but this particular
model, where the coding was needed to impose the prior ordering
constraint, ruled out the posterior post-processing approach.</p>
<hr />
<p><em>Last knitted on 2023-07-05. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2023-07-03-bayesian-ordering-constraint.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">2</a></sup></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:invert" role="doc-endnote">
<p>I use <a href="https://rdrr.io/r/base/solve.html"><code class="language-plaintext highlighter-rouge">solve()</code></a> here for the inversion, but <a href="https://www.sciencedirect.com/science/article/pii/S0749596X19300695">Schad and
colleagues
(2020)</a>
use the generalized inverse <a href="https://rdrr.io/pkg/MASS/man/ginv.html"><code class="language-plaintext highlighter-rouge">MASS::ginv()</code></a> or
<a href="https://cran.r-project.org/web/packages/matlib/vignettes/ginv.html"><code class="language-plaintext highlighter-rouge">matlib::Ginv()</code></a>.
<code class="language-plaintext highlighter-rouge">solve()</code> only works on square matrices, but the generalized inverse
works on non-square matrices. <a href="#fnref:invert" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:si" role="doc-endnote">
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#> ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> setting value</span><span class="w">
</span><span class="c1">#> version R version 4.3.0 (2023-04-21 ucrt)</span><span class="w">
</span><span class="c1">#> os Windows 11 x64 (build 22621)</span><span class="w">
</span><span class="c1">#> system x86_64, mingw32</span><span class="w">
</span><span class="c1">#> ui RTerm</span><span class="w">
</span><span class="c1">#> language (EN)</span><span class="w">
</span><span class="c1">#> collate English_United States.utf8</span><span class="w">
</span><span class="c1">#> ctype English_United States.utf8</span><span class="w">
</span><span class="c1">#> tz America/Chicago</span><span class="w">
</span><span class="c1">#> date 2023-07-05</span><span class="w">
</span><span class="c1">#> pandoc NA</span><span class="w">
</span><span class="c1">#> stan (rstan) 2.26.1</span><span class="w">
</span><span class="c1">#> stan (cmdstanr) 2.32.0</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> ! package * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#> abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> bayesplot 1.10.0 2022-11-16 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> bridgesampling 1.1-2 2021-04-16 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> brms * 2.19.0 2023-03-14 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> Brobdingnag 1.2-9 2022-10-19 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> checkmate 2.2.0 2023-04-27 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> cmdstanr 0.5.3 2023-04-24 [1] local</span><span class="w">
</span><span class="c1">#> coda 0.19-4 2020-09-30 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> colourpicker 1.2.0 2022-10-28 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> crosstalk 1.2.0 2021-11-04 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> curl 5.0.1 2023-06-07 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> digest 0.6.32 2023-06-26 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#> distributional 0.3.2 2023-03-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> downlit 0.4.3 2023-06-29 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> DT 0.28 2023-05-18 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> dygraphs 1.1.1.6 2018-07-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> emmeans 1.8.7 2023-06-23 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#> estimability 1.4.1 2022-08-05 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> ggplot2 * 3.4.2 2023-04-03 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> git2r 0.32.0 2023-04-12 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> here 1.0.1 2020-12-13 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> highr 0.10 2022-12-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> httpuv 1.6.11 2023-05-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> igraph 1.5.0 2023-06-16 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#> inline 0.3.19 2021-05-31 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> jsonlite 1.8.5 2023-06-05 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#> knitr * 1.43 2023-05-25 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> labeling 0.4.2 2020-10-20 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> later 1.3.1 2023-05-02 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> lattice 0.21-8 2023-04-05 [2] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> loo 2.6.0 2023-03-31 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> lubridate * 1.9.2 2023-02-10 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> markdown 1.7 2023-05-16 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> MASS * 7.3-60 2023-05-04 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> Matrix 1.5-4 2023-04-04 [2] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> matrixStats 1.0.0 2023-06-02 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> mime 0.12 2021-09-28 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> mvtnorm 1.2-2 2023-06-08 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#> nlme 3.1-162 2023-01-31 [2] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> pkgbuild 1.4.2 2023-06-26 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> plyr 1.8.8 2022-11-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> posterior 1.4.1 2023-03-14 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> processx 3.8.1 2023-04-18 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#> promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> ps 1.7.5 2023-04-18 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> ragg 1.2.5 2023-01-12 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> Rcpp * 1.0.10 2023-01-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> D RcppParallel 5.1.7 2023-02-27 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> rstan 2.26.22 2023-05-02 [1] local</span><span class="w">
</span><span class="c1">#> rstantools 2.3.1 2023-03-30 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> shiny 1.7.4 2022-12-15 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> shinyjs 2.1.0 2021-12-23 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> shinystan 2.6.0 2022-03-03 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> shinythemes 1.2.0 2021-01-25 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> StanHeaders 2.26.27 2023-06-14 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> tensorA 0.36.2 2020-11-19 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> threejs 0.3.3 2020-01-21 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> V8 4.3.0 2023-04-08 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.1)</span><span class="w">
</span><span class="c1">#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> xts 0.13.1 2023-04-16 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> zoo 1.8-12 2023-04-13 [1] CRAN (R 4.3.0)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [1] C:/Users/Tristan/AppData/Local/R/win-library/4.3</span><span class="w">
</span><span class="c1">#> [2] C:/Program Files/R/R-4.3.0/library</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> D ── DLL MD5 mismatch, broken installation.</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div> </div>
<p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Tristan Mahrtjmahrweb@gmail.comBut mostly how contrast matrices are computedHow to score Rock Paper Scissors2022-12-06T00:00:00-06:002022-12-06T00:00:00-06:00https://tjmahr.github.io/rock-paper-scissors-lists-are-trees<p>Ho ho ho, it is the most wonderful time of the year: Advent of code!</p>
<p>AOC is a yearly collection of programming puzzles throughout the
first 25 days of December. I like it… so much so that I wrote <a href="https://github.com/tjmahr/aoc">an R
package</a> for completing my puzzles using
the structure of an R package. The puzzles start out easy and get
progressively more elaborate or devious in their requirements. But I am
going to talk about an easy puzzle in this post, and specifically, one
little trick I used in my solution.</p>
<p><a href="https://adventofcode.com/2022/day/2">Day 2 of 2022</a> requires us to score games of Rock Paper Scissors. The
moves are encoded using letters, where our opponent’s moves are coded as
<code class="language-plaintext highlighter-rouge">A</code>, <code class="language-plaintext highlighter-rouge">B</code>, <code class="language-plaintext highlighter-rouge">C</code> and ours are coded as <code class="language-plaintext highlighter-rouge">X</code>, <code class="language-plaintext highlighter-rouge">Y</code>, <code class="language-plaintext highlighter-rouge">Z</code>. So, an input
describing three moves will look like the following:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">example_input</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="s2">"A Y"</span><span class="p">,</span><span class="w">
</span><span class="s2">"B X"</span><span class="p">,</span><span class="w">
</span><span class="s2">"C Z"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>Where the letters mean the following:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">move_codes</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="s2">"A"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rock"</span><span class="p">,</span><span class="w">
</span><span class="s2">"B"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"paper"</span><span class="p">,</span><span class="w">
</span><span class="s2">"C"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"scissors"</span><span class="p">,</span><span class="w">
</span><span class="s2">"X"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rock"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Y"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"paper"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Z"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"scissors"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>This encoding seems like a weird bit of indirection thrown on, and <em>it is</em>,
because the puzzle changes the meanings of the letters in Part 2. Still,
it is straightforward to parse the input into a list of roshambo moves.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">input</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">example_input</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">strsplit</span><span class="p">(</span><span class="s2">" "</span><span class="p">)</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="c1"># Use character subsetting to convert letters to moves</span><span class="w">
</span><span class="n">lapply</span><span class="p">(</span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="n">unname</span><span class="p">(</span><span class="n">move_codes</span><span class="p">[</span><span class="n">x</span><span class="p">]))</span><span class="w">
</span><span class="c1"># Our character's move is the second element in each vector</span><span class="w">
</span><span class="n">str</span><span class="p">(</span><span class="n">input</span><span class="p">)</span><span class="w">
</span><span class="c1">#> List of 3</span><span class="w">
</span><span class="c1">#> $ : chr [1:2] "rock" "paper"</span><span class="w">
</span><span class="c1">#> $ : chr [1:2] "paper" "rock"</span><span class="w">
</span><span class="c1">#> $ : chr [1:2] "scissors" "scissors"</span><span class="w">
</span></code></pre></div></div>
<p>Now, for the point of this post, <strong>how do we score each game?</strong></p>
<p>The naive approach is to start typing away furiously</p>
<p><img src="/figs/2022-12-06-rock-paper-scissors-lists-are-trees//unnamed-chunk-5.svg" alt="center" width="100%" style="display: block; margin: auto;" /></p>
<p>before eventually noping the hell out of there.</p>
<p>What we have is a decision tree: we need to follow a branch for player
one and another branch for player two. And here’s the main point of this
post: <strong>nested lists are trees</strong>. (Yes, I love lists—see <a href="/lists-knitr-secret-weapon/">this
post</a> where I use them in my knitr
reporting.) The top (outer) level of the list will be all of the player
one options, and then the bottom (inner) level will be all the player
two options. The nodes of the tree (bottom level values) are the
outcomes of the games.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">run_game</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">pair</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="c1"># nested lists are trees</span><span class="w">
</span><span class="n">rules</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">rock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">rock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"draw"</span><span class="p">,</span><span class="w">
</span><span class="n">scissors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lose"</span><span class="p">,</span><span class="w">
</span><span class="n">paper</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"win"</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">scissors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">scissors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"draw"</span><span class="p">,</span><span class="w">
</span><span class="n">rock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"win"</span><span class="p">,</span><span class="w">
</span><span class="n">paper</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lose"</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">paper</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">paper</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"draw"</span><span class="p">,</span><span class="w">
</span><span class="n">scissors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"win"</span><span class="p">,</span><span class="w">
</span><span class="n">rock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lose"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1"># Because `rules[[pair[1]]][[pair[2]]]` is unsightly:</span><span class="w">
</span><span class="n">rules</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">getElement</span><span class="p">(</span><span class="n">pair</span><span class="p">[</span><span class="m">1</span><span class="p">])</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">getElement</span><span class="p">(</span><span class="n">pair</span><span class="p">[</span><span class="m">2</span><span class="p">])</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>At this point, we could take a second to ponder how the structure of
several nested if-elses—the actual shape of the code, indenting in and
out in and in again—resembles the structure and the shape of the
nested list, and ponder further about how the regular, orderly shape of
code could be the whispers of hidden data, saying “<code class="language-plaintext highlighter-rouge">list()</code> me, <code class="language-plaintext highlighter-rouge">list()</code>
me”. Or, we could run the code and see it in action.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">input</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">lapply</span><span class="p">(</span><span class="n">run_game</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [[1]]</span><span class="w">
</span><span class="c1">#> [1] "win"</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [[2]]</span><span class="w">
</span><span class="c1">#> [1] "lose"</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [[3]]</span><span class="w">
</span><span class="c1">#> [1] "draw"</span><span class="w">
</span><span class="c1"># Or to repeat the input</span><span class="w">
</span><span class="n">input</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">stats</span><span class="o">::</span><span class="n">setNames</span><span class="p">(</span><span class="n">input</span><span class="p">)</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">lapply</span><span class="p">(</span><span class="n">run_game</span><span class="p">)</span><span class="w">
</span><span class="c1">#> $`c("rock", "paper")`</span><span class="w">
</span><span class="c1">#> [1] "win"</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> $`c("paper", "rock")`</span><span class="w">
</span><span class="c1">#> [1] "lose"</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> $`c("scissors", "scissors")`</span><span class="w">
</span><span class="c1">#> [1] "draw"</span><span class="w">
</span></code></pre></div></div>
<hr />
<p>Earlier in the post, I used <a href="https://adv-r.hadley.nz/subsetting.html#lookup-tables">character
subsetting</a> to
convert letters into moves. This process turned a matching/replacement
problem into a data lookup problem. The Rock Paper Scissors are the same
trick again: converting a decision tree into a data lookup problem.</p>
<hr />
<p><em>Last knitted on 2022-12-06. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2022-12-06-rock-paper-scissors-lists-are-trees.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:si" role="doc-endnote">
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#> ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> setting value</span><span class="w">
</span><span class="c1">#> version R version 4.2.2 (2022-10-31 ucrt)</span><span class="w">
</span><span class="c1">#> os Windows 10 x64 (build 22621)</span><span class="w">
</span><span class="c1">#> system x86_64, mingw32</span><span class="w">
</span><span class="c1">#> ui RTerm</span><span class="w">
</span><span class="c1">#> language (EN)</span><span class="w">
</span><span class="c1">#> collate English_United States.utf8</span><span class="w">
</span><span class="c1">#> ctype English_United States.utf8</span><span class="w">
</span><span class="c1">#> tz America/Chicago</span><span class="w">
</span><span class="c1">#> date 2022-12-06</span><span class="w">
</span><span class="c1">#> pandoc NA</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> package * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#> asciicast 2.3.0 2022-12-05 [1] CRAN (R 4.2.2)</span><span class="w">
</span><span class="c1">#> cli 3.4.1 2022-09-23 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> curl 4.3.3 2022-10-06 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> evaluate 0.18 2022-11-07 [1] CRAN (R 4.2.2)</span><span class="w">
</span><span class="c1">#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> git2r 0.30.1 2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> here 1.0.1 2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> jsonlite 1.8.3 2022-10-21 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> knitr * 1.40 2022-08-24 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> magick 2.7.3 2021-08-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> processx 3.8.0 2022-10-26 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> ps 1.7.2 2022-10-26 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ragg 1.2.4 2022-10-24 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> stringr 1.4.1 2022-08-20 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tibble 3.1.8 2022-07-22 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> V8 4.2.2 2022-11-03 [1] CRAN (R 4.2.2)</span><span class="w">
</span><span class="c1">#> vctrs 0.5.0 2022-10-22 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xfun 0.34 2022-10-18 [1] CRAN (R 4.2.1)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [1] C:/Users/trist/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#> [2] C:/Program Files/R/R-4.2.2/library</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div> </div>
<p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Tristan Mahrtjmahrweb@gmail.comLists are treesCreating a Summoning Salt-style speedrun plot2022-05-24T00:00:00-05:002022-05-24T00:00:00-05:00https://tjmahr.github.io/summoning-salt-plot<p>A videogame speedrun is a challenge to beat the game as quickly as
possible. It’s time attack racing but for a videogame. There are, in my
mind, two ways to make a run’s time go faster: Playing better and more
smoothly (optimizations, having better luck) and playing less of the
game (better routing, new glitches/skips). The history of a speedrun
category then is often an exciting mix of evolutionary improvements as
players level up their skills and revolutionary jumps as players find
new ways to cut through the game.</p>
<p><a href="https://www.youtube.com/c/SummoningSalt">Summoning Salt</a> is a Youtube
creator who creates documentaries that trace out the world record
progression in a speedrun. The videos are immensely enjoyable, as Salt
dishes out the history bit by bit, record by record, sometimes in a suspenseful fashion.</p>
<p>As a data visualization person, I’ve noticed that Summoning
Salt recently started to use a new prop in the videos: A step graph of the
world record times. The graph is developed throughout a video as players
(represented by individual colors) lower the times with new
records (points) until you get a full reveal of a timeline like the
following:</p>
<figure class="" style="max-width: 100%; display: block; margin: 2em auto;">
<img src="/assets/images/2022-05-wr-plot-1.png" alt="Screenshot of a timeline from a Summoning Salt video." /><figcaption>
Screenshot of a timeline from a Summoning Salt video.
</figcaption></figure>
<p>Let’s recreate this figure in R with ggplot2.</p>
<h2 id="warp-pipe-obtaining-the-data">Warp pipe: Obtaining the data</h2>
<p>The game in question is <em>New Super Mario Bros Wii</em>, and the record
keeper is the site <a href="https://www.speedrun.com/nsmbw">speedrun.com</a>. There
is not just one speedrun category for this game, so in particular, we
want the “Any%” record history (i.e., “any percent”: you don’t have
play every level, and you can skip parts of the game.)</p>
<p>We need to get the leaderboard history data from speedrun.com. There is an
<a href="https://github.com/speedruncomorg/api">official REST API</a> for the
site’s data, but it’s not straightforward how to query it to obtain the
data needed for a world record progression. (Apparently, one could
request <a href="https://github.com/speedruncomorg/api/issues/123">the leaderboard on different
dates</a> and work
backwards through time.) But that’s okay, we are not going to use the
API. Instead, the <a href="https://www.speedrun.com/nsmbw/gamestats">statistics page for the
game</a> has a plot that is
tantalizingly close to the one we want to create.</p>
<figure class="" style="max-width: 100%; display: block; margin: 2em auto;">
<img src="/assets/images/2022-05-wr-plot-2.png" alt="A timeline figure from speedrun.com." /><figcaption>
A timeline figure from speedrun.com.
</figcaption></figure>
<p>This plot is <em>interactive</em>, and our browser is downloading the data and
plotting it for us. If we snoop around the page, we can find the JSON
data behind the plot. In Firefox, when I right-click on the plot and hit
“Inspect”, I see the HTML code that contains the plot. Just below the
plot’s div is a chunk of Javascript.</p>
<figure class="" style="max-width: 100%; display: block; margin: 2em auto;">
<img src="/assets/images/2022-05-firefox-shot2.png" alt="A screenshot of the Firefox inspector showing the speedrun data in a Javascript script tag." /><figcaption>
A screenshot of the Firefox inspector showing the speedrun data in a Javascript script tag.
</figcaption></figure>
<p>The first line of it is all the speedrun data that is being plotted. We
save that JSON into <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/data/2022-05-23-nsmbw-runs.json">its own file</a>.</p>
<h2 id="ground-pound-filtering-and-cleaning-the-data">Ground pound: Filtering and cleaning the data</h2>
<p>Let’s read the data into R. JSON is short for “Javascript Object
Notation”, and it’s basically the equivalent of a <code class="language-plaintext highlighter-rouge">list()</code> in R. Hence,
<a href="https://rdrr.io/pkg/jsonlite/man/read_json.html">jsonlite</a> provides a large, deeply nested list for us.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="c1"># a helper function to download the data from github</span><span class="w">
</span><span class="c1"># in case you want to play along</span><span class="w">
</span><span class="n">path_blog_data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">file.path</span><span class="p">(</span><span class="w">
</span><span class="s2">"https://raw.githubusercontent.com"</span><span class="p">,</span><span class="w">
</span><span class="s2">"tjmahr/tjmahr.github.io/master/_R/data"</span><span class="p">,</span><span class="w">
</span><span class="n">x</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">json_runs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">path_blog_data</span><span class="p">(</span><span class="s2">"2022-05-23-nsmbw-runs.json"</span><span class="p">)</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">jsonlite</span><span class="o">::</span><span class="n">read_json</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>
<p>The plot on the statistics page has a dropdown menu for different
kinds of records to display, so this JSON object has a sublist for each
dropdown menu choice. What we want is the first sublist (full game runs)
then its first sublist (with a <code class="language-plaintext highlighter-rouge">label</code> of <code class="language-plaintext highlighter-rouge">"Any% - Physical"</code>) then its
<code class="language-plaintext highlighter-rouge">"data"</code>.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Dropdown menu choices</span><span class="w">
</span><span class="n">str</span><span class="p">(</span><span class="n">json_runs</span><span class="p">,</span><span class="w"> </span><span class="n">max.level</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="c1">#> List of 10</span><span class="w">
</span><span class="c1">#> $ 0 :List of 7</span><span class="w">
</span><span class="c1">#> $ 6789:List of 18</span><span class="w">
</span><span class="c1">#> $ 6805:List of 18</span><span class="w">
</span><span class="c1">#> $ 6815:List of 18</span><span class="w">
</span><span class="c1">#> $ 6826:List of 19</span><span class="w">
</span><span class="c1">#> $ 6841:List of 18</span><span class="w">
</span><span class="c1">#> $ 6846:List of 20</span><span class="w">
</span><span class="c1">#> $ 6859:List of 19</span><span class="w">
</span><span class="c1">#> $ 6868:List of 22</span><span class="w">
</span><span class="c1">#> $ 6882:List of 18</span><span class="w">
</span><span class="c1"># Full game run histories</span><span class="w">
</span><span class="n">str</span><span class="p">(</span><span class="n">json_runs</span><span class="p">[[</span><span class="m">1</span><span class="p">]],</span><span class="w"> </span><span class="n">max.level</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="c1">#> List of 7</span><span class="w">
</span><span class="c1">#> $ :List of 7</span><span class="w">
</span><span class="c1">#> ..$ label : chr "Any% - Physical"</span><span class="w">
</span><span class="c1">#> ..$ data :List of 30</span><span class="w">
</span><span class="c1">#> ..$ borderColor : chr "#EE4444"</span><span class="w">
</span><span class="c1">#> ..$ pointBorderColor : chr "#EE4444"</span><span class="w">
</span><span class="c1">#> ..$ pointHoverBackgroundColor: chr "#EE4444"</span><span class="w">
</span><span class="c1">#> ..$ hidden : logi FALSE</span><span class="w">
</span><span class="c1">#> ..$ steppedLine : logi TRUE</span><span class="w">
</span><span class="c1">#> $ :List of 7</span><span class="w">
</span><span class="c1">#> ..$ label : chr "Cannonless - Physical"</span><span class="w">
</span><span class="c1">#> ..$ data :List of 25</span><span class="w">
</span><span class="c1">#> ..$ borderColor : chr "#EF8241"</span><span class="w">
</span><span class="c1">#> ..$ pointBorderColor : chr "#EF8241"</span><span class="w">
</span><span class="c1">#> ..$ pointHoverBackgroundColor: chr "#EF8241"</span><span class="w">
</span><span class="c1">#> ..$ hidden : logi FALSE</span><span class="w">
</span><span class="c1">#> ..$ steppedLine : logi TRUE</span><span class="w">
</span><span class="c1">#> $ :List of 7</span><span class="w">
</span><span class="c1">#> ..$ label : chr "100% - Physical"</span><span class="w">
</span><span class="c1">#> ..$ data :List of 17</span><span class="w">
</span><span class="c1">#> ..$ borderColor : chr "#F0C03E"</span><span class="w">
</span><span class="c1">#> ..$ pointBorderColor : chr "#F0C03E"</span><span class="w">
</span><span class="c1">#> ..$ pointHoverBackgroundColor: chr "#F0C03E"</span><span class="w">
</span><span class="c1">#> ..$ hidden : logi FALSE</span><span class="w">
</span><span class="c1">#> ..$ steppedLine : logi TRUE</span><span class="w">
</span><span class="c1">#> $ :List of 7</span><span class="w">
</span><span class="c1">#> ..$ label : chr "Any% No W5 - Physical"</span><span class="w">
</span><span class="c1">#> ..$ data :List of 22</span><span class="w">
</span><span class="c1">#> ..$ borderColor : chr "#8AC951"</span><span class="w">
</span><span class="c1">#> ..$ pointBorderColor : chr "#8AC951"</span><span class="w">
</span><span class="c1">#> ..$ pointHoverBackgroundColor: chr "#8AC951"</span><span class="w">
</span><span class="c1">#> ..$ hidden : logi TRUE</span><span class="w">
</span><span class="c1">#> ..$ steppedLine : logi TRUE</span><span class="w">
</span><span class="c1">#> $ :List of 7</span><span class="w">
</span><span class="c1">#> ..$ label : chr "Low% - Physical"</span><span class="w">
</span><span class="c1">#> ..$ data :List of 18</span><span class="w">
</span><span class="c1">#> ..$ borderColor : chr "#09B876"</span><span class="w">
</span><span class="c1">#> ..$ pointBorderColor : chr "#09B876"</span><span class="w">
</span><span class="c1">#> ..$ pointHoverBackgroundColor: chr "#09B876"</span><span class="w">
</span><span class="c1">#> ..$ hidden : logi TRUE</span><span class="w">
</span><span class="c1">#> ..$ steppedLine : logi TRUE</span><span class="w">
</span><span class="c1">#> $ :List of 7</span><span class="w">
</span><span class="c1">#> ..$ label : chr "Any% Multiplayer - Physical"</span><span class="w">
</span><span class="c1">#> ..$ data :List of 11</span><span class="w">
</span><span class="c1">#> ..$ borderColor : chr "#44BBEE"</span><span class="w">
</span><span class="c1">#> ..$ pointBorderColor : chr "#44BBEE"</span><span class="w">
</span><span class="c1">#> ..$ pointHoverBackgroundColor: chr "#44BBEE"</span><span class="w">
</span><span class="c1">#> ..$ hidden : logi TRUE</span><span class="w">
</span><span class="c1">#> ..$ steppedLine : logi TRUE</span><span class="w">
</span><span class="c1">#> $ :List of 7</span><span class="w">
</span><span class="c1">#> ..$ label : chr "All Regular Exits - Physical"</span><span class="w">
</span><span class="c1">#> ..$ data :List of 7</span><span class="w">
</span><span class="c1">#> ..$ borderColor : chr "#6666EE"</span><span class="w">
</span><span class="c1">#> ..$ pointBorderColor : chr "#6666EE"</span><span class="w">
</span><span class="c1">#> ..$ pointHoverBackgroundColor: chr "#6666EE"</span><span class="w">
</span><span class="c1">#> ..$ hidden : logi TRUE</span><span class="w">
</span><span class="c1">#> ..$ steppedLine : logi TRUE</span><span class="w">
</span><span class="c1"># Just want the data field from the first one</span><span class="w">
</span><span class="n">json_any_percent</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">json_runs</span><span class="p">[[</span><span class="m">1</span><span class="p">]][[</span><span class="m">1</span><span class="p">]][[</span><span class="s2">"data"</span><span class="p">]]</span><span class="w">
</span></code></pre></div></div>
<p>Here are the first two points’ worth of date. We have a not-so-obviously
encoded date (<code class="language-plaintext highlighter-rouge">x</code>), the run length in seconds (<code class="language-plaintext highlighter-rouge">y</code>) and the <code class="language-plaintext highlighter-rouge">player</code>. We
are going to convert each of these lists into a dataframe and bind them
together.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">json_any_percent</span><span class="w"> </span><span class="o">|></span><span class="w"> </span><span class="n">head</span><span class="p">(</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">|></span><span class="w"> </span><span class="n">str</span><span class="p">()</span><span class="w">
</span><span class="c1">#> List of 2</span><span class="w">
</span><span class="c1">#> $ :List of 4</span><span class="w">
</span><span class="c1">#> ..$ x : int 1306670400</span><span class="w">
</span><span class="c1">#> ..$ y : int 1616</span><span class="w">
</span><span class="c1">#> ..$ players:List of 1</span><span class="w">
</span><span class="c1">#> .. ..$ : chr "RaikerZ"</span><span class="w">
</span><span class="c1">#> ..$ link : chr "/nsmbw/run/2216987"</span><span class="w">
</span><span class="c1">#> $ :List of 4</span><span class="w">
</span><span class="c1">#> ..$ x : int 1325246400</span><span class="w">
</span><span class="c1">#> ..$ y : int 1549</span><span class="w">
</span><span class="c1">#> ..$ players:List of 1</span><span class="w">
</span><span class="c1">#> .. ..$ : chr "RaikerZ"</span><span class="w">
</span><span class="c1">#> ..$ link : chr "/nsmbw/run/2216995"</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">json_any_percent</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">lapply</span><span class="p">(</span><span class="w">
</span><span class="c1"># turn one list into a dataframe</span><span class="w">
</span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">tibble</span><span class="p">(</span><span class="w">
</span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="o">$</span><span class="n">x</span><span class="p">,</span><span class="w">
</span><span class="n">run_time_s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="o">$</span><span class="n">y</span><span class="p">,</span><span class="w">
</span><span class="n">player</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="o">$</span><span class="n">players</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">bind_rows</span><span class="p">()</span><span class="w">
</span><span class="n">data</span><span class="w">
</span><span class="c1">#> # A tibble: 30 × 3</span><span class="w">
</span><span class="c1">#> date run_time_s player </span><span class="w">
</span><span class="c1">#> <int> <dbl> <chr> </span><span class="w">
</span><span class="c1">#> 1 1306670400 1616 RaikerZ </span><span class="w">
</span><span class="c1">#> 2 1325246400 1549 RaikerZ </span><span class="w">
</span><span class="c1">#> 3 1332763200 1531 RaikerZ </span><span class="w">
</span><span class="c1">#> 4 1349870400 1527 RaikerZ </span><span class="w">
</span><span class="c1">#> 5 1457179200 1526 GreenUprooter</span><span class="w">
</span><span class="c1">#> 6 1461585600 1523 Auchgard </span><span class="w">
</span><span class="c1">#> 7 1461672000 1522 Auchgard </span><span class="w">
</span><span class="c1">#> 8 1461758400 1519 Auchgard </span><span class="w">
</span><span class="c1">#> 9 1470744000 1514 Auchgard </span><span class="w">
</span><span class="c1">#> 10 1471521600 1512 Auchgard </span><span class="w">
</span><span class="c1">#> # … with 20 more rows</span><span class="w">
</span></code></pre></div></div>
<p>Lastly, we need to do something about those dates. When you see a
date-time represented by a single large number, it’s probably a
<a href="https://rdrr.io/r/base/as.POSIXlt.html">POSIX</a> date representing the date-time as the number of
seconds since some origin date-time (see also <a href="https://en.wikipedia.org/wiki/Unix_time">Unix
Time</a>). Using the default Unix
origin time seems to give the correct date conversion:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="w">
</span><span class="n">date_posix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.POSIXct</span><span class="p">(</span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">tz</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"UTC"</span><span class="p">,</span><span class="w"> </span><span class="n">origin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"1970-01-01"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="w">
</span><span class="c1">#> # A tibble: 30 × 4</span><span class="w">
</span><span class="c1">#> date run_time_s player date_posix </span><span class="w">
</span><span class="c1">#> <int> <dbl> <chr> <dttm> </span><span class="w">
</span><span class="c1">#> 1 1306670400 1616 RaikerZ 2011-05-29 12:00:00</span><span class="w">
</span><span class="c1">#> 2 1325246400 1549 RaikerZ 2011-12-30 12:00:00</span><span class="w">
</span><span class="c1">#> 3 1332763200 1531 RaikerZ 2012-03-26 12:00:00</span><span class="w">
</span><span class="c1">#> 4 1349870400 1527 RaikerZ 2012-10-10 12:00:00</span><span class="w">
</span><span class="c1">#> 5 1457179200 1526 GreenUprooter 2016-03-05 12:00:00</span><span class="w">
</span><span class="c1">#> 6 1461585600 1523 Auchgard 2016-04-25 12:00:00</span><span class="w">
</span><span class="c1">#> 7 1461672000 1522 Auchgard 2016-04-26 12:00:00</span><span class="w">
</span><span class="c1">#> 8 1461758400 1519 Auchgard 2016-04-27 12:00:00</span><span class="w">
</span><span class="c1">#> 9 1470744000 1514 Auchgard 2016-08-09 12:00:00</span><span class="w">
</span><span class="c1">#> 10 1471521600 1512 Auchgard 2016-08-18 12:00:00</span><span class="w">
</span><span class="c1">#> # … with 20 more rows</span><span class="w">
</span></code></pre></div></div>
<h2 id="triple-jump-plotting">Triple jump: Plotting</h2>
<p>First, let’s get the data on the panel. I could spend an endless amount
of time tweaking or customizing a plot’s theme, so I do the styling
last. Otherwise, styling would fill up all of the time I’ve set aside to
work on the plot.</p>
<p>We want to draw a point for each particular record-setting event, and we
want to draw a line that connects all of the points.
<a href="https://rdrr.io/pkg/ggplot2/man/geom_path.html"><code class="language-plaintext highlighter-rouge">geom_step()</code></a> draws a line plot but it can move
straight up/down or straight left/right—no diagonal lines—so it’s
what we want. We also want to the color of these geometries to change
with the record holder (<code class="language-plaintext highlighter-rouge">player</code>).</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_step</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-05-24-summoning-salt-plot/plot-oops-1.png" title="A step plot with one line per player. It is not what we want." alt="A step plot with one line per player. It is not what we want." width="80%" style="display: block; margin: auto;" /></p>
<p>Oops! It assumed that we wanted to connected the dots separately for
each color. We have to set the <code class="language-plaintext highlighter-rouge">group</code> aesthetic to a constant value so
there is only one line drawn.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_step</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-05-24-summoning-salt-plot/plot-grouped-correctly-1.png" title="A step plot showing the world record progression. There is a single line and it changes color whenever a new record-holder takes over." alt="A step plot showing the world record progression. There is a single line and it changes color whenever a new record-holder takes over." width="80%" style="display: block; margin: auto;" /></p>
<p>Making the Summoning Salt version is just a matter of theming at this
point. We use <a href="https://rdrr.io/pkg/ggplot2/man/ggtheme.html"><code class="language-plaintext highlighter-rouge">theme_void()</code></a> to completely wipe out
the current theme, and we hide the color legend.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_step</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_void</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">guides</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-05-24-summoning-salt-plot/void-plot-1-1.png" title="A step plot showing the world record progression." alt="A step plot showing the world record progression." width="80%" style="display: block; margin: auto;" /></p>
<p>Next, we are going to use the showtext package to obtain an 8-bit font:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">showtext</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Loading required package: sysfonts</span><span class="w">
</span><span class="c1">#> Loading required package: showtextdb</span><span class="w">
</span><span class="n">font_add_google</span><span class="p">(</span><span class="s2">"Press Start 2P"</span><span class="p">)</span><span class="w">
</span><span class="n">showtext_auto</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>The void theme provides nothing, so we have to specify the main colors,
the axis lines, and the plotting margin. We also crank up the chroma
values to have more intense colors for the black background.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_step</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">guides</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_color_discrete</span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">255</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"World Record Timeline"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_void</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">base_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Press Start 2P"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="w">
</span><span class="n">plot.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w">
</span><span class="n">plot.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">),</span><span class="w">
</span><span class="n">axis.line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_line</span><span class="p">(</span><span class="w">
</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w">
</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w">
</span><span class="c1"># more 8-bit looking lines</span><span class="w">
</span><span class="n">lineend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"square"</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">plot.margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">margin</span><span class="p">(</span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="s2">"pt"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-05-24-summoning-salt-plot/void-plot-2-1.png" title="A step plot showing the world record progression. There is a black background now and an 8-bit looking font." alt="A step plot showing the world record progression. There is a black background now and an 8-bit looking font." width="80%" style="display: block; margin: auto;" /></p>
<p>To keep overlapping points from looking like blobs, we can use a filled
point. For these, <code class="language-plaintext highlighter-rouge">color</code> is used on the border and <code class="language-plaintext highlighter-rouge">fill</code> is used on
the inside. We will set the outline of the points to black and the fill
to the player color. (If you look at more professional data
visualizations, you see this trick frequently with white bordering
around points.) With a new fill aesthetic in place, e have to make sure
that guide for the fill doesn’t appear and that fill and color have the
same color scale.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_step</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">(</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">),</span><span class="w">
</span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">21</span><span class="p">,</span><span class="w">
</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">,</span><span class="w">
</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="c1"># no legend for fill</span><span class="w">
</span><span class="n">guides</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="c1"># fill and color get same scale</span><span class="w">
</span><span class="n">scale_color_discrete</span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">255</span><span class="p">,</span><span class="w"> </span><span class="n">aesthetics</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"color"</span><span class="p">,</span><span class="w"> </span><span class="s2">"fill"</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"World Record Timeline"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_void</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">base_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Press Start 2P"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="w">
</span><span class="n">plot.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w">
</span><span class="n">plot.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">),</span><span class="w">
</span><span class="n">axis.line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_line</span><span class="p">(</span><span class="w">
</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w">
</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w">
</span><span class="n">lineend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"square"</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">plot.margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">margin</span><span class="p">(</span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="s2">"pt"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-05-24-summoning-salt-plot/void-plot-3-1.png" title="A step plot showing the world record progression. The points have been restyled to have a black outline." alt="A step plot showing the world record progression. The points have been restyled to have a black outline." width="80%" style="display: block; margin: auto;" /></p>
<p>Finally, let’s make another version of this figure. How might we make a
more accessible presentation of this information (of who held a record
and when), assuming that we only have a static image? A legend with
players/colors is a nonstarter. We could give each player their own
distinct point shape so that color/shape encode the same information,
but shapes get rough once you have to use more than four of them. We
could use a player’s first letter instead of a point (show an F for
FadeVanity) but the letters quickly overlap.</p>
<p>One idea would be to label the point with an annotation whenever there
is a new record holder.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">showtext_auto</span><span class="p">(</span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="w">
</span><span class="c1"># Remove the country flag annotation from this player</span><span class="w">
</span><span class="n">player2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="w">
</span><span class="n">player</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"[gb/eng]FadeVanity"</span><span class="p">,</span><span class="w">
</span><span class="s2">"FadeVanity"</span><span class="p">,</span><span class="w">
</span><span class="n">player</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="c1"># Record whenever the title holder changes as an "era"</span><span class="w">
</span><span class="n">change</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">lag</span><span class="p">(</span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nf">is.na</span><span class="p">(</span><span class="n">lag</span><span class="p">(</span><span class="n">player</span><span class="p">)),</span><span class="w">
</span><span class="n">era</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">cumsum</span><span class="p">(</span><span class="n">change</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1"># I am going to hardcode some vertical position adjustments for the labels.</span><span class="w">
</span><span class="n">offsets</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">-4</span><span class="p">,</span><span class="w"> </span><span class="m">-3</span><span class="p">,</span><span class="w"> </span><span class="m">-2</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">)</span><span class="w">
</span><span class="n">data_lab</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">era</span><span class="p">)</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="c1"># Label the last point in an era</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="n">run_time_s</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">run_time_s</span><span class="p">))</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">ungroup</span><span class="p">()</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">offsets</span><span class="p">)</span><span class="w">
</span><span class="n">nudge_factor</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">30</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_text</span><span class="p">(</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="w">
</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">player2</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">nudge_factor</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">offset</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w">
</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data_lab</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_segment</span><span class="p">(</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="w">
</span><span class="c1"># i.e., run the line up to .95 of the label's nudging</span><span class="w">
</span><span class="n">yend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">run_time_s</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">nudge_factor</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="m">.95</span><span class="p">,</span><span class="w">
</span><span class="n">xend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date_posix</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data_lab</span><span class="p">,</span><span class="w">
</span><span class="n">linetype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"dashed"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_step</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="c1"># yes, I'm adding forty million seconds to the last datetime</span><span class="w">
</span><span class="n">expand_limits</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">data</span><span class="o">$</span><span class="n">date_posix</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">4e7</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">guides</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_datetime</span><span class="p">(</span><span class="w">
</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">,</span><span class="w">
</span><span class="n">date_breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"2 years"</span><span class="p">,</span><span class="w">
</span><span class="n">date_labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%Y"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="w">
</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"World record"</span><span class="p">,</span><span class="w">
</span><span class="n">breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">21</span><span class="o">:</span><span class="m">27</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="m">60</span><span class="p">,</span><span class="w">
</span><span class="c1"># Show the minutes value with zero-padded seconds</span><span class="w">
</span><span class="n">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="n">sprintf</span><span class="p">(</span><span class="s2">"%d:%02.f"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">%/%</span><span class="w"> </span><span class="m">60</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="m">60</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_minimal</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">14</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="n">plot.margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">margin</span><span class="p">(</span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="s2">"pt"</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-05-24-summoning-salt-plot/informative-plot-1.png" title="A step plot showing the world record progression. The name of the player is next to their point whenever the record changes." alt="A step plot showing the world record progression. The name of the player is next to their point whenever the record changes." width="80%" style="display: block; margin: auto;" /></p>
<hr />
<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2022-05-24-summoning-salt-plot.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:si" role="doc-endnote">
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#> ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> setting value</span><span class="w">
</span><span class="c1">#> version R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#> os Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#> system x86_64, mingw32</span><span class="w">
</span><span class="c1">#> ui RTerm</span><span class="w">
</span><span class="c1">#> language (EN)</span><span class="w">
</span><span class="c1">#> collate English_United States.utf8</span><span class="w">
</span><span class="c1">#> ctype English_United States.utf8</span><span class="w">
</span><span class="c1">#> tz America/Chicago</span><span class="w">
</span><span class="c1">#> date 2022-05-27</span><span class="w">
</span><span class="c1">#> pandoc NA</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> package * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> broom 0.8.0 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> curl 4.3.2 2021-06-23 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> downlit 0.4.0 2021-10-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> dplyr * 1.0.9 2022-04-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> farver 2.1.0 2021-02-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> git2r 0.30.1 2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> haven 2.5.0 2022-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> here 1.0.1 2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> hms 1.1.1 2021-09-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> httr 1.4.3 2022-05-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> knitr * 1.39 2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> labeling 0.4.2 2020-10-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ragg 1.2.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> readr * 2.1.2 2022-01-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> readxl 1.4.0 2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rvest 1.0.2 2021-10-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> showtext * 0.9-5 2022-02-09 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> showtextdb * 3.0 2020-06-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> sysfonts * 0.8.8 2022-03-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tibble * 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyr * 1.2.0 2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#> [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div> </div>
<p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Tristan Mahrtjmahrweb@gmail.com*Cool 8-bit music plays over a montage of me editing R code*The cursed Morgan Stanley Covid-19 visualization2022-03-23T00:00:00-05:002022-03-23T00:00:00-05:00https://tjmahr.github.io/morgan-stanley-cursed-covid-plot<p>Darren Dahly, username <a href="https://twitter.com/statsepi">@statsepi</a>, asked
people on Twitter to share some of their favorite or least favorite data
visualizations from the pandemic. I nominated the notorious <a href="https://twitter.com/WhiteHouseCEA45/status/1257680258364555264">“cubic fit”
‘forecast’</a>
from the Council of Economic Advisers. But then there was the reply
by Travis Whitfill, username
<a href="https://twitter.com/twhitfill">@twhitfill</a>, showing a nightmare of a
figure from a report produced by Morgan Stanley:</p>
<blockquote class="twitter-tweet" data-conversation="none" data-dnt="true"><p lang="en" dir="ltr">I’d like to submit this one from Morgan Stanley 🤦🏻♂️ <a href="https://t.co/D5CYi6zSrT">pic.twitter.com/D5CYi6zSrT</a></p>
<img src="/assets/images/2022-03-morgan-stanley.jpg" alt="A two panel plot showing the current number of Covid-19 patients in ICU beds in 'closed' versus 'open' states." />
<br />
— Travis Whitfill MPH (@twhitfill) <a href="https://twitter.com/twhitfill/status/1505974833217437696?ref_src=twsrc%5Etfw">March 21, 2022</a></blockquote>
<p>The main statistical problem here is the completely inappropriate
“smoothing” line. The panel on the left is really two linear trends: a
steady trend around 8,500 patients until May 6th and a decreasing trend
from 11,000 patients starting on May 7th. Upon seeing data like these
points, I would be inclined to ask, “What changed in the data? Was a new
state added to the dataset? Did the definition of what counts as an ICU
bed change?” The analysts here instead imposed a linear trend on the
points.</p>
<p>Another problem with this plot is rhetorical: it’s tryhard
counterintuitive bullshit. I think analysts will fetishize surprising or
counterintuitive findings, with an attitude of “oh, you would think that
such-and-such is true but the data show us that <em>actually</em> the opposite
is true”. At the time of this plot, our belief was something like
“Covid-19 protections like stay-at-home orders can help flatten the curve
and reduce the spread of the disease and the number of
hospitalizations.” This plot sashays into the room and tells us “well,
according to the data, it’s the states without Covid-19 protections that have
decreasing numbers of ICU patients, and get this: Covid lockdowns make things
worse!”. Granted, I could not find the original report for this
image, so I don’t know how the authors interpreted it in the report’s
narrative. Yet, I can only assume the authors added these linear trend
lines–overriding the default GAM or LOESS smooth used by
<a href="https://rdrr.io/pkg/ggplot2/man/geom_smooth.html"><code class="language-plaintext highlighter-rouge">stat_smooth()</code></a>–to make this particular point.</p>
<p>When I first saw it, this plot <a href="https://twitter.com/tjmahr/status/1506019955661234184">made me
quip</a>: “I hate
statistics now. it’s been a good run. gonna live my days out as a
druid”. But it’s been a few days, and I’m still haunted by this plot.
What did go wrong? Why do the ICU counts shoot upwards like that? So, I
investigated it.</p>
<h2 id="attempt-1-there-is-no-jump">Attempt 1: There is no jump</h2>
<p>I tried to find the original report, searching Google and Twitter for a
report with this image from around May 12, 2020 (when @twhitfill <a href="https://twitter.com/twhitfill/status/1263119423847661569">first
shared it</a>),
but nothing came up. After dredging through a bunch of Morgan Stanley
report PDFs, I noticed that the reports usually had a small number of
authors, so I am wondering whether (and hoping that) the original report
was something more akin to a dashed-off newsletter than a research
report.</p>
<p>Failing to find the original image, I tried to recreate it in R. The
original image credits The COVID Tracking Project, and <a href="https://covidtracking.com/data/download">their downloads
page</a> provides a .csv file with
state-level data. Here we read in just the relevant columns, filter down
to the time range of the cursed image, and plot the total number of
current ICU patients.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="c1"># a helper function to download the data from github</span><span class="w">
</span><span class="c1"># in case you want to play along</span><span class="w">
</span><span class="n">path_blog_data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">file.path</span><span class="p">(</span><span class="w">
</span><span class="s2">"https://raw.githubusercontent.com"</span><span class="p">,</span><span class="w">
</span><span class="s2">"tjmahr/tjmahr.github.io/master/_R/data"</span><span class="p">,</span><span class="w">
</span><span class="n">x</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readr</span><span class="o">::</span><span class="n">read_csv</span><span class="p">(</span><span class="w">
</span><span class="n">path_blog_data</span><span class="p">(</span><span class="s2">"all-states-history.csv"</span><span class="p">),</span><span class="w">
</span><span class="n">col_types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cols</span><span class="p">(</span><span class="w">
</span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_date</span><span class="p">(),</span><span class="w">
</span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_character</span><span class="p">(),</span><span class="w">
</span><span class="n">inIcuCurrently</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_number</span><span class="p">(),</span><span class="w">
</span><span class="n">.default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_skip</span><span class="p">()</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">progress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="w">
</span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-04-28"</span><span class="p">)</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w">
</span><span class="n">date</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-05-11"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">stat_summary</span><span class="p">(</span><span class="n">fun</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"sum"</span><span class="p">,</span><span class="w"> </span><span class="n">geom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"point"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Data from The COVID Project (March 23, 2022)"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Warning: Removed 454 rows containing non-finite values (stat_summary).</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/most-recent-totals-1.png" title="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers steadily decrease from around 14,000 to 12,000." alt="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers steadily decrease from around 14,000 to 12,000." width="80%" style="display: block; margin: auto;" /></p>
<p>There is no jump in ICU patients ❌, and because the jump disappeared
when we used a more recent (and presumably better) version of the
dataset, the jump was probably some kind of artifact.</p>
<p>Out of curiosity, let’s look at the state-by-state data. Because
(<em>spoiler alert</em>) about half the states only have <code class="language-plaintext highlighter-rouge">NA</code> values for this
time period, we will filter out the <code class="language-plaintext highlighter-rouge">NA</code> points and look at the
remaining points.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">inIcuCurrently</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">facet_wrap</span><span class="p">(</span><span class="s2">"state"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Data from The COVID Project (March 23, 2022)"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/most-recent-state-1.png" title="A plot showing the current number of Covid-19 patients in ICU beds in states with available data (around 30)." alt="A plot showing the current number of Covid-19 patients in ICU beds in states with available data (around 30)." width="80%" style="display: block; margin: auto;" /></p>
<p>So, some states have ICU patient data added midway through this window and
many states are completely missing data from this window. The whole
open-versus-closed-states question was doomed from the get-go because we
don’t know what happened in every state.</p>
<h2 id="attempt-2-lets-go-back-in-time">Attempt 2: Let’s go back in time</h2>
<p>If we poke around the COVID Tracking Project’s GitHub repository, we
find a <a href="https://github.com/COVID19Tracking/covid-tracking-data/tree/master/data">folder of data
backups</a>
with a file called <code class="language-plaintext highlighter-rouge">states_daily_4pm_et.csv</code>. This file provides the
same result as the previously loaded data.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readr</span><span class="o">::</span><span class="n">read_csv</span><span class="p">(</span><span class="w">
</span><span class="n">path_blog_data</span><span class="p">(</span><span class="s2">"states_daily_4pm_et.csv"</span><span class="p">),</span><span class="w">
</span><span class="n">col_types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cols</span><span class="p">(</span><span class="w">
</span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_date</span><span class="p">(</span><span class="s2">"%Y%m%d"</span><span class="p">),</span><span class="w">
</span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_character</span><span class="p">(),</span><span class="w">
</span><span class="n">inIcuCurrently</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_number</span><span class="p">(),</span><span class="w">
</span><span class="n">.default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_skip</span><span class="p">()</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">progress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="w">
</span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-04-28"</span><span class="p">)</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w">
</span><span class="n">date</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-05-11"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">stat_summary</span><span class="p">(</span><span class="n">fun</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"sum"</span><span class="p">,</span><span class="w"> </span><span class="n">geom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"point"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Data from The COVID Project (March 23, 2022)"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Warning: Removed 454 rows containing non-finite values (stat_summary).</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/latest-total-1.png" title="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers steadily decrease from around 14,000 to 12,000." alt="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers steadily decrease from around 14,000 to 12,000." width="80%" style="display: block; margin: auto;" /></p>
<p>But because this file is hosted on GitHub, we can go back in time and find
the <a href="https://github.com/COVID19Tracking/covid-tracking-data/blob/5ec9962d5f5f6505bb0593df150ab62867af98f7/data/states_daily_4pm_et.csv">version of the data from
May 12, 2020</a>
and use that file instead.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readr</span><span class="o">::</span><span class="n">read_csv</span><span class="p">(</span><span class="w">
</span><span class="n">path_blog_data</span><span class="p">(</span><span class="s2">"2020-05-12-states_daily_4pm_et.csv"</span><span class="p">),</span><span class="w">
</span><span class="n">col_types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cols</span><span class="p">(</span><span class="w">
</span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_date</span><span class="p">(</span><span class="s2">"%Y%m%d"</span><span class="p">),</span><span class="w">
</span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_character</span><span class="p">(),</span><span class="w">
</span><span class="n">inIcuCurrently</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_number</span><span class="p">(),</span><span class="w">
</span><span class="n">.default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col_skip</span><span class="p">()</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">progress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="w">
</span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-04-28"</span><span class="p">)</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w">
</span><span class="n">date</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2020-05-11"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">stat_summary</span><span class="p">(</span><span class="n">fun</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"sum"</span><span class="p">,</span><span class="w"> </span><span class="n">geom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"point"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Data from The COVID19 Project (May 12, 2020)"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Warning: Removed 477 rows containing non-finite values (stat_summary).</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/old-total-1.png" title="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers hover around 9000 and then rapidly jump to over 12000 after May 7." alt="A plot showing the total number of Covid-19 patients in ICU beds from April 28, 2020 to May 11, 2020. The numbers hover around 9000 and then rapidly jump to over 12000 after May 7." width="80%" style="display: block; margin: auto;" /></p>
<p>There it is: the jump ICU patients on May 7th ✔️. Let’s look at the
state-by-state data:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">inIcuCurrently</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">facet_wrap</span><span class="p">(</span><span class="s2">"state"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Data from The COVID Project (May 12, 2020)"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/old-state-1.png" title="A plot showing the current number of Covid-19 patients in ICU beds in states with available data (around 25). Of note is the New York which only has 5 points and they are all above 2000." alt="A plot showing the current number of Covid-19 patients in ICU beds in states with available data (around 25). Of note is the New York which only has 5 points and they are all above 2000." width="80%" style="display: block; margin: auto;" /></p>
<p>Look at New York (NY)! That’s the jump in original plot. New York had a
large number of ICU patients but their data only became available on
May 7th, giving the spurious increase in ICU patients.</p>
<p>By adding incomplete data from NY to the rest of the states, the analyst
effectively treated all of the missing points in the NY panel as zeros.</p>
<h2 id="what-could-they-have-done-differently">What could they have done differently?</h2>
<p>It’s fun to complain about haunted plots, but I will try to be
constructive for a moment. How would a fixed version of this plot look?</p>
<p><strong>Option 1: Don’t do it.</strong> Given all the missing and incomplete data,
it’s just not worth it to make this plot.</p>
<p><strong>Option 2: Don’t aggregate.</strong> Or we might embrace the missingness, and
show all and only the data we have. Here is a sketch of this kind of
approach. We will show individual state data and provide labels for the
states that stand out from the pack. We will also note the number of
missing lines in the caption.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data_for_plot</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">inIcuCurrently</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">state</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">state_icu_max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">inIcuCurrently</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">ungroup</span><span class="p">()</span><span class="w">
</span><span class="n">total_regions</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data</span><span class="o">$</span><span class="n">state</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">unique</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="nf">length</span><span class="p">()</span><span class="w">
</span><span class="n">plotted_regions</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data_for_plot</span><span class="o">$</span><span class="n">state</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">unique</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="nf">length</span><span class="p">()</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">data_for_plot</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">inIcuCurrently</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geomtextpath</span><span class="o">::</span><span class="n">geom_textline</span><span class="p">(</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">state</span><span class="p">,</span><span class="w"> </span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">state</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">state</span><span class="p">),</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">state_icu_max</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">250</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geomtextpath</span><span class="o">::</span><span class="n">scale_hjust_discrete</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_line</span><span class="p">(</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">state</span><span class="p">),</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">state_icu_max</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="m">250</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Current patients in ICU"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">glue</span><span class="o">::</span><span class="n">glue</span><span class="p">(</span><span class="w">
</span><span class="s2">"
Data from The COVID Project (May 12, 2020).
No data available for {total_regions - plotted_regions} states/territories.
"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">14</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-23-morgan-stanley-cursed-covid-plot/try-to-fix-it-1.png" title="An attempt to fix the plot that uses the bad data. It shows one line per included state. In the middle of the line is the abbreviation for the state. In the top right, we can see the NY line dominating the rest of the lines. The caption notes the number of missing states/territories." alt="An attempt to fix the plot that uses the bad data. It shows one line per included state. In the middle of the line is the abbreviation for the state. In the top right, we can see the NY line dominating the rest of the lines. The caption notes the number of missing states/territories." width="80%" style="display: block; margin: auto;" /></p>
<p>And then we can put the linear regression “smooth” on it. 🙃</p>
<h2 id="update-notes-from-the-tracking-project-trenches-mar-24-2022">Update: Notes from the Tracking Project trenches [<em>Mar. 24, 2022</em>]</h2>
<p>After releasing this post, COVID Tracking Project alum Quang Nguyen
<a href="https://twitter.com/quangpmnguyen/status/1506807264295936002">shared some behind the scenes
details</a>
of what happened around May 7th, 2020. I will repost the Twitter thread here:</p>
<blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr">OMG <a href="https://twitter.com/COVID19Tracking?ref_src=twsrc%5Etfw">@COVID19Tracking</a> history lesson (short 🧵)!! First, shoutout to our data infrastructure folks <a href="https://twitter.com/zachlipton?ref_src=twsrc%5Etfw">@zachlipton</a> <a href="https://twitter.com/JuliaKodysh?ref_src=twsrc%5Etfw">@JuliaKodysh</a> for the GitHub archive! Second, I actually dug through the slack to figure out what happened (jokes on me, I was shift lead that day). <a href="https://t.co/7U6LOm8HKE">https://t.co/7U6LOm8HKE</a> <a href="https://t.co/iXdPO9EV6A">pic.twitter.com/iXdPO9EV6A</a></p>
— Quang Nguyen (@quangpmnguyen) <a href="https://twitter.com/quangpmnguyen/status/1506807264295936002?ref_src=twsrc%5Etfw">March 24, 2022</a></blockquote>
<blockquote class="twitter-tweet" data-conversation="none" data-dnt="true"><p lang="en" dir="ltr">The problem was, back in May 2020, the only way you can get hospitalization data for the state of NY was to take low-res screenshots of the governor's presentation and then try to piece the information together (also shoutout to <a href="https://twitter.com/justinhendrix?ref_src=twsrc%5Etfw">@justinhendrix</a> for watching these press conf.). <a href="https://t.co/5UnGP1RUox">pic.twitter.com/5UnGP1RUox</a></p>
<img src="/assets/images/2022-03-cuomo.jpg" alt="A screenshot of a Slack post of two screenshots of a Cuomo Covid update showing statistics drawn on hard-to-read plots in the background." />
<br />
— Quang Nguyen (@quangpmnguyen) <a href="https://twitter.com/quangpmnguyen/status/1506807269752815617?ref_src=twsrc%5Etfw">March 24, 2022</a></blockquote>
<blockquote class="twitter-tweet" data-conversation="none" data-dnt="true"><p lang="en" dir="ltr">Using this weird graph, we actually tried to back-calculate total hospitalization numbers, but unfortunately, it was super messy and nothing came out of it. This source also doesn't have current ICU numbers.</p>— Quang Nguyen (@quangpmnguyen) <a href="https://twitter.com/quangpmnguyen/status/1506807271698968579?ref_src=twsrc%5Etfw">March 24, 2022</a></blockquote>
<blockquote class="twitter-tweet" data-conversation="none" data-dnt="true"><p lang="en" dir="ltr">We actually found a new source from Twitter (!!) who apparently got these numbers from a press email list from the governor (??). May 7th was the first day where we got data directly from the email list, which was the BLIP in total ICU data that made it onto the disastrous graph.</p>— Quang Nguyen (@quangpmnguyen) <a href="https://twitter.com/quangpmnguyen/status/1506807272990773248?ref_src=twsrc%5Etfw">March 24, 2022</a></blockquote>
<blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr">The bottom line is: data from 2020 was a mess, and don't trust anything that came out of it. A group of volunteers taped it together using nothing but hot glue and scotch tape.</p>— Quang Nguyen (@quangpmnguyen) <a href="https://twitter.com/quangpmnguyen/status/1506807274211270662?ref_src=twsrc%5Etfw">March 24, 2022</a></blockquote>
<p>The fact they had to pull numbers from the graphs in the Governor’s
Covid briefings is an important reminder that high-quality Covid-19 was
hard to come by at the start of the pandemic (<a href="https://www.nytimes.com/2022/03/15/nyregion/nursing-home-deaths-cuomo-covid.html">especially from the Cuomo
administration</a>).
We needed something like the COVID Tracking Project where volunteers
would go to heroic lengths to curate data.</p>
<hr />
<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2022-03-23-morgan-stanley-cursed-covid-plot.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:si" role="doc-endnote">
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#> ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> setting value</span><span class="w">
</span><span class="c1">#> version R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#> os Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#> system x86_64, mingw32</span><span class="w">
</span><span class="c1">#> ui RTerm</span><span class="w">
</span><span class="c1">#> language (EN)</span><span class="w">
</span><span class="c1">#> collate English_United States.utf8</span><span class="w">
</span><span class="c1">#> ctype English_United States.utf8</span><span class="w">
</span><span class="c1">#> tz America/Chicago</span><span class="w">
</span><span class="c1">#> date 2022-05-27</span><span class="w">
</span><span class="c1">#> pandoc NA</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> package * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> bit 4.0.4 2020-08-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> broom 0.8.0 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> curl 4.3.2 2021-06-23 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> downlit 0.4.0 2021-10-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> dplyr * 1.0.9 2022-04-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> farver 2.1.0 2021-02-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> geomtextpath 0.1.0 2022-01-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> git2r 0.30.1 2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> haven 2.5.0 2022-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> here 1.0.1 2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> hms 1.1.1 2021-09-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> httr 1.4.3 2022-05-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> knitr * 1.39 2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> labeling 0.4.2 2020-10-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ragg 1.2.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> readr * 2.1.2 2022-01-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> readxl 1.4.0 2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rvest 1.0.2 2021-10-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tibble * 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyr * 1.2.0 2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> vroom 1.5.7 2021-11-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#> [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div> </div>
<p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Tristan Mahrtjmahrweb@gmail.comWhat went wrong?Self-documenting plots in ggplot22022-03-10T00:00:00-06:002022-03-10T00:00:00-06:00https://tjmahr.github.io/self-titled-ggplot2-plots<p>When I am showing off a plotting technique in
<a href="https://ggplot2.tidyverse.org/">ggplot2</a>, I sometimes like to include
the R code that produced the plot <em>as part of the plot</em>. Here is an
example I made to demonstrate the <code class="language-plaintext highlighter-rouge">debug</code> parameter in
<a href="https://rdrr.io/pkg/ggplot2/man/element.html"><code class="language-plaintext highlighter-rouge">element_text()</code></a>:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">self_document</span><span class="p">(</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="n">axis.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">debug</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/basic-example-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image. The plot theme includes yellow shading and points in the x and y axis titles." alt="A ggplot2 plot of a histogram with the plotting code above the image. The plot theme includes yellow shading and points in the x and y axis titles." width="80%" style="display: block; margin: auto;" /></p>
<p>Let’s call these “self-documenting plots”. If we’re feeling nerdy, we
might also call them “qquines”, although they are not true
<a href="https://en.wikipedia.org/wiki/Quine_%28computing%29">quines</a>.</p>
<p>In this post, we will build up a <code class="language-plaintext highlighter-rouge">self_document()</code> function from scratch. Here are
the problems we need to sort out:</p>
<ul>
<li>how to put plotting code above a title</li>
<li>how to capture plotting code and convert it into text</li>
</ul>
<h2 id="creating-the-code-annotation">Creating the code annotation</h2>
<p>As a first step, let’s just treat our plotting code as a string that
is ready to use for annotation.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_text</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s1">'ggplot(mtcars, aes(x = mpg)) +
geom_histogram(bins = 20, color = "white") +
labs(title = "A basic histogram")'</span><span class="w">
</span><span class="n">p_plot</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>In order to have a titled plot along with this annotation, we need some
way to combine these two graphical objects together (the code and the
plot produced by ggplot2). I like the
<a href="https://patchwork.data-imaginist.com/articles/patchwork.html">patchwork</a>
package for this job. Here we use
<a href="https://patchwork.data-imaginist.com/reference/wrap_elements.html"><code class="language-plaintext highlighter-rouge">wrap_elements()</code></a> to capture the plot into a
“patch” that patchwork can annotate.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">patchwork</span><span class="p">)</span><span class="w">
</span><span class="n">wrap_elements</span><span class="p">(</span><span class="n">p_plot</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">plot_annotation</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p_text</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-strings-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image. Here the title is in the default font." alt="A ggplot2 plot of a histogram with the plotting code above the image. Here the title is in the default font." width="50%" style="display: block; margin: auto;" /></p>
<p>Let’s style this title to use a monospaced font. I use Windows and like
Consolas, so I will use that font.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Use default mono font if "Consolas" is not available</span><span class="w">
</span><span class="n">extrafont</span><span class="o">::</span><span class="n">loadfonts</span><span class="p">(</span><span class="n">device</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"win"</span><span class="p">,</span><span class="w"> </span><span class="n">quiet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">monofont</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="w">
</span><span class="n">extrafont</span><span class="o">::</span><span class="n">choose_font</span><span class="p">(</span><span class="s2">"Consolas"</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w">
</span><span class="s2">"mono"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Consolas"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">title_theme</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">theme</span><span class="p">(</span><span class="w">
</span><span class="n">plot.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="w">
</span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">monofont</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rel</span><span class="p">(</span><span class="m">.9</span><span class="p">),</span><span class="w">
</span><span class="n">margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">margin</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">5.5</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"pt"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">wrap_elements</span><span class="p">(</span><span class="n">p_plot</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">plot_annotation</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p_text</span><span class="p">,</span><span class="w"> </span><span class="n">theme</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">title_theme</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-strings-consolas-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image. Here the title is in Consolas." alt="A ggplot2 plot of a histogram with the plotting code above the image. Here the title is in Consolas." width="50%" style="display: block; margin: auto;" /></p>
<p>One problem with this setup is that the plotting code has to be edited
in two places: the plot <code class="language-plaintext highlighter-rouge">p_plot</code> and the title <code class="language-plaintext highlighter-rouge">p_text</code>. As a result,
it’s easy for these two pieces of code to fall out of sync with each
other, turning our self-documenting plot into a lying liar plot.</p>
<p>The solution is pretty easy: Tell R that <code class="language-plaintext highlighter-rouge">p_text</code> is code with
<a href="https://rdrr.io/r/base/parse.html"><code class="language-plaintext highlighter-rouge">parse()</code></a> and evaluate the code with
<a href="https://rdrr.io/r/base/eval.html"><code class="language-plaintext highlighter-rouge">eval()</code></a>:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">wrap_elements</span><span class="p">(</span><span class="n">eval</span><span class="p">(</span><span class="n">parse</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p_text</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">plot_annotation</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p_text</span><span class="p">,</span><span class="w"> </span><span class="n">theme</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">title_theme</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-strings-consolas-eval-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>
<p>This <em>works</em>. It gets the job done. But we find ourselves in a clumsy
workflow, either having to edit R code inside of quotes or editing the
plot interactively and then having to wrap it in quotes. Let’s do better.</p>
<h2 id="capturing-plotting-code-as-a-string">Capturing plotting code as a string</h2>
<p>Time for some <em>nonstandard evaluation</em>. I will use the
<a href="https://rlang.r-lib.org/">rlang</a> package, although in principle we
could use functions in base R to accomplish these goals.</p>
<p>First, we are going to use <a href="https://rdrr.io/pkg/rlang/man/expr.html"><code class="language-plaintext highlighter-rouge">rlang::expr()</code></a> to
capture/quote/<a href="https://rlang.r-lib.org/reference/topic-defuse.html">defuse</a>
the R code as an expression. We can print the code as code, print it as
text, and use <code class="language-plaintext highlighter-rouge">eval()</code> to show the plot.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_code</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr</span><span class="p">(</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1"># print the expressions</span><span class="w">
</span><span class="n">p_code</span><span class="w">
</span><span class="c1">#> ggplot(mtcars, aes(x = mpg)) + geom_histogram(bins = 20, color = "white") + </span><span class="w">
</span><span class="c1">#> labs(title = "A basic histogram")</span><span class="w">
</span><span class="c1"># expression => text</span><span class="w">
</span><span class="n">rlang</span><span class="o">::</span><span class="n">expr_text</span><span class="p">(</span><span class="n">p_code</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] "ggplot(mtcars, aes(x = mpg)) + geom_histogram(bins = 20, color = \"white\") + \n labs(title = \"A basic histogram\")"</span><span class="w">
</span><span class="n">eval</span><span class="p">(</span><span class="n">p_code</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-code-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>
<p>Then, it should be straightforward to make the self-documenting plot, right?</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_code</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr</span><span class="p">(</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">wrap_elements</span><span class="p">(</span><span class="n">eval</span><span class="p">(</span><span class="n">p_code</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">plot_annotation</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr_text</span><span class="p">(</span><span class="n">p_code</span><span class="p">),</span><span class="w"> </span><span class="n">theme</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">title_theme</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-code-eval-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image. In this case, the title is mostly on one line and some text is cut off from the image." alt="A ggplot2 plot of a histogram with the plotting code above the image. In this case, the title is mostly on one line and some text is cut off from the image." width="50%" style="display: block; margin: auto;" /></p>
<p>Hey, it reformatted the title! Indeed, in the process of capturing the
code, the code formatting was lost. To get something closer to the
source code we provided, we have to reformat the captured code before we
print it.</p>
<p>The <a href="https://styler.r-lib.org/">styler</a> package provides a suite of
functions for reformatting code. We can define our own coding
styles/formatting rules to customize how styler works. I like the styler
rules used by Garrick Aden-Buie in his
<a href="https://github.com/gadenbuie/grkstyle">grkstyle</a> package, so I will use
<code class="language-plaintext highlighter-rouge">grkstyle::grk_style_text()</code> to reformat the code.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">p_code</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr</span><span class="p">(</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">wrap_elements</span><span class="p">(</span><span class="n">eval</span><span class="p">(</span><span class="n">p_code</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">plot_annotation</span><span class="p">(</span><span class="w">
</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr_text</span><span class="p">(</span><span class="n">p_code</span><span class="p">)</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">grkstyle</span><span class="o">::</span><span class="n">grk_style_text</span><span class="p">()</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="c1"># reformatting returns a vector of lines,</span><span class="w">
</span><span class="c1"># so we have to combine them</span><span class="w">
</span><span class="n">paste0</span><span class="p">(</span><span class="n">collapse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"\n"</span><span class="p">),</span><span class="w">
</span><span class="n">theme</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">title_theme</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/from-code-eval-style-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>
<h2 id="putting-it-all-together">Putting it all together</h2>
<p>When we write our <code class="language-plaintext highlighter-rouge">self_document()</code> function, the only change we have to
make is using <a href="https://rdrr.io/pkg/rlang/man/defusing-advanced.html"><code class="language-plaintext highlighter-rouge">rlang::enexpr()</code></a> instead <code class="language-plaintext highlighter-rouge">rlang::expr()</code>. The
en-variant is used when we want to <em>en</em>-quote exactly what the user
provided. Aside from that change, our <code class="language-plaintext highlighter-rouge">self_document()</code> function just bundles together all of the code we developed above:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">self_document</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">monofont</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="w">
</span><span class="n">extrafont</span><span class="o">::</span><span class="n">choose_font</span><span class="p">(</span><span class="s2">"Consolas"</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w">
</span><span class="s2">"mono"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Consolas"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">p</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">enexpr</span><span class="p">(</span><span class="n">expr</span><span class="p">)</span><span class="w">
</span><span class="n">title</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rlang</span><span class="o">::</span><span class="n">expr_text</span><span class="p">(</span><span class="n">p</span><span class="p">)</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">grkstyle</span><span class="o">::</span><span class="n">grk_style_text</span><span class="p">()</span><span class="w"> </span><span class="o">|></span><span class="w">
</span><span class="n">paste0</span><span class="p">(</span><span class="n">collapse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"\n"</span><span class="p">)</span><span class="w">
</span><span class="n">patchwork</span><span class="o">::</span><span class="n">wrap_elements</span><span class="p">(</span><span class="n">eval</span><span class="p">(</span><span class="n">p</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">patchwork</span><span class="o">::</span><span class="n">plot_annotation</span><span class="p">(</span><span class="w">
</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">title</span><span class="p">,</span><span class="w">
</span><span class="n">theme</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">theme</span><span class="p">(</span><span class="w">
</span><span class="n">plot.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="w">
</span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">monofont</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rel</span><span class="p">(</span><span class="m">.9</span><span class="p">),</span><span class="w">
</span><span class="n">margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">margin</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">5.5</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"pt"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>And let’s confirm that it works.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">self_document</span><span class="p">(</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/final-demo-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>
<p>Because we developed this function on top of rlang, we can do some tricks like
injecting a variable’s value when capturing the code. For example, here I
use <code class="language-plaintext highlighter-rouge">!! color</code> to replace the <code class="language-plaintext highlighter-rouge">color</code> variable with the actual value.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">color</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"white"</span><span class="w">
</span><span class="n">self_document</span><span class="p">(</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">!!</span><span class="w"> </span><span class="n">color</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/final-demo-inject-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>
<p>And if you are wondering, yes, we can <code class="language-plaintext highlighter-rouge">self_document()</code> a
<code class="language-plaintext highlighter-rouge">self_document()</code> plot.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">self_document</span><span class="p">(</span><span class="w">
</span><span class="n">self_document</span><span class="p">(</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/final-demo-self-document-1.png" title="A self_document() plot of a plot of a histogram with the plotting code above the image. There are two sets of code on top of each other." alt="A self_document() plot of a plot of a histogram with the plotting code above the image. There are two sets of code on top of each other." width="50%" style="display: block; margin: auto;" /></p>
<h2 id="alas-comments-are-lost">Alas, comments are lost</h2>
<p>One downside of this approach is that helpful comments are lost.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">self_document</span><span class="p">(</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">!!</span><span class="w"> </span><span class="n">color</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="c1"># get rid of that grey</span><span class="w">
</span><span class="n">theme_minimal</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2022-03-10-self-titled-ggplot2-plots/final-demo-no-comments-1.png" title="A ggplot2 plot of a histogram with the plotting code above the image." alt="A ggplot2 plot of a histogram with the plotting code above the image." width="50%" style="display: block; margin: auto;" /></p>
<p>I am not sure how to include comments. One place where comments are stored
and printed is in function bodies:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mpg</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">bins</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">!!</span><span class="w"> </span><span class="n">color</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="c1"># get rid of that grey</span><span class="w">
</span><span class="n">theme_minimal</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"A basic histogram"</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">print</span><span class="p">(</span><span class="n">f</span><span class="p">,</span><span class="w"> </span><span class="n">useSource</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1">#> function() {</span><span class="w">
</span><span class="c1">#> ggplot(mtcars, aes(x = mpg)) +</span><span class="w">
</span><span class="c1">#> geom_histogram(bins = 20, color = !! color) +</span><span class="w">
</span><span class="c1">#> # get rid of that grey</span><span class="w">
</span><span class="c1">#> theme_minimal() +</span><span class="w">
</span><span class="c1">#> labs(title = "A basic histogram")</span><span class="w">
</span><span class="c1">#> }</span><span class="w">
</span><span class="c1">#> <environment: 0x00000222d313b848></span><span class="w">
</span></code></pre></div></div>
<p>I have no idea how to go about exploiting this feature for
self-documenting plots, however.</p>
<hr />
<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2022-03-10-self-titled-ggplot2-plots.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:si" role="doc-endnote">
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#> ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> setting value</span><span class="w">
</span><span class="c1">#> version R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#> os Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#> system x86_64, mingw32</span><span class="w">
</span><span class="c1">#> ui RTerm</span><span class="w">
</span><span class="c1">#> language (EN)</span><span class="w">
</span><span class="c1">#> collate English_United States.utf8</span><span class="w">
</span><span class="c1">#> ctype English_United States.utf8</span><span class="w">
</span><span class="c1">#> tz America/Chicago</span><span class="w">
</span><span class="c1">#> date 2022-05-27</span><span class="w">
</span><span class="c1">#> pandoc NA</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> package * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> downlit 0.4.0 2021-10-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> dplyr 1.0.9 2022-04-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> extrafont 0.18 2022-04-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> extrafontdb 1.0 2012-06-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> farver 2.1.0 2021-02-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> git2r 0.30.1 2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> grkstyle 0.0.3 2022-05-25 [1] Github (gadenbuie/grkstyle@6a7011c)</span><span class="w">
</span><span class="c1">#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> here 1.0.1 2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> knitr * 1.39 2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> labeling 0.4.2 2020-10-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> patchwork * 1.1.1 2020-12-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ragg 1.2.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> Rttf2pt1 1.3.10 2022-02-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> styler 1.7.0 2022-03-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#> [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div> </div>
<p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Tristan Mahrtjmahrweb@gmail.comIncluding plotting code as an annotation on a plotCustom syntax highlighting themes in RMarkdown (and pandoc)2021-11-17T00:00:00-06:002021-11-17T00:00:00-06:00https://tjmahr.github.io/custom-highlighting-pandoc-rmarkdown<p>I recently developed and released an R package called
<a href="https://github.com/tjmahr/solarizeddocx" title="GitHub page for solarizeddocx">solarizeddocx</a>. It provides <code class="language-plaintext highlighter-rouge">solarizeddocx::document()</code>, an
<a href="https://rmarkdown.rstudio.com/">RMarkdown</a> output format for
<a href="https://github.com/altercation/solarized" title="GitHub page for solarized">solarized</a>-highlighted Microsoft Word documents . The image below
shows a comparison of the solarizeddocx and the default docx format:</p>
<figure class="" style="max-width: 100%; display: block; margin: 2em auto;">
<img src="/assets/images/2021-11-solarized.png" alt="Side-by-side comparison of solarizeddocx::document() and rmarkdown::word_document()" /><figcaption>
Side-by-side comparison of <code class="language-plaintext highlighter-rouge">solarizeddocx::document()</code> and <a href="https://pkgs.rstudio.com/rmarkdown/reference/word_document.html"><code class="language-plaintext highlighter-rouge">rmarkdown::word_document()</code></a>.
</figcaption></figure>
<p>The package provides a demo document which is essentially a vignette
where I describe all the customizations used by the package and put the
syntax highlighting to the test. The demo can be rendered and viewed
with:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># install.packages("devtools")</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"tjmahr/solarizeddocx"</span><span class="p">)</span><span class="w">
</span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">demo_document</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>
<p>The format can used in RMarkdown document via YAML metadata.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>output:
solarizeddocx::document: default
</code></pre></div></div>
<p>Or explicitly with rmarkdown:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rmarkdown</span><span class="o">::</span><span class="n">render</span><span class="p">(</span><span class="w">
</span><span class="s2">"README.Rmd"</span><span class="p">,</span><span class="w">
</span><span class="n">output_format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">document</span><span class="p">()</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>solarizeddocx also exports its document assets so that they can be used
in other output formats, and it exports theme-building tools to create
new <a href="https://pandoc.org/MANUAL.html" title="Pandoc User's Guide">pandoc</a> syntax highlighting themes. I am most proud of these
features, so I will demonstrate each of these in turn and create a brand
new syntax highlighting theme in this post.</p>
<h2 id="knitr-rmd-to-md-conversion">knitr: .Rmd to .md conversion</h2>
<p>To give a simplified description, RMarkdown works by knitting the code
in an RMarkdown (.Rmd) file with <a href="https://yihui.org/knitr/" title="knitr homepage">knitr</a> to obtain a markdown (.md)
file and then post-processing this knitr output with other tools. In
particular, it uses pandoc which converts between all kinds of document
formats. For this demonstration, we will do the knitting and pandoc
steps separately without relying on RMarkdown. That said, the options we
pass to pandoc can usually be used in RMarkdown (as we demonstrate at
the very end of this post).</p>
<p>Our input file is a small .Rmd file. It’s very basic, meant to
illustrate some function calls, strings, numbers, code comments and
output.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```
Fit a model with `lm`():
```{r}
model <- lm(mpg ~ 1 + cyl, mtcars)
coefs <- coef(model)
# prediction for 8 cylinders
coefs["(Intercept)"] + 8 * coefs["cyl"]
predict(model, data.frame(cyl = 8L))
```
</code></pre></div></div>
<p>We <a href="https://rdrr.io/pkg/knitr/man/knit.html"><code class="language-plaintext highlighter-rouge">knit()</code></a> the document to run the code and store results
in a markdown file. (Actually, we use <a href="https://rdrr.io/pkg/knitr/man/knit_child.html"><code class="language-plaintext highlighter-rouge">knit_child()</code></a>
because I was getting some weird using-<code class="language-plaintext highlighter-rouge">knit()</code>-inside-of-<code class="language-plaintext highlighter-rouge">knit()</code>
issues when rendering this post. But in general, we would <code class="language-plaintext highlighter-rouge">knit()</code>.)</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">md_file</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tempfile</span><span class="p">(</span><span class="n">fileext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">".md"</span><span class="p">)</span><span class="w">
</span><span class="n">knit_func</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="nf">interactive</span><span class="p">())</span><span class="w"> </span><span class="n">knitr</span><span class="o">::</span><span class="n">knit</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="n">knitr</span><span class="o">::</span><span class="n">knit_child</span><span class="w">
</span><span class="n">knit_func</span><span class="p">(</span><span class="w">
</span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">file_code_block</span><span class="p">(),</span><span class="w">
</span><span class="n">output</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">md_file</span><span class="p">,</span><span class="w">
</span><span class="n">quiet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>This is the content of the file.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
Fit a model with `lm`():
```r
model <- lm(mpg ~ 1 + cyl, mtcars)
coefs <- coef(model)
# prediction for 8 cylinders
coefs["(Intercept)"] + 8 * coefs["cyl"]
#> (Intercept)
#> 14.87826
predict(model, data.frame(cyl = 8L))
#> 1
#> 14.87826
```
</code></pre></div></div>
<h2 id="pandoc-md-to-everything-conversion">pandoc: .md to <em>everything</em> conversion</h2>
<p>Everything we do with syntax highlighting occurs at this point when we
have an .md file. For this demo, we will use pandoc to convert this .md
file to an HTML document.</p>
<p>To make life easier, let’s set up a workflow for quickly converting a
.md file to an HTML document and taking a screenshot of the document.
<code class="language-plaintext highlighter-rouge">run_pandoc()</code> is a wrapper over
<a href="https://pkgs.rstudio.com/rmarkdown/reference/pandoc_convert.html"><code class="language-plaintext highlighter-rouge">rmarkdown::pandoc_convert()</code></a> but hard-codes
some output options and lets us more easily forward <code class="language-plaintext highlighter-rouge">options</code> to pandoc
using <code class="language-plaintext highlighter-rouge">...</code>.s <code class="language-plaintext highlighter-rouge">page_thumbnail()</code> is a wrapper over
<a href="http://wch.github.io/webshot/reference/webshot.html"><code class="language-plaintext highlighter-rouge">webshot::webshot()</code></a> with some predefined output
options. <code class="language-plaintext highlighter-rouge">pd_style()</code> and <code class="language-plaintext highlighter-rouge">pd_syntax()</code> are helpers we will use later
for setting pandoc options.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">run_pandoc</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">output</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tempfile</span><span class="p">(</span><span class="n">fileext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">".html"</span><span class="p">)</span><span class="w">
</span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">pandoc_convert</span><span class="p">(</span><span class="w">
</span><span class="n">input</span><span class="p">,</span><span class="w">
</span><span class="n">to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"html5"</span><span class="p">,</span><span class="w">
</span><span class="n">output</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">output</span><span class="p">,</span><span class="w">
</span><span class="n">options</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="s2">"--standalone"</span><span class="p">,</span><span class="w">
</span><span class="n">...</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">output</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">webshot</span><span class="o">::</span><span class="n">webshot</span><span class="p">(</span><span class="w">
</span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">url</span><span class="p">,</span><span class="w">
</span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">file</span><span class="p">,</span><span class="w">
</span><span class="n">vwidth</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">500</span><span class="p">,</span><span class="w">
</span><span class="n">vheight</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">350</span><span class="p">,</span><span class="w">
</span><span class="n">zoom</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">pd_style</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"--highlight-style"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="n">pd_syntax</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"--syntax-definition"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="c1"># Update from May 2022: Make file paths into urls</span><span class="w">
</span><span class="n">url_file</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="s2">"file://localhost/"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>These tools let us preview the default syntax highlighting in pandoc:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">results</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">run_pandoc</span><span class="p">(</span><span class="n">md_file</span><span class="p">,</span><span class="w"> </span><span class="n">pd_style</span><span class="p">(</span><span class="s2">"tango"</span><span class="p">))</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">results</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot1.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot1-1.png" title="Screenshot of html file created by pandoc" alt="Screenshot of html file created by pandoc" width="80%" style="display: block; margin: auto;" /></p>
<h2 id="setting-pandoc-options">Setting pandoc options</h2>
<p>Here is the pandoc HTML output but this time using my solarized (light)
highlighting style:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">theme_sl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">file_solarized_light_theme</span><span class="p">()</span><span class="w">
</span><span class="n">results</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">run_pandoc</span><span class="p">(</span><span class="n">md_file</span><span class="p">,</span><span class="w"> </span><span class="n">pd_style</span><span class="p">(</span><span class="n">theme_sl</span><span class="p">))</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">results</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot2.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot2-1.png" title="Screenshot of html file created by pandoc. It now has solarized colors." alt="Screenshot of html file created by pandoc. It now has solarized colors." width="80%" style="display: block; margin: auto;" /></p>
<p>By convention, we see two kinds of comment lines: actual code comments
(<code class="language-plaintext highlighter-rouge">#</code>) and R output (<code class="language-plaintext highlighter-rouge">#></code>). The <code class="language-plaintext highlighter-rouge">#></code> comments helpful because I can copy
a whole code block (output included) and run it in R without that output
being interpreted as code. But <strong>these comments represent two different
kinds of information</strong>, and I’d like them to be styled differently. The
<code class="language-plaintext highlighter-rouge">#</code> code comments can stay unintrusive (light italic type), but the <code class="language-plaintext highlighter-rouge">#></code>
out comments should be legible (darker roman type).</p>
<p>To treat these two type of comments differently, I modified the <a href="https://github.com/KDE/syntax-highlighting/blob/master/data/syntax/r.xml" title="GitHub page for the r.xml syntax definition">R
syntax definition</a> used by pandoc to recognize <code class="language-plaintext highlighter-rouge">#</code> and <code class="language-plaintext highlighter-rouge">#></code>
as different entities. We can pass that syntax definition to pandoc:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">syntax_sl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">file_syntax_definition</span><span class="p">()</span><span class="w">
</span><span class="n">results</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">run_pandoc</span><span class="p">(</span><span class="w">
</span><span class="n">md_file</span><span class="p">,</span><span class="w">
</span><span class="n">pd_style</span><span class="p">(</span><span class="n">theme_sl</span><span class="p">),</span><span class="w">
</span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">results</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot3.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot3-1.png" title="Screenshot of html file created by pandoc. It now has solarized colors and differently styled #> comments." alt="Screenshot of html file created by pandoc. It now has solarized colors and differently styled #> comments." width="80%" style="display: block; margin: auto;" /></p>
<h2 id="creating-a-theme-from-scratch">Creating a theme from scratch</h2>
<p>Maybe you’re thinking, <em>that’s cool… if you like solarized. What about
something fun like Fairy Floss?</em> Okay, fine, let’s make <a href="https://github.com/sailorhg/fairyfloss">Fairy
Floss</a>… right now… in this
blog post.</p>
<p>First, let’s store the Fairy Floss colors in a handy list:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ff_colors</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">gold</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#e6c000"</span><span class="p">,</span><span class="w">
</span><span class="n">yellow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#ffea00"</span><span class="p">,</span><span class="w">
</span><span class="n">dark_purple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#5a5475"</span><span class="p">,</span><span class="w">
</span><span class="n">white</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#f8f8f2"</span><span class="p">,</span><span class="w">
</span><span class="n">pink</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#ffb8d1"</span><span class="p">,</span><span class="w">
</span><span class="n">salmon</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#ff857f"</span><span class="p">,</span><span class="w">
</span><span class="n">purple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#c5a3ff"</span><span class="p">,</span><span class="w">
</span><span class="n">teal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#c2ffdf"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>If we use the correct command, pandoc will provide us with a syntax
highlighting theme as a JSON file. <code class="language-plaintext highlighter-rouge">copy_base_pandoc_theme()</code> will call
this command for us. We can read that file into R and see that it is a
list of global style options followed by a list of individual style
definitions.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">temptheme</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tempfile</span><span class="p">(</span><span class="n">fileext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">".theme"</span><span class="p">)</span><span class="w">
</span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">copy_base_pandoc_theme</span><span class="p">(</span><span class="n">temptheme</span><span class="p">)</span><span class="w">
</span><span class="n">data_theme</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">jsonlite</span><span class="o">::</span><span class="n">read_json</span><span class="p">(</span><span class="n">temptheme</span><span class="p">)</span><span class="w">
</span><span class="n">str</span><span class="p">(</span><span class="n">data_theme</span><span class="p">,</span><span class="w"> </span><span class="n">max.level</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="c1">#> List of 5</span><span class="w">
</span><span class="c1">#> $ text-color : NULL</span><span class="w">
</span><span class="c1">#> $ background-color : NULL</span><span class="w">
</span><span class="c1">#> $ line-number-color : chr "#aaaaaa"</span><span class="w">
</span><span class="c1">#> $ line-number-background-color: NULL</span><span class="w">
</span><span class="c1">#> $ text-styles :List of 29</span><span class="w">
</span><span class="c1">#> ..$ Other :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Attribute :List of 5</span><span class="w">
</span><span class="c1">#> ..$ SpecialString :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Annotation :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Function :List of 5</span><span class="w">
</span><span class="c1">#> ..$ String :List of 5</span><span class="w">
</span><span class="c1">#> ..$ ControlFlow :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Operator :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Error :List of 5</span><span class="w">
</span><span class="c1">#> ..$ BaseN :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Alert :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Variable :List of 5</span><span class="w">
</span><span class="c1">#> ..$ BuiltIn :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Extension :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Preprocessor :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Information :List of 5</span><span class="w">
</span><span class="c1">#> ..$ VerbatimString:List of 5</span><span class="w">
</span><span class="c1">#> ..$ Warning :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Documentation :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Import :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Char :List of 5</span><span class="w">
</span><span class="c1">#> ..$ DataType :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Float :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Comment :List of 5</span><span class="w">
</span><span class="c1">#> ..$ CommentVar :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Constant :List of 5</span><span class="w">
</span><span class="c1">#> ..$ SpecialChar :List of 5</span><span class="w">
</span><span class="c1">#> ..$ DecVal :List of 5</span><span class="w">
</span><span class="c1">#> ..$ Keyword :List of 5</span><span class="w">
</span></code></pre></div></div>
<p>Each of those individual style definitions is a list of color options
and font style options:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">str</span><span class="p">(</span><span class="n">data_theme</span><span class="o">$</span><span class="n">`text-styles`</span><span class="o">$</span><span class="n">Comment</span><span class="p">)</span><span class="w">
</span><span class="c1">#> List of 5</span><span class="w">
</span><span class="c1">#> $ text-color : chr "#60a0b0"</span><span class="w">
</span><span class="c1">#> $ background-color: NULL</span><span class="w">
</span><span class="c1">#> $ bold : logi FALSE</span><span class="w">
</span><span class="c1">#> $ italic : logi TRUE</span><span class="w">
</span><span class="c1">#> $ underline : logi FALSE</span><span class="w">
</span></code></pre></div></div>
<p>solarizeddocx provides a helper function <code class="language-plaintext highlighter-rouge">set_theme_text_style()</code> for
setting individual style options. Let’s set up Fairy Floss’s global and
comment styles. We use the fake name <code class="language-plaintext highlighter-rouge">"global"</code> to access the global
style options, and we use style definition names like <code class="language-plaintext highlighter-rouge">"Comment"</code> to
access those specifically.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">magrittr</span><span class="p">)</span><span class="w">
</span><span class="n">ff_theme</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data_theme</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">set_theme_text_style</span><span class="p">(</span><span class="w">
</span><span class="s2">"global"</span><span class="p">,</span><span class="w">
</span><span class="n">background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">dark_purple</span><span class="p">,</span><span class="w">
</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">set_theme_text_style</span><span class="p">(</span><span class="w">
</span><span class="s2">"Comment"</span><span class="p">,</span><span class="w">
</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">gold</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">set_theme_text_style</span><span class="p">(</span><span class="w">
</span><span class="s2">"String"</span><span class="p">,</span><span class="w">
</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">yellow</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>Let’s preview our partial theme:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">write_pandoc_theme</span><span class="p">(</span><span class="n">ff_theme</span><span class="p">,</span><span class="w"> </span><span class="n">temptheme</span><span class="p">)</span><span class="w">
</span><span class="n">results</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">run_pandoc</span><span class="p">(</span><span class="w">
</span><span class="n">md_file</span><span class="p">,</span><span class="w">
</span><span class="n">pd_style</span><span class="p">(</span><span class="n">temptheme</span><span class="p">),</span><span class="w">
</span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">results</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot4.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot4-1.png" title="Screenshot of html file created by pandoc. It has a purple background, white text, gold comments and yellow strings, but it still looks bad because not all of the colors are done." alt="Screenshot of html file created by pandoc. It has a purple background, white text, gold comments and yellow strings, but it still looks bad because not all of the colors are done." width="80%" style="display: block; margin: auto;" /></p>
<p>This is a good start, but when I first ported the solarized theme, I had
to use 20 calls to <code class="language-plaintext highlighter-rouge">set_theme_text_style()</code>. That’s a lot. Plus,
<strong>themes are data</strong>. Can’t we just describe what needs to change in a
list? Yes. For this post, I made
<code class="language-plaintext highlighter-rouge">solarizeddocx::patch_theme_text_style()</code> where we describe the changes
to make as a list of patches.</p>
<p>Let’s write our list of patches to make to the base theme. Because some
style definitions are identical, we will use tibble’s lazy list
<a href="https://rdrr.io/pkg/tibble/man/lst.html"><code class="language-plaintext highlighter-rouge">tibble::lst()</code></a>to reuse patches along the way. For this
application of the palette, I consulted the <a href="http://tmtheme-editor.herokuapp.com/#!/editor/url/https://raw.githubusercontent.com/sailorhg/fairyfloss/gh-pages/fairyfloss.tmTheme" title="Fairy Floss Theme in online editor">Fairy Floss .tmTheme
file</a> and the <a href="https://github.com/gadenbuie/rsthemes/blob/main/inst/templates/fairyfloss.scss#L31-L42" title="GitHub source for rsthemes/inst/templates/fairyfloss.scss">rsthemes implementation</a> of Fairy
Floss.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">patches</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tibble</span><span class="o">::</span><span class="n">lst</span><span class="p">(</span><span class="w">
</span><span class="n">global</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="p">,</span><span class="w">
</span><span class="n">background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">dark_purple</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="c1"># # comments</span><span class="w">
</span><span class="n">Comment</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">gold</span><span class="p">,</span><span class="w"> </span><span class="n">italic</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">bold</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">),</span><span class="w">
</span><span class="c1"># ## comments</span><span class="w">
</span><span class="n">Documentation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Comment</span><span class="p">,</span><span class="w">
</span><span class="c1"># #> comments</span><span class="w">
</span><span class="n">Information</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">gold</span><span class="p">,</span><span class="w"> </span><span class="n">italic</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">bold</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">),</span><span class="w">
</span><span class="n">Keyword</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">pink</span><span class="p">),</span><span class="w">
</span><span class="n">ControlFlow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">pink</span><span class="p">,</span><span class="w"> </span><span class="n">bold</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">),</span><span class="w">
</span><span class="n">Operator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">pink</span><span class="p">),</span><span class="w">
</span><span class="n">Function</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">teal</span><span class="p">),</span><span class="w">
</span><span class="n">Attribute</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="p">),</span><span class="w">
</span><span class="n">Variable</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="p">),</span><span class="w">
</span><span class="c1"># this should be code outside of a code block</span><span class="w">
</span><span class="n">VerbatimString</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="p">,</span><span class="w">
</span><span class="n">background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">dark_purple</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">Other</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Variable</span><span class="p">,</span><span class="w">
</span><span class="n">Constant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">purple</span><span class="p">),</span><span class="w">
</span><span class="n">Error</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">salmon</span><span class="p">),</span><span class="w">
</span><span class="n">Alert</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Error</span><span class="p">,</span><span class="w">
</span><span class="n">Warning</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Error</span><span class="p">,</span><span class="w">
</span><span class="n">Float</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">purple</span><span class="p">),</span><span class="w">
</span><span class="n">DecVal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Float</span><span class="p">,</span><span class="w">
</span><span class="n">BaseN</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Float</span><span class="p">,</span><span class="w">
</span><span class="n">SpecialChar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">white</span><span class="p">),</span><span class="w">
</span><span class="n">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ff_colors</span><span class="o">$</span><span class="n">yellow</span><span class="p">),</span><span class="w">
</span><span class="n">Char</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w">
</span><span class="n">SpecialString</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">String</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<div class="notice--info">
<p><strong>Save yourself from guessing and checking.</strong> These style definition
names are documented on <a href="https://docs.kde.org/stable5/en/kate/katepart/highlight.html#kate-highlight-default-styles">this
page</a>.
I wish I had found this page before starting to port the solarized
theme. My initial approach was to use the style inspector in Microsoft
Word and look at the style names applied to pieces of code. The downside
of that approach is that in order to figure out what a <code class="language-plaintext highlighter-rouge">SpecialChar</code>
was, I had to write a <code class="language-plaintext highlighter-rouge">SpecialChar</code>. (Escape sequences inside of strings
like <code class="language-plaintext highlighter-rouge">"hello\nthere"</code> are <code class="language-plaintext highlighter-rouge">SpecialChars</code> in the R syntax definition used
by pandoc.)</p>
</div>
<p>Now we apply our patches to the theme:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ff_theme</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">patch_theme_text_style</span><span class="p">(</span><span class="w">
</span><span class="n">data_theme</span><span class="p">,</span><span class="w">
</span><span class="n">patches</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">solarizeddocx</span><span class="o">::</span><span class="n">write_pandoc_theme</span><span class="p">(</span><span class="n">ff_theme</span><span class="p">,</span><span class="w"> </span><span class="n">temptheme</span><span class="p">)</span><span class="w">
</span><span class="n">results</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">run_pandoc</span><span class="p">(</span><span class="w">
</span><span class="n">md_file</span><span class="p">,</span><span class="w">
</span><span class="n">pd_style</span><span class="p">(</span><span class="n">temptheme</span><span class="p">),</span><span class="w">
</span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">results</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot5.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot5-1.png" title="Screenshot of html file created by pandoc. It now has Fairy Floss colors." alt="Screenshot of html file created by pandoc. It now has Fairy Floss colors." width="80%" style="display: block; margin: auto;" /></p>
<p>Wonderful!</p>
<h2 id="sneaking-these-features-into-rmarkdown">Sneaking these features into RMarkdown</h2>
<div class="notice--info">
<p><strong>Update: This problem has been fixed</strong>. When I first wrote this post,
it was not possible to use custom highlighting themes with RMarkdown
HTML documents. The syntax highlighting for this format was overhauled
in
<a href="https://cran.r-project.org/web/packages/rmarkdown/news/news.html">rmarkdown 2.12</a>.
[<em>May 27, 2022</em>]</p>
</div>
<p>So far, we have set these options by directly calling pandoc with the
style and syntax options. <del>We can use these options in RMarkdown <em>some of
the time</em>. For example, here we try to send the Fairy Floss theme into
an <a href="https://pkgs.rstudio.com/rmarkdown/reference/html_document.html"><code class="language-plaintext highlighter-rouge">html_document()</code></a> and fail.</del></p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">out</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">render</span><span class="p">(</span><span class="w">
</span><span class="n">md_file</span><span class="p">,</span><span class="w">
</span><span class="n">output_format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">html_document</span><span class="p">(</span><span class="w">
</span><span class="c1"># Update, May 2022: Adding this line fixes things</span><span class="w">
</span><span class="n">highlight</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pd_style</span><span class="p">(</span><span class="n">temptheme</span><span class="p">)[</span><span class="m">2</span><span class="p">],</span><span class="w">
</span><span class="n">pandoc_args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">quiet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">out</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot6.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot6-1.png" title="Screenshot of html file created by RMarkdown. It has the default colors." alt="Screenshot of html file created by RMarkdown. It has the default colors." width="80%" style="display: block; margin: auto;" /></p>
<p><del>RMarkdown assembles and performs a giant pandoc command. The problem,
as far as I can tell, is that this command includes our
<code class="language-plaintext highlighter-rouge">pd_style(temptheme)</code> which sets the option for
<code class="language-plaintext highlighter-rouge">--highlight-style</code>—but later on it also includes <code class="language-plaintext highlighter-rouge">--no-highlight</code>
which blocks our style. Bummer.</del></p>
<p>If we use the simpler <a href="https://pkgs.rstudio.com/rmarkdown/reference/html_document_base.html"><code class="language-plaintext highlighter-rouge">html_document_base()</code></a>
format, however, we can see Fairy Floss output.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">out</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">render</span><span class="p">(</span><span class="w">
</span><span class="n">md_file</span><span class="p">,</span><span class="w">
</span><span class="n">output_format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">html_document_base</span><span class="p">(</span><span class="w">
</span><span class="n">pandoc_args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">pd_style</span><span class="p">(</span><span class="n">temptheme</span><span class="p">),</span><span class="w"> </span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">))</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">quiet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">page_thumbnail</span><span class="p">(</span><span class="n">url_file</span><span class="p">(</span><span class="n">out</span><span class="p">),</span><span class="w"> </span><span class="s2">"shot7.png"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot7-1.png" title="Screenshot of html file created by RMarkdown. It has the Fairy Floss colors." alt="Screenshot of html file created by RMarkdown. It has the Fairy Floss colors." width="80%" style="display: block; margin: auto;" /></p>
<p>The options also work for the <a href="https://pkgs.rstudio.com/rmarkdown/reference/pdf_document.html"><code class="language-plaintext highlighter-rouge">pdf_document()</code></a>
format.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">out</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">render</span><span class="p">(</span><span class="w">
</span><span class="n">md_file</span><span class="p">,</span><span class="w">
</span><span class="n">output_format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rmarkdown</span><span class="o">::</span><span class="n">pdf_document</span><span class="p">(</span><span class="w">
</span><span class="n">pandoc_args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">pd_style</span><span class="p">(</span><span class="n">temptheme</span><span class="p">),</span><span class="w"> </span><span class="n">pd_syntax</span><span class="p">(</span><span class="n">syntax_sl</span><span class="p">))</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">quiet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1"># Convert to png and crop most of the empty page</span><span class="w">
</span><span class="n">png</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pdftools</span><span class="o">::</span><span class="n">pdf_convert</span><span class="p">(</span><span class="n">out</span><span class="p">,</span><span class="w"> </span><span class="n">dpi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">144</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Converting page 1 to file343c662113f3_1.png... done!</span><span class="w">
</span><span class="n">magick</span><span class="o">::</span><span class="n">image_read</span><span class="p">(</span><span class="n">png</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">magick</span><span class="o">::</span><span class="n">image_crop</span><span class="p">(</span><span class="n">magick</span><span class="o">::</span><span class="n">geometry_area</span><span class="p">(</span><span class="m">1050</span><span class="p">,</span><span class="w"> </span><span class="m">400</span><span class="p">,</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="m">100</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2021-11-17-custom-highlighting-pandoc-rmarkdown/shot8-1.png" title="Screenshot of a cropped pdf file created by RMarkdown. It has the Fairy Floss colors." alt="Screenshot of a cropped pdf file created by RMarkdown. It has the Fairy Floss colors." width="80%" style="display: block; margin: auto;" /></p>
<p>The options also work with <a href="https://pkgs.rstudio.com/rmarkdown/reference/word_document.html"><code class="language-plaintext highlighter-rouge">word_document()</code></a>. In
fact, that’s how <code class="language-plaintext highlighter-rouge">solarizeddocx::document()</code> works.</p>
<hr />
<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2021-11-17-custom-highlighting-pandoc-rmarkdown.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:si" role="doc-endnote">
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#> ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> setting value</span><span class="w">
</span><span class="c1">#> version R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#> os Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#> system x86_64, mingw32</span><span class="w">
</span><span class="c1">#> ui RTerm</span><span class="w">
</span><span class="c1">#> language (EN)</span><span class="w">
</span><span class="c1">#> collate English_United States.utf8</span><span class="w">
</span><span class="c1">#> ctype English_United States.utf8</span><span class="w">
</span><span class="c1">#> tz America/Chicago</span><span class="w">
</span><span class="c1">#> date 2022-05-27</span><span class="w">
</span><span class="c1">#> pandoc 2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> package * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#> askpass 1.1 2019-01-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> bslib 0.3.1 2021-10-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> callr 3.7.0 2021-04-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> downlit 0.4.0 2021-10-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> git2r 0.30.1 2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> here 1.0.1 2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> knitr * 1.39 2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> magick 2.7.3 2021-08-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> magrittr * 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pdftools 3.2.0 2022-04-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> processx 3.5.3 2022-03-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ps 1.7.0 2022-04-23 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> qpdf 1.1 2019-03-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ragg 1.2.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> sass 0.4.1 2022-03-23 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> solarizeddocx 0.0.1.9000 2022-05-25 [1] Github (tjmahr/solarizeddocx@8f82bf1)</span><span class="w">
</span><span class="c1">#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tinytex 0.39 2022-05-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> webshot 0.5.3 2022-04-14 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#> [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div> </div>
<p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Tristan Mahrtjmahrweb@gmail.comNow you can have Fairy Floss in quarterly-report.docxA one-liner for generating random participant IDs2021-10-12T00:00:00-05:002021-10-12T00:00:00-05:00https://tjmahr.github.io/one-liner-to-generate-ids<p>On one of the Slacks I browse, someone asked how to de-identify a
column of participant IDs. The original dataset was a wait list, so
the ordering of IDs itself was a sensitive feature of the data and we
need to scramble the order of IDs produced.</p>
<p>For example, suppose we have the following <em>repeated measures</em> dataset.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tibble</span><span class="o">::</span><span class="n">tribble</span><span class="p">(</span><span class="w">
</span><span class="o">~</span><span class="w"> </span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">timepoint</span><span class="p">,</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">score</span><span class="p">,</span><span class="w">
</span><span class="s2">"DB"</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">7</span><span class="p">,</span><span class="w">
</span><span class="s2">"DB"</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">8</span><span class="p">,</span><span class="w">
</span><span class="s2">"DB"</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">8</span><span class="p">,</span><span class="w">
</span><span class="s2">"TW"</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">,</span><span class="w">
</span><span class="s2">"TW"</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">9</span><span class="p">,</span><span class="w">
</span><span class="s2">"CF"</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">9</span><span class="p">,</span><span class="w">
</span><span class="s2">"CF"</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">8</span><span class="p">,</span><span class="w">
</span><span class="s2">"JH"</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w">
</span><span class="s2">"JH"</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w">
</span><span class="s2">"JH"</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>We want to map the <code class="language-plaintext highlighter-rouge">participant</code> identifiers onto some sort of
shuffled-up random IDs. Suggestions included hashing the IDs with
<a href="https://rdrr.io/pkg/digest/man/sha1.html">digest</a>:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># This approach cryptographically compresses the input into a short</span><span class="w">
</span><span class="c1"># "digest". (It is not a random ID.)</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="w">
</span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Vectorize</span><span class="p">(</span><span class="n">digest</span><span class="o">::</span><span class="n">sha1</span><span class="p">)(</span><span class="n">participant</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1">#> # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#> participant timepoint score</span><span class="w">
</span><span class="c1">#> <chr> <dbl> <dbl></span><span class="w">
</span><span class="c1">#> 1 ad61ec1247b2381922bec89483c3ce2fb67f98d9 1 7</span><span class="w">
</span><span class="c1">#> 2 ad61ec1247b2381922bec89483c3ce2fb67f98d9 2 8</span><span class="w">
</span><span class="c1">#> 3 ad61ec1247b2381922bec89483c3ce2fb67f98d9 3 8</span><span class="w">
</span><span class="c1">#> 4 c080f9a87edc6d47f28185279fd8be068c566a37 1 NA</span><span class="w">
</span><span class="c1">#> 5 c080f9a87edc6d47f28185279fd8be068c566a37 2 9</span><span class="w">
</span><span class="c1">#> 6 1f9da22bf684761daec27326331c58b46502a25b 1 9</span><span class="w">
</span><span class="c1">#> 7 1f9da22bf684761daec27326331c58b46502a25b 2 8</span><span class="w">
</span><span class="c1">#> 8 627d211747438ae59690cea8f0a8d6adf666b974 1 10</span><span class="w">
</span><span class="c1">#> 9 627d211747438ae59690cea8f0a8d6adf666b974 2 10</span><span class="w">
</span><span class="c1">#> 10 627d211747438ae59690cea8f0a8d6adf666b974 3 10</span><span class="w">
</span></code></pre></div></div>
<p>But this approach seems like overkill, and hashing just transforms these
IDs. We want to be rid of them completely.</p>
<p>The <a href="https://rdrr.io/pkg/uuid/man/UUIDgenerate.html">uuid</a> package provides <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random)">another approach</a>:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">participant</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="w">
</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">uuid</span><span class="o">::</span><span class="n">UUIDgenerate</span><span class="p">(</span><span class="n">use.time</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">ungroup</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">relocate</span><span class="p">(</span><span class="n">participant</span><span class="p">)</span><span class="w">
</span><span class="c1">#> # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#> participant timepoint score</span><span class="w">
</span><span class="c1">#> <chr> <dbl> <dbl></span><span class="w">
</span><span class="c1">#> 1 03e9536d-1446-4779-ac4d-67848fa73ef4 1 7</span><span class="w">
</span><span class="c1">#> 2 03e9536d-1446-4779-ac4d-67848fa73ef4 2 8</span><span class="w">
</span><span class="c1">#> 3 03e9536d-1446-4779-ac4d-67848fa73ef4 3 8</span><span class="w">
</span><span class="c1">#> 4 f7b73ca6-57c7-4c9a-9211-86b434912856 1 NA</span><span class="w">
</span><span class="c1">#> 5 f7b73ca6-57c7-4c9a-9211-86b434912856 2 9</span><span class="w">
</span><span class="c1">#> 6 81b02d88-c3bd-490b-b2dc-150077f03172 1 9</span><span class="w">
</span><span class="c1">#> 7 81b02d88-c3bd-490b-b2dc-150077f03172 2 8</span><span class="w">
</span><span class="c1">#> 8 60f80714-77ba-4e9f-a7d2-1943ca6724fc 1 10</span><span class="w">
</span><span class="c1">#> 9 60f80714-77ba-4e9f-a7d2-1943ca6724fc 2 10</span><span class="w">
</span><span class="c1">#> 10 60f80714-77ba-4e9f-a7d2-1943ca6724fc 3 10</span><span class="w">
</span></code></pre></div></div>
<p>Again, these IDs seem excessive: Imagine plotting data with one participant
per facet.</p>
<p>When I create blogposts for this site, I use a function to create a new
.Rmd file with the date and a <a href="https://rdrr.io/pkg/ids/man/adjective_animal.html">random adjective-animal
phrase</a> for a
placeholder (e.g., <code class="language-plaintext highlighter-rouge">2021-06-28-mild-capybara.Rmd</code>). We could try that for
fun:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">participant</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="w">
</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ids</span><span class="o">::</span><span class="n">adjective_animal</span><span class="p">()</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">ungroup</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">relocate</span><span class="p">(</span><span class="n">participant</span><span class="p">)</span><span class="w">
</span><span class="c1">#> # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#> participant timepoint score</span><span class="w">
</span><span class="c1">#> <chr> <dbl> <dbl></span><span class="w">
</span><span class="c1">#> 1 chrysoprase_bushsqueaker 1 7</span><span class="w">
</span><span class="c1">#> 2 chrysoprase_bushsqueaker 2 8</span><span class="w">
</span><span class="c1">#> 3 chrysoprase_bushsqueaker 3 8</span><span class="w">
</span><span class="c1">#> 4 hideous_cheetah 1 NA</span><span class="w">
</span><span class="c1">#> 5 hideous_cheetah 2 9</span><span class="w">
</span><span class="c1">#> 6 powdery_siamang 1 9</span><span class="w">
</span><span class="c1">#> 7 powdery_siamang 2 8</span><span class="w">
</span><span class="c1">#> 8 ducal_hornshark 1 10</span><span class="w">
</span><span class="c1">#> 9 ducal_hornshark 2 10</span><span class="w">
</span><span class="c1">#> 10 ducal_hornshark 3 10</span><span class="w">
</span></code></pre></div></div>
<p>But that’s too whimsical (and something like <code class="language-plaintext highlighter-rouge">hideous-cheetah</code> seems
disrespectful for human subjects).</p>
<p>One user suggested <a href="https://forcats.tidyverse.org/reference/fct_anon.html"><code class="language-plaintext highlighter-rouge">forcats::fct_anon()</code></a>:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="w">
</span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">participant</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">as.factor</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">forcats</span><span class="o">::</span><span class="n">fct_anon</span><span class="p">(</span><span class="n">prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"p0"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1">#> # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#> participant timepoint score</span><span class="w">
</span><span class="c1">#> <fct> <dbl> <dbl></span><span class="w">
</span><span class="c1">#> 1 p04 1 7</span><span class="w">
</span><span class="c1">#> 2 p04 2 8</span><span class="w">
</span><span class="c1">#> 3 p04 3 8</span><span class="w">
</span><span class="c1">#> 4 p02 1 NA</span><span class="w">
</span><span class="c1">#> 5 p02 2 9</span><span class="w">
</span><span class="c1">#> 6 p03 1 9</span><span class="w">
</span><span class="c1">#> 7 p03 2 8</span><span class="w">
</span><span class="c1">#> 8 p01 1 10</span><span class="w">
</span><span class="c1">#> 9 p01 2 10</span><span class="w">
</span><span class="c1">#> 10 p01 3 10</span><span class="w">
</span></code></pre></div></div>
<p>This approach works wonderfully. The only wrinkle is that it requires
converting our IDs to a factor in order to work.</p>
<h2 id="call-me-the-match-maker">Call me the <code class="language-plaintext highlighter-rouge">match()</code>-maker</h2>
<p>My approach is a nice combination of base R functions:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="w">
</span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">match</span><span class="p">(</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">participant</span><span class="p">)))</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1">#> # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#> participant timepoint score</span><span class="w">
</span><span class="c1">#> <int> <dbl> <dbl></span><span class="w">
</span><span class="c1">#> 1 3 1 7</span><span class="w">
</span><span class="c1">#> 2 3 2 8</span><span class="w">
</span><span class="c1">#> 3 3 3 8</span><span class="w">
</span><span class="c1">#> 4 1 1 NA</span><span class="w">
</span><span class="c1">#> 5 1 2 9</span><span class="w">
</span><span class="c1">#> 6 2 1 9</span><span class="w">
</span><span class="c1">#> 7 2 2 8</span><span class="w">
</span><span class="c1">#> 8 4 1 10</span><span class="w">
</span><span class="c1">#> 9 4 2 10</span><span class="w">
</span><span class="c1">#> 10 4 3 10</span><span class="w">
</span></code></pre></div></div>
<p><a href="https://rdrr.io/r/base/match.html"><code class="language-plaintext highlighter-rouge">match(x, table)</code></a> returns the first
positions of the <code class="language-plaintext highlighter-rouge">x</code> elements in some vector <code class="language-plaintext highlighter-rouge">table</code>. What is the
position in the alphabet of the letters L and Q and L again?</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">match</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"L"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Q"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 12 17 12</span><span class="w">
</span></code></pre></div></div>
<p><a href="https://rdrr.io/r/base/sample.html"><code class="language-plaintext highlighter-rouge">sample()</code></a> shuffles the values in
the <code class="language-plaintext highlighter-rouge">table</code> so the order of elements is lost. The <code class="language-plaintext highlighter-rouge">unique()</code> is
optional. We could just <code class="language-plaintext highlighter-rouge">sample(data$participant)</code>. Then the first
position of one of the IDs might be a number larger than 4:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">shuffle</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="n">data</span><span class="o">$</span><span class="n">participant</span><span class="p">)</span><span class="w">
</span><span class="n">shuffle</span><span class="w">
</span><span class="c1">#> [1] "CF" "JH" "TW" "JH" "DB" "DB" "DB" "JH" "CF" "TW"</span><span class="w">
</span><span class="n">match</span><span class="p">(</span><span class="n">data</span><span class="o">$</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="n">shuffle</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 5 5 5 3 3 1 1 2 2 2</span><span class="w">
</span></code></pre></div></div>
<p>For more aesthetically pleasing names, and for names that will sort
correctly, we can zero-pad the results with
<a href="https://rdrr.io/r/base/sprintf.html"><code class="language-plaintext highlighter-rouge">sprintf()</code></a>. I am mostly
including this step so that I have it written down somewhere for my own
reference.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">zero_pad</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span><span class="w"> </span><span class="n">prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="c1"># use widest element if bigger than `width`</span><span class="w">
</span><span class="n">width</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">nchar</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span><span class="w"> </span><span class="n">width</span><span class="p">))</span><span class="w">
</span><span class="n">sprintf</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="n">prefix</span><span class="p">,</span><span class="w"> </span><span class="s2">"%0"</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="p">,</span><span class="w"> </span><span class="s2">"d"</span><span class="p">),</span><span class="w"> </span><span class="n">xs</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="w">
</span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">match</span><span class="p">(</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">participant</span><span class="p">))),</span><span class="w">
</span><span class="n">participant</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">zero_pad</span><span class="p">(</span><span class="n">participant</span><span class="p">,</span><span class="w"> </span><span class="s2">"p"</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1">#> # A tibble: 10 × 3</span><span class="w">
</span><span class="c1">#> participant timepoint score</span><span class="w">
</span><span class="c1">#> <chr> <dbl> <dbl></span><span class="w">
</span><span class="c1">#> 1 p003 1 7</span><span class="w">
</span><span class="c1">#> 2 p003 2 8</span><span class="w">
</span><span class="c1">#> 3 p003 3 8</span><span class="w">
</span><span class="c1">#> 4 p004 1 NA</span><span class="w">
</span><span class="c1">#> 5 p004 2 9</span><span class="w">
</span><span class="c1">#> 6 p002 1 9</span><span class="w">
</span><span class="c1">#> 7 p002 2 8</span><span class="w">
</span><span class="c1">#> 8 p001 1 10</span><span class="w">
</span><span class="c1">#> 9 p001 2 10</span><span class="w">
</span><span class="c1">#> 10 p001 3 10</span><span class="w">
</span></code></pre></div></div>
<h3 id="bonus-match-in-disguise">Bonus: <code class="language-plaintext highlighter-rouge">match()</code> <code class="language-plaintext highlighter-rouge">%in%</code> disguise</h3>
<p>What happens when <code class="language-plaintext highlighter-rouge">match()</code> fails to find an <code class="language-plaintext highlighter-rouge">x</code> in the table? By
default, we get <code class="language-plaintext highlighter-rouge">NA</code>. But we can customize the results with the
<code class="language-plaintext highlighter-rouge">nomatch</code> argument.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">match</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] NA 1 12</span><span class="w">
</span><span class="n">match</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">,</span><span class="w"> </span><span class="n">nomatch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-99</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] -99 1 12</span><span class="w">
</span><span class="n">match</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">,</span><span class="w"> </span><span class="n">nomatch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 0 1 12</span><span class="w">
</span></code></pre></div></div>
<p>If we do something like this last example, then we can check whether an
element in <code class="language-plaintext highlighter-rouge">x</code> has a match by checking for numbers greater than 0.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">match</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">,</span><span class="w"> </span><span class="n">nomatch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="c1">#> [1] FALSE TRUE TRUE</span><span class="w">
</span></code></pre></div></div>
<p>And that is how the functions <a href="https://rdrr.io/r/base/match.html"><code class="language-plaintext highlighter-rouge">%in%</code></a> and <a href="https://rdrr.io/r/base/sets.html"><code class="language-plaintext highlighter-rouge">is.element()</code></a> are implemented
behind the scenes:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">)</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="nb">LETTERS</span><span class="w">
</span><span class="c1">#> [1] FALSE TRUE TRUE</span><span class="w">
</span><span class="c1"># The 0L means it's an integer number instead of floating point number</span><span class="w">
</span><span class="n">`%in%`</span><span class="w">
</span><span class="c1">#> function (x, table) </span><span class="w">
</span><span class="c1">#> match(x, table, nomatch = 0L) > 0L</span><span class="w">
</span><span class="c1">#> <bytecode: 0x0000019f10fbf0a0></span><span class="w">
</span><span class="c1">#> <environment: namespace:base></span><span class="w">
</span><span class="n">is.element</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"7"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">,</span><span class="w"> </span><span class="s2">"L"</span><span class="p">),</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] FALSE TRUE TRUE</span><span class="w">
</span><span class="n">is.element</span><span class="w">
</span><span class="c1">#> function (el, set) </span><span class="w">
</span><span class="c1">#> match(as.vector(el), as.vector(set), 0L) > 0L</span><span class="w">
</span><span class="c1">#> <bytecode: 0x0000019f13c60db0></span><span class="w">
</span><span class="c1">#> <environment: namespace:base></span><span class="w">
</span></code></pre></div></div>
<hr />
<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2021-10-12-one-liner-to-generate-ids.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:si" role="doc-endnote">
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#> ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> setting value</span><span class="w">
</span><span class="c1">#> version R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#> os Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#> system x86_64, mingw32</span><span class="w">
</span><span class="c1">#> ui RTerm</span><span class="w">
</span><span class="c1">#> language (EN)</span><span class="w">
</span><span class="c1">#> collate English_United States.utf8</span><span class="w">
</span><span class="c1">#> ctype English_United States.utf8</span><span class="w">
</span><span class="c1">#> tz America/Chicago</span><span class="w">
</span><span class="c1">#> date 2022-05-27</span><span class="w">
</span><span class="c1">#> pandoc NA</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> package * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> broom 0.8.0 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> dplyr * 1.0.9 2022-04-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> git2r 0.30.1 2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> haven 2.5.0 2022-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> here 1.0.1 2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> hms 1.1.1 2021-09-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> httr 1.4.3 2022-05-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ids 1.0.1 2017-05-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> knitr * 1.39 2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ragg 1.2.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> readr * 2.1.2 2022-01-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> readxl 1.4.0 2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rvest 1.0.2 2021-10-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tibble * 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyr * 1.2.0 2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> uuid 1.1-0 2022-04-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#> [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div> </div>
<p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Tristan Mahrtjmahrweb@gmail.comFind a `match()` in your base R libraryKeep your R scripts locally sourced2021-08-16T00:00:00-05:002021-08-16T00:00:00-05:00https://tjmahr.github.io/keep-it-locally-sourced<p>A few weeks ago, I had a <em>bad</em> debugging session. The code was just not
doing what I expected, and I went down a lot of deadends trying to fix
or simplify things. I could not get the problem to happen in a
reproducible example (<a href="https://reprex.tidyverse.org/">reprex</a>) or
interactively (in RStudio). Eventually, the most minimal example of the
problem completely broke my mental model for how the code should work.</p>
<p>The problem had to do with names and what they mean. <code class="language-plaintext highlighter-rouge">select()</code> is a
function the lives in the MASS package and the dplyr package, and I
always intend for <code class="language-plaintext highlighter-rouge">select()</code> to point to
<a href="https://dplyr.tidyverse.org/reference/select.html"><code class="language-plaintext highlighter-rouge">dplyr::select()</code></a>.
But sometimes a statistics package will load in MASS and overwrite
<code class="language-plaintext highlighter-rouge">select()</code> to point to
<a href="https://rdrr.io/pkg/MASS/man/lm.ridge.html"><code class="language-plaintext highlighter-rouge">MASS::select()</code></a>. And in
this case, my attempts to use <code class="language-plaintext highlighter-rouge">select()</code> in a
<a href="https://rdrr.io/r/base/source.html"><code class="language-plaintext highlighter-rouge">source()</code></a>-ed file kept reverting
to <code class="language-plaintext highlighter-rouge">MASS::select()</code> instead of <code class="language-plaintext highlighter-rouge">dplyr::select()</code>. A tweet from the
session shows the minimal example and my wracked brain. (I will describe
the example in more detail below.)</p>
<blockquote class="twitter-tweet" data-conversation="none" data-lang="en" data-dnt="true" data-theme="light">
<p lang="en" dir="ltr">i'm dry heaving here wtf is going <a href="https://t.co/KIeRJT6kwY">pic.twitter.com/KIeRJT6kwY</a></p>
<img src="/assets/images/2021-08-wtf-debugging.jpg" width="60%" alt="Code/output where I map `select` to `dplyr::select`, create a file with one function that prints the environment of `select`, print `select` (namespace:dplyr), call the function (namespace:MASS), and print `select` (namespace:dplyr)" />
<br />
— tj mahr 🍍🍕 (@tjmahr) <a href="https://twitter.com/tjmahr/status/1417894498080800769?ref_src=twsrc%5Etfw">July 21, 2021</a>
</blockquote>
<p>Here’s what happens:</p>
<ol>
<li>I explicitly assign <code class="language-plaintext highlighter-rouge">select</code> to <code class="language-plaintext highlighter-rouge">dplyr::select()</code>.</li>
<li>I make a function <code class="language-plaintext highlighter-rouge">f()</code> that prints the environment of <code class="language-plaintext highlighter-rouge">select</code>
(where the name/function is defined), store the function in a <code class="language-plaintext highlighter-rouge">.R</code>
text file and <code class="language-plaintext highlighter-rouge">source()</code> in the text file. (<code class="language-plaintext highlighter-rouge">source()</code> runs the code
in an R script.)</li>
<li>I print the value of <code class="language-plaintext highlighter-rouge">select</code> and see that it is indeed from the
dplyr environment.</li>
<li>I call my function, and it says that <code class="language-plaintext highlighter-rouge">select</code> is actually in the
MASS package.</li>
<li>I check the value of <code class="language-plaintext highlighter-rouge">select</code>, and it reports the dplyr environment
once again.</li>
</ol>
<h2 id="a-similar-problem-using-functions">A similar problem using functions</h2>
<p>This problem only happened while knitting <a href="https://github.com/tjmahr/notestar" title="My notebook system">one of my analysis
notebooks</a> (which was a clue). Right now, it’s proving
difficult for me to write examples of this problem for this blogpost, so
I’m going to show the source 😉 of the problem using functions.</p>
<p>First, let’s set up things so that <code class="language-plaintext highlighter-rouge">select</code> belongs to the MASS package.
We are also going to use the <a href="https://conflicted.r-lib.org/" title="conflicted: An Alternative Conflict Resolution Strategy">conflicted</a> package which normally prevents
package name <em>conflicts</em> from happening. This part isn’t necessary or
helpful; I just want to illustrate that this is not a simple name
conflict problem.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">conflicted</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">MASS</span><span class="p">)</span><span class="w">
</span><span class="n">environment</span><span class="p">(</span><span class="n">select</span><span class="p">)</span><span class="w">
</span><span class="c1">#> <environment: namespace:MASS></span><span class="w">
</span></code></pre></div></div>
<p>We are going to make a function that does what my original code example
tried to do:</p>
<ul>
<li>set <code class="language-plaintext highlighter-rouge">select</code> to dplyr explicitly</li>
<li><code class="language-plaintext highlighter-rouge">source()</code> in a file that gives the environment of <code class="language-plaintext highlighter-rouge">select</code></li>
<li>return the environment of <code class="language-plaintext highlighter-rouge">select</code>, both using the <code class="language-plaintext highlighter-rouge">source()</code>-ed
function and directly.</li>
</ul>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">source_in_my_code</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">...</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="c1"># set dplyr select</span><span class="w">
</span><span class="n">select</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dplyr</span><span class="o">::</span><span class="n">select</span><span class="w">
</span><span class="c1"># write a script to temporary file</span><span class="w">
</span><span class="n">temp_script</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tempfile</span><span class="p">(</span><span class="n">fileext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">".R"</span><span class="p">)</span><span class="w">
</span><span class="n">my_code</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"
f <- function() environment(select)
"</span><span class="w">
</span><span class="n">writeLines</span><span class="p">(</span><span class="n">my_code</span><span class="p">,</span><span class="w"> </span><span class="n">temp_script</span><span class="p">)</span><span class="w">
</span><span class="c1"># run the script</span><span class="w">
</span><span class="n">source</span><span class="p">(</span><span class="n">temp_script</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w">
</span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">source_select_environment</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">f</span><span class="p">(),</span><span class="w">
</span><span class="n">function_select_environment</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">environment</span><span class="p">(</span><span class="n">select</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">default_results</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">source_in_my_code</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>
<p>What do you think the <code class="language-plaintext highlighter-rouge">select</code> environment should be? dplyr, right?
That’s what <code class="language-plaintext highlighter-rouge">select</code> means everywhere else inside of the function.
<code class="language-plaintext highlighter-rouge">source()</code> is just like dropping in some R code and running it, right?
That’s what I thought.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">default_results</span><span class="w">
</span><span class="c1">#> $source_select_environment</span><span class="w">
</span><span class="c1">#> <environment: namespace:MASS></span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> $function_select_environment</span><span class="w">
</span><span class="c1">#> <environment: namespace:dplyr></span><span class="w">
</span></code></pre></div></div>
<p>No, it’s the MASS environment. 😕</p>
<h2 id="local-and-parent-environments">Local and parent environments</h2>
<p>In order to understand what’s happening, let’s first note that R works
by evaluating expressions in an environment. The environment defines the
values of names. If a name is not found in an environment, R searches
parent environment for the name (or the parent’s parent, and so on).
This idea is <a href="https://adv-r.hadley.nz/environments.html#parents">illustrated beautifully in <em>Advanced R</em> using
diagrams</a>.</p>
<p>For an analogy, you might think of environments as looking up someone in
an office, a building directory, then an area directory:</p>
<blockquote class="twitter-tweet" data-conversation="none" data-lang="en" data-dnt="true" data-theme="light">
<p lang="en" dir="ltr">I like the multi-company building analogy. If you want to call Jim, first you look in your company directory. If there isn’t a Jim there, you look in the all-building maintenance dir. If not there, you look in the city services dir. You don’t look in another company-specific dir
</p>
— Brenton Wiernik 🏳️🌈 (@bmwiernik) <a href="https://twitter.com/bmwiernik/status/1387164714451488772?ref_src=twsrc%5Etfw">April 27, 2021</a>
</blockquote>
<p>Here is small example showing a local function environment, its
parent environment and how a name will take different values depending on
the context.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">where_am_i</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"outside of the function"</span><span class="w">
</span><span class="n">where_are_you</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"outside of the function too"</span><span class="w">
</span><span class="n">where_is_everyone</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">where_am_i</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"inside of the function"</span><span class="w">
</span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">where_am_i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">where_am_i</span><span class="p">,</span><span class="w">
</span><span class="n">where_are_you</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">where_are_you</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">where_am_i</span><span class="w">
</span><span class="c1">#> [1] "outside of the function"</span><span class="w">
</span><span class="n">where_is_everyone</span><span class="p">()</span><span class="w">
</span><span class="c1">#> $where_am_i</span><span class="w">
</span><span class="c1">#> [1] "inside of the function"</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> $where_are_you</span><span class="w">
</span><span class="c1">#> [1] "outside of the function too"</span><span class="w">
</span><span class="n">where_am_i</span><span class="w">
</span><span class="c1">#> [1] "outside of the function"</span><span class="w">
</span></code></pre></div></div>
<p>Outside of the function, <code class="language-plaintext highlighter-rouge">where_am_i</code> is <code class="language-plaintext highlighter-rouge">"outside of the function"</code>,
but in the body of the function, it is defined to <code class="language-plaintext highlighter-rouge">"inside of the
function"</code>. The variable <code class="language-plaintext highlighter-rouge">where_are_you</code> is <em>only</em> defined <code class="language-plaintext highlighter-rouge">"out of the
function too"</code>, so the function has to search for the variable in its
parent environment.</p>
<blockquote class="twitter-tweet" data-conversation="none" data-lang="en" data-dnt="true" data-theme="light">
<p lang="en" dir="ltr">"parent" environment suggests a family metaphor. if you cant find what a symbol means, ask a parent.</p>
— tj mahr 🍍🍕 (@tjmahr) <a href="https://twitter.com/tjmahr/status/1387087953982328833?ref_src=twsrc%5Etfw">April 27, 2021</a>
</blockquote>
<h2 id="locally-sourced-r-code">Locally sourced R code</h2>
<p>Reading the <a href="https://rdrr.io/r/base/source.html">documentation to <code class="language-plaintext highlighter-rouge">source()</code></a>, we find the solution to the
original problem:</p>
<blockquote>
<p><strong>Arguments</strong></p>
<p><strong><code class="language-plaintext highlighter-rouge">local</code></strong> <br />
<code class="language-plaintext highlighter-rouge">TRUE</code>, <code class="language-plaintext highlighter-rouge">FALSE</code> or an environment, determining where the parsed
expressions are evaluated. <code class="language-plaintext highlighter-rouge">FALSE</code> (the default) corresponds to the
user’s workspace (the global environment) and <code class="language-plaintext highlighter-rouge">TRUE</code> to the
environment from which <code class="language-plaintext highlighter-rouge">source</code> is called.</p>
</blockquote>
<p>By default, the code evaluated by <code class="language-plaintext highlighter-rouge">source()</code> runs in the global
environment–that is, “outside” of the body of the function. The code
<em>breaks out</em> of the function environment and runs at the higher
environment.</p>
<p>My mental model for <code class="language-plaintext highlighter-rouge">source()</code> was completely wrong. <code class="language-plaintext highlighter-rouge">source()</code> is not
like dropping in the R code from a file and running it. It is more like
pausing everything that you’re doing in your current context, backing
out to the highest level context, running that code, and then resuming
what you’re doing.</p>
<p>Fortunately, if we ask source to run locally (<code class="language-plaintext highlighter-rouge">local = TRUE</code>), <code class="language-plaintext highlighter-rouge">select</code>
has the same environment inside the function and in the code run using
<code class="language-plaintext highlighter-rouge">source()</code>.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># I defined the function so it could pass arguments to source()</span><span class="w">
</span><span class="n">source_in_my_code</span><span class="p">(</span><span class="n">local</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1">#> $source_select_environment</span><span class="w">
</span><span class="c1">#> <environment: namespace:dplyr></span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> $function_select_environment</span><span class="w">
</span><span class="c1">#> <environment: namespace:dplyr></span><span class="w">
</span></code></pre></div></div>
<p>When we’re using <code class="language-plaintext highlighter-rouge">source()</code> as one of the first few lines of an R
script, the default global environment for <code class="language-plaintext highlighter-rouge">source()</code> doesn’t really
matter. But in contexts like the function example or code stored in a
custom knitr/RMarkdown setup (my original problem), this difference <em>is</em>
a problem. Therefore, in the future, I’m going to abide by the motto
<em>Keep it locally sourced</em>. This way fits my mental model for <code class="language-plaintext highlighter-rouge">source()</code>
as something that drops in R code and runs it in place.</p>
<p>And by the way, yes, even though I cited <em>Advanced R</em> above, I clearly
did not do all of the exercises:</p>
<blockquote>
<p><a href="https://adv-r.hadley.nz/evaluation.html#exercises-61">20.2.4 Exercises</a></p>
<ol>
<li>Carefully read the documentation for <code class="language-plaintext highlighter-rouge">source()</code>. What environment
does it use by default? What if you supply <code class="language-plaintext highlighter-rouge">local = TRUE</code>? How do
you provide a custom environment?</li>
</ol>
</blockquote>
<hr />
<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2021-08-16-keep-it-locally-sourced.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:si" role="doc-endnote">
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#> ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> setting value</span><span class="w">
</span><span class="c1">#> version R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#> os Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#> system x86_64, mingw32</span><span class="w">
</span><span class="c1">#> ui RTerm</span><span class="w">
</span><span class="c1">#> language (EN)</span><span class="w">
</span><span class="c1">#> collate English_United States.utf8</span><span class="w">
</span><span class="c1">#> ctype English_United States.utf8</span><span class="w">
</span><span class="c1">#> tz America/Chicago</span><span class="w">
</span><span class="c1">#> date 2022-05-27</span><span class="w">
</span><span class="c1">#> pandoc NA</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> package * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> conflicted * 1.1.0 2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> dplyr 1.0.9 2022-04-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> emo 0.0.0.9000 2022-05-25 [1] Github (hadley/emo@3f03b11)</span><span class="w">
</span><span class="c1">#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> git2r 0.30.1 2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> here 1.0.1 2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> knitr * 1.39 2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> MASS * 7.3-56 2022-03-23 [2] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ragg 1.2.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#> [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div> </div>
<p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Tristan Mahrtjmahrweb@gmail.comA lesson from debugging `source()`Snecko eye lets you play more cards2021-07-07T00:00:00-05:002021-07-07T00:00:00-05:00https://tjmahr.github.io/slay-the-spire-snecko-eye-simulation<p>In <a href="/pokemon-go-unown-simulation">a previous post</a>, I used simulations
to estimate how long it would take to collect the unique Unowns in
<em>Pokemon Go!</em> The message of the post was that we can use simulations
to solve problems when the analytic solution is not clear or obvious.
The current post is an another example of using simulations to
understand a weird counting/probability problem.</p>
<hr />
<p>Over the past strange year, I sunk a lot of time into <a href="https://www.megacrit.com/" title="Slay the Spire website"><em>Slay the
Spire</em></a>, a <a href="https://en.wikipedia.org/wiki/Roguelike" title="Roguelike on Wikipedia">rogue-like</a> deck-building game. You have to
escape from a 50-floor spire, fighting monsters by playing cards. The
cards let you attack, defend, apply buffs and debuffs, draw cards, etc.
You start each turn with a given amount of energy, the cards cost energy
(with more powerful effects costing more energy), so you need to
plan out how to play your turns in order to defeat the monsters. You
receive more cards from winning battles and can receive special relics
that will make you stronger or change how your deck plays.</p>
<p>That’s the basic gist of the game. The sublime part comes when the
cards and relics start synergistically empowering each other and
comboing off each other. You might get the curse <a href="https://slay-the-spire.fandom.com/wiki/Pain" title="Pain on the Slay the Spire Fandom wiki">Pain</a> which drains you of 1
health every time you play a card. (This is bad.) But
then you find a <a href="https://slay-the-spire.fandom.com/wiki/Rupture" title="Rupture on the Slay the Spire Fandom wiki">Rupture</a> which increases your strength every time you
take damage from a card. Then you get <a href="https://slay-the-spire.fandom.com/wiki/Runic_Cube" title="Runic Cube on the Slay the Spire Fandom wiki">Runic Cube</a> which draws you an extra
card every time you take damage. Finally, you find <a href="https://slay-the-spire.fandom.com/wiki/Reaper" title="Reaper on the Slay the Spire Fandom wiki">Reaper</a> which
converts damage into health. So you now have this card-drawing,
strength-building, self-sustaining engine that makes you unstoppable.
(This <a href="https://youtu.be/OySMKDSWsTE">particular scenario</a> unfolded in a
recent game by the streamer Jorbs.)</p>
<figure class="" style="max-width: 80%; display: block; margin: 2em auto;">
<img src="/assets/images/2021-07-ss.jpg" alt="A screenshot of Slay The Spire gameplay." /><figcaption>
Screenshot of <em>Slay the Spire</em> gameplay. We see a hand of 5 cards at the bottom with 1/3 energy available on the left.
</figcaption></figure>
<p>The exercise today: <strong>simulate the maximum number of cards we can play
per turn</strong> for 3 energy under normal circumstances and when a
game-warping relic (Snecko Eye) is active.</p>
<h2 id="a-baseline-deck">A baseline deck</h2>
<p>Let’s consider a setup as a baseline for comparison.</p>
<ul>
<li>We have 3 energy to play cards per turn.</li>
<li>We draw 5 cards per turn.</li>
<li>Our deck has 16 cards: 1 card costs 0 energy, 12 cost 1 energy, 3
cost 2 energy, and 1 costs 3 energy.</li>
</ul>
<p>(I made up this deck for this example.)</p>
<p>If we build our deck, we can find the average cost of our cards and
simulate some draws by <a href="https://rdrr.io/r/base/sample.html" title="Documentation for sample()"><code class="language-plaintext highlighter-rouge">sample()</code></a>-ing without replacement.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">magrittr</span><span class="p">)</span><span class="w">
</span><span class="n">set.seed</span><span class="p">(</span><span class="m">20210707</span><span class="p">)</span><span class="w">
</span><span class="n">costs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">12</span><span class="p">),</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">),</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="n">costs</span><span class="w">
</span><span class="c1">#> [1] 0 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 3</span><span class="w">
</span><span class="n">mean</span><span class="p">(</span><span class="n">costs</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 1.235294</span><span class="w">
</span><span class="c1"># Simulate 3 hands</span><span class="w">
</span><span class="n">sample</span><span class="p">(</span><span class="n">costs</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 1 2 2 1 1</span><span class="w">
</span><span class="n">sample</span><span class="p">(</span><span class="n">costs</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 1 3 2 1 1</span><span class="w">
</span><span class="n">sample</span><span class="p">(</span><span class="n">costs</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 1 1 1 1 1</span><span class="w">
</span></code></pre></div></div>
<p>Suppose that we don’t really care about what the cards do. We want to
maximize the number of cards that we play per turn. We just want to
know: <strong>How many cards per turn can I expect to play on average?</strong></p>
<p>Let’s write a function that counts the number of playable cards in a
hand given a certain energy budget. The basic logic is that we sort the
card costs, compute the cumulative sum (cumulative energy spent on each
card), and count how many sums (played cards) are less than or equal to
the energy limit.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># A worked example</span><span class="w">
</span><span class="n">energy</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">3</span><span class="w">
</span><span class="n">hand</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="n">costs</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">)</span><span class="w">
</span><span class="n">hand</span><span class="w">
</span><span class="c1">#> [1] 3 1 0 1 1</span><span class="w">
</span><span class="n">sort</span><span class="p">(</span><span class="n">hand</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 0 1 1 1 3</span><span class="w">
</span><span class="nf">cumsum</span><span class="p">(</span><span class="n">sort</span><span class="p">(</span><span class="n">hand</span><span class="p">))</span><span class="w">
</span><span class="c1">#> [1] 0 1 2 3 6</span><span class="w">
</span><span class="nf">cumsum</span><span class="p">(</span><span class="n">sort</span><span class="p">(</span><span class="n">hand</span><span class="p">))</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">energy</span><span class="w">
</span><span class="c1">#> [1] TRUE TRUE TRUE TRUE FALSE</span><span class="w">
</span><span class="nf">sum</span><span class="p">(</span><span class="nf">cumsum</span><span class="p">(</span><span class="n">sort</span><span class="p">(</span><span class="n">hand</span><span class="p">))</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">energy</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 4</span><span class="w">
</span><span class="n">count_max_playable</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">hand</span><span class="p">,</span><span class="w"> </span><span class="n">energy</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nf">sum</span><span class="p">(</span><span class="nf">cumsum</span><span class="p">(</span><span class="n">sort</span><span class="p">(</span><span class="n">hand</span><span class="p">))</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">energy</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">count_max_playable</span><span class="p">(</span><span class="n">hand</span><span class="p">,</span><span class="w"> </span><span class="n">energy</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] 4</span><span class="w">
</span></code></pre></div></div>
<p>Now, we can do this procedure on several thousand hands and run summary
statistics on the number of playable cards in each hand.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">simulated_cards_played</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">replicate</span><span class="p">(</span><span class="w">
</span><span class="m">10000</span><span class="p">,</span><span class="w">
</span><span class="n">costs</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">sample</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">count_max_playable</span><span class="p">(</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">simulated_cards_played</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Min. 1st Qu. Median Mean 3rd Qu. Max. </span><span class="w">
</span><span class="c1">#> 2.000 3.000 3.000 3.173 3.000 4.000</span><span class="w">
</span><span class="n">table</span><span class="p">(</span><span class="n">simulated_cards_played</span><span class="p">)</span><span class="w">
</span><span class="c1">#> simulated_cards_played</span><span class="w">
</span><span class="c1">#> 2 3 4 </span><span class="w">
</span><span class="c1">#> 451 7364 2185</span><span class="w">
</span><span class="n">proportions</span><span class="p">(</span><span class="n">table</span><span class="p">(</span><span class="n">simulated_cards_played</span><span class="p">))</span><span class="w">
</span><span class="c1">#> simulated_cards_played</span><span class="w">
</span><span class="c1">#> 2 3 4 </span><span class="w">
</span><span class="c1">#> 0.0451 0.7364 0.2185</span><span class="w">
</span></code></pre></div></div>
<p>The expected number of playable cards per hand is 3.2. The dreaded
(1, 2, 2, 2, 3) hand appears about 4.5% of the time, but the
one 0-cost card in our deck lets us play a fourth card about 21.8% of
the time.</p>
<h2 id="enter-the-snecko">Enter the Snecko</h2>
<figure class="" style="max-width: 80%; display: block; margin: 2em auto;">
<img src="/assets/images/2021-07-snecko.png" alt="A screenshot of Snecko Eye" /><figcaption>
Snecko Eye is probably the best relic in the game.
</figcaption></figure>
<p>Let’s suppose we obtain the mighty <a href="https://slay-the-spire.fandom.com/wiki/Snecko_Eye" title="Snecko Eye on the Slay the Spire Fandom wiki">Snecko Eye</a> relic. It says
“Draw 2 additional cards each turn. Start each combat Confused.”
Confused is a debuff that randomizes the costs of cards when we draw
them. So now our setup is the following:</p>
<ul>
<li>We have 3 energy to play cards per turn.</li>
<li>We draw 7 cards per turn.</li>
<li>Our deck has 16 cards: the costs are random integers between 0 and 3
energy.</li>
</ul>
<p>The average energy cost of any given card in our deck is now
<code class="language-plaintext highlighter-rouge">mean(0:3)</code> = 1.5. In the baseline example, the average energy cost
was 1.24. (One obvious strategy with Snecko Eye is to maximize the costs of
new cards—that is, try to get as many as 2s and 3s as possible because
the new expected cost is less than the original cost. But let’s ignore that
dimension of gameplay for now.)</p>
<p>So here’s the puzzle, <strong>how many cards per turn can I play with Snecko
Eye?</strong> We can run the same simulations as above.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">snecko_costs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0</span><span class="o">:</span><span class="m">3</span><span class="w">
</span><span class="n">simulated_snecko_cards_played</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">replicate</span><span class="p">(</span><span class="w">
</span><span class="m">10000</span><span class="p">,</span><span class="w">
</span><span class="n">snecko_costs</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">sample</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">7</span><span class="p">,</span><span class="w"> </span><span class="n">replace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">count_max_playable</span><span class="p">(</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">simulated_snecko_cards_played</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Min. 1st Qu. Median Mean 3rd Qu. Max. </span><span class="w">
</span><span class="c1">#> 1.000 3.000 4.000 3.826 5.000 7.000</span><span class="w">
</span><span class="n">table</span><span class="p">(</span><span class="n">simulated_snecko_cards_played</span><span class="p">)</span><span class="w">
</span><span class="c1">#> simulated_snecko_cards_played</span><span class="w">
</span><span class="c1">#> 1 2 3 4 5 6 7 </span><span class="w">
</span><span class="c1">#> 83 979 2911 3397 1945 621 64</span><span class="w">
</span><span class="n">proportions</span><span class="p">(</span><span class="n">table</span><span class="p">(</span><span class="n">simulated_snecko_cards_played</span><span class="p">))</span><span class="w">
</span><span class="c1">#> simulated_snecko_cards_played</span><span class="w">
</span><span class="c1">#> 1 2 3 4 5 6 7 </span><span class="w">
</span><span class="c1">#> 0.0083 0.0979 0.2911 0.3397 0.1945 0.0621 0.0064</span><span class="w">
</span></code></pre></div></div>
<p>Let us note that the dream—playing 7 cards in one turn—happened
about 0.6% of the time and the nightmare—drawing only 2-cost
and 3-cost cards—happened 0.8% of the time. Recall that in the
baseline setup, we got to play 4 cards 21.8% of the time. With Snecko
Eye, we can play 4 or more cards per turn 60.3% of the time.
Snecko Eye simply lets us play more cards on average.</p>
<div class="notice--info">
<p><strong>Yes, we could skip the random sampling.</strong> For this problem where there
are 4^7 = 16384 combinations, a brute-force enumeration is possible. The
proportions from the counting from the full set are within .005 (half a
percentage point) of the proportions from simulating 10,000 hands.</p>
<pre>
# Generate all combinations
expand.grid(rep(list(0:3), 7)) %>%
# Count playables in each row
apply(MARGIN = 1, count_max_playable, energy = 3) %>%
table() %>%
proportions() %>%
round(3)
#> .
#> 1 2 3 4 5 6 7
#> 0.008 0.098 0.287 0.341 0.200 0.059 0.007
</pre>
</div>
<h3 id="where-does-this-power-come-from">Where does this power come from?</h3>
<p>Is the magic of Snecko Eye the card draw or the cost randomization?
Well, let’s suppose that we are just confused and we draw only 5 cards
(as in the baseline example).</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">simulated_confused_cards_played</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">replicate</span><span class="p">(</span><span class="w">
</span><span class="m">10000</span><span class="p">,</span><span class="w">
</span><span class="n">snecko_costs</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">sample</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">replace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">count_max_playable</span><span class="p">(</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">simulated_confused_cards_played</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Min. 1st Qu. Median Mean 3rd Qu. Max. </span><span class="w">
</span><span class="c1">#> 1.00 2.00 3.00 3.03 4.00 5.00</span><span class="w">
</span><span class="n">table</span><span class="p">(</span><span class="n">simulated_confused_cards_played</span><span class="p">)</span><span class="w">
</span><span class="c1">#> simulated_confused_cards_played</span><span class="w">
</span><span class="c1">#> 1 2 3 4 5 </span><span class="w">
</span><span class="c1">#> 358 2496 4162 2454 530</span><span class="w">
</span><span class="n">proportions</span><span class="p">(</span><span class="n">table</span><span class="p">(</span><span class="n">simulated_confused_cards_played</span><span class="p">))</span><span class="w">
</span><span class="c1">#> simulated_confused_cards_played</span><span class="w">
</span><span class="c1">#> 1 2 3 4 5 </span><span class="w">
</span><span class="c1">#> 0.0358 0.2496 0.4162 0.2454 0.0530</span><span class="w">
</span></code></pre></div></div>
<p>Here the average number of cards played is 3.0 and we play 4–5 cards
per turn 29.8% of the time. This percentage is greater than the
baseline case (21.8%), but the nightmare case is worse (1 card),
occurring 3.6% of the time.</p>
<p>We can plot the three simulations side by side and observe the
distributions. First, we package them together into a single dataframe
suitable for plotting and plot a bar chart.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">sim1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">set</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Baseline"</span><span class="p">,</span><span class="w">
</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w">
</span><span class="n">cards</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">simulated_cards_played</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">sim2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">set</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Snecko Eye"</span><span class="p">,</span><span class="w">
</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w">
</span><span class="n">cards</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">simulated_snecko_cards_played</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">sim3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">set</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Confused"</span><span class="p">,</span><span class="w">
</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w">
</span><span class="n">cards</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">simulated_confused_cards_played</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">sims</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind</span><span class="p">(</span><span class="n">sim1</span><span class="p">,</span><span class="w"> </span><span class="n">sim2</span><span class="p">,</span><span class="w"> </span><span class="n">sim3</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">sims</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cards</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_bar</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stat</span><span class="p">(</span><span class="n">prop</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">facet_wrap</span><span class="p">(</span><span class="s2">"set"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_continuous</span><span class="p">(</span><span class="n">breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">7</span><span class="p">,</span><span class="w"> </span><span class="n">minor_breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">scales</span><span class="o">::</span><span class="n">label_percent</span><span class="p">())</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="w">
</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Confusion increases variance. Card draw increases mean."</span><span class="p">,</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Number of playable of cards in hand"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Percentage of hands"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"N = 10,000 simulations per panel"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2021-07-07-slay-the-spire-snecko-eye-simulation/bar-by-three-1.png" title="A one by three plot showing a histogram of playable cards for each simulation set. It's title says 'Confusion increases variance. Card draw increases mean.'" alt="A one by three plot showing a histogram of playable cards for each simulation set. It's title says 'Confusion increases variance. Card draw increases mean.'" width="100%" style="display: block; margin: auto;" /></p>
<p>Both the confused and the Snecko Eye panels have increased variance. The
bars are shorter and more spread out, compared to the Baseline panel.
The peak (the mode) shifts from 3 to 4 cards from the Confused and Snecko
Eye panels.</p>
<p>A more statistically niche technique would be plotting the <a href="https://en.wikipedia.org/wiki/Empirical_distribution_function" title="Empirical distribution function on Wikipedia">empirical
cumulative distribution function</a>. Imagine taking the bars from
the previous plot and summing them along the <em>x</em> axis so that they are
cumulative percentages. These percentages would tell you about the
percentage of cases less than or equal to that given value. In the plot
below, I do that procedure on reversed <em>x</em> axis, so we can look at what
proportion of simulations had at least 4 cards played. (I chose the
reversed <em>x</em> axis to visually convey the advantage of Snecko Eye.)</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span><span class="n">props</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sims</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">count</span><span class="p">(</span><span class="n">set</span><span class="p">,</span><span class="w"> </span><span class="n">cards</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="c1"># Fill in rows that would be n = 0</span><span class="w">
</span><span class="n">tidyr</span><span class="o">::</span><span class="n">complete</span><span class="p">(</span><span class="n">set</span><span class="p">,</span><span class="w"> </span><span class="n">cards</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">7</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="c1"># Compute ECDF in reverse order (dtarting at 7 cards)</span><span class="w">
</span><span class="n">arrange</span><span class="p">(</span><span class="n">set</span><span class="p">,</span><span class="w"> </span><span class="n">desc</span><span class="p">(</span><span class="n">cards</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">set</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="w">
</span><span class="n">proportion</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">n</span><span class="p">),</span><span class="w">
</span><span class="n">ecdf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">cumsum</span><span class="p">(</span><span class="n">proportion</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">ungroup</span><span class="p">()</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">props</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cards</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_step</span><span class="p">(</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ecdf</span><span class="w"> </span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">set</span><span class="p">,</span><span class="w"> </span><span class="n">linetype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">set</span><span class="p">),</span><span class="w">
</span><span class="n">direction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"mid"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_label</span><span class="p">(</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">set</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ecdf</span><span class="p">),</span><span class="w">
</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Snecko can play 4 or more\ncards in 60% of hands"</span><span class="p">,</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">set</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"Snecko Eye"</span><span class="p">,</span><span class="w"> </span><span class="n">cards</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">4</span><span class="p">),</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">.65</span><span class="p">,</span><span class="w">
</span><span class="n">nudge_x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-.25</span><span class="p">,</span><span class="w">
</span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1.0</span><span class="p">,</span><span class="w">
</span><span class="n">vjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w">
</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">scales</span><span class="o">::</span><span class="n">alpha</span><span class="p">(</span><span class="s2">"grey93"</span><span class="p">,</span><span class="w"> </span><span class="m">.6</span><span class="p">),</span><span class="w">
</span><span class="n">label.size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w">
</span><span class="n">show.legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w">
</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4.5</span><span class="p">,</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_reverse</span><span class="p">(</span><span class="n">breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">7</span><span class="o">:</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">minor_breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">scales</span><span class="o">::</span><span class="n">label_percent</span><span class="p">())</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"At least X playable cards in hand"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Percentage of hands"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"N = 10,000 simulations per line"</span><span class="p">,</span><span class="w">
</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">,</span><span class="w">
</span><span class="n">linetype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="w">
</span><span class="n">legend.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"top"</span><span class="p">,</span><span class="w">
</span><span class="n">legend.justification</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"left"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2021-07-07-slay-the-spire-snecko-eye-simulation/ecdf-1.png" title="A plot of the ECDF for the proportion of hands with at least x playable cards. Snecko dominates the other lines in the plot because we can play more cards per turn with it. There is a caption at the center of the plot that says 'Snecko can play 4 or more cards in 60% of hands'." alt="A plot of the ECDF for the proportion of hands with at least x playable cards. Snecko dominates the other lines in the plot because we can play more cards per turn with it. There is a caption at the center of the plot that says 'Snecko can play 4 or more cards in 60% of hands'." width="80%" style="display: block; margin: auto;" /></p>
<h3 id="the-advantage-at-higher-energy">The advantage at higher energy</h3>
<p>During a run through the game, we can obtain up to two relics (along
with Snecko Eye) that increase our energy per turn by 1 unit. Let’s see
how these new energy budgets affect the simulations.</p>
<p>First, we run the simulations. We put the main code into functions so that
we can build the dataframes more easily.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">simulate_decko</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">energy</span><span class="p">,</span><span class="w"> </span><span class="n">costs</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">replicate</span><span class="p">(</span><span class="w">
</span><span class="n">n</span><span class="p">,</span><span class="w">
</span><span class="n">sample</span><span class="p">(</span><span class="n">costs</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">size</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">count_max_playable</span><span class="p">(</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">energy</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">simulate_snecko</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">energy</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">7</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">snecko_costs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0</span><span class="o">:</span><span class="m">3</span><span class="w">
</span><span class="n">replicate</span><span class="p">(</span><span class="w">
</span><span class="n">n</span><span class="p">,</span><span class="w">
</span><span class="n">sample</span><span class="p">(</span><span class="n">snecko_costs</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">size</span><span class="p">,</span><span class="w"> </span><span class="n">replace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">count_max_playable</span><span class="p">(</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">energy</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">additional_sims</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind</span><span class="p">(</span><span class="w">
</span><span class="c1"># include old results</span><span class="w">
</span><span class="n">sims</span><span class="p">,</span><span class="w">
</span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">set</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Baseline"</span><span class="p">,</span><span class="w">
</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w">
</span><span class="n">cards</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">simulate_decko</span><span class="p">(</span><span class="m">10000</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="n">costs</span><span class="p">)</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">set</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Baseline"</span><span class="p">,</span><span class="w">
</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w">
</span><span class="n">cards</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">simulate_decko</span><span class="p">(</span><span class="m">10000</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">costs</span><span class="p">)</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">set</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Snecko Eye"</span><span class="p">,</span><span class="w">
</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w">
</span><span class="n">cards</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">simulate_snecko</span><span class="p">(</span><span class="m">10000</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">)</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">set</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Snecko Eye"</span><span class="p">,</span><span class="w">
</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w">
</span><span class="n">cards</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">simulate_snecko</span><span class="p">(</span><span class="m">10000</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">)</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">set</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Confused"</span><span class="p">,</span><span class="w">
</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w">
</span><span class="n">cards</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">simulate_snecko</span><span class="p">(</span><span class="m">10000</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">)</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">data.frame</span><span class="p">(</span><span class="w">
</span><span class="n">set</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Confused"</span><span class="p">,</span><span class="w">
</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w">
</span><span class="n">cards</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">simulate_snecko</span><span class="p">(</span><span class="m">10000</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>We can make the same kind of plot as before. We see that the
distribution with the highest mode (the peak that lands on the highest
number of cards) in each row is Snecko Eye.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">(</span><span class="n">additional_sims</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">energy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="n">energy</span><span class="p">,</span><span class="w"> </span><span class="s2">" energy"</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cards</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_bar</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stat</span><span class="p">(</span><span class="n">prop</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">facet_grid</span><span class="p">(</span><span class="n">energy</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">set</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_continuous</span><span class="p">(</span><span class="n">breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">7</span><span class="p">,</span><span class="w"> </span><span class="n">minor_breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">scales</span><span class="o">::</span><span class="n">label_percent</span><span class="p">())</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Number of playable of cards in hand"</span><span class="p">,</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Percentage of hands"</span><span class="p">,</span><span class="w">
</span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"N = 10,000 simulations per panel"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="/figs/2021-07-07-slay-the-spire-snecko-eye-simulation/3-by-3-1.png" title="center" alt="center" width="100%" style="display: block; margin: auto;" /></p>
<p>One limitation of the other two non-Snecko sets becomes more obvious in
the 5-energy row: They can never play 6 or 7 cards in a turn. They don’t
draw that many cards. Their distributions are cut off at 5 cards.</p>
<p>If we look at numerical summaries, we get some sense that the benefit of
Snecko diminishes as energy increases but we won’t explore this trend in
any detail.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">additional_sims</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">group_by</span><span class="p">(</span><span class="n">set</span><span class="p">,</span><span class="w"> </span><span class="n">energy</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">summarise</span><span class="p">(</span><span class="w">
</span><span class="n">mean</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mean</span><span class="p">(</span><span class="n">cards</span><span class="p">),</span><span class="w">
</span><span class="n">sd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sd</span><span class="p">(</span><span class="n">cards</span><span class="p">),</span><span class="w">
</span><span class="n">median</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">median</span><span class="p">(</span><span class="n">cards</span><span class="p">),</span><span class="w">
</span><span class="n">.groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"drop"</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">tidyr</span><span class="o">::</span><span class="n">pivot_longer</span><span class="p">(</span><span class="n">cols</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">mean</span><span class="p">,</span><span class="w"> </span><span class="n">sd</span><span class="p">,</span><span class="w"> </span><span class="n">median</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">tidyr</span><span class="o">::</span><span class="n">pivot_wider</span><span class="p">(</span><span class="w">
</span><span class="n">names_from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">energy</span><span class="p">,</span><span class="w">
</span><span class="n">values_from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w">
</span><span class="n">names_prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Energy "</span><span class="w">
</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">rename</span><span class="p">(</span><span class="n">Set</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">set</span><span class="p">,</span><span class="w"> </span><span class="n">Statistic</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">arrange</span><span class="p">(</span><span class="n">Statistic</span><span class="p">,</span><span class="w"> </span><span class="n">Set</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">knitr</span><span class="o">::</span><span class="n">kable</span><span class="p">(</span><span class="n">digits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<table>
<thead>
<tr>
<th style="text-align: left">Set</th>
<th style="text-align: left">Statistic</th>
<th style="text-align: right">Energy 3</th>
<th style="text-align: right">Energy 4</th>
<th style="text-align: right">Energy 5</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">Baseline</td>
<td style="text-align: left">mean</td>
<td style="text-align: right">3.17</td>
<td style="text-align: right">3.81</td>
<td style="text-align: right">4.27</td>
</tr>
<tr>
<td style="text-align: left">Confused</td>
<td style="text-align: left">mean</td>
<td style="text-align: right">3.03</td>
<td style="text-align: right">3.44</td>
<td style="text-align: right">3.79</td>
</tr>
<tr>
<td style="text-align: left">Snecko Eye</td>
<td style="text-align: left">mean</td>
<td style="text-align: right">3.83</td>
<td style="text-align: right">4.30</td>
<td style="text-align: right">4.72</td>
</tr>
<tr>
<td style="text-align: left">Baseline</td>
<td style="text-align: left">median</td>
<td style="text-align: right">3.00</td>
<td style="text-align: right">4.00</td>
<td style="text-align: right">4.00</td>
</tr>
<tr>
<td style="text-align: left">Confused</td>
<td style="text-align: left">median</td>
<td style="text-align: right">3.00</td>
<td style="text-align: right">3.00</td>
<td style="text-align: right">4.00</td>
</tr>
<tr>
<td style="text-align: left">Snecko Eye</td>
<td style="text-align: left">median</td>
<td style="text-align: right">4.00</td>
<td style="text-align: right">4.00</td>
<td style="text-align: right">5.00</td>
</tr>
<tr>
<td style="text-align: left">Baseline</td>
<td style="text-align: left">sd</td>
<td style="text-align: right">0.48</td>
<td style="text-align: right">0.56</td>
<td style="text-align: right">0.53</td>
</tr>
<tr>
<td style="text-align: left">Confused</td>
<td style="text-align: left">sd</td>
<td style="text-align: right">0.92</td>
<td style="text-align: right">0.89</td>
<td style="text-align: right">0.84</td>
</tr>
<tr>
<td style="text-align: left">Snecko Eye</td>
<td style="text-align: left">sd</td>
<td style="text-align: right">1.11</td>
<td style="text-align: right">1.08</td>
<td style="text-align: right">1.06</td>
</tr>
</tbody>
</table>
<h3 id="you-should-probably-play-snecko">You should probably play Snecko</h3>
<figure class="align-right">
<img src="/assets/images/2021-07-no-steppo.jpg" alt="Don't tread on me flag with the Snecko Eye flag." /><figcaption>
Pwease no steppo. Posted by <a href="https://www.reddit.com/r/slaythespire/comments/l0qm6z/no_step_on_snecko/?ref=share&ref_source=link">u/usernameequalspants</a>.
</figcaption></figure>
<p>We could go on and on with the simulations. Suppose you are dying and
you are desperate need of the <a href="https://slay-the-spire.fandom.com/wiki/Apparition" title="Apparition on the Slay the Spire Fandom wiki">Apparition</a> on the top of your deck.
How many cards can you play after you are forced to play that card? Or
suppose that you are, rightly, playing on Ascension 20 and one of the
cards is an unplayable curse. Does that change anything?</p>
<p>The point here is that we used simulations to visualize how
randomization increased the variance of playable cards but the extra
cards shifted the mode of the distribution upwards. You can play more
cards with Snecko Eye because you simply have more cards you can play
per turn.</p>
<hr />
<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2021-07-07-slay-the-spire-snecko-eye-simulation.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:si" role="doc-endnote">
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#> ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> setting value</span><span class="w">
</span><span class="c1">#> version R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#> os Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#> system x86_64, mingw32</span><span class="w">
</span><span class="c1">#> ui RTerm</span><span class="w">
</span><span class="c1">#> language (EN)</span><span class="w">
</span><span class="c1">#> collate English_United States.utf8</span><span class="w">
</span><span class="c1">#> ctype English_United States.utf8</span><span class="w">
</span><span class="c1">#> tz America/Chicago</span><span class="w">
</span><span class="c1">#> date 2022-05-27</span><span class="w">
</span><span class="c1">#> pandoc NA</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> package * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> dplyr * 1.0.9 2022-04-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> farver 2.1.0 2021-02-28 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> git2r 0.30.1 2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> here 1.0.1 2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> knitr * 1.39 2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> labeling 0.4.2 2020-10-20 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> magrittr * 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ragg 1.2.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#> [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div> </div>
<p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Tristan Mahrtjmahrweb@gmail.comA simulation study of the mighty *Slay the Spire* relicThink of `&&` as a stricter `&`2021-07-01T00:00:00-05:002021-07-01T00:00:00-05:00https://tjmahr.github.io/think-of-stricter-logical-operators<p>In programming languages, we find logical operators for <em>and</em>
and <em>or</em>. In fact, Python uses the actual words <code class="language-plaintext highlighter-rouge">and</code> and <code class="language-plaintext highlighter-rouge">or</code>
for these operators.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Python via the reticulate package
</span><span class="n">x</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">y</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">x</span> <span class="ow">and</span> <span class="n">y</span>
<span class="c1">#> False
</span><span class="n">x</span> <span class="ow">or</span> <span class="n">y</span>
<span class="c1">#> True
</span></code></pre></div></div>
<p>In Javascript, we see <code class="language-plaintext highlighter-rouge">&&</code> for <em>and</em> and <code class="language-plaintext highlighter-rouge">||</code> for <em>or</em> instead.</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Javascript via `engine = "node"` in knitr</span>
<span class="kd">let</span> <span class="nx">x</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">y</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">x</span> <span class="o">&&</span> <span class="nx">y</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">x</span> <span class="o">||</span> <span class="nx">y</span><span class="p">);</span>
<span class="c1">// false</span>
<span class="c1">// true</span>
</code></pre></div></div>
<p>In R, we have <em>two</em> versions of logical <em>and</em> (<code class="language-plaintext highlighter-rouge">&</code> and <code class="language-plaintext highlighter-rouge">&&</code>) and logical
<em>or</em> (<code class="language-plaintext highlighter-rouge">|</code> and <code class="language-plaintext highlighter-rouge">||</code>). What’s going on?</p>
<p>Documentation in <a href="https://rdrr.io/r/base/Logic.html"><code class="language-plaintext highlighter-rouge">help("Logic",
package = "base")</code></a> provides the
following:</p>
<blockquote>
<p><code class="language-plaintext highlighter-rouge">&</code> and <code class="language-plaintext highlighter-rouge">&&</code> indicate logical AND and <code class="language-plaintext highlighter-rouge">|</code> and <code class="language-plaintext highlighter-rouge">||</code> indicate logical
OR. The shorter form performs elementwise comparisons in much the same
way as arithmetic operators. The longer form evaluates left to right
examining only the first element of each vector. Evaluation proceeds
only until the result is determined. The longer form is appropriate
for programming control-flow and typically preferred in <code class="language-plaintext highlighter-rouge">if</code> clauses.</p>
</blockquote>
<p>Let’s unpack this paragraph</p>
<h2 id="the-shorter-operators-are-vectorized">The shorter operators are vectorized</h2>
<p>The crucial difference is that the shorter versions (<code class="language-plaintext highlighter-rouge">&</code>, <code class="language-plaintext highlighter-rouge">|</code>) are
vectorized. Given two vectors, they will apply logical <em>and</em>/<em>or</em> on
pairs of elements from each vector. In the example below, <code class="language-plaintext highlighter-rouge">ttff[1]</code> is
<em>and</em>-ed with <code class="language-plaintext highlighter-rouge">tftf[1]</code>, <code class="language-plaintext highlighter-rouge">ttff[2]</code> is <em>and</em>-ed with <code class="language-plaintext highlighter-rouge">tftf[2]</code>, and so
on. This vectorization is a pretty important feature, for example, when
we are comparing columns in a dataframe in order to filter rows.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Returns something of length four</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">tftf</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> [1] TRUE FALSE FALSE FALSE</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> [1] TRUE TRUE TRUE FALSE</span><span class="w">
</span></code></pre></div></div>
<p>In contrast, <code class="language-plaintext highlighter-rouge">&&</code> and <code class="language-plaintext highlighter-rouge">||</code> only work on <em>scalars</em> (length-one values).
They return just one element. In this example, they look at <code class="language-plaintext highlighter-rouge">ttff[1]</code>
and <code class="language-plaintext highlighter-rouge">tftf[1]</code>.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># R 4.1.2 behavior</span><span class="w">
</span><span class="c1"># Returns something of length one</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span></code></pre></div></div>
<div class="notice--info">
<p><strong>Update: R is stricter now about some behaviors I had documented in
this post</strong>. R 4.2.0 was released after this post was written. Two
<a href="https://cran.rstudio.com/bin/windows/base/NEWS.R-4.2.0.html">changes in that
version</a>
change the behavior in this post. Here are the bullet points from the
release notes:</p>
<ul>
<li>
<p>Calling <code class="language-plaintext highlighter-rouge">&&</code> or <code class="language-plaintext highlighter-rouge">||</code> with either argument of length greater than one
now gives a warning (which it is intended will become an error).</p>
</li>
<li>
<p>Calling <code class="language-plaintext highlighter-rouge">if()</code> or <code class="language-plaintext highlighter-rouge">while()</code> with a condition of length greater than
one gives an error rather than a warning. Consequently, environment variable <code class="language-plaintext highlighter-rouge">_R_CHECK_LENGTH_1_CONDITION_</code> no longer has any effect.</p>
</li>
</ul>
<p>These are good changes. They make it harder to do the wrong thing. But
they also make this post inaccurate. Like, comically inaccurate: I went
out of my way to demonstrate <code class="language-plaintext highlighter-rouge">_R_CHECK_LENGTH_1_CONDITION_</code> and now it
doesn’t do anything. I have updated some examples to show old and new
behaviors.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># R current behavior</span><span class="w">
</span><span class="c1"># Returns something of length one</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> Warning in ttff && tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)'</span><span class="w">
</span><span class="c1">#> Warning in ttff && tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> Warning in ttff || tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span></code></pre></div></div>
<p>The fact that <code class="language-plaintext highlighter-rouge">&&</code> throws two warnings and <code class="language-plaintext highlighter-rouge">||</code> throws one warning follows
from the short-circuit behavior discussed below in this post. [<em>May 25, 2022</em>]</p>
</div>
<p>To help remember the distinction, <strong>think of the longer versions (<code class="language-plaintext highlighter-rouge">&&</code>,
<code class="language-plaintext highlighter-rouge">||</code>) as <em>stricter</em> forms of the logical operators</strong>. They don’t just care
about truthiness or falsiness, but they also care about length. The
extra <em>and</em>/<em>or</em> characters are there because the operators are
<a href="http://extra.urbanup.com/245251#.YN3RSfYxVDc.twitter">extra</a>, if you will.</p>
<p>Okay, that was the point of this post: to describe the difference
between the short and long operators and introduce the intuition that
the longer forms are stricter. The rest of this post will dig into some
other oddities and notes about <em>and</em> and <em>or</em>.</p>
<h2 id="short-circuit-evaluation">Short circuit evaluation</h2>
<p>You may have noticed a strange or unclear detail in the documentation.</p>
<blockquote>
<p>The longer form evaluates left to right examining only the first
element of each vector. Evaluation proceeds only until the result is
determined.</p>
</blockquote>
<p>This part is describing the semantics of <a href="https://en.wikipedia.org/wiki/Short-circuit_evaluation">short-circuit
evaluation</a>.
Here are two facts about <em>and</em> and <em>or</em>:</p>
<ul>
<li><em>x and y</em> is false when either <em>x</em> or <em>y</em> is false. Therefore, if
<em>x</em> is false, we don’t need to look at <em>y</em> at all.</li>
<li><em>x or y</em> is true when either <em>x</em> or <em>y</em> is true. Therefore, if <em>x</em>
is true, we don’t need to look at <em>y</em> at all.</li>
</ul>
<p>R will not evaluate the second operand for <code class="language-plaintext highlighter-rouge">&&</code> and <code class="language-plaintext highlighter-rouge">||</code> if it can learn
the answer from the first operand. Thus, short-circuit evaluation will
ignore the <code class="language-plaintext highlighter-rouge">stop()</code> calls in the examples below.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kc">FALSE</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">stop</span><span class="p">(</span><span class="s2">"this is an error"</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] FALSE</span><span class="w">
</span><span class="kc">TRUE</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">stop</span><span class="p">(</span><span class="s2">"this is an error"</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="c1"># Short-circuiting doesn't apply. Need the second operand.</span><span class="w">
</span><span class="kc">TRUE</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">stop</span><span class="p">(</span><span class="s2">"this is an error"</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Error in eval(expr, envir, enclos): this is an error</span><span class="w">
</span><span class="kc">FALSE</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">stop</span><span class="p">(</span><span class="s2">"this is an error"</span><span class="p">)</span><span class="w">
</span><span class="c1">#> Error in eval(expr, envir, enclos): this is an error</span><span class="w">
</span></code></pre></div></div>
<h3 id="the-null-or-default-pattern">The NULL-or-default pattern</h3>
<p>Logically, the short-circuit evaluation for <code class="language-plaintext highlighter-rouge">||</code> is equivalent to a
particular kind of <em>if</em> statement:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">if_x_then_x_else_y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="n">y</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">if_x_then_x_else_y</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="c1"># short-circuited</span><span class="w">
</span><span class="n">if_x_then_x_else_y</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">stop</span><span class="p">(</span><span class="s2">"this is an error"</span><span class="p">))</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span></code></pre></div></div>
<p>In languages that treat undefined values as falsy and defined values
as truthy, this <code class="language-plaintext highlighter-rouge">if (x) x else y</code> behavior is sometimes used as an idiom
to set a default, backup value for a variable. In the Javascript code below,
the undefined variable <code class="language-plaintext highlighter-rouge">name</code> is treated as falsy, the
string <code class="language-plaintext highlighter-rouge">"I don't know your name"</code> is treated as truthy, so the <em>or</em>
returns the second string. In other words, <a href="https://flexdinesh.github.io/short-circuit-assignment-in-javascript/">“the first truthy value is
returned”</a>.</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">let</span> <span class="nx">name</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">fallback</span> <span class="o">=</span> <span class="nx">name</span> <span class="o">||</span> <span class="dl">"</span><span class="s2">I don't know your name!</span><span class="dl">"</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">name</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">fallback</span><span class="p">);</span>
<span class="c1">// undefined</span>
<span class="c1">// I don't know your name!</span>
</code></pre></div></div>
<p>(This pattern, incidentally, appears to have earned its own operator in
Javascript with the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Nullish_coalescing_operator">“nullish coalescing operator”
<code class="language-plaintext highlighter-rouge">??</code></a>.)</p>
<p>Why do I mention this programming idiom from Javascript? Because setting
a default for missing values is pretty useful, and this syntax is pretty
nice. The tidyverse provides <a href="https://rlang.r-lib.org/reference/op-null-default.html">a null coalescing
operator</a>,
inspired by Ruby’s <code class="language-plaintext highlighter-rouge">||</code> operator.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">purrr</span><span class="p">)</span><span class="w">
</span><span class="m">1</span><span class="w"> </span><span class="o">%||%</span><span class="w"> </span><span class="m">2</span><span class="w">
</span><span class="c1">#> [1] 1</span><span class="w">
</span><span class="kc">NULL</span><span class="w"> </span><span class="o">%||%</span><span class="w"> </span><span class="m">2</span><span class="w">
</span><span class="c1">#> [1] 2</span><span class="w">
</span><span class="c1"># Not exactly or-like. It just cares about NULL-ness.</span><span class="w">
</span><span class="kc">FALSE</span><span class="w"> </span><span class="o">%||%</span><span class="w"> </span><span class="m">2</span><span class="w">
</span><span class="c1">#> [1] FALSE</span><span class="w">
</span><span class="c1"># See the source code</span><span class="w">
</span><span class="n">`%||%`</span><span class="w">
</span><span class="c1">#> function (x, y) </span><span class="w">
</span><span class="c1">#> {</span><span class="w">
</span><span class="c1">#> if (is_null(x)) </span><span class="w">
</span><span class="c1">#> y</span><span class="w">
</span><span class="c1">#> else x</span><span class="w">
</span><span class="c1">#> }</span><span class="w">
</span><span class="c1">#> <bytecode: 0x00000262684332f0></span><span class="w">
</span><span class="c1">#> <environment: namespace:rlang></span><span class="w">
</span></code></pre></div></div>
<h2 id="if-statements-want-the-stricter-operators"><code class="language-plaintext highlighter-rouge">if()</code> statements want the stricter operators</h2>
<p>Recall the following from the documentation:</p>
<blockquote>
<p>The longer form is appropriate for programming control-flow and
typically preferred in <code class="language-plaintext highlighter-rouge">if</code> clauses.</p>
</blockquote>
<p><code class="language-plaintext highlighter-rouge">if()</code> statements are not vectorized. (See
<a href="https://rdrr.io/r/base/ifelse.html"><code class="language-plaintext highlighter-rouge">ifelse()</code></a> instead.) <code class="language-plaintext highlighter-rouge">if()</code>
statements <del>complain</del> <u>error</u> when they see a vector:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># R 4.1.2 behavior</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ttff</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">tftf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="s2">"We used to get a warning (before R 4.2.0)."</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">#> Warning in if (ttff | tftf) {: the condition has length > 1 and only the first</span><span class="w">
</span><span class="c1">#> element will be used</span><span class="w">
</span><span class="c1">#> [1] "We used to get a warning (before R 4.2.0)." </span><span class="w">
</span></code></pre></div></div>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># R current behavior</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ttff</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">tftf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="s2">"We used to get a warning (before R 4.2.0)."</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">#> Error in if (ttff | tftf) {: the condition has length > 1</span><span class="w">
</span></code></pre></div></div>
<p>The idea behind the documentation is that because <code class="language-plaintext highlighter-rouge">&</code> and <code class="language-plaintext highlighter-rouge">|</code> return
vectors and because <code class="language-plaintext highlighter-rouge">if()</code> only likes scalars, we should not use the
shorter forms in <code class="language-plaintext highlighter-rouge">if()</code> statements. They provide the wrong output for
<code class="language-plaintext highlighter-rouge">if()</code>. But that <em>does not mean</em> the following code with the stricter
<em>or</em> operator is correct.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># R 4.1.2 behavior</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ttff</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">tftf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="s2">"We used to not get a warning (before R 4.2.0), "</span><span class="p">,</span><span class="w">
</span><span class="s2">"even though this code is not right."</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">#> [1] "We used to not get a warning (before R 4.2.0), "</span><span class="w">
</span><span class="c1">#> [2] "even though this code is not right." </span><span class="w">
</span></code></pre></div></div>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># R current behavior</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ttff</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">tftf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="s2">"We used to not get a warning (before R 4.2.0), "</span><span class="p">,</span><span class="w">
</span><span class="s2">"even though this code is not right."</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">#> Warning in ttff || tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)'</span><span class="w">
</span><span class="c1">#> [1] "We used to not get a warning (before R 4.2.0), "</span><span class="w">
</span><span class="c1">#> [2] "even though this code is not right."</span><span class="w">
</span></code></pre></div></div>
<p>Although the code here <del>does not</del> <u>used to not</u> raise any warnings, it
reduces all of the information in <code class="language-plaintext highlighter-rouge">ttff</code> and <code class="language-plaintext highlighter-rouge">tftf</code> into just <code class="language-plaintext highlighter-rouge">ttff[1]</code>
and <code class="language-plaintext highlighter-rouge">tftf[1]</code>. Those values likely will not be appropriate for the
programming task at hand, so we should provide the scalars ourselves.</p>
<h3 id="all-and-any-can-apply-and-and-or-down-a-vector"><code class="language-plaintext highlighter-rouge">all()</code> and <code class="language-plaintext highlighter-rouge">any()</code> can apply <em>and</em> and <em>or</em> down a vector</h3>
<p>Because I am talking about <em>and</em> and <em>or</em> and about creating logical
scalars, I want to advertise two particular functions that can reduce a
logical vector into a scalar. <a href="https://rdrr.io/r/base/all.html"><code class="language-plaintext highlighter-rouge">all()</code></a>
is <code class="language-plaintext highlighter-rouge">TRUE</code> when all of the elements are <code class="language-plaintext highlighter-rouge">TRUE</code>. For a vector, it would be
like replacing the commas in <code class="language-plaintext highlighter-rouge">c(TRUE, TRUE, FALSE)</code> with <code class="language-plaintext highlighter-rouge">&</code>s.
<a href="https://rdrr.io/r/base/any.html"><code class="language-plaintext highlighter-rouge">any()</code></a> provides the analogous
down-vector behavior for <code class="language-plaintext highlighter-rouge">|</code>. <code class="language-plaintext highlighter-rouge">any()</code> is <code class="language-plaintext highlighter-rouge">TRUE</code> when any of the elements
in the vector are <code class="language-plaintext highlighter-rouge">TRUE</code> (and not <code class="language-plaintext highlighter-rouge">NA</code>—more on that later).</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">all</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="nf">c</span><span class="p">(</span><span class="kc">TRUE</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="kc">TRUE</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="nf">all</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w">
</span><span class="c1">#> [1] FALSE</span><span class="w">
</span><span class="nf">c</span><span class="p">(</span><span class="kc">TRUE</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="kc">FALSE</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] FALSE</span><span class="w">
</span><span class="nf">c</span><span class="p">(</span><span class="kc">TRUE</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="kc">FALSE</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="c1"># The input can be scalars or vectors</span><span class="w">
</span><span class="nf">all</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">))</span><span class="w">
</span><span class="c1">#> [1] FALSE</span><span class="w">
</span><span class="nf">any</span><span class="p">(</span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="c1"># These appear not to short circuit</span><span class="w">
</span><span class="nf">any</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">stop</span><span class="p">(</span><span class="s2">"this is an error"</span><span class="p">))</span><span class="w">
</span><span class="c1">#> Error in eval(expr, envir, enclos): this is an error</span><span class="w">
</span><span class="nf">all</span><span class="p">(</span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">stop</span><span class="p">(</span><span class="s2">"this is an error"</span><span class="p">))</span><span class="w">
</span><span class="c1">#> Error in eval(expr, envir, enclos): this is an error</span><span class="w">
</span></code></pre></div></div>
<h3 id="we-can-make-the-strict-operators-even-stricter">We can make the strict operators even stricter</h3>
<p>Recall the following unsettling example where just <code class="language-plaintext highlighter-rouge">ttff[1]</code> and
<code class="language-plaintext highlighter-rouge">tftf[1]</code> are considered in the <code class="language-plaintext highlighter-rouge">if()</code> statement.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># R 4.1.2 behavior</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ttff</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">tftf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="s2">"We used to get a warning (before R 4.2.0)."</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">#> Warning in if (ttff | tftf) {: the condition has length > 1 and only the first</span><span class="w">
</span><span class="c1">#> element will be used</span><span class="w">
</span><span class="c1">#> [1] "We used to get a warning (before R 4.2.0)." </span><span class="w">
</span></code></pre></div></div>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># R current behavior</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ttff</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">tftf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="s2">"We used to get a warning (before R 4.2.0)."</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">#> Error in if (ttff | tftf) {: the condition has length > 1</span><span class="w">
</span></code></pre></div></div>
<p>It would be nice to rule out this behavior outright and make this
behavior illegal. In fact, we can make our code stricter by setting the
system environment variable <code class="language-plaintext highlighter-rouge">_R_CHECK_LENGTH_1_LOGIC2_</code>. Once set, using
the strict forms on inputs longer than 1 will throw an error.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># R 4.1.2 behavior</span><span class="w">
</span><span class="n">Sys.setenv</span><span class="p">(</span><span class="s2">"_R_CHECK_LENGTH_1_LOGIC2_"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"TRUE"</span><span class="p">)</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> Error in ttff && tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)'</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> Error in ttff || tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)'</span><span class="w">
</span><span class="c1"># Default behavior</span><span class="w">
</span><span class="n">Sys.unsetenv</span><span class="p">(</span><span class="s2">"_R_CHECK_LENGTH_1_LOGIC2_"</span><span class="p">)</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span></code></pre></div></div>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># R current behavior</span><span class="w">
</span><span class="n">Sys.setenv</span><span class="p">(</span><span class="s2">"_R_CHECK_LENGTH_1_LOGIC2_"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"TRUE"</span><span class="p">)</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> Error in ttff && tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)'</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> Error in ttff || tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)'</span><span class="w">
</span><span class="c1"># Default behavior</span><span class="w">
</span><span class="n">Sys.unsetenv</span><span class="p">(</span><span class="s2">"_R_CHECK_LENGTH_1_LOGIC2_"</span><span class="p">)</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> Warning in ttff && tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)'</span><span class="w">
</span><span class="c1">#> Warning in ttff && tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="n">ttff</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">tftf</span><span class="w">
</span><span class="c1">#> Warning in ttff || tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span></code></pre></div></div>
<p>A related environment variable <del>is</del> <u>used to be</u>
<code class="language-plaintext highlighter-rouge">_R_CHECK_LENGTH_1_CONDITION_</code> that turns vectors inside of <code class="language-plaintext highlighter-rouge">if()</code> into
errors instead of warnings. [I had documented this variable in this
post but I have removed it.–<em>May 25, 2022</em>].</p>
<p>To make project code more robust, one might consider setting
<code class="language-plaintext highlighter-rouge">_R_CHECK_LENGTH_1_LOGIC2_</code> inside of <a href="https://support.rstudio.com/hc/en-us/articles/360047157094-Managing-R-with-Rprofile-Renviron-Rprofile-site-Renviron-site-rsession-conf-and-repos-conf">a .Renviron
file</a>
or using them with <a href="https://github.com/gaborcsardi/dotenv">dotenv</a>. (But
if I remember correctly, these checks apply to package code so you get
errors for legal-but-dodgy R code in other people’s packages.)</p>
<h2 id="nas-infect-other-values"><code class="language-plaintext highlighter-rouge">NA</code>s infect other values</h2>
<p>All of the above examples conveniently avoided <code class="language-plaintext highlighter-rouge">NA</code>s. These are missing
values that infect other logical values, turning them into <code class="language-plaintext highlighter-rouge">NA</code>s. For this
section, I want to briefly highlight some behaviors of <code class="language-plaintext highlighter-rouge">NA</code>s and some
functions that can help us work around them.</p>
<p>For <em>and</em>, an <code class="language-plaintext highlighter-rouge">NA</code> with any non-<code class="language-plaintext highlighter-rouge">FALSE</code> value is <code class="language-plaintext highlighter-rouge">NA</code>. For <em>or</em>, an <code class="language-plaintext highlighter-rouge">NA</code>
with any non-<code class="language-plaintext highlighter-rouge">TRUE</code> value is <code class="language-plaintext highlighter-rouge">NA</code>. That’s a funny sentence, but it
reflects the case where we can infer the answer without seeing the <code class="language-plaintext highlighter-rouge">NA</code>
value. <code class="language-plaintext highlighter-rouge">TRUE | NA</code> (and <code class="language-plaintext highlighter-rouge">NA | TRUE</code>) returns <code class="language-plaintext highlighter-rouge">TRUE</code> because it would
return <code class="language-plaintext highlighter-rouge">TRUE</code> if the <code class="language-plaintext highlighter-rouge">NA</code> was actually a <code class="language-plaintext highlighter-rouge">TRUE</code> or <code class="language-plaintext highlighter-rouge">FALSE</code>. The same
holds for <code class="language-plaintext highlighter-rouge">FALSE & NA</code> (and <code class="language-plaintext highlighter-rouge">NA & FALSE</code>) returning <code class="language-plaintext highlighter-rouge">FALSE</code>. If we
un-missing-ed the <code class="language-plaintext highlighter-rouge">NA</code> into <code class="language-plaintext highlighter-rouge">TRUE</code> or <code class="language-plaintext highlighter-rouge">FALSE</code>, the statement would still
be <code class="language-plaintext highlighter-rouge">FALSE</code>.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tfnn</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">)</span><span class="w">
</span><span class="n">nntf</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">tfnn</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">nntf</span><span class="w">
</span><span class="c1">#> [1] NA FALSE NA FALSE</span><span class="w">
</span><span class="n">tfnn</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">nntf</span><span class="w">
</span><span class="c1">#> [1] TRUE NA TRUE NA</span><span class="w">
</span><span class="c1"># Infecting in all() and any()</span><span class="w">
</span><span class="kc">TRUE</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="kc">NA</span><span class="w">
</span><span class="c1">#> [1] NA</span><span class="w">
</span><span class="kc">FALSE</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="kc">NA</span><span class="w">
</span><span class="c1">#> [1] NA</span><span class="w">
</span><span class="nf">all</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] NA</span><span class="w">
</span><span class="nf">any</span><span class="p">(</span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] NA</span><span class="w">
</span><span class="c1"># These return FALSE and TRUE because they would return TRUE and FALSE</span><span class="w">
</span><span class="c1"># regardless of whether the NA was a TRUE or a FALSE.</span><span class="w">
</span><span class="kc">FALSE</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="kc">NA</span><span class="w">
</span><span class="c1">#> [1] FALSE</span><span class="w">
</span><span class="kc">TRUE</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="kc">NA</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="nf">all</span><span class="p">(</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] FALSE</span><span class="w">
</span><span class="nf">any</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="kc">NA</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span></code></pre></div></div>
<p>This uhh complicates things, so how do I check if what I have is <code class="language-plaintext highlighter-rouge">TRUE</code>
or <code class="language-plaintext highlighter-rouge">FALSE</code>? <a href="https://rdrr.io/r/base/Logic.html"><code class="language-plaintext highlighter-rouge">isTRUE()</code> and
<code class="language-plaintext highlighter-rouge">isFALSE()</code></a> provide direct tests of
whether the input is the scalar <code class="language-plaintext highlighter-rouge">TRUE</code> or the scalar <code class="language-plaintext highlighter-rouge">FALSE</code>.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">c</span><span class="p">(</span><span class="n">isTRUE</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">),</span><span class="w"> </span><span class="n">isTRUE</span><span class="p">(</span><span class="kc">FALSE</span><span class="p">),</span><span class="w"> </span><span class="n">isTRUE</span><span class="p">(</span><span class="kc">NA</span><span class="p">))</span><span class="w">
</span><span class="c1">#> [1] TRUE FALSE FALSE</span><span class="w">
</span><span class="nf">c</span><span class="p">(</span><span class="n">isFALSE</span><span class="p">(</span><span class="kc">TRUE</span><span class="p">),</span><span class="w"> </span><span class="n">isFALSE</span><span class="p">(</span><span class="kc">FALSE</span><span class="p">),</span><span class="w"> </span><span class="n">isFALSE</span><span class="p">(</span><span class="kc">NA</span><span class="p">))</span><span class="w">
</span><span class="c1">#> [1] FALSE TRUE FALSE</span><span class="w">
</span></code></pre></div></div>
<p>The documentation notes that</p>
<blockquote>
<p><code class="language-plaintext highlighter-rouge">if(isTRUE(cond))</code> may be preferable to <code class="language-plaintext highlighter-rouge">if(cond)</code> because of <code class="language-plaintext highlighter-rouge">NA</code>s</p>
</blockquote>
<p>so <code class="language-plaintext highlighter-rouge">isTRUE()</code> is something we run into packaged code.</p>
<p><code class="language-plaintext highlighter-rouge">isTRUE()</code> and <code class="language-plaintext highlighter-rouge">isFALSE()</code> are not vectorized, but we can check elements
in a vector are <code class="language-plaintext highlighter-rouge">TRUE</code> and only <code class="language-plaintext highlighter-rouge">TRUE</code> in a few different ways.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Make a new function</span><span class="w">
</span><span class="n">Vectorize</span><span class="p">(</span><span class="n">FUN</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isTRUE</span><span class="p">)(</span><span class="n">tfnn</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] TRUE FALSE FALSE FALSE</span><span class="w">
</span><span class="c1"># Apply the function on a vector</span><span class="w">
</span><span class="n">vapply</span><span class="p">(</span><span class="n">X</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tfnn</span><span class="p">,</span><span class="w"> </span><span class="n">FUN</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isTRUE</span><span class="p">,</span><span class="w"> </span><span class="n">FUN.VALUE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">logical</span><span class="p">(</span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="c1">#> [1] TRUE FALSE FALSE FALSE</span><span class="w">
</span><span class="c1"># Use table-lookup or set operations</span><span class="w">
</span><span class="n">tfnn</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="c1">#> [1] TRUE FALSE FALSE FALSE</span><span class="w">
</span><span class="n">is.element</span><span class="p">(</span><span class="n">el</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tfnn</span><span class="p">,</span><span class="w"> </span><span class="n">set</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="c1">#> [1] TRUE FALSE FALSE FALSE</span><span class="w">
</span></code></pre></div></div>
<hr />
<p>I think that’s just about every useful thing I want to say about <em>and</em>
and <em>or</em> in R. But just remember, <code class="language-plaintext highlighter-rouge">&&</code> and <code class="language-plaintext highlighter-rouge">||</code> are longer operators because
they are stricter.</p>
<hr />
<p><em>Last knitted on 2022-05-27. <a href="https://github.com/tjmahr/tjmahr.github.io/blob/master/_R/2021-07-01-think-of-stricter-logical-operators.Rmd">Source code on
GitHub</a>.</em><sup id="fnref:si" role="doc-noteref"><a href="#fn:si" class="footnote" rel="footnote">1</a></sup></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:si" role="doc-endnote">
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">.session_info</span><span class="w">
</span><span class="c1">#> ─ Session info ───────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> setting value</span><span class="w">
</span><span class="c1">#> version R version 4.2.0 (2022-04-22 ucrt)</span><span class="w">
</span><span class="c1">#> os Windows 10 x64 (build 22000)</span><span class="w">
</span><span class="c1">#> system x86_64, mingw32</span><span class="w">
</span><span class="c1">#> ui RTerm</span><span class="w">
</span><span class="c1">#> language (EN)</span><span class="w">
</span><span class="c1">#> collate English_United States.utf8</span><span class="w">
</span><span class="c1">#> ctype English_United States.utf8</span><span class="w">
</span><span class="c1">#> tz America/Chicago</span><span class="w">
</span><span class="c1">#> date 2022-05-27</span><span class="w">
</span><span class="c1">#> pandoc NA</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ─ Packages ───────────────────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> package * version date (UTC) lib source</span><span class="w">
</span><span class="c1">#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> git2r 0.30.1 2022-03-16 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> here 1.0.1 2020-12-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> knitr * 1.39 2022-04-26 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> Matrix 1.4-1 2022-03-23 [2] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> png 0.1-7 2013-12-03 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> ragg 1.2.2 2022-02-21 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> reticulate 1.25 2022-05-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2</span><span class="w">
</span><span class="c1">#> [2] C:/Program Files/R/R-4.2.0/library</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ─ Python configuration ───────────────────────────────────────────────────────</span><span class="w">
</span><span class="c1">#> python: C:/Users/Tristan/AppData/Local/r-miniconda/envs/r-reticulate/python.exe</span><span class="w">
</span><span class="c1">#> libpython: C:/Users/Tristan/AppData/Local/r-miniconda/envs/r-reticulate/python36.dll</span><span class="w">
</span><span class="c1">#> pythonhome: C:/Users/Tristan/AppData/Local/r-miniconda/envs/r-reticulate</span><span class="w">
</span><span class="c1">#> version: 3.6.10 (default, Mar 5 2020, 10:17:47) [MSC v.1900 64 bit (AMD64)]</span><span class="w">
</span><span class="c1">#> Architecture: 64bit</span><span class="w">
</span><span class="c1">#> numpy: C:/Users/Tristan/AppData/Local/r-miniconda/envs/r-reticulate/Lib/site-packages/numpy</span><span class="w">
</span><span class="c1">#> numpy_version: 1.18.5</span><span class="w">
</span><span class="c1">#> </span><span class="w">
</span><span class="c1">#> ──────────────────────────────────────────────────────────────────────────────</span><span class="w">
</span></code></pre></div> </div>
<p><a href="#fnref:si" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Tristan Mahrtjmahrweb@gmail.comA crash course on the *and*s and *or*s in R