Gimli: Spec

Notation.

We denote by $\mathcal{W} = \{0, 1\}^{32}$ the set of bitstrings of length $32$. We will refer to the elements of this set as “words”. We use

$a \oplus b$ to denote a bitwise exclusive or (XOR) of the values $a$ and $b$,
$a \land b$ for a bitwise logical and of the values $a$ and $b$,
$a \lor b$ for a bitwise logical or of the values $a$ and $b$,
$a \lll k$ for a cyclic left shift of the value a by a shift distance of $k$, and
$a \ll k$ for a non-cyclic shift (i.e, a shift that is filling up with zero bits) of the value a by a shift distance of $k$.

We index all vectors and matrices starting at zero. We encode words as bytes in little-endian form.

The state.

Gimli applies a sequence of rounds to a $384$-bit state. The state is represented as a parallelepiped with dimensions $3 \times 4 \times 32$ or, equivalently, as a $3 \times 4$ matrix of $32$-bit words.

We name the following sets of bits:

a column $j$ is a sequence of $96$ bits such that $\textbf{s}_j = \{s_{0,j};s_{1,j};s_{2,j}\} \in \mathcal{W}^{3}$
a row $i$ is a sequence of $128$ bits such that $\textbf{s}_i = \{s_{i,0};s_{i,1};s_{i,2};s_{i,3}\} \in \mathcal{W}^{4}$

Each round is a sequence of three operations:

a non-linear layer, specifically a $96$-bit SP-box applied to each column;
in every second round, a linear mixing layer;
in every fourth round, a constant addition.

The non-linear layer.

The SP-box consists of three sub-operations: rotations of the first and second words; a 3-input nonlinear T-function; and a swap of the first and third words.

The linear layer.

The linear layer consists of two swap operations, namely Small-Swap and Big-Swap. Small-Swap occurs every 4 rounds starting from the 1st round. Big-Swap occurs every 4 rounds starting from the 3rd round.

The round constants.

There are 24 rounds in Gimli, numbered $24,23,\dots,1$. When the round number $r$ is $24,20,16,12,8,4$ we XOR the round constant $\texttt{0x9e377900} \oplus r$ to the first state word $s_{0,0}$.

Reference code in C.

#include <stdint.h>

uint32_t rotate(uint32_t x, int bits)
{
  if (bits == 0) return x;
  return (x << bits) | (x >> (32 - bits));
}

extern void gimli(uint32_t *state)
{
  int round;
  int column;
  uint32_t x;
  uint32_t y;
  uint32_t z;

  for (round = 24; round > 0; −−round)
  {
    for (column = 0; column < 4; ++column)
    {
      x = rotate(state[    column], 24);
      y = rotate(state[4 + column],  9);
      z =        state[8 + column];

      state[8 + column] = x ^ (z << 1) ^ ((y&z) << 2);
      state[4 + column] = y ^ x        ^ ((x|z) << 1);
      state[column]     = z ^ y        ^ ((x&y) << 3);
    }

    if ((round & 3) == 0) { // small swap: pattern s...s...s... etc.
      x = state[0];
      state[0] = state[1];
      state[1] = x;
      x = state[2];
      state[2] = state[3];
      state[3] = x;
    }
    if ((round & 3) == 2) { // big swap: pattern ..S...S...S. etc.
      x = state[0];
      state[0] = state[2];
      state[2] = x;
      x = state[1];
      state[1] = state[3];
      state[3] = x;
    }

    if ((round & 3) == 0) { // add constant: pattern c...c...c... etc.
      state[0] ^= (0x9e377900 | round);
    }
  }
}

Version: This is version 2017.09.16 of the Spec web page.