Notation.
We denote by $\mathcal{W} = \{0, 1\}^{32}$ the set of bitstrings of length $32$. We will refer to the elements of this set as “words”. We use
- $a \oplus b$ to denote a bitwise exclusive or (XOR) of the values $a$ and $b$,
- $a \land b$ for a bitwise logical and of the values $a$ and $b$,
- $a \lor b$ for a bitwise logical or of the values $a$ and $b$,
- $a \lll k$ for a cyclic left shift of the value a by a shift distance of $k$, and
- $a \ll k$ for a non-cyclic shift (i.e, a shift that is filling up with zero bits) of the value a by a shift distance of $k$.
We index all vectors and matrices starting at zero. We encode words as bytes in little-endian form.
The state.
Gimli applies a sequence of rounds to a $384$-bit state. The state is represented as a parallelepiped with dimensions $3 \times 4 \times 32$ or, equivalently, as a $3 \times 4$ matrix of $32$-bit words.
We name the following sets of bits:
- a column $j$ is a sequence of $96$ bits such that $\textbf{s}_j = \{s_{0,j};s_{1,j};s_{2,j}\} \in \mathcal{W}^{3}$
- a row $i$ is a sequence of $128$ bits such that $\textbf{s}_i = \{s_{i,0};s_{i,1};s_{i,2};s_{i,3}\} \in \mathcal{W}^{4}$
Each round is a sequence of three operations:
- a non-linear layer, specifically a $96$-bit SP-box applied to each column;
- in every second round, a linear mixing layer;
- in every fourth round, a constant addition.
The non-linear layer.
The SP-box consists of three sub-operations: rotations of the first and second words; a 3-input nonlinear T-function; and a swap of the first and third words.
The linear layer.
The linear layer consists of two swap operations, namely Small-Swap and Big-Swap. Small-Swap occurs every 4 rounds starting from the 1st round. Big-Swap occurs every 4 rounds starting from the 3rd round.
The round constants.
There are 24 rounds in Gimli, numbered $24,23,\dots,1$. When the round number $r$ is $24,20,16,12,8,4$ we XOR the round constant $\texttt{0x9e377900} \oplus r$ to the first state word $s_{0,0}$.
Reference code in C.
#include <stdint.h>
uint32_t rotate(uint32_t x, int bits)
{
if (bits == 0) return x;
return (x << bits) | (x >> (32 - bits));
}
extern void gimli(uint32_t *state)
{
int round;
int column;
uint32_t x;
uint32_t y;
uint32_t z;
for (round = 24; round > 0; −−round)
{
for (column = 0; column < 4; ++column)
{
x = rotate(state[ column], 24);
y = rotate(state[4 + column], 9);
z = state[8 + column];
state[8 + column] = x ^ (z << 1) ^ ((y&z) << 2);
state[4 + column] = y ^ x ^ ((x|z) << 1);
state[column] = z ^ y ^ ((x&y) << 3);
}
if ((round & 3) == 0) { // small swap: pattern s...s...s... etc.
x = state[0];
state[0] = state[1];
state[1] = x;
x = state[2];
state[2] = state[3];
state[3] = x;
}
if ((round & 3) == 2) { // big swap: pattern ..S...S...S. etc.
x = state[0];
state[0] = state[2];
state[2] = x;
x = state[1];
state[1] = state[3];
state[3] = x;
}
if ((round & 3) == 0) { // add constant: pattern c...c...c... etc.
state[0] ^= (0x9e377900 | round);
}
}
}
Version: This is version 2017.09.16 of the Spec web page.