GPU Trie optimization by data prefetching.

Multiple pattern search problem on GPU. Part 4

April 01, 2019
Artur Niewiadomski, Stanisław Piotrowski, Krzysztof Kaczmarski
Faculty of Mathematics and Information Science,
Warsaw University of Technology

Searching loop

The last employed optimization was the application of prefetching in the array-node loop. Previously, the first loop had begun with fetching a character from a text (Listing 1, line 3) and afterwards trying to find a child node based on the fetched data. To allow prefetching, we only had to place the instruction that loaded the character from the memory right after the last instruction that used the character from the previous iteration of the loop, in this case it was the line 5. In addition, in order for the first iteration of the loop to have the correct data, we had to place an additional fetch instruction before it. The resulting code is presented in Listing 2. The minor changes, shown in lines 1 and 6, led to an improvement of as much as 15%, increased throughput, from 52 million input lines per second to 60 million input lines per second.

Listing 1. The original loop and characters reading method.
while (textIt < textEnd)
{
    char c = *textIt++;    char normC = c | 0x20;
    bool alfa = (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');    if (alfa)
    {
        if (!((trieOffset = reinterpret_cast<uint16_t*>(trie + trieOffset)[normC - 'a'])))
            break;
        if (trieOffset & 0x8000)
        {
            trieOffset &= 0x7FFF;
            finishString = true;
            break;
        }
    }
    else
        break;
}
Listing 2. The same loop with prefetching enabled.
char c = *textIt++;while (textIt < textEnd)
{
    char normC = c | 0x20;
    bool alfa = (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
    c = *textIt++;    if (alfa)
    {
        if (!((trieOffset = reinterpret_cast<uint16_t*>(trie + trieOffset)[normC - 'a'])))
            break;
        if (trieOffset & 0x8000)
        {
            trieOffset &= 0x7FFF;
            finishString = true;
            break;
        }
    }
    else
        break;
}