IsIsogram Without a Nested Loop

dawsonc623 · December 13, 2020, 12:07am

I did a little trickery here to avoid the nested loop and make the code a bit more efficient. Here is the code:

bool UBullCowCartridge::IsIsogram(const FString& Word) const
{
    bool bIsIsogram = true;

    std::unordered_set<TCHAR> Characters;

    for (const TCHAR Character : Word)
    {
        if (Characters.find(Character) != Characters.end())
        {
            bIsIsogram = false;
            break;
        }
        else
        {
            Characters.insert(Character);
        }
    }

    return bIsIsogram;
}

A bit of explanation here for those who need it. First, I use a range-based for loop instead of a regular for loop, but I did so as a style choice, not a hardcore efficiency thing. What does matter though is that I use a data structure known as an unordered set to store each character that I find, and then as I move on to the following character I check to see if that new character has already been found (and thus exists in my set). I chose an unordered set specifically because it has a constant look-up time and thus does not noticeably impact the run time of my function. Why this is more efficient gets into Big-O territory, but the simplified answer is that the number of comparisons my function makes at most is less than the nested loop solution’s number of comparisons at most.

The one thing that is impacted here is the amount of memory used. Since I save the characters in a separate data structure, this essentially increases the memory usage by the number of characters in the word at most. If you are in a memory constrained environment, this might not be ideal. But for most cases (especially in game development) memory constraints are not so tight as to make this a non-viable solution.

DanM · December 13, 2020, 10:35am

This is significantly decreasing the performance of the function. In English* N is going to max out at 26 which is nowhere near enough for the extra overhead of dynamic memory allocation and hashing to benefit from the constant time lookup/insertion.
Here’s a rather primitive comparison with the alphabet as the input - https://godbolt.org/z/dEM1ac

On top of that, the standard unordered containers are terrible performance-wise as they are rather restricted by the standard to be implemented a certain way (iterator invalidation guarantees) as such, lots of different implementations blow it out of the water - https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-03-01-result-InsertHugeInt/
I would imagine Unreal’s containers to easily beat them as well.

There is also a slight issue with your implementation. You do two lookups (one for find, one for insert) when you can do just one. insert returns an iterator, bool pair which is an iterator to the element inserted and whether or not an insertion took place.

bool UBullCowCartridge::IsIsogram(const FString& Word) const
{
    std::unordered_set<TCHAR> Characters;
    for (const TCHAR Character : Word)
    {
        // if not inserted, must already be in the set.
        if (!Characters.insert(Character).second)
        {
            return false;
        }
    }
    return true;
}

* Don’t think this game works with many other languages other than English anyway.

dawsonc623 · December 14, 2020, 6:25pm

Interesting. I did not realize the standard containers had performance issues. After reading up on it, I understand why, now. In our case, the extra safety afforded by the standard is not super relevant, so we can swap that for a faster implementation.

I also like your improvement. I was not paying attention to the return of insert, so I missed that. It takes a slight hit readability-wise for anyone who does not immediately know the return of the function, but I think it is worth the tradeoff. Of course, if we swapped implementations, it would be up to the new API how that would work. If a double look-up is necessary, I would be interested to see its performance implications.

I may play with a faster hash set implementation and do a comparison there, if only for curiosity’s sake. I have always been pushed to use data structures as a means to drop the Big-O of an algorithm, but you make a good point that for 26 characters the data structure allocation may be a hindrance more than a help.

DanM · December 15, 2020, 1:23pm

Then you might be surprised to know that std::regex also has horrible performance. I believe there are talks about deprecating it.

That can be improved via variables, in C++17 you can use a structured binding (using _ for the first as it’s unused so don’t care).

const auto [_, bInserted] =  Characters.insert(Character);

TSet has Add which can tell you whether or not it was inserted by passing in a pointer to bool (why pointer? who knows).

If you’re interested here are some benchmarks for different implementations with different inputs

Run on (16 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x2)
-------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations
-------------------------------------------------------------------------
NestedLoop/alphabet                   180 ns          180 ns      4072727
AdjacentFind/alphabet                 149 ns          151 ns      4977778
Array/alphabet                       21.8 ns         22.0 ns     32000000
Set/alphabet                         1655 ns         1650 ns       407273
UnorderedSet/alphabet                2738 ns         2787 ns       263529
RobinSet/alphabet                     832 ns          816 ns       746667

NestedLoop/subdermatoglyphic         84.5 ns         85.4 ns      8960000
AdjacentFind/subdermatoglyphic       98.7 ns         98.4 ns      7466667
Array/subdermatoglyphic              14.8 ns         14.6 ns     44800000
Set/subdermatoglyphic                 999 ns         1004 ns       746667
UnorderedSet/subdermatoglyphic       1082 ns         1074 ns       640000
RobinSet/subdermatoglyphic            786 ns          785 ns       896000

NestedLoop/planet                    11.4 ns         11.5 ns     64000000
AdjacentFind/planet                  28.0 ns         27.6 ns     24888889
Array/planet                         6.18 ns         6.14 ns    112000000
Set/planet                            369 ns          368 ns      1866667
UnorderedSet/planet                   426 ns          420 ns      1600000
RobinSet/planet                       394 ns          401 ns      1792000

NestedLoop/bob                       2.14 ns         2.15 ns    320000000
AdjacentFind/bob                     8.33 ns         8.16 ns     74666667
Array/bob                            3.84 ns         3.84 ns    179200000
Set/bob                               157 ns          157 ns      4480000
UnorderedSet/bob                      217 ns          215 ns      3200000
RobinSet/bob                          186 ns          188 ns      3733333

(alphabet being the alphabet not literally “alphabet”)

NestedLoop - what is done in the course
AdjacentFind - sort the word and then compare adjacent pairs.
Array - use a 26 length array with characters being indexes into the array.
Set - std::set
UnorderedSet - std::unordered_set
RobinMap - tsl::robin_set from Tessil/robin_map

dawsonc623 · December 15, 2020, 1:45pm

Oh wow. Neat! Thank you for putting that all together. I was working on it, myself, but not in such depth. I was just taking the tsl::robin_set and using the alphabet.

So, to summarize, I see two major takeaways here. First, never consider the standard implementations the best just because they are standard. I have run into this before, but I still default to the “standard == best” idea in my mind. That is a habit I need to break and start doing better research on them.

Second, do not discount larger Big-O solutions immediately, especially on known, smaller data sets. Again, this has bitten me in the past, but my CS background having had Big-O beaten into me I often just go “that Big-O is N-squared, but this Big-O is N. Therefore, this is better” which does not always translate in practice.

Thank you for the discussion. It is always really eye-opening to see the theory fail in practice. It shows that you need to do your research and really put your solutions to the test before assuming they are good because the theory is there.

wirepair · December 17, 2020, 12:27pm

What did you use to create those benchmarks? Here’s my implementation using bitmaps :>

    int bitmap = 0;
    for (auto &ch : Input.ToLower())
    {
        if ((bitmap & (1 << ch)) > 0)
        {
            return true;
        }
        bitmap |= (1 << ch);
    }
    return false;

DanM · December 17, 2020, 1:32pm

I used Google Benchmark. That should be != 0 not > 0 and I wouldn’t lower the whole thing as I would imagine it would be wasted if you return checking the second character.

I also forgot to mention that for the array I didn’t check case sensitivity. So with the tolowering of that implementation as well as a bitmask with/without tolower, here are the results

Run on (16 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x2)
-------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations
-------------------------------------------------------------------------
NestedLoop/alphabet                   190 ns          190 ns      3446154
AdjacentFind/alphabet                 151 ns          150 ns      4480000
Array/alphabet                       25.1 ns         25.1 ns     28000000
ArrayLowered/alphabet                71.8 ns         71.5 ns      8960000
Set/alphabet                         1715 ns         1726 ns       407273
UnorderedSet/alphabet                2745 ns         2699 ns       248889
RobinSet/alphabet                     804 ns          802 ns       896000
BitMask/alphabet                     16.1 ns         16.1 ns     40727273
BitMaskLower/alphabet                64.9 ns         65.6 ns     11200000

NestedLoop/subdermatoglyphic         82.3 ns         81.6 ns      7466667
AdjacentFind/subdermatoglyphic       89.2 ns         90.7 ns      8960000
Array/subdermatoglyphic              14.9 ns         14.8 ns     49777778
ArrayLowered/subdermatoglyphic       47.2 ns         47.1 ns     14933333
Set/subdermatoglyphic                 986 ns          984 ns       746667
UnorderedSet/subdermatoglyphic       2306 ns         2344 ns       320000
RobinSet/subdermatoglyphic            754 ns          750 ns       896000
BitMask/subdermatoglyphic            10.7 ns         10.7 ns     64000000
BitMaskLower/subdermatoglyphic       42.0 ns         41.4 ns     16592593

NestedLoop/planet                    10.7 ns         10.7 ns     64000000
AdjacentFind/planet                  28.3 ns         28.3 ns     24888889
Array/planet                         6.29 ns         6.28 ns    112000000
ArrayLowered/planet                  17.6 ns         17.6 ns     37333333
Set/planet                            377 ns          377 ns      1947826
UnorderedSet/planet                   402 ns          399 ns      1723077
RobinSet/planet                       378 ns          375 ns      1792000
BitMask/planet                       3.89 ns         3.93 ns    194782609
BitMaskLower/planet                  15.0 ns         15.0 ns     44800000

NestedLoop/bob                       2.15 ns         2.18 ns    344615385
AdjacentFind/bob                     8.61 ns         8.54 ns     89600000
Array/bob                            3.86 ns         3.92 ns    179200000
ArrayLowered/bob                     9.43 ns         9.42 ns     74666667
Set/bob                               154 ns          154 ns      4977778
UnorderedSet/bob                      201 ns          199 ns      3446154
RobinSet/bob                          177 ns          176 ns      3733333
BitMask/bob                          2.16 ns         2.18 ns    280000000
BitMaskLower/bob                     7.62 ns         7.50 ns     89600000

So whilst using a bitmask is better I don’t think it’s significant enough for the hit on readability to be worth it.

dawsonc623 · December 17, 2020, 5:09pm

Interesting solution and results. I agree that the hit to readability is not work the 1-2 ns improvement, but it is still cool to see creative solutions like that. I would have never thought about that solution as a possibility, to be honest.

wirepair · December 18, 2020, 12:27am

Yeah, you’ll find lots of these interesting, but probably not the best in reality, solutions from those leet code challenge sites.

Utkarsh_Adlakha · October 16, 2021, 12:00pm

bool UBullCowCartridge::IsIsogram(FString Word) const

{

    for (int32 Index = 0, Comparison = Index + 1; Comparison < Word.Len(); Comparison++)

    {

        if (Word[Index]==Word[Comparison])

        {

            return false;

        }

        else

        {

            ++Index;

        }

       

    }

    return true;

}

I used a slightly different code, but it works fine

RunninglVlan · January 11, 2022, 11:34pm

Yeah, I did it using Unreal Set:

TSet<TCHAR> Unique;
for (TCHAR Symbol : Guess)
{
    bool IsAlreadyInSet;
    Unique.Add(Symbol, &IsAlreadyInSet);
    if (IsAlreadyInSet)
    {
        return false;
    }
}
return true;

I understood that this is not the most performant solution, but I didn’t want to use nested loop =)
The code would be 2 lines less in C# by the way, as HashShet#Add returns bool there

Didn’t quite understand what you meant by Array solution - could you share some code?

How would it work without pointer? Add declaration looks like bool* bIsAlreadyInSetPtr, but I saw examples on StackOverflow of method declarations with & instead of *. Haven’t seen implementation of such methods though. I’d appreciate if you could provide some
Hm, maybe pointer is used to have a default value of nullptr?

DanM · January 12, 2022, 12:11pm

You can implement the same algorithm the course uses but with 1 (visible) loop, by using functions on FString, specifically FindLastChar. Though it does have the slight disadvantage that you can’t use an offset (Index + 1) which also means it needs an addition check that the found index isn’t Index.

Sure

bool UBullCowCartridge::IsIsogram(FString Word) const
{
    int32 Letters[26] = {};
    for (auto Letter : Word)
    {
        const int32 Index = Letter - TEXT('a');
        if (++Letters[Index] > 1)
        {
            return false;
        }
    }
    return true;
}

Though that has the assumption Word only contains ASCII, additional checks (Word.Len() <= 26 & 0 <= Index < 26) would be needed otherwise.

By using a reference instead. At the time of writing I thought there were overloads for 1 arg or 2 but it looks like it’s just the one with a default argument of nullptr. So it’s just a way to make that an optional parameter.