The answer is that it depends, folks!

robertanthony02 · March 27, 2017, 8:02am

There were two algorithms proposed for determining if a word is an isogram or not:

Method A – The kind of brute-force method, which compares each letter to every other letter. The actual order for this one is ((n^2) - n) / 2.

Method B – A more clever method in design, this approach first sorts the word (which has its own cost that we will include) and then it loops through the sorted word and checks each contiguous pair for a match. The actual order for this one is (n log n) + (n - 1).

First it’s important to keep in mind what the order expression is really describing. It’s describing the MAXIMUM number of comparisons that an algorithm can possibly make when determining if the word is an isogram. Obviously, a well-designed algorithm should return the answer as soon as it has it. The maximum number of comparisons WILL be made if in fact the word is an isogram, otherwise the good algorithm will be able to make less comparisons before determining that the word is not an isogram.

So to see how the two methods compare when the word is in fact an isogram, we plot it on the graph using:

(((n^2)-n)/2) vs ((n log n) + (n - 1)), n=2 to 15

Copy/paste that into the http://www.wolframalpha.com site.

I went to 15 because the largest isogram known is “dermatoglyphics”. But you should also run the range at 26 and beyond to really see how method B really scales well at longer string lengths, even though 26 is the highest valid value.

If you look closely, you’ll see that it’s only at 7 characters and beyond that method B begins to be more efficient. Shorter than that, method A utilizes less comparisons. Don’t forget, we are including the comparisons that are required to run the sort() method.

And again, we are comparing the efficiency of the two methods only when the word is an isogram and therefore will “stress” the algorithms to the max. If the word is not an isogram, then which method is faster will depend on a couple factors: how long is the word and which character is repeated?

If the word is long, then sorting it will have a high cost, possibly making method A faster. If the letter that’s repeated comes soon in the alphabet (like the letter “a”), then method B may end up being faster (because the word is sorted first and then searched for pairs). If the letter that is repeated comes soon in the word itself, then method A will be faster. If there is a letter that is repeated often (more than twice), then method A will be faster as well.

So what’s my vote? If I had to vote, I would say in a practical sense method A is faster. That’s because most common isograms will be less than 7 letters, many commonly used words are less than 7 letters, and many words are not isograms, with the repeating letters coming soon in the word. Remember that a well-designed algorithm will return the answer as soon as it has it!

Here’s the code to prove all this. Just change the first line in main() to test whatever word you want.

#include <stdio.h>
#include <tchar.h>
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <math.h>

using namespace std;

void IsIsogramA(const char* word, int size)
{
	bool isIsogram = true;
	int count = 0;

	for (int i = 0; i < size; i++)
	{
		char currChar = word[i];

		for (int j = 0; j < i; j++)
		{
			char compChar = word[j];
			count++;

			if (currChar == compChar)
			{
				cout << count << " comparisons made with Method A and NO, it is not an isogram." << endl << endl;
				isIsogram = false;
				break;
			}
		}

		if (!isIsogram) break;
	}

	if (isIsogram)
		cout << count << " comparisons made with Method A and YES, it is an isogram." << endl << endl;
}

void IsIsogramB(const char* word, int size)
{
	string strWord(word, size);
	sort(strWord.begin(), strWord.end());

	bool isIsogram = true;
	int count = (int)(size * log(size));

	for (int i = 0; i < (int)strWord.length(); i++)
	{
		if (i >= (int)strWord.length() - 1)
		{
			break;
		}
		else
		{
			count++;

			if (strWord[i] == strWord[i + 1])
			{
				cout << count << " comparisons made with Method B and NO, it is not an isogram." << endl << endl;
				isIsogram = false;
				break;
			}
		}
	}

	if (isIsogram)
		cout << count << " comparisons made with Method B and YES, it is an isogram." << endl << endl;
}

int main()
{
	string word("dermatoglyphics");

	cout << "Is \"" << word << "\" an isogram? " << endl << endl;

	IsIsogramA(word.c_str(), word.length());
	IsIsogramB(word.c_str(), word.length());

	cin.get();
	return 0;
}

DanM · March 27, 2017, 12:46pm

First, remember Big O notation is about the upper bound, i.e. the worst offender. So in that case n² for A and n log n for B.

And the algorithm itself, the complexity doesn’t talk about how fast it is, just how well it scales.

Also if you want a better solution to what Ben comes up with you can do

bool IsIsogram(const std::string& Word)
{
	for(char Letter : Word)
	{
		if (std::count(Word.begin(), Word.end(), Letter) > 1)
			return false;
	}
	return true;
}

robertanthony02 · March 27, 2017, 2:57pm

The algorithm will run the slowest when the word is an isogram, which means it has to perform all the checks to be able to definitively say the word is an isogram. As the instructor, Ben Tristem, points out in the video, that can be expressed as (((n^2) - n) / 2) for method A. He simplifies it to n^2 only because, and he mentions this in the video, that the “minus n” part just isn’t that significant relative to n^2. Well, he’s right about that but it isn’t 100% accurate to just take it out. The “divide by 2” part could be thought of as just scaling, which isn’t important if you are comparing relative speeds or just looking at the basic shape of the algorithm on a graph as he did. But the truth is that, as the drawing in his video indicates, the real order is (((n^2) - n) / 2). That is 15 comparisons for a 6 letter word (just count the empty (not X’d out) squares that are outlined in his red triangle and see for yourself! That drawing makes it clear why it is actually (((n^2) - n) / 2) and not just n^2.

As for method B, you are stating the performance of the sort algorithm and not the entirety of method B. Sorting takes n log n. But what about the actual check for matching pairs? Again, the instructor leaves this out because he thinks it’s insignificant. The instructor also incorrectly says that this is equal to n-- it’s actually equal to n - 1. Think about it for a simple word like “dogs”. You would check “do”, then “og”, then “gs”-- that’s 3 comparisons not 4. But again, the instructor leaves this out so that we the students can focus on a simplified, albeit inaccurate, expression so as to determine which algorithm is generally faster.

Run the code I included in the OP and you’ll see that, as far as comparisons are concerned (not actual run-time speed), that one algorithm is not necessarily faster in all cases. This is true when processing isograms (testing the maximum order) as well as non-isograms.

I should have been more clear. I wasn’t speaking to the actual run-time speed of the algorithms, just how many comparisons they make (working under the assumption that the speed is totally dependent on the number of comparisons being made since that’s the main work that’s being done, however this isn’t a 100% fool-proof way of comparing actual run-time speeds).

DanM · March 27, 2017, 3:13pm

robertanthony02:

The algorithm will run the slowest when the word is an isogram, which means it has to perform all the checks to be able to definitively say the word is an isogram. As the instructor, Ben Tristem, points out in the video, that can be expressed as (((n^2) - n) / 2) for method A. He simplifies it to n^2 only because, and he mentions this in the video, that the “minus n” part just isn’t that significant relative to n^2. Well, he’s right about that but it isn’t 100% accurate to just take it out. The “divide by 2” part could be thought of as just scaling, which isn’t important if you are comparing relative speeds or just looking at the basic shape of the algorithm on a graph as he did. But the truth is that, as the drawing in his video indicates, the real order is (((n^2) - n) / 2). That is 15 comparisons for a 6 letter word (just count the empty (not X’d out) squares that are outlined in his red triangle and see for yourself! That drawing makes it clear why it is actually (((n^2) - n) / 2) and not just n^2.

We are using Big O Notation, Big O only talks about the worst offender, which is n².

“Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm.” - https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/

“A description of a function in terms of big O notation usually only provides an upper bound on the growth rate of the function” - https://en.wikipedia.org/wiki/Big_O_notation

"Here our runtime is O(n + n²), which we just call O(n²).Even if it was O(n²/2 + 100n), it would still be O(n²)

Similarly:
O(n³ + 50n² + 10000) is O(n³)
O((n + 30) * (n + 5)) is O(n²)

Again, we can get away with this because the less significant terms quickly become, well, less significant as n gets big." - https://www.interviewcake.com/article/java/big-o-notation-time-and-space-complexity

robertanthony02 · March 27, 2017, 3:21pm

DanM:

Also if you want a better solution to what Ben comes up with you can do
bool IsIsogram(const std::string& Word)
{
	for(char Letter : Word)
	{
		if (std::count(Word.begin(), Word.end(), Letter) > 1)
			return false;
	}
	return true;
}

That is certainly better in terms of readability and conciseness, however not in terms of performance. Firstly, you loop through EVERY letter in the word, executing count() n number of times, which in itself is then looping through every character as well! Just look at the implementation of count() and see what it does. count() is not optimized to just find a matching pair and exit…it’s designed to find as many instances of a letter as possible, so it must loop through every char in a word.

As mentioned in my second post, the best scenario for searching for duplicate chars is n - 1. In my algorithm I posted in the OP, it does n - 1 comparisons (unless you are including the if statement in the loop). I wrote that code late last night not really thinking about optimizing it in its entirety. Obviously that part could be rewritten as:

int length = (int)strWord.length() - 1;
for (int i = 0; i < length; i++)
{
	count++;

	if (strWord[i] == strWord[i + 1])
	{
		cout << count << " comparisons made with Method B and NO, it is not an isogram." << endl << endl;
		isIsogram = false;
		break;
	}
}

To really be precise, you have to include the i < length comparison as well for the order of the algorithm, but have instead just focused on direct char comparisons.

DanM · July 5, 2019, 4:21am

That’s what the first method is doing, is it not? Except breaking early. I only included that because it benchmarked better than the one I made using std::find.

The compiler is pretty damn amazing when it comes to optimising. (C does better on smaller words)

Also you should prefer readability over micro optimisations unless you are in a performance critical environment.

robertanthony02 · March 27, 2017, 3:34pm

You are focused on Big O notation, as though you are wanting to teach Big O specifically. I learned Big O in college buddy. I’m here for the Unreal lessons but I reviewed the C++ lessons just to ensure nothing important is missed and this particular lesson caught my attention (I program C++ on embedded systems for a living).

The challenge the instructor presented to us was to see which algorithm was better for finding isograms…at least that’s how I interpreted the challenge. In that case, when you say:

…we can’t actually “get away” with it. It’s certainly a true statement in a general sense and an important point when teaching Big O. But that’s not what we’re doing here. We’re trying to find the fastest algorithm to use in our make-believe program. My only point was that which is faster (based on # of comparisons) totally depends on several factors. If this were a real-world problem we are trying to solve for a customer, for example, I would ask about the types of words that will be processed; ie, are they big words or small words?; will they tend to be isograms or tend not to be?; etc. The answer to those questions may determine which algorithm we choose.

robertanthony02 · March 27, 2017, 3:44pm

Yes, but we weren’t talking about method A. We were talking about method B and two different ways to implement method B: your way included count() and my way included just comparing pairs as the instructor talked about.

And again, when I mentioned which is “faster” I was only referring to the number of comparisons, which was the metric focused on in the lesson. The output you show is focused on time (real performance).

I think we may be talking around each other.

Good luck!

DanM · March 27, 2017, 4:00pm

I’m focused on Big O because that’s what this lecture is about. You only know complexities of any algorithm you use in terms of Big O so how can you compare them when you know the exact complexity of Method A and just Big O of some parts of Method B?

I presented an additional method, not a different implementation for A or B. Maybe @ben should clarify the challenge and to specify that you aren’t limited to the methods shown in the lecture.

ben · July 6, 2020, 11:11pm

Thanks @DanM, I’m uploading this annotation now…

DanM · April 3, 2017, 9:10pm

Should be noted that in the next lecture where you do #define TMap std::map should be #define TMap std::unordered_map(Or TMapBase, looking at the docs) and then #include <unordered_map>.

As a std::map is sorted and would mean an insertion isn’t O(1), thus your IsIsogram algorithm isn’t O(n).