Strings: Comparison
Very often we need to check if two strings are equal or not. We cannot use == to compare the two strings in C, because strings are arrays and not just primitive types such as int or char variables. So to compare if the two strings are equal, you actually need to compare characters in each position one by one in a loop, and make sure that the lengths of these two strings are equal as well.
Note: many modern languages, such as Python or JavaScript, allow comparing strings with a simple comparison operator. You must understand that, under the hood, in the worst case they are still comparing all characters one by one, so it's an operation with linear complexity.
strcmp function
There is a function called strcmp which does exactly that: iterates over two strings and compares characters in each position. It returns 0 if the two strings are equal, a positive number if the first string is greater, and a negative number if the first string is less than the second string.
What exactly do I mean by "greater" and "less" here? Strings are compared by the lexicographical, or dictionary, order, but with one important difference: the comparison is by character numeric values. In ASCII, the character table all modern computers use, characters are arranged in the following order:
code ... 32 ... 48 49 50 51 52 53 54 55 56 57
char ' ' '0' '1' '2' '3' '4' '5' '6' '7' '8' '9'
code ... 65 66 ... 90 ... 97 98 ... 122 ...
char 'A' 'B' 'Z' 'a' 'b' 'z'
So, the comparison will place space first, then digits, then uppercase letters, and then lowercase letters.
The following strings are listed in the dictionary order based on the codes of the characters:
"A"
"A " /* a space character after A */
"A0"
"A9"
"AA"
"AT"
"AZ"
"Aa"
"At"
"Az"
"B"
"Bz"
"a"
"b"
"z"
I hope it makes sense! Of course this is not how you would sort strings in a real dictionary, but that's a much more difficult task: real languages are hard, use various alphabets, and it's not that easy to even define what's the proper dictionary order should be. For all purposes of C programming the comparison based on character codes is good enough.
Implementing strcmp
So, as I said above, strcmp(s1, s2) will return an integer:
- 0 if they are equal (that's pretty straightforward),
- a positive number if
s1is greater thans2, - a negative number if
s1is less thans2.
What are those positive and negative numbers? The standard does not define it, allowing different compilers do what they want, but in the majority of cases it will return a difference between the first pair of non-matching characters. Indeed, if the first string is "abc" and the second is "abe", we can see that 'a' == 'a' and 'b' == 'b', but 'c' != 'e', and the difference 'c' - 'e' will be negative (-2). If the strings were "AD" and "AA", the difference between the first pair of non-matching characters would be positive: 'D' - 'A' is 3.
Now I will ask you to actually implement your own version of strcmp! I must warn you, there are some corner cases you will need to think about. What if the first string is shorter than the second one? What if it's longer? You need to make sure not to read anything beyond the zero character which terminates a string.
The correct solution only iterates characters in the strings once.
In the test code, I will use scanf("%s", s) to read one word: scanf will stop reading the characters into a string when it sees a space (or a new line) character. Note that I'm not using & with %s, because scanf expects a pointer to the first element of the string in this case.
Each input will contain pair of words, and your code shoud print the result of strcmp.
#include <stdio.h>
int mystrcmp(char *s1, char *s2) {
// TODO: return positive number, zero, or negative number
return 0;
}
int main() {
char s1[100], s2[100];
int result;
scanf("%s%s", s1, s2);
result = mystrcmp(s1, s2);
if (result > 0) {
printf("greater\n");
} else if (result < 0) {
printf("less\n");
} else {
printf("equal\n");
}
return 0;
}
Take your time and try to make it work! I'll show you one of the possible solutions below.
If you struggle to make it work with just one loop, try to think what options you have when you compare two characters. First, one of them can be 0, meaning that that string ended. If they are both zeros, the strings are equal, but if not, the string which reached zero first is less. If they are not zeros and equal, we continue; if they are different, we stop.
s1[i] | s2[i] | result
-------+-------+----------------------
0 | 0 | equal
>0 | 0 | greater
0 | >0 | less
same character | keep going
different | return the difference
How do we write it in the shortest way possible? This is one of the options:
int mystrcmp(char *s1, char *s2) {
while (*s1 && *s1 == *s2) {
s1++;
s2++;
}
return (*s1 - *s2);
}
Try running this code in your head and look at the table with options above. Why I don't need to check for *b in the while loop condition?