char type

We'll soon start talking about strings and files, but before we go there, we must mention a special type which can be used for storing individual characters like 'a' or '0': char.

int is 4 bytes in most compilers

You could've noticed that we actually never formally defined what an int is. In the very beginning we just said that int is a type to store numbers, but we never mentioned how big those numbers can be. This is an exciting topic and we'll talk about it a lot, but later; for now, let me give you just a little bit of extra information: I'll tell you that in most modern compilers, an int variable takes 4 bytes.

A byte in modern computers, as you surely know, consists of 8 bits, each of which can store either 0 or 1. So, if one bit can have two different values: 0 or 1, then two bits combined can have 4 values: 00, 01, 10, and 01. Three bits can make 8 different combinations: 000, 001, 010, 011, 100, 101, 110, and 111. We can continue that and notice–it's a simple combinatorics–that for one byte, which is 8 bits, there are 28 = 256 different values.

I told you that an int is 4 bytes in most C compilers; 4 bytes contain 32 bits, so one int can have 232 different values, which is a little bit more than 4 billion: the exact numbers is 4294967296. Since int numbers can be both positive and negative, the range for an int variable is from –231 to 231–1. We'll talk about it more later.

char is 1 byte

I was very cautious and never said that int is always 4 bytes: the C language does not actually enforce that, and in some older compilers–like the ones I started learning C on–int was actually 2 bytes. But sometimes we need a type that is smaller than that, and C has one numeric type which is always defined to be exactly 1 byte. This type is called char.

char a;

It is important to understand that char is still a numeric type, despite the name. Now the fun part is, it is left to the compiler to decide if a char variable can be negative (in which case its range will likely be from —128 to 127) or not (then it's from 0 to 255). The good news is that we mostly don't care, because we rarely, if ever, use char for store numbers; we use it to store characters. As we discussed when we were talking about reading a number, characters are defined by their ASCII codes, so 'A' is actually 65 and '0' is 48.

char c = 'A';  /* the value is 65 */
char d = '0';  /* the value is 48 */

It is very useful that the main part of the ASCII table, where the English letters and basic symbols are, is all within the range between 0 and 127, and the extended ASCII table is 128 to 255. We can safely assume that you can use a variable of char type to store an ASCII character.

If you need to store a non-ASCII character–maybe a Greek or Cyrillic letter, or a Chinese character, or an emoji, or literally anything other than the characters listed in ASCII table–then you're out of luck, and we'll need to talk about character encoding, Unicode, and UTF-8, which is a very complex topic to be discussed much later. For now in this course, we'll only use ASCII characters.

getchar() returns int and not a char

One question students often think about is why getchar() would return int and not a char, if it's sole purpose is to read and return one character, and char is a perfect type to store one character. The answer is simple: getchar() must be able to signal that there are no more characters to be read, and it does that by returning a special value of EOF. This EOF cannot be equal to any character, so it must be outside of the range of char; that's why getchar() returns int.

We'll get more practice with char type on the next page, when we start talking about strings!

© Alexander Fenster (contact)