Introduction to Pointers

As I have said multiple times already, C was never supposed to be a language for beginners. We need to talk about one very advanced concept–a pointer–that early in the C course is that you will need to use pointers for something so basic as reading int variables from standard input without using our homemade read_int function.

So, let me explain what a pointer is.

Variables are stored in memory

Whenever you declare a variable, e.g. int a, the value you put into that variable must be stored somewhere in a computer memory available for your program when it is running. The reality is more complex, but a simple enough mental model is to imagine that the memory is just a sequence of numbered blocks, each block called a byte.

A byte is a minimum unit of memory that can be accessed in a program. On all systems you have a chance to see in the modern world, maybe except some very special cases, one byte contains 8 bits ("elementary particles" which store either 0 or 1), but you cannot read or write one bit: a byte is the minimum amount of data you can read or write.

A bit can equal 0 or 1, so if we take one byte–8 of these bits–we'll be able to represent any sequence between 00000000 and 11111111, which gives us 256 different sequences in total:

one bit gives us 2 options: 0 or 1;
two bits give us 2² = 4 options: 00, 01, 10, 11;
three bits give us 2³ = 8 options: 000, 001, 010, 011, 100, 101, 110, 111; ...
eight bits give us 2⁸ = 256 options: 00000000, 00000001, 00000010, ..., 11111110, 11111111.

On most machines, int values use 4 sequential bytes (8 × 4 = 32 bits) to allow 2³² different values, half of which represent negative values (we'll talk about that at some later time). The main idea for now is that when you declare a variable, int a, it will be placed at some location in the memory, which in our model is the sequence of numbered bytes. For example, we could imagine that int a takes 4 consecutive bytes numbered 1000, 1001, 1002, and 1003. In this case, we'll say that the address of a is 1000, and the size of a is 4 bytes.

Getting addresses and dereferencing

In reality, we don't really care what the actual address is–be it 1000 or any other number; we just need to be able to perform two operations:

& means "take an address": if a is a variable, &a is its address;
* means "dereference": if p is known to be an address of an int variable, *p is the value in the 4 bytes of memory that start at that address.

But what is p? We could technically imagine that p is just an int, because addresses are just numbers, but in C, it's a special type of a value called "a pointer".

Let's look at the deference operation once more. If p points to an int, then *p is the value of that int it points to. So if you want to define p as a pointer to int, you write

int *p;
xxxxxxxxxx
 
int *p;

basically saying that *p is an int.

It can get more complicated when you have multiple variables in one line:

int *p, a, *q;
xxxxxxxxxx
 
int *p, a, *q;

Three variables are defined in the above line: p is a pointer to int (because *p is int), a is just int (a number), and q is again a pointer to int.

A real example

Now, let's see how & and * work together. Read this code and run it:

#include <stdio.h>

int main() {
	int a, *p;

	a = 42;
	p = &a;

	printf("value of a: %d\n", a);
	printf("value of *p: %d\n", *p);

	/* we are not supposed to do it but... it works */
	printf("address stored in p is: %d\n", p); 

	*p = 11;
	printf("a: %d\n", a);

	return 0;
}C Code
xxxxxxxxxx
 
1
#include <stdio.h>
2
3
int main() {
4
  int a, *p;
5
6
  a = 42;
7
  p = &a;
8
9
  printf("value of a: %d\n", a);
10
  printf("value of *p: %d\n", *p);
11
12
  /* we are not supposed to do it but... it works */
13
  printf("address stored in p is: %d\n", p); 
14
15
  *p = 11;
16
  printf("a: %d\n", a);
17
18
  return 0;
19
}
Output

Let's see what's going on here. First, two variables a and p are allocated:

int a, *p;
xxxxxxxxxx
 
int a, *p;

When your code is executed here in your browser, it's run by a browser component called WebAssembly; both int and a pointer take 4 bytes in WebAssembly, and they are allocated like this:

address    value 
  ...          
 65523         
 65524    ┌─────┐
 65525    │(p)  │
 65526    │65528│
 65527    └─────┘
 65528    ┌─────┐
 65529    │(a)  │
 65530    │ 42  │
 65531    └─────┘
 65532         
  ...

In this picture, the variable a takes 4 bytes starting at address 65528, and p is allocated at 65524.

Since we assigned a = 42, the four bytes at 65528 now store the value 42 (how exactly it is stored, we'll discuss later).

After we assigned p = &a, the variable p now stores the address of a, which is 65528.

Then we had three printfs:

printf("value of a: %d\n", a);
printf("value of *p: %d\n", *p);

/* we are not supposed to do it but... it works */
printf("address stored in p is: %d\n", p);
xxxxxxxxxx
 
printf("value of a: %d\n", a);
printf("value of *p: %d\n", *p);
/* we are not supposed to do it but... it works */
printf("address stored in p is: %d\n", p);

We know that a is 42. Then, p points to a, so *p is also 42. The third line is tricky: we are not supposed to print pointers as ints with %d, but it does work (for reasons we'll talk about much later), so we can actually figure out where exactly in memory a is located.

Now, the next assignment does something interesting:

*p = 11;
printf("a: %d\n", a);
xxxxxxxxxx
 
*p = 11;
printf("a: %d\n", a);

We put 11 into a block of memory which p points to, which is the memory occupied by a. So when we assign *p = 11, we actually store 11 in a, which our last printf proves.

Might be too much? Just try to remember these two things:

we use & to get an address of a variable;
we use * to get a value that the pointer points to.

Why would we need all this? You will find out very soon!