My understanding of pointers

T_PAAMAYIM_NEKUDATA · August 21, 2017, 8:47pm

I don’t.

I can’t understand the reason you would ever need to know the memory location of something vs that something’s value.
Example:

name = "Name";
address = &name;
//address is the string "Name"'s location in RAM? Why would you ever need this?
new_address = *address; //equals "Name"?

Why would you ever do this over just

name = new_address?

BKISHDROID · July 5, 2019, 4:34am

Pointers are one of those advanced programming techniques that seem really confusing at first, but once you have your “AH HA” moment it all clicks and you see how powerful they really are.
Pointers are used everywhere. In fact, you have probably used pointers plenty of times before without even knowing it. Arrays use pointers. They use a “Short hand” notation for pointers, that you are probably familiar with. Every used this before “Array[1]”? That is essentially a short hand notation of a pointer. But I digress, let me explain not just what a pointer is, but some of the ways they are used…

What is a pointer?

It sounds like you already know this answer, but since others might not know, a pointer is literally the address to the location of a memory. Every variable you make is physically stored somewhere in your ram.
Think of your ram like a city block, and each house inside of that city block holds data. Well each one of those houses has an address.

A pointer is kind of like a post man. The post man knows exactly what streets he has to take, to get to the house, using the address.

In a way, a pointer is kind of like a post man. The pointer knows exactly where to go to get directly to the house on the block. The pointer holds the address. Not the data, but the address to the house. So you can use the post man to get data that is inside of the house, but using the address for the house.

Your ram is like a city block. It has thousands of memory locations and every one of those memory locations has a very specific address to it. Usually the address is written in hexadecimal, like 0x0021.

So a pointer is just a variable that holds the address to the house that has the data. It isn’t the data itself, but it knows exactly where to go to get the data…

Here is an image to help you visualize this idea of how memory in your ram works conceptually:

In that image, the 0x9 represents the address for the data inside of the slot to the right of it. That is that data’s “Address”, and all of those data slots together make up the entirety of your ram. The image is a very small block of memory. Your ram has thousands if not millions of data slots, and every one of them has a unique address.
Arrays are just a simplified way of storing multiple data together in one chunk of memory…

An array is essentially just a “block” of memory. It is several bytes of data that are all next to each other or consecutive.
When you make an array, with 10 elements, you are literally telling the compiler, “Hey, go find 10 data slots that are all next to each other that I can use.” So the compiler searches the ram and says back, “Ok I found a physical location in the ram where there are 10 empty slots for data all next to each other, and here is the first address of the 10 slots. Oh, also, I won’t use those 10 data slots for anything else. I will only put the data you give me inside of there.”

This is exactly what you are doing when you declare an array with “int Array[10]”, The compiler says “Ok, every time you use the word Array, I know you want me to store the data at that location of memory I reserved for you”.

Ok… Now that you have the concept of how an array works in memory, we can get to the point of how an array is nothing more than a pointer to that block of memory.

When you say something like “Array[4] = 127”, the compiler goes, “Oh ok, the user wants me to store the value of 127 in the 4th slot of that 10 slot block of data that I reserved earlier for him.”, but how does the compiler know the location of the 4th slot or where that block of data even is in the ram? It can find the 4th slot using two steps. The first is, that it knows where the starting address of the array is because when you declared the array, it actually stored the FIRST address of that memory block into the array name. That is how it knows where the memory block is… Ok so it knows where the address of the first memory slot is, that’s great, but what about the 4th slot? Well it takes the address of the first memory slot and literally just adds 4 to it. To find the 4th slot.

For instance, if the very first slot of the array has the address 0x0003, and you say “Array[4]” it will go, “Ok, I start at 0x0003, because I stored the first address of that memory location to the name ‘Array’ earlier, so now I just have to count up 4 from there (Because arrays are zero based)… sooooo I start at 0x0003…. Then I count… 0x0004… 0x0005… 0x0006… 0x0007! Ah ha! I need to store the value of 127 into address 0x0007!” and then it stores the value in the memory location.

So when we do “Array[4]” the compiler knows exactly where it needs to store the data.

I only used this array example because I assume you are at least a little familiar with arrays. BUT! There is another way of doing this exact same process without using an array… which is where a POINTER comes in…
Pointers hold the address of the memory location, not the data itself. Instead of asking the compiler to go find 10 empty memory slots for our array, we could simply set a pointer to point to that starting address ourselves! We could assign a pointer like this:

int * myPointer = 0x0003;

NOW we can change the value in the pointer by hand to tell the compiler where we want to get data from…

If we say “ * myPointer ” we are telling the compiler, “Hey, go get the data that is stored in the address 0x0003!”, and the compiler goes, “Ok” and returns the data. But our pointer doesn’t hold the data. It just holds the address TO the data. We use the “ * ” to tell the compiler that we want the DATA at that address. If we don’t use the “ * ” symbol in front of the pointer, we are telling the compiler, “Hey, what is the address that the pointer is pointing too?”. So if we say something like “myPointer” WITHOUT the “ * ” then the compiler responds with, “Hey the pointer is pointing to address 0x003 at the moment”. And so we get the value of 0x003 returned from the pointer, instead of the data that the address is pointing to.

So how do we get to slot 4 of our array if we use “myPointer” instead of “Array[4]”? Good question, we just say “myPointer + 4” so the pointer now increments the address it is pointing to by 4. So 0x0003 + 4 = 0x0007!
Now our pointer is pointing to the 4th data slot in our 10 slot memory block. Now that it is pointing to the correct address, we use “*myPointer” to get the DATA at that address.

That process looks really familiar doesn’t it? We just did the same thing that “Array[4]” did, but we had to write more code! Really an array is just a short hand notation for a pointer, where the compiler keeps track of the memory addresses for us, and we don’t have to give it much thought. (It also does a few added things like reserve the memory so nothing else uses it and excreta, but that is out of the scope of this question).

WHY USE A POINTER? WHAT ARE THEY GOOD FOR?

Once you understand that a pointer is really just an address to a specific location in memory, it becomes easier to understand what it is used for. However, because pointers are an advanced topic, some of these techniques might seem foreign to you. So I will try to explain them the best I can.

Let’s use something you have seen before in this Unreal Course, and in fact you have encountered over and over in the course so far, which is objects.

We have seen things like “myCharacter->GetActorLocation();”

Well “myCharacter” is a pointer to a memory location, just in our explanation above. What we are doing is we are saying, “Hey compiler, there is a chunk of data in your memory that tells you how to get my characters location. Here is the address to the beginning of that chunk of data…”, the compiler goes “Oh, ok I know where that is in my memory location”, then you go, “Ok compiler, I am glad you know where that data is, now I want you to find the part of the data that holds the ‘GetActorLocation’ function and execute it!”, and the compiler goes, “Ok human, I now know where that data is, AND I will look for the section of that data that is labeled GetActorLocation and run it!”

This is an over grossly simplified explanation of what is going on when we say “myCharacter->GetActorLocation();”

One of the biggest reason for pointers, is to access the data directly. See the compiler likes to make copies of stuff. For example, if you use this function:

void MyAddTenFunction(int myVariable)
{
myVariable = myVariable + 10;
}
And lets say you have a variable in your code somewhere called “myCodeVar” which is equal to 10, so like this:
Int myCodeVar = 10;

And you pass it into your function…

MyAddTenFunction(myCodeVar);

The compiler goes, “Oh hey the user wants me to use the data stored in ‘myCodeVar’ and perform some task on it in the function. So what I am going to do is I am going to make a COPY of it into a new memory location.” So the compiler goes, “Ok, myCodeVar’s, data is stored in address 0x0009, so I am going to copy the contents of address 0x0009 into this new address I have allocated specifically for variables passed into the “MyAddTenFunciton” in the “myVariable” parameter, which is address 0x2201 over here.” So the compiler copies the value in address 0x0009 to the new address which has allocated fro “myVariable” then it goes “Oh, ok the function tells me I need to add 10 to the value. So I am going to add ten to the copy I made at address 0x2201.”… then the compiler goes “Ah, ok, I don’t have anything more I need to do, I added 10 to the copy I made, but now the function has come to a end, and I don’t need a copy of “myVariable” anymore because I did everything I was supposed to, so I am going to delete it and exit.”

But your original variable didn’t change! “myCodeVar” is still 10, not 20! Because the compiler made a COPY of myCodeVar and performed all the operations on the COPY of myCodeVar then deleted it when it was done… WTH compiler!?

AHHHH now we do the samething but with a POINTER instead… So now our function looks like this:

void MyAddTenFunction(int *myPtr)
{
*myPtr = *myPtr + 10;
}

Now our function accepts a POINTER to data, and not the data itself…
There is one thing I haven’t explained yet that I have to explain before we continue. Normal variables have addresses to their data location in ram. If we want to get the address of where a variable is stored, and not the value of the variable then we put a “&” in front of it… SO….

We do “MyAddTenFunction(&myCodeVar);” this time…

“&myCodeVar” will not return the value of 10, which is what is stored in the variable, but instead it will return something like 0x7832, which is the ADDRESS of where its data is stored. SO… We are passing in the address of the data for the variable myCodeVar, and not the data itself into the function “MyAddTenFunction” when we say “MyAddTenFunction(&myCodeVar);”

So what? Ok we are passing the address of the variable into the function, and we are saving the address of the variable into the pointer named “myPtr” in the function, what does that do for us?

Well when we say “*myPtr = *myPtr + 10;” we are telling the compiler, “Hey I want you to go to the data located at this address directly, and add 10 to it.” The compiler goes “Ah ok, I will go DIRECTLY to where that variable’s data is at using the address you gave to me and I will add 10 to it, right where it is. No copies…” and then the function exits.
But NOW when you return to your code, you see that your “myCodeVar” now holds the value of 20! Wait what?
The function modified the variable in place! You told it where to find the variables data, and you modified it directly.
YOU, “Yeah, yeah, that is great, but I can do the same thing using the ‘return’ command” and you are absolutely correct… in this case… but what if you need to modify TWO variables? Well you can only return 1 variable back with “return”! You can’t return more than one variable! BUT you CAN EDIT two variables in place at the same time.

Now, these are just EXAMPLES of things you can do with pointers. However, I should point out before I get flamed, in the last example, where you use pointers to edit a variable in place is generally frowned upon and not good practice, and should be avoided at all costs, BUT it goes to show some of the things you can do with pointers… for example only…

The pointer concept comes hugely into play with Object Oriented Programming.

I am just going to touch on this briefly because this is again one of those advanced topics, but when you create an object in an OOP language, you are not actually making a complete copy of the object in memory. What you are actually doing is, like when defining an array, you create a block of memory in your ram where just the variables of the object can be stored. The object itself is actually a pointer to that block of memory! So for instance, if your class has a function inside of it that performs some task. When you create an object from that class, you do not actually copy all of the functions in the class, instead you creating a pointer to a new block of memory that is large enough to hold all the variables in memory for that class. Every time that function is called, the compiler goes uses the object’s pointer to go find the vairables that then get passed through the function. But there is only one copy of the function in the memory, it is only the variables that are copied and ran through the function by using pointers to the data’s location! This saves a lot of memory in your ram. Instead of making multiple copies of code that doesn’t change, it uses the same function code, and just moves the memory locations for each object through the function using pointers to the data!

There are a LOT of things pointers are used for other than this, but now that you have a good idea of what it is, and some of the applications it is used in, it might help you a little more understand why we use them in the code.
Again, pointers are a really advanced topic and are extremely useful, but usually they are used for really advanced topics and are kind of hard to explain using basics.

Sandeep_Kumar_P · August 22, 2017, 12:48am

That was a really thorough explanation. I had an understanding of pointers before but your answer cleared it a lot more. Thanks @BKISHDROID

(Someone pin this answer in the forum )

T_PAAMAYIM_NEKUDATA · August 22, 2017, 2:21am

Thank you for this!

This was very helpful! Particularly the copying of the data part!

So is it safe to say that pointer can be synonymous with reference ?

I.e using php to assign an array value:

foreach($arr as &$not_the_copy){$not_the_copy['existing_index'] = 'new_value';}; 
// you cannot assign a value here if you leave the '&' off

Does this occur because the iterator creates a copy?

BKISHDROID · August 22, 2017, 2:40am

To be honest, I don’t deal with PHP at all. So I couldn’t relate it. I am an embedded systems engineer, so I deal with very low level software, i.e. understanding how the code actually affects the hardware at a binary level. The only high level language experience I have is C#.

But even the high level languages do this same process, they just do it behind the scenes at a much lower level. While I can’t say for certain in regards to PHP, I am fairly confident that PHP would do the same in regards to arrays where it keeps track of the addresses in memory of where the data is stored.

There is another aspect of pointers I didn’t touch on because it starts getting down to the binary language level of coding which is the pointer type and how the data is stored. i.e. a float is actually 4 data slots (Or bytes) total, not just one. A 8 bit int on a microprocessor is actually 2 data slots (or bytes). So you have to tell the pointer the type of data you are pointing to, so that the pointer knows how many spaces to move along the block of data to get the next value.

But don’t worry about that part. That is even more advanced than pointers themselves. As long as you think of it as I explained above, you will be fine.

Wish I could help more with that.

BKISHDROID · July 5, 2019, 4:34am

So I took a quick look at the explanation for “references” for PHP and this is what was returned:

So it looks like what is called a “reference” is in fact a pointer in PHP. Different languages refer to pointers differently. And when you start getting higher up in the higher level languages, they start doing some crazy things with pointers, but C and C++ are fairly low level languages. In fact C++ is basically just a “Addon” for C really…

But in short, yes it looks like references are pointers in PHP.

I then found this:

So just from what I can put together in the last few minutes looking into your question, yes it looks like what is happening, is that the foreach loop is cycling through the array, but instead of storing the ‘value’ of the array into the placeholder variable it is storing the ‘address’ to each of the data in the array, which allows you to edit the array’s data directly, instead of making a copy of it, similar to what I explained in the above example, except it is doing it for the full length of the array.

Hope that helps a little more.

BKISHDROID · August 22, 2017, 4:49am

Since we are on the topic of pointers and arrays, I figured I would throw out what a “buffer overflow” or a “overflow” is in programming. I know no one really asked about this, but this is a real issue you can run across when you program in any language and it is common enough to have a slang term for. AND it has everything to do with pointers…

All a buffer overflow is, is where the pointer value goes beyond the memory block of data slots (actually called registers) that was set aside for the array.

So if you try to store a block of data larger than the size of your array in the memory, and the way you do it is by increasing the pointer for every piece of data, your pointer will eventually point to a memory address beyond your array.

Lets say you have a section of memory that is 3 bytes long (3 data slots long) and your memory slots start at the address 0x0006.

int * myMemoryPtr = 0x0006;

And you write 3 bytes of data to that memory block… Ok so you set the pointer to 0x0006 and your write your first piece of data:

*myMemoryPtr = data1

now you increment the piointer the the next memory block, so you say:

myMemoryPtr++;

Now your pointer is pointing to address 0x0007! so you wright the second piece of your data…

*myMemoryPtr = data2

Now you increment your pointer:

myMemoryPtr++;

Your pointer is now pointing to addres 0x0008, so you write your third piece of data:

*myMemoryPtr = data3

Ok all is good. Our memory addresses look something like this:

-----Start of our data block in memory-----
0x0006 = data1
0x0007 = data2
0x0008 = data3
-----End of our data block in memory-----
0x0009 = 'null’
0x000A = ‘null’ <we start using letters because addresses use hexadecimal, so we use A instead of 10
0x000B = 'null’
0x000C = Program Code
0x000D = Program Code
0x000E = Program Code
0x000F = Program Code
0x0010 = ‘null’

Note that memory addresses 0x000C through 0x000F has some important code that is required for your program to run correctly!

NOW… What if we try to save data that is 10 bytes long?

So we start like we did before…

And you write 10 bytes of data to that memory block… Ok so you set the pointer to 0x0006 and your write your first piece of data:

*myMemoryPtr = data1

now you increment the pointer the the next memory block, so you say:

myMemoryPtr++;

Now your pointer is pointing to address 0x0007! so you wright the second piece of your data…

*myMemoryPtr = data2

Now you increment your pointer:

myMemoryPtr++;

Your pointer is now pointing to address 0x0008, so you write your third piece of data:

*myMemoryPtr = data3

Ok we are at the end of our data block, BUT, the code doesn’t know that… It just knows we have 10 bytes of data to write… So the code just continues as it always has…

Now you increment your pointer:

myMemoryPtr++;

Your pointer is now pointing to address 0x0009, so you write your fourth piece of data:

*myMemoryPtr = data4

Now you increment your pointer:

myMemoryPtr++;

Your pointer is now pointing to address 0x000A, so you write your fifth piece of data:

*myMemoryPtr = data5

Now you increment your pointer:

myMemoryPtr++;

Your pointer is now pointing to address 0x000B, so you write your sixth piece of data:

*myMemoryPtr = data6

etc until it writes all 10 bytes of data…

-----Start of our data block in memory-----
0x0006 = data1
0x0007 = data2
0x0008 = data3
-----End of our data block in memory-----
0x0009 = data4
0x000A = data5 <we start using letters because addresses use hexadecimal, so we use A instead of 10
0x000B = data6
0x000C = data7
0x000D = data8
0x000E = data9
0x000F = data10
0x0010 = ‘null’

Look what happened! Remember those memory addresses that had important program code? Memory addresses 0x000C through 0x000F? That code that was crucial for your program to not crash!? WHAT HAPPENED TO IT?

Well you overwrote it with some arbitrary data… you didn’t keep track of your pointer well enough and your program started to overwrite itself! (It is kind of like a surgeon trying to perform heart surgery on themselves…)

This is what is known as an “overflow” or “buffer overflow”… You had a very specific amount of memory you could write data to, but you tried to store too much data in it, so your pointer just kept moving up through the memory addresses and started overwriting other parts of your code.

This causes computers to crash, hard drives to become corrupt, corrupted data in the ram, black holes open up… It is a nasty bug, and one that is not so easily identifiable. SO be aware that you have to be mindful of what memory you point to with pointers, because if you point to the wrong thing, you could end up performing open heart surgery on yourself…

P.S. That is why arrays have to be explicitly defined, so that the compiler can store memory for your array and have it empty. That is why arrays also take up large amounts of memory even if they are not being utilized!

But arrays have a LOT of safe guards to prevent this issue. This is why the compiler always yells at you when you do “For” loops and try to go one index above the last index array because you forgot to account for the index being zero based. It is because the compiler is preventing you from pushing the pointer past the specific memory it has set aside for you, so you don’t start overwriting stuff… But when you manually manage pointers… all bets are off… Evil Laugh MU HA HA HA HA HA Evil Laugh… Ok… I am tired… that is tired humor… I am going to bed…

Hope this helps…

noise · August 22, 2017, 5:01am

Excellent information! Thanks!

Tom_Franklin · August 24, 2017, 7:40am

This causes computers to crash, hard drives to become corrupt, corrupted data in the ram, black holes open up…

Don’t forget that it could also make demons fly out of your nose XD

T_PAAMAYIM_NEKUDATA · August 24, 2017, 10:49pm

Very good informing!

Thanks for taking your time to write that up!

surge914 · September 4, 2017, 6:22pm

This is a lot to take in but thanks for the explanations. It does help to clarify what pointers are and what they do. Have a great day.

Rob · November 20, 2017, 10:24am