Memory Management and Performance Techniques
In this article, we will give preliminary information about Memory Management and performance enhancing methods for dynamic memory usage. Although the coding examples throughout the article are exemplified with C / C +++ language, there are often equivalents in other languages. The main topic we want to explain is actually a standard issue about CPU, RAM and Operating Systems, regardless of languages.
This subject actually covers the whole of the operations in which the processing of data in the basic functioning of the software comes to life. Memory management covers all the work done for the realisation of more complex data representations, starting from the simplest variable definitions.
In terms of memory management, we are currently performing the most basic memory management functions in terms of memory allocation and re-release with our variable definitions and uses that we use while coding.
These operations are provided as basic operations regardless of the software language. In short, we do not bother with the memory allocation, length and deletion of the variable we have opened in memory when it is finished. However, dynamic memory allocations may be required for more specialised data structures or very unique data representations according to the subject we are processing.
With the use of dynamic memory, we can make the memory allocations we need except for the variables we use. The allocated memory region, which we call buffer, is a byte array with a certain address and length.
The use of Dynamic Memory is the management of a 3-stage process. These consist of the most basic operations such as allocating, using and releasing when finished.
A data is represented in memory at a specific address and of a specific length. You can allocate 1 byte of dynamic space, or you can allocate 10 MB of memory.
A written application is loaded into a memory region and executed while running, regardless of the operating system. The application, called a process, actually consists of separate sections called segments.
These segments are known as Code Segment, Data Segment, Stack Segment, Extra Segment. When an application (Process) is loaded into memory, memory allocation is made and each segment data is loaded into the relevant allocated segment . Of course, data is not loaded for each segment. Because some segments are partially loaded or used more at runtime. For example, Code Segment contains machine codes and is loaded into memory at the first memory load of the process. Whereas the Stack Segment starts to be used when the machine codes of the process are executed by the CPU.
When developing software, our variable usage usually works by making allocations from the areas we call Data Segment and Stack Segment in the process. Data segment is also called Heap in this sense. And it is usually used to make global variable values and dynamic allocations. Function parameters (formal variables) are allocated according to function calls and are released when the function is exited. Stack Segment is used for these variables. Stack segment works with the LIFO method. Stack Segment is determined when the software is loaded into memory and its size can be adjusted. According to the invocation of the functions, the first stack segment is advanced as far as the relevant variables in the segment, and when it needs to be deleted, it is moved backwards. Its position changes like an invisible cursor in the segment. Software developers do not take any action for the management of the Stack Segment. This place is used thanks to machine codes such as PUSH, POP, which are found in the Code Segment and run.
We said that a process has segments within itself. Today, Operating Systems can run many applications at the same time with Multi process capabilities.
Can a process read the data of another process or access and write the relevant internal segments? Theoretically, we can say it is possible. But in general use, they are blocked by the CPU level access barrier, which we call Protected Mode. For dynamic memory usage, in fact, the software developer is left with the part we call External Memory, which is outside the use of the Data Segment and all other processes. Software developers can use this place with dynamic allocations.
Apart from these, it is also necessary to briefly mention the subject of Shared Memory. Shared Memory is the name given to the memory region allocated for more than one different application software to access the same memory region in the part known as external memory. Of course, the memory region allocated in the external region does not have to be shared with other processes. However, other processes can read and write from this memory region. The evaluation of this information in terms of security is also important for the programmes written. It is an important advantage in terms of security that the software developer knows this and does not keep security-critical data in the memory area allocated in the external area. For example, if you are writing a Web Browser, keeping session information in external memory may cause security weakness.
If we need to mention Virtual Memory as another capability, it is actually the name given to the addressing capability that allows us to perform normal memory access operations via disc when physical memory is not enough. Of course, many operations such as physical memory addresses and virtual memory addresses are performed here. The MMU (Memory Management Unit) interface converts these virtual addresses for us and actually allows us to allocate much larger memory areas than physical memory. From another point of view, we can continue memory allocation operations even if physical memory is exhausted. Virtual memory access is not the only reason why the harddisk lamp flashes a lot on computers with low RAM. But it is largely due to virtual memory usage.
What we can do for performance
With what we have explained so far, we have tried to explain in the simplest way a subject that actually contains more details in its inner workings. Apart from these, in terms of Dynamic Memory usage, we can mention a few techniques that can make the processor less busy and gain performance.
1 - Firstly, if dynamic memory allocation needs to be done continuously, we can make this process buffered. For example, when creating a dynamic array, instead of allocating memory every time an addition is made to the array, we can organise it as an algorithm that continues by allocating 100 elements and allocating a new 100-element space when full. Although this increases our memory usage a bit, it will allow us to process faster at runtime.
2 - We can use functions such as memcpy, memmove to copy data from one region to another region in memory. It will work much faster than copying with a loop. You can use memcmp to compare two memory regions (buffer) or memset functions to fill all byte values of the relevant memory region with a certain value. The performance of these functions is due to the fact that they utilise the MMU, MMX and SIMD capabilities of the processor rather than normal memory accesses.
char *pBufferSource = (char*)malloc(1000); //1000 byte memory allocation
if( pBufferSource!= NULL ) {
char *pBufferTarget = (char*)malloc(1000); //1000 byte memory allocation
if( pBufferTarget != NULL ) {
//Fill 900 bytes of the Source Buffer with the character 'A'
memset( pBufferSource , ‘A’ , 900 );
//Copy 900 bytes of Source Buffer to Target Buffer area
memcpy( pBufferTarget , pBufferSource , 900 );
//Since we are done, we release the target buffer field.
free( pBufferTarget );
}
//We release the source buffer field because we are done.
free( pBufferSource );
}
3 - Apart from these, by ensuring that the variables we use as loop variables are allocated on the CPU registers instead of allocating them in memory, you eliminate memory access by using the CPU's bus in each increment of the counter variable. And so you can make your loop spin faster.
//With the register prefix we specify that the variable uses CPU registers instead of memory
register int x = 0;
for( i = 0; i < 5000000; i++ ) {
printf( “%d\n” , i );
}
4 - Avoiding in-loop variable definitions will ensure that another memory allocation is not made at each loop execution. In other words, creating the necessary variables before the loop starts and using them in the loop will save us performance in this method.
int s = 0;
while( i < 1000000 ) {
//here we could have defined int s = i * 3; but we allocated it before the loop
s = i * 3;
printf( “%d\n” , s );
i++;
}
In terms of performance, you can make your algorithms more efficient with a few methods above. Sometimes you can reduce software running costs with such improvements in the parts that make the most intensive cycles in software.
For example, if you provide the service provided by a backend software with 100 servers, when you make performance improvement, you can provide the same capacity service with maybe 80 servers, and at the same time, you can reduce your operating costs economically.
In this article, we have touched on the basic issues in the field of memory management in general terms. Since it is a subject that contains too many technical sub-elements to fit in a single article, it would be more accurate to consider this article as preliminary information only. Nevertheless, it would be a more inclusive approach not to consider Memory Management operations as independent from the subject of data structures and algorithms.
I would like to share with you a few resources that I think will be useful for those who want to learn more about these issues.
https://en.wikipedia.org/wiki/Memory_management
https://en.wikipedia.org/wiki/Intel_8086
https://www.nxp.com/docs/en/supporting-information/TPQ2CH09.pdf

