.NET memory management

The summary and highlights of the book "Under the hood of .NET memory management" which I enjoyed reading it. Some of the contents might be outdated but I think it gives you a good understanding of how the languages memory management work.

When a .NET application runs, four sections of memory (heaps) are created to be used for storage:

The Code Heap stores the actual native code instructions after they have been Just in Time compiled (JITed).
The Small Object Heap (SOH) stores allocated objects that are less than 85K in size.
The Large Object Heap (LOH) stores allocated objects greater than 85K.
Process Heap

When a method is called, .NET creates a container (stack) that contains all of the data necessary to complete the call, including parameters, locally declared variables and the address of the line of code to execute after the method finishes. When a method completes, its container is removed from the top of the stack and the execution returns to the next line.

If your application has multiple threads, then each thread will have its own stack.

When an instance of a reference type is created (usually involving the new keyword), only an object reference is stored on stack. The actual instance itself is created on the heap, and its address held on the stack.

When you pass a value type as a parameter, all you actually pass to the calling method is a copy of the variable. Any changes that are made to the passed variable within the method call are isolated to the method.

All the GC does is look for allocated objects on the heap that aren't being referenced by anything. The most obvious source of references, are the stack and global/static object references and these are called root references or GC roots. As well as root references, an object can also be referenced by other objects.

Garbage collection of the Small Object Heap (SOH) involves compaction. When compaction occurs, marked objects are copied over the space taken up by unmarked objects, overwriting those objects, removing any gaps, and keeping the heap contiguous. The advantage of this is that heap fragmentation (i.e. unusable memory gaps) is kept to a minimum. The main disadvantage is that compaction involves copying chunks of memory around, which requires CPU cycles and so, depending on frequency, can cause performance problems.

The Large Object Heap (LOH) isn't compacted, and this is simply because of the time it would take to copy large objects over the top of unused ones.

When you mark something static, the runtime creates a global instance soon after the code referencing them is loaded and used. They are accessible by all threads in an app domain (unless they are marked with the ThreadStatic attribute and are never garbage collected because they essentially are root references in themselves.

The GC runs on a separate thread when certain memory conditions are reached or when the application begins to run out of memory. This is never really a good idea to run GC manually because it can cause performance and scalability problems.

In unmanaged C/C++ applications, objects are allocated onto the heap wherever space can be found to accommodate them. When an object is destroyed by the programmer, the space that that object used on the heap is then available to allocate other objects onto. The problem is that, over time, the heap can become fragmented. As a result, the heap becomes larger than necessary. Another problem is that whenever an allocation takes place (which is often), it can take time to find a suitable gap in memory to use.

To minimize allocation time and almost eliminate heap fragmentation, .NET allocates objects consecutively, one on top of another, and keeps track of where to allocate the next object.

When an object has just been created, it is classified as a Gen 0 object, which just means that it's new and hasn't yet been inspected by the GC. Gen 1 objects have been inspected by the GC once and survived, and Gen 2 objects have survived two or more such inspections (don't forget that the GC only lets an object survive if it ultimately has a root reference).

The GC runs automatically When the size of objects in any generation reaches a generation-specific threshold. To be precise, when:

Gen 0 hits ~ 256K.
Gen 1 hits ~ 2MB (at which point the GC collects Gen 1 and 0).
Gen 2 hits ~10MB (at which point the GC collects Gen 2, 1 and 0).
GC.Collect() is called in code.
the OS sends a low memory notification.

Objects larger than 85 KB are allocated onto the Large Object Heap (LOH). Unlike the SOH, objects on the LOH aren't compacted, because of the overhead of copying large chunks of memory. When a full (Gen 2) GC takes place, the address ranges of any LOH objects not in use are recorded in a "free space" allocation table. If the chunks are <85 K, they will be left with no possibility of reuse, as objects of that size obviously never make it onto the LOH. In fact, for performance reasons, .NET preferentially allocates large objects at the end of the heap (i.e. after the last allocated object).

Normally, you would probably expect that an array of doubles would only be allocated onto the LOH when it reached an array size of about 10,600. However, for performance reasons, doubles arrays of size 999 or less allocate onto the SOH, and arrays of 1,000 or above go onto the LOH.

85K = 85 * 1024 = 87040B
Double = 8B => 87040 / 8 = 10880 ~ 10,600

GC has two modes: Workstation mode, which is tuned to give maximum UI responsiveness and Server mode, which is tuned to give maximum request throughput.

Weak object references allow you to keep hold of objects, but still allow them to be collected if the GC needs to. The ideal candidate for a weak reference will be easy to calculate and will take up a substantial amount of memory

Use the IDisposable interface with the Disposable pattern to release unmanaged resources, and suppress finalization in the Dispose method if no other finalization is required.

Use the using statement to define the scope of a disposable object, unless doing so could cause an exception.

Use StringBuilder when the number of string concatenations is unknown or contained within a loop.

Initialize the capacity of a StringBuilder to a reasonable value, if possible.

Use a struct to represent any immutable, single value whose instance size is 16 bytes or smaller and which will be infrequently boxed.

Override the GetHashCode and Equals methods of a struct.

Mark all fields in a struct as readonly

Use the readonly keyword on class fields that should be immutable.

Use static methods when it is unnecessary to access or mutate state.

Remove event handlers before disposing of an object.

ref tells the compiler that the object is initialised before entering the function, while out tells the compiler that the object will be initialised inside the function.

Value types consist of simple types, enum types, struct types, and nullable types. Reference types include class types, interface types, array types, and delegate types.

Value types can be converted to reference types through a process known as boxing, and back into value types through unboxing. Boxing is sometimes necessary, but it should be avoided if at all possible, because it will slow down performance and increase memory requirements. Boxing can be avoided by using parameterized classes and methods, which is implemented using generics; in fact this was the motivation for adding generics.

Thanks to the GC, a true memory leak is rare in managed code.

When Dispose method is available, call it. Task is an exception and although it implements IDisposable, you don't need to dispose it.

Since strings are immutable, methods from the string class return a new string rather than modify the original string's memory. The value for a string is stored in the heap. This is important because the stack can only hold 1 MB, and it would not be very difficult to imagine a single string using all of that space.

As a good rule of thumb, you should probably never use a StringBuilder outside of a loop. Also don't do any string concatenation inside of the calls to the Append method.

Yield return has a couple of advantages. The most important, from a memory perspective, is that only one item at a time has to be in memory, which can be a dramatic improvement if only one object is used at a time.

If an object is created before a slow method call and won't be needed after it returns, release the object before calling the method. If the GC runs while this long-running process is running, the object will be promoted to a later generation. Objects take longer to collect.

Assemblies will be compiled with debug symbols, resulting in poorer performance, and meaning that the GC will not work as effectively as in a release build. Essentially, the GC will be less aggressive in reclaiming memory when debug symbols are included. Since the debug symbols are included, the GC needs to be prepared that a debugger could be attached, and many of the rules for identifying unreachable references may not be applicable. With a debugger attached, a lot more objects may be reachable.

LINQ uses the yield statement to avoid having to return all of the data until the IEnumerable<T> or IQueryable<T> is expanded. Thats why IEnumerable is better choice for bigger datasets when we want to iterate through them one-by-one because we can read items one at a time and it doesn't need the the whole list become available into memory.

A 32-bit application is capped (without going into deeper details) at 2 GB of Virtual Address Space, whereas a 64-bit application is capped at 8 TB. In 64-bit systems pointers and references have doubled in size and any given program compiled as a 64-bit application will use more memory than the 32-bit version of the same program.

It is important to remember that the allocated memory is virtual, which means it doesn't directly refer to physical memory. Instead, it refers to a virtual memory address that is translated by the CPU to a physical memory address on demand. The reason for using virtual addresses is to allow for a much more flexible memory space to be constructed; one which is far larger than what is actually available in physical RAM. To make this possible, a machine's hard drive is used as a kind of secondary memory store to increase the total amount of memory available to our applications, and we will now explore that virtual/physical relationship.

Physical memory is simply the amount of physical RAM installed on the machine. By definition, it has a fixed size that can only be increased by opening the back of the machine and slotting more inside. Any resource with a fixed limit causes a problem; once that limit is reached – bang – there is nowhere to go. Virtual memory was devised as a way around the problem: using a machine's hard drive as a kind of alternative memory store of last resort.

Each 32-bit Windows OS process has a maximum address space of 4 GB, which is calculated as 232. This is split between a private space of 2 GB for each process, and a space of 2 GB for the OS's dedicated use.

For 64-bit Windows processes, the available address space depends on the processor architecture. It would be theoretically possible for a 64-bit system to address up to 18 exabytes (264). However, in reality, current 64-bit processors use 44 bits for addressing virtual memory, allowing for 16 terabytes (TB) of memory, which is equally split between user mode (application processes) and kernel mode (OS processes and drivers). 64-bit Windows applications therefore have a private space of up to 8 TB.