CLR via C# 3rd - 05 - Primitive, Reference, and Value Types

1. Primitive Types

Any data types the compiler directly supports are called primitive types.

Primitive types map directly to types existing in the Framework Class Library (FCL).

For the types that are compliant with the Common Language Specification (CLS), other languages will offer similar primitive types. However, languages aren’t required to offer any support for the non–CLS-compliant types.

Primitives with Corresponding FCL Types

Another way to think of this is that the C# compiler automatically assumes that you have the following using directives in all of your source code files.

using sbyte = System.SByte;

using byte = System.Byte;

using short = System.Int16;

using ushort = System.UInt16;

using int = System.Int32;

using uint = System.UInt32;

...

About the compiler:

First, the compiler is able to perform implicit or explicit casts between primitive types. C# allows implicit casts if the conversion is “safe,” that is, no loss of data is possible. C# requires explicit casts if the conversion is potentially unsafe. For numeric types, “unsafe” means that you could lose precision or magnitude as a result of the conversion.

Be aware that different compilers can generate different code to handle these cast operations. For example, when casting a Single with a value of 6.8 to an Int32, some compilers could generate code to put a 6 in the Int32, and others could perform the cast by rounding the result up to 7. By the way, C# always truncates the result.

In addition to casting, primitive types can be written as literals.

If you have an expression consisting of literals, the compiler is able to evaluate the expression at compile time, improving the application’s performance.

2. Checked and Unchecked Primitive Type Operations

The CLR offers IL instructions that allow the compiler to choose the desired behavior. The CLR has an instruction called add that adds two values together. The add instruction performs no overflow checking. The CLR also has an instruction called add.ovf that also adds two values together. However, add.ovf throws a System.OverflowException if an overflow occurs. In addition to these two IL instructions for the add operation, the CLR also has similar IL instructions for subtraction (sub/sub.ovf), multiplication (mul/mul.ovf), and data conversions (conv/conv.ovf).

One way to get the C# compiler to control overflows is to use the /checked+ compiler switch. This switch tells the compiler to generate code that has the overflow-checking versions of the add, subtract, multiply, and conversion IL instructions. The code executes a little slower because the CLR is checking these operations to determine whether an overflow occurred. If an overflow occurs, the CLR throws an OverflowException.

In addition to having overflow checking turned on or off globally, programmers can control overflow checking in specific regions of their code. C# allows this flexibility by offering checked and unchecked operators.

e.g

UInt32 invalid = unchecked((UInt32) (-1)); // OK

     Byte b = 100;

     b = checked((Byte) (b + 200)); // OverflowException is thrown

     b = (Byte) checked(b + 200); // b contains 44; no OverflowException

In addition to the checked and unchecked operators, C# also offers checked and unchecked statements. The statements cause all expressions within a block to be checked or unchecked.

e.g

checked { // Start of checked block

Byte b = 100;

b = (Byte) (b + 200); // This expression is checked for overflow.

}

In fact, if you use a checked statement block, you can now use the += operator with the Byte, which simplifies the code a bit:

e.g

checked { // Start of checked block

Byte b = 100;

b += 200; // This expression is checked for overflow.

}

Important: Because the only effect that the checked operator and statement have is to determine which versions of the add, subtract, multiply, and data conversion IL instructions are produced, calling a method within a checked operator or statement has no impact on that method, as the following code demonstrates:

checked {

     // Assume SomeMethod tries to load 400 into a Byte.

     SomeMethod(400);

     // SomeMethod might or might not throw an OverflowException.

     // It would if SomeMethod were compiled with checked instructions.

}

Some recommended rules to programmers

(1) Use signed data types (such as Int32 and Int64) instead of unsigned numeric types (such as UInt32 and UInt64) wherever possible.

(2) As you write your code, explicitly use checked around blocks where an unwanted overflow might occur due to invalid input data

(3) As you write your code, explicitly use unchecked around blocks where an overflow is OK, such as calculating a checksum

(4) For any code that doesn’t use checked or unchecked, the assumption is that you do want an exception to occur on overflow.

Important: The System.Decimal type is a very special type. Although many programming languages (C# and Visual Basic included) consider Decimal a primitive type, the CLR does not. This means that the CLR doesn’t have IL instructions that know how to manipulate a Decimal value. If you look up the Decimal type in the .NET Framework SDK documentation, you’ll see that it has public static methods called Add, Subtract, Multiply, Divide, and so on. In addition, the Decimal type provides operator overload methods for +, -, *, /, and so on.

When you compile code that uses Decimal values, the compiler generates code to call Decimal’s members to perform the actual operation. This means that manipulating Decimal values is slower than manipulating CLR primitive values. Also, because there are no IL instructions for manipulating Decimal values, the checked and unchecked operators, statements, and compiler switches have no effect. Operations on Decimal values always throw an OverflowException if the operation can’t be performed safely.

Similarly, the System.Numerics.BigInteger type is also special in that it internally uses an array of UInt32s to represent an arbitrarily large integer whose value has no upper or lower bound. Therefore, operations on a BigInteger never result in an OverflowException. However, a BigInteger operation may throw an OutOfMemoryException if the value gets too large and there is insufficient available memory to resize the array.

3. Reference Types and Value Types

The CLR supports two kinds of types: reference types and value types.

In C#, types declared using struct are value types, and types declared using class are reference types.

Value type instances are usually allocated on a thread’s stack (although they can also be embedded as a field in a reference type object). The variable representing the instance doesn’t contain a pointer to an instance; the variable contains the fields of the instance itself.

Reference types are always allocated from the managed heap, and the C# new operator returns the memory address of the object—the memory address refers to the object’s bits.

All of the structures are immediately derived from the System.ValueType abstract type. System.ValueType is itself immediately derived from the System.Object type. By definition, all value types must be derived from System.ValueType. All enumerations are derived from the System.Enum abstract type, which is itself derived from System.ValueType. The CLR and all programming languages give enumerations special treatment.

In addition, all value types are sealed, which prevents a value type from being used as a base type for any other reference type or value type

Important For many developers (such as unmanaged C/C++ developers), reference types and value types will seem strange at first. In unmanaged C/C++, you declare a type, and then the code that uses the type gets to decide if an instance of the type should be allocated on the thread’s stack or in the application’s heap.

In managed code, the developer defining the type indicates where instances of the type are allocated; the developer using the type has no control over this.

4. CLR controls the Layout of Type's Fields

To improve performance, the CLR is capable of arranging the fields of a type any way it chooses.

You tell the CLR what to do by applying the System.Runtime.InteropServices. StructLayoutAttribute attribute on the class or structure you’re defining. To this attribute’s constructor, you can pass LayoutKind.Auto to have the CLR arrange the fields, LayoutKind.Sequential to have the CLR preserve your field layout, or LayoutKind.Explicit to explicitly arrange the fields in memory by using offsets. If you don’t explicitly specify the StructLayoutAttribute on a type that you’re defining, your compiler selects whatever layout it determines is best.

You should be aware that Microsoft’s C# compiler selects LayoutKind.Auto for reference types (classes) and LayoutKind.Sequential for value types (structures).

The StructLayoutAttribute also allows you to explicitly indicate the offset of each field by passing LayoutKind.Explicit to its constructor. Then you apply an instance of the System.Runtime.InteropServices.FieldOffsetAttribute attribute to each field passing to this attribute’s constructor an Int32 indicating the offset (in bytes) of the field’s first byte from the beginning of the instance. Explicit layout is typically used to simulate what would be a union in unmanaged C/C++ because you can have multiple fields starting at the same offset in memory.

The Differences between Value Type and Reference Type:

(1) Value type objects have two representations: an unboxed form and a boxed form. Reference types are always in a boxed form.

(2) Value types are derived from System.ValueType. This type offers the same methods as defined by System.Object. However, System.ValueType overrides the Equals method so that it returns true if the values of the two objects’ fields match. In addition, System.ValueType overrides the GetHashCode method to produce a hash code value by using an algorithm that takes into account the values in the object’s instance fields.

(3) Because you can’t define a new value type or a new reference type by using a value type as a base class, you shouldn’t introduce any new virtual methods into a value type. No methods can be abstract, and all methods are implicitly sealed (can’t be overridden).

(4) Reference type variables contain the memory address of objects in the heap. By default, when a reference type variable is created, it is initialized to null, indicating that the reference type variable doesn’t currently point to a valid object. Attempting to use a null reference type variable causes a NullReferenceException to be thrown. By contrast, value type variables always contain a value of the underlying type, and all members of the value type are initialized to 0. Since a value type variable isn’t a pointer, it’s not possible to generate a NullReferenceException when accessing a value type. The CLR does offer a special feature that adds the notion of nullability to a value type. This feature, called nullable types.

(5) When you assign a value type variable to another value type variable, a field-by-field copy is made. When you assign a reference type variable to another reference type variable, only the memory address is copied.

(6) Two or more reference type variables can refer to a single object in the heap, allowing operations on one variable to affect the object referenced by the other variable. On the other hand, value type variables are distinct objects, and it’s not possible for operations on one value type variable to affect another

(7) Because unboxed value types aren’t allocated on the heap, the storage allocated for them is freed as soon as the method that defines an instance of the type is no longer active. This means that a value type instance doesn’t receive a notification (via a Finalize method) when its memory is reclaimed.

4. Boxing and Unboxing Value Types

It’s possible to convert a value type to a reference type by using a mechanism called boxing.

Internally, here’s what happens when an instance of a value type is boxed:

1. Memory is allocated from the managed heap. The amount of memory allocated is the size required by the value type’s fields plus the two additional overhead members (the type object pointer and the sync block index) required by all objects on the managed heap.

2. The value type’s fields are copied to the newly allocated heap memory.

3. The address of the object is returned. This address is now a reference to an object; the value type is now a reference type.

When trying convert reference type to value type. Two steps to accomplish the progress:

First, the address of the value type's fields in the boxed value type's object is obtained. This process is called unboxing.

Then, the values of these fields are copied from the heap to the stack-based value type instance.

Unboxing is not the exact opposite of boxing. The unboxing operation is much less costly than boxing.

Unboxing is really just the operation of obtaining a pointer to the raw value type (data fields) contained within an object. In effect, the pointer refers to the unboxed portion in the boxed instance. So, unlike boxing, unboxing doesn’t involve the copying of any bytes in memory. Having made this important clarification, it is important to note that an unboxing operation is typically followed by copying the fields.

Unboxed value types are lighter-weight types than reference types for two reasons:

(1) They are not allocated on the managed heap.

(2) They don’t have the additional overhead members that every object on the heap has: a type object pointer and a sync block index.

Because unboxed value types don’t have a sync block index, you can’t have multiple threads synchronize their access to the instance by using the methods of the System.Threading.Monitor type

5. Changing Fields in a Boxed Value Type by Using Interfaces

6. Object Equality and Identity

The System.Object type offers a virtual method named Equals, whose purpose is to return true if two objects contain the same value. The implementation of Object’s Equals method looks like this:

public class Object {

     public virtual Boolean Equals(Object obj) {

          // If both references point to the same object,

          // they must have the same value.

          if (this == obj) return true;

          // Assume that the objects do not have the same value.

          return false;

     }

}

At first, this seems like a reasonable default implementation of Equals: it returns true if the this and obj arguments refer to the same exact object. This seems reasonable because Equals knows that an object must have the same value as itself. However, if the arguments refer to different objects, Equals can’t be certain if the objects contain the same values, and therefore, false is returned. In other words, the default implementation of Object’s Equals method really implements identity, not value equality.

Here is how to properly implement an Equals method internally

1. If the obj argument is null, return false because the current object identified by this is obviously not null when the nonstatic Equals method is called.

2. If the this and obj arguments refer to the same object, return true. This step can improve performance when comparing objects with many fields.

3. If the this and obj arguments refer to objects of different types, return false. Obviously, checking if a String object is equal to a FileStream object should result in a false result.

4. For each instance field defined by the type, compare the value in the this object with the value in the obj object. If any fields are not equal, return false.

5. Call the base class’s Equals method so it can compare any fields defined by it. If the base class’s Equals method returns false, return false; otherwise, return true.

So Microsoft should have implemented Object’s Equals like this:

e.g

public class Object {

     public virtual Boolean Equals(Object obj) {

          // The given object to compare to can't be null

          if (obj == null) return false;

          // If objects are different types, they can't be equal.

          if (this.GetType() != obj.GetType()) return false;

          // If objects are same type, return true if all of their fields match

          // Since System.Object defines no fields, the fields match

          return true;

     }

}

But, since Microsoft didn’t implement Equals this way, the rules for how to implement Equals are significantly more complicated than you would think. When a type overrides Equals, the override should call its base class’s implementation of Equals unless it would be calling Object’s implementation. This also means that since a type can override Object’s Equals method, this Equals method can no longer be called to test for identity. To fix this, Object offers a static ReferenceEquals method, which is implemented like this:

public class Object {

     public static Boolean ReferenceEquals(Object objA, Object objB) {

          return (objA == objB);

     }

}

You should always call ReferenceEquals if you want to check for identity (if two references point to the same object). You shouldn’t use the C# == operator (unless you cast both operands to Object first) because one of the operands’ types could overload the == operator, giving it semantics other than identity.

As you can see, the .NET Framework has a very confusing story when it comes to object equality and identity. By the way, System.ValueType (the base class of all value types) does override Object’s Equals method and is correctly implemented to perform a value equality check (not an identity check). Internally, ValueType’s Equals is implemented this way:

1. If the obj argument is null, return false.

2. If the this and obj arguments refer to objects of different types, return false.

3. For each instance field defined by the type, compare the value in the this object withthe value in the obj object by calling the field’s Equals method. If any fields are notequal, return false.

4. Return true. Object’s Equals method is not called by ValueType’s Equals method.

Internally, ValueType’s Equals method uses reflection in step #3.

The four properties of equality

.. Equals must be reflexive; that is, x.Equals(x) must return true.

.. Equals must be symmetric; that is, x.Equals(y) must return the same value as y.Equals(x).

.. Equals must be transitive; that is, if x.Equals(y) returns true and y.Equals(z) returns true, then x.Equals(z) must also return true.

.. Equals must be consistent. Provided that there are no changes in the two values being compared, Equals should consistently return true or false.

When overriding the Equals method, there are a few more things that you’ll probably want to do:

.. Have the type implement the System.IEquatable<T> interface’s Equals method

This generic interface allows you to define a type-safe Equals method. Usually, you’ll implement the Equals method that takes an Object parameter to internally call the type-safe Equals method

.. Overload the == and !=operator methods

Usually, you’ll implement these operator methods to internally call the type-safe Equals method.

7. Object Hash Codes

The designers of the FCL decided that it would be incredibly useful if any instance of any object could be placed into a hash table collection. To this end, System.Object provides a virtual GetHashCode method so that an Int32 hash code can be obtained for any and all objects.

If you define a type and override the Equals method, you should also override the GetHashCode method. In fact, Microsoft’s C# compiler emits a warning if you define a type that overrides Equals without also overriding GetHashCode.

The reason why a type that defines Equals must also define GetHashCode is that the implementation of the System.Collections.Hashtable type, the System.Collections. Generic.Dictionary type, and some other collections require that any two objects that are equal must have the same hash code value. So if you override Equals, you should override GetHashCode to ensure that the algorithm you use for calculating equality corresponds to the algorithm you use for calculating the object’s hash code.

Defining a GetHashCode method can be easy and straightforward. But depending on your data types and the distribution of data, it can be tricky to come up with a hashing algorithm that returns a well-distributed range of values. Here’s a simple example that will probably work just fine for Point objects:

internal sealed class Point {

     private readonly Int32 m_x, m_y;

     public override Int32 GetHashCode() {

          return m_x ^ m_y; // m_x XOR'd with m_y

     }

     ...

}

When selecting an algorithm for calculating hash codes for instances of your type, try to follow these guidelines:

.. Use an algorithm that gives a good random distribution for the best performance of the hash table.

.. Your algorithm can also call the base type’s GetHashCode method, including its return value. However, you don’t generally want to call Object’s or ValueType’s GetHashCode method, because the implementation in either method doesn’t lend itself to highperformance hashing algorithms.

.. Your algorithm should use at least one instance field.

.. Ideally, the fields you use in your algorithm should be immutable; that is, the fields should be initialized when the object is constructed, and they should never again change during the object’s lifetime.

.. Your algorithm should execute as quickly as possible.

.. Objects with the same value should return the same code. For example, two String objects with the same text should return the same hash code value.

System.Object’s implementation of the GetHashCode method doesn’t know anything about its derived type and any fields that are in the type. For this reason, Object’s GetHashCode method returns a number that is guaranteed to uniquely identify the object within the AppDomain; this number is guaranteed not to change for the lifetime of the object. After the object is garbage collected, however, its unique number can be reused as the hash code for a new object.

Note If a type overrides Object’s GetHashCode method, you can no longer call it to get a unique ID for the object. If you want to get a unique ID (within an AppDomain) for an object, the FCL provides a method that you can call. In the System.Runtime.CompilerServices namespace, see the RuntimeHelpers class’s public, static GetHashCode method that takes a reference to an Object as an argument. RuntimeHelpers’ GetHashCode method returns a unique ID for an object even if the object’s type overrides Object’s GetHashCode method. This method got its name because of its heritage, but it would have been better if Microsoft had named it something like GetUniqueObjectID.

System.ValueType’s implementation of GetHashCode uses reflection (which is slow) and XORs some of the type’s instance fields together. This is a naïve implementation that might be good for some value types, but I still recommend that you implement GetHashCode yourself because you’ll know exactly what it does, and your implementation will be faster than ValueType’s implementation.

8. Dynamic Primitive Type

Important Do not confuse dynamic and var. Declaring a local variable using var is just a syntactical shortcut that has the compiler infer the specific data type from an expression. The var keyword can be used only for declaring local variables inside a method while the dynamic keyword can be used for local variables, fields, and arguments. You cannot cast an expression to var but you can cast an expression to dynamic. You must explicitly initialize a variable declared using var while you do not have to initialize a variable declared with dynamic.

Important A dynamic expression is really the same type as System.Object. The compiler assumes that whatever operation you attempt on the expression is legal, so the compiler will not generate any warnings or errors. However, exceptions will be thrown at runtime if you attempt to execute an invalid operation. In addition, Visual Studio cannot offer any IntelliSense support to help you write code against a dynamic expression. You cannot define an extension method that extends dynamic, although you can define one that extends Object. And, you cannot pass a lambda expression or anonymous method as an argument to a dynamic method call since the compiler cannot infer the types being used.