Wednesday, June 17, 2009

High Performance Method to Copy C# KeyedCollection<,>, List<>, and Dictionary<,>

What do you do if you have a collection with 1,000,000 items in it and you want to make a quick copy?  C# says you should loop through the source collection and call .add() on the destination collection, but what if that is just waaaaaaaaaaaaay toooooooooooo sloooooooow?

You could write some unsafe code to get the job done.  What I did here was make some structures that contained all the instance variables of the aforementioned classes.  I then overlaid these structures on top of their classes so I can gain access to the private member variables.  Once inside a List<>, I just use Array.Copy to copy the internal array that makes up the List<>.  There are a couple more variables that have to be copied like the number of elements currently in use in the array, etc.  I do the same sort of thing with Dictionary<,> and KeyedCollection.  KeyedCollection actually calls the CopyList and CopyDict functions.

WARING!!!!!! If you don’t like unsafe code and hacking the C# host, then turn back now.

WARNING!!!!! This is for 32 bit only.  For 64 bit, we have to switch from int to Int64 when dealing with addresses.  I used a wacky way of getting around the C# casting complaints.  I used an integer pointer that I pointed to the stack.  By using array indexing [] on this pointer, I gained unchecked read/write access to the stack, which is where all the local variables live.   I can even change where managed objects point without C# complaining!!!  I don’t recommend changing them to point to just anything…  Hilarity will ensue.  Some reader out there might know a better way to do this.

WARNING!!!! This is for Windows Only.  Don’t try this on Mono.  Well you can, but expect to change the code a bit.

WARNING!!!! I tested this is for Net 3.5.  The layout of the instance variables of the classes might be different in 2.0 and 3.0.  Also, at any time, Redmond can release a patch and change the layout of these classes.  Hopefully they won’t, but don’t be surprised if they do. 

Warnings aside, if you have a performance problem related to copying collections, this will solve them.

using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;
using System.Collections.ObjectModel;
using System.Reflection;

namespace CopyCollection {

  // these are just some test classes that get copied in 
  // the main()
  class CFoo {
    public int Key;
    public string Name;
  }
  class MyKeyedCollection : KeyedCollection<int, CFoo> {
    public MyKeyedCollection() : base(null, 10) { }
    protected override int GetKeyForItem(CFoo foo) {
      return foo.Key;
    }
  }
  class MyObject {
    public MyKeyedCollection kc;
    // Copy constructor
    public MyObject(MyObject that) {
      this.kc = new MyKeyedCollection();
      if (that != null) {
        CollectionTools.CopyKeyedCollection<int, CFoo>(
         that.kc, this.kc);
      }
    }
  }
  class Program {

    static void Main(string[] args) {

      // here I create an object, containing a 
      // keyedcollection with 7 items
      MyObject mobj1 = new MyObject(null);
      for (int i = 0; i < 7; ++i)
        mobj1.kc.Add(
          new CFoo() { 
            Key = i, Name = i.ToString() 
          }
        );
      // quick copy it
      MyObject mobj2 = new MyObject(mobj1);
      // add a lot more items
      for (int i = 8; i < 712324; ++i)
        mobj2.kc.Add(
          new CFoo() { 
            Key = i, Name = i.ToString() 
          }
        );
      // quick copy this one.
      MyObject mobj3 = new MyObject(mobj2);
      // by looking in debugger, you can see that the first 
      // list has 7 items, the second has 712331 and so does 
      // the 3rd the first 7 items of all objects are the 
      // same the last 712324 items of the last 2 objects 
      // are the same!  hey this works!
    }
  }

  /* ---------------------------------------------------------
   * CollectionTools contains three functions that can be
   * used to copy a list, a dictionary, or a keyedcollection
   * ------------------------------------------------------ */
  public static class CollectionTools {

    public unsafe static KeyedCollection<TKey, TValue> 
     CopyKeyedCollection<TKey, TValue>(
     KeyedCollection<TKey, TValue> src, 
     KeyedCollection<TKey, TValue> dst) {

      object osrc = src;
      // TKeyedCollection* is pointer to a structure 
      // that is a template for the 
      // instance variables of A KeyedCollection<TKey, TValue>
      // What is going on here is that src is copied to osrc, 
      // which is on the stack exactly 4 bytes after psrc. 
      // hence the (int*)&psrc + 1.  osrc is a pointer to
      // the object on the heap, so we copy *osrc to &psrc
      // which causes our structure to overlay the class
      // if you change the order of these variables on the 
      // stack, then this trick wont work.
      TKeyedCollection* psrc = 
       (TKeyedCollection*)(*((int*)&psrc + 1));  
      object odst = dst;
      TKeyedCollection* pdst = 
       (TKeyedCollection*)(*((int*)&pdst + 1));
      object srcObj = null;  // i use i[2] to change this
      object dstObj = null;  // i use i[1] to change this
      int* i = (int*)&i;  // helps me find the stack

      // copy the internal list, by calling CopyList
      i[2] = (int)psrc->_01_items;  
      dstObj = CopyList<TValue>(srcObj as List<TValue>);
      pdst->_01_items = (uint)i[1];

      // copy the internal dictionary.  There is only a
      // dictionary once an internal count threshold has
      // been reached.  Sometimes there is no dictionary
      if (psrc->_04_dict != 0) {
        // copy the internal dictionary by calling CopyDict
        i[2] = (int)psrc->_04_dict;
        dstObj = CopyDict<TKey, TValue>(
         srcObj as Dictionary<TKey, TValue>);
        pdst->_04_dict = (uint)i[1];
      }

      // copy the rest of the values 
      pdst->_03_comparer = psrc->_03_comparer;
      pdst->_05_keyCount = psrc->_05_keyCount;
      pdst->_06_threshold = psrc->_06_threshold;
      return dst;
    }

    public unsafe static List<TValue> CopyList<TValue>(
     List<TValue> src) {

      // see previous comments for notes about these local
      // variables
      object osrc = src;
      TList* psrc = (TList*)(*((int*)&psrc + 1));  
      object srcArray = null;
      object dstArray = null;
      int* i = (int*)&i;  

      // a list has an array of items
      
      // I need c# to consider _01_items as an object,
      // so I use my handy dandy stack munger, i, to
      // set the srcArray and dstArray
      i[2] = (int)psrc->_01_items;  
      // its easy to create an empty array of the right
      // size using one of the List<> constructors
      int capacity = (srcArray as Array).Length;
      List<TValue> dst = new List<TValue>(capacity);
      // overlay a hack structure on the list we just
      // created, so we can copy stuff to it
      TList* pdst = (TList*)(*((int*)&pdst + 1));
      // now setup dstArray
      i[1] = (int)pdst->_01_items; 
      // now that srcArray and dstArray are set, we can  
      // let C# do the copying for us.  This saves us 
      // having to determine the size of each element
      Array.Copy(srcArray as Array, dstArray as Array, 
       capacity);

      // set the size, which is the # of used elements 
      // of the list
      pdst->_03_size = psrc->_03_size;

      return dst;
    }

    public unsafe static Dictionary<TKey, TValue> 
     CopyDict<TKey, TValue>(
     Dictionary<TKey, TValue> src) {

      // see previous comments for notes about these local
      // variables
      object osrc = src;
      TDictionary* psrc = 
       (TDictionary*)(*((int*)&psrc + 1)); 
      object srcArray = null;
      object dstArray = null;
      int* i = (int*)&i;  // helps me find the stack

      // a dictionary has a hash table called buckets

      // I need c# to consider _01_buckets as an object,
      // so I use my handy dandy stack munger, i, to
      // set the srcArray and dstArray
      i[2] = (int)psrc->_01_buckets;
      // create a dictionary the same size as the src 
      // dictionary
      int capacity = (srcArray as Array).Length;
      Dictionary<TKey, TValue> dst =
       new Dictionary<TKey, TValue>(capacity);
      // overlay the hack structure
      TDictionary* pdst = 
       (TDictionary*)(*((int*)&pdst + 1));
      // now setup dstArray
      i[1] = (int)pdst->_01_buckets;
      // now that srcArray and dstArray are set, we can  
      // let C# do the copying for us.  This saves us 
      // having to determine the size of each element
      Array.Copy(srcArray as Array, dstArray as Array,
       capacity);

      // a dictionary also has keeps the data stored 
      // in an array
      i[2] = (int)psrc->_02_entries;
      i[1] = (int)pdst->_02_entries;
      Array.Copy(srcArray as Array, dstArray as Array, 
       capacity);

      // copy the remaining simple variables
      pdst->_03_comparer = psrc->_03_comparer;
      pdst->_04_m_siInfo = psrc->_04_m_siInfo;
      pdst->_08_count = psrc->_08_count;
      pdst->_10_freeList = psrc->_10_freeList;
      pdst->_11_freeCount = psrc->_11_freeCount;

      return dst;
    }
    /* -------------------------------------------------------
     * These are the structs that map onto the classes which
     * I use to access the private member variables
     * Items marked with * I determined needed to be copied
     * -----------------------------------------------------*/
 
    struct TKeyedCollection {
      public uint _00_MethodInfo;   //
      // Collection
      public uint _01_items;      // * IList<T>
      public uint _02_syncRoot;   //   object
      // KeyedCollection
      public uint _03_comparer;   //   IEqualityComparer<TKey> 
      public uint _04_dict;       // * Dictionary<TKey, TItem> 
      public int _05_keyCount;    // *
      public int _06_threshold;   // *
    }

    struct TList {
      public uint _00_MethodInfo; //
      public uint _01_items;      // * T[] 
      public uint _02_syncRoot;   //   object
      public int _03_size;        // *
      public int _04_version;     //
    }

    struct TDictionary {
      public uint _00_MethodInfo; //
      public uint _01_buckets;    // * int[] 
      public uint _02_entries;    // * Entry<TKey, TValue>[] 
      public uint _03_comparer;   //   IEqualityComparer<TKey> 
      public uint _04_m_siInfo;   //   SerializationInfo
      public uint _05__syncRoot;  //   object 
      public uint _06_keys;       //   KeyCollection<,> 
      public uint _07_values;     //   ValueCollection<,> 
      public int _08_count;       // *
      public int _09_version;
      public int _10_freeList;    // * 
      public int _11_freeCount;   // *
    }

  }


}

This construct, “int* i = (int*)&i”, sets the pointer i (nice name for a pointer…) to point to itself.  “i” is located on the stack, so now I have a pointer to a spot on the stack.  Now if we use “i” as an array, we can access variables on the stack in ways that keep C# from complaining about casting.  You don’t get a “cannot cast a managed object to a void *” (or vice versa) error when you set an object this way: “i[2] = (int)psrc->_01_items”  “i[2]” means the int located 2 ints above “i” in the stack.  I assume everyone knows ints are 4 bytes.  I chose ints because almost everything I am messing with is a managed object, which is really a pointer, which is 4 bytes long.  Now if you look at the variables declared on the stack in front of “i”, just count up 2 ints.  This is the variable that is being set by “i[2]”.  Once I get better at this blogging thing I will draw a picture.

      object srcObj = null;  // i use i[2] to change this
      object dstObj = null;  // i use i[1] to change this
      int* i = (int*)&i;  // helps me find the stack

The same sort of thing is going on here

object osrc = src;
TDictionary* psrc = 
 (TDictionary*)(*((int*)&psrc + 1)); 

&psrc is on the stack, 4 bytes before it, “(int*)&psrc + 1”, is osrc.  The value of osrc, “*((int*)&psrc + 1)”, is the address of src in the heap.  Cast that as a TDictionary* and you have just overlaid a structure on a class.

The way I figured out what to copy was by using the RedGate .Net Relfector.  I just looked at the code for all the aforementioned collections to figure out what the instance variables were and then what was important and needed to be copied.

Unfortunately, the order of the instance variables in RedGate’s Reflector and in Type.GetFields() is not the order of the instance variables in memory.  To figure this out, I had to look at the objects in the memory dump of visual studio.  Techniques to figure out what random bytes in the heap actually are is a good topic for a future article.

Using the WIN32 API Functions CreateFile, ReadFile, WriteFile, SetFilePointer from C#

What if for some reason you need to use the WIN32 API to read and write files? I know FileStream can do a lot of what the WIN32 API can do, but this is a "what if".... a hypothetical situation, ok! (don't think for one second I did this because I thought I couldn't seek to a spot in a file and write some bytes...ok...not for one second did I not know about FileStream.Seek)

WinApiFile is my class that wraps the basic functionality of the 4 WIN32 API functions. Here is the class and a sample console app that uses the class to 1) read a file and close it and then 2) re-open the file and write a couple bytes in the middle of it. Not included is a little text file that needs to be located where the compiled executable is created. I just add a text file to the project and tell visual studio to copy it to the destination directory every time I run the app.

I got a little bit of the code, although its vastly changed now, from the MSDN website.

using System;
using Microsoft.Win32.SafeHandles;
using System.Runtime.InteropServices;
using System.ComponentModel;
using System.Windows.Forms;
using Path = System.IO.Path;

class Program {

static void Main() {

  string sTestFile = Path.Combine(Path.GetDirectoryName(
   Application.ExecutablePath), "Test.txt");

  // test reading a file
  WinApiFile File = new WinApiFile(sTestFile,
   WinApiFile.DesiredAccess.GENERIC_READ);
  byte[] aBuffer = new byte[1000];
  uint cbRead = File.Read(aBuffer, 1000);
  File.Close();

  // Test writing the file
  File.Open(WinApiFile.DesiredAccess.GENERIC_WRITE);
  // write the time to a 10 byte offset in the file
  // conver the date time to byte array
  int i = 0;
  foreach (var ch in DateTime.Now.ToString())
    aBuffer[i++] = (byte)ch;
  // now write it out
  File.MoveFilePointer(10);  // 10 byte offset
  File.Write(aBuffer, (uint)i);  // write the time
  File.Dispose();
}
}

public class WinApiFile : IDisposable {

/* ---------------------------------------------------------
 * private members
 * ------------------------------------------------------ */
private SafeFileHandle _hFile = null;
private string _sFileName = "";
private bool _fDisposed;

/* ---------------------------------------------------------
 * properties
 * ------------------------------------------------------ */
public bool IsOpen { get { return (_hFile != null); } }
public SafeFileHandle Handle { get { return _hFile; } }
public string FileName {
  get { return _sFileName; }
  set {
    _sFileName = (value ?? "").Trim();
    if (_sFileName.Length == 0)
      CloseHandle(_hFile);
  }
}
public int FileLength {
  get {
    return (_hFile != null) ? (int)GetFileSize(_hFile,
     IntPtr.Zero) : 0;
  }
  set {
    if (_hFile == null)
      return;
    MoveFilePointer(value, MoveMethod.FILE_BEGIN);
    if (!SetEndOfFile(_hFile))
      ThrowLastWin32Err();
  }
}

/* ---------------------------------------------------------
 * Constructors
 * ------------------------------------------------------ */

public WinApiFile(string sFileName,
 DesiredAccess fDesiredAccess) {
  FileName = sFileName;
  Open(fDesiredAccess);
}
public WinApiFile(string sFileName,
 DesiredAccess fDesiredAccess,
 CreationDisposition fCreationDisposition) {
  FileName = sFileName;
  Open(fDesiredAccess, fCreationDisposition);
}

/* ---------------------------------------------------------
 * Open/Close
 * ------------------------------------------------------ */

public void Open(
 DesiredAccess fDesiredAccess) {
  Open(fDesiredAccess, CreationDisposition.OPEN_EXISTING);
}

public void Open(
 DesiredAccess fDesiredAccess,
 CreationDisposition fCreationDisposition) {
  ShareMode fShareMode;
  if (fDesiredAccess == DesiredAccess.GENERIC_READ) {
    fShareMode = ShareMode.FILE_SHARE_READ;
  } else {
    fShareMode = ShareMode.FILE_SHARE_NONE;
  }
  Open(fDesiredAccess, fShareMode, fCreationDisposition, 0);
}

public void Open(
 DesiredAccess fDesiredAccess,
 ShareMode fShareMode,
 CreationDisposition fCreationDisposition,
 FlagsAndAttributes fFlagsAndAttributes) {

  if (_sFileName.Length == 0)
    throw new ArgumentNullException("FileName");
  _hFile = CreateFile(_sFileName, fDesiredAccess, fShareMode,
   IntPtr.Zero, fCreationDisposition, fFlagsAndAttributes,
   IntPtr.Zero);
  if (_hFile.IsInvalid) {
    _hFile = null;
    ThrowLastWin32Err();
  }
  _fDisposed = false;

}

public void Close() {
  if (_hFile == null)
    return;
  _hFile.Close();
  _hFile = null;
  _fDisposed = true;
}

/* ---------------------------------------------------------
 * Move file pointer
 * ------------------------------------------------------ */

public void MoveFilePointer(int cbToMove) {
  MoveFilePointer(cbToMove, MoveMethod.FILE_CURRENT);
}

public void MoveFilePointer(int cbToMove,
 MoveMethod fMoveMethod) {
  if (_hFile != null)
    if (SetFilePointer(_hFile, cbToMove, IntPtr.Zero,
     fMoveMethod) == INVALID_SET_FILE_POINTER)
      ThrowLastWin32Err();
}

public int FilePointer {
  get {
    return (_hFile != null) ? (int)SetFilePointer(_hFile, 0,
     IntPtr.Zero, MoveMethod.FILE_CURRENT) : 0;
  }
  set {
    MoveFilePointer(value);
  }
}

/* ---------------------------------------------------------
 * Read and Write
 * ------------------------------------------------------ */

public uint Read(byte[] buffer, uint cbToRead) {
  // returns bytes read
  uint cbThatWereRead = 0;
  if (!ReadFile(_hFile, buffer, cbToRead,
   ref cbThatWereRead, IntPtr.Zero))
    ThrowLastWin32Err();
  return cbThatWereRead;
}

public uint Write(byte[] buffer, uint cbToWrite) {
  // returns bytes read
  uint cbThatWereWritten = 0;
  if (!WriteFile(_hFile, buffer, cbToWrite,
   ref cbThatWereWritten, IntPtr.Zero))
    ThrowLastWin32Err();
  return cbThatWereWritten;
}

/* ---------------------------------------------------------
 * IDisposable Interface
 * ------------------------------------------------------ */
public void Dispose() {
  Dispose(true);
  GC.SuppressFinalize(this);
}

protected virtual void Dispose(bool fDisposing) {
  if (!_fDisposed) {
    if (fDisposing) {
      if (_hFile != null)
        _hFile.Dispose();
      _fDisposed = true;
    }
  }
}

~WinApiFile() {
  Dispose(false);
}

/* ---------------------------------------------------------
 * WINAPI STUFF
 * ------------------------------------------------------ */

private void ThrowLastWin32Err() {
  Marshal.ThrowExceptionForHR(
   Marshal.GetHRForLastWin32Error());
}

[Flags]
public enum DesiredAccess : uint {
  GENERIC_READ = 0x80000000,
  GENERIC_WRITE = 0x40000000
}
[Flags]
public enum ShareMode : uint {
  FILE_SHARE_NONE = 0x0,
  FILE_SHARE_READ = 0x1,
  FILE_SHARE_WRITE = 0x2,
  FILE_SHARE_DELETE = 0x4,

}
public enum MoveMethod : uint {
  FILE_BEGIN = 0,
  FILE_CURRENT = 1,
  FILE_END = 2
}
public enum CreationDisposition : uint {
  CREATE_NEW = 1,
  CREATE_ALWAYS = 2,
  OPEN_EXISTING = 3,
  OPEN_ALWAYS = 4,
  TRUNCATE_EXSTING = 5
}
[Flags]
public enum FlagsAndAttributes : uint {
  FILE_ATTRIBUTES_ARCHIVE = 0x20,
  FILE_ATTRIBUTE_HIDDEN = 0x2,
  FILE_ATTRIBUTE_NORMAL = 0x80,
  FILE_ATTRIBUTE_OFFLINE = 0x1000,
  FILE_ATTRIBUTE_READONLY = 0x1,
  FILE_ATTRIBUTE_SYSTEM = 0x4,
  FILE_ATTRIBUTE_TEMPORARY = 0x100,
  FILE_FLAG_WRITE_THROUGH = 0x80000000,
  FILE_FLAG_OVERLAPPED = 0x40000000,
  FILE_FLAG_NO_BUFFERING = 0x20000000,
  FILE_FLAG_RANDOM_ACCESS = 0x10000000,
  FILE_FLAG_SEQUENTIAL_SCAN = 0x8000000,
  FILE_FLAG_DELETE_ON = 0x4000000,
  FILE_FLAG_POSIX_SEMANTICS = 0x1000000,
  FILE_FLAG_OPEN_REPARSE_POINT = 0x200000,
  FILE_FLAG_OPEN_NO_CALL = 0x100000
}

public const uint INVALID_HANDLE_VALUE = 0xFFFFFFFF;
public const uint INVALID_SET_FILE_POINTER = 0xFFFFFFFF;
// Use interop to call the CreateFile function.
// For more information about CreateFile,
// see the unmanaged MSDN reference library.
[DllImport("kernel32.dll", SetLastError = true)]
internal static extern SafeFileHandle CreateFile(
 string lpFileName,
 DesiredAccess dwDesiredAccess,
 ShareMode dwShareMode,
 IntPtr lpSecurityAttributes,
 CreationDisposition dwCreationDisposition,
 FlagsAndAttributes dwFlagsAndAttributes,
 IntPtr hTemplateFile);

[DllImport("kernel32", SetLastError = true)]
internal static extern Int32 CloseHandle(
 SafeFileHandle hObject);

[DllImport("kernel32", SetLastError = true)]
internal static extern bool ReadFile(
 SafeFileHandle hFile,
 Byte[] aBuffer,
 UInt32 cbToRead,
 ref UInt32 cbThatWereRead,
 IntPtr pOverlapped);

[DllImport("kernel32.dll", SetLastError = true)]
internal static extern bool WriteFile(
 SafeFileHandle hFile,
 Byte[] aBuffer,
 UInt32 cbToWrite,
 ref UInt32 cbThatWereWritten,
 IntPtr pOverlapped);

[DllImport("kernel32.dll", SetLastError = true)]
internal static extern UInt32 SetFilePointer(
 SafeFileHandle hFile,
 Int32 cbDistanceToMove,
 IntPtr pDistanceToMoveHigh,
 MoveMethod fMoveMethod);

[DllImport("kernel32.dll", SetLastError = true)]
internal static extern bool SetEndOfFile(
 SafeFileHandle hFile);

[DllImport("kernel32.dll", SetLastError = true)]
internal static extern UInt32 GetFileSize(
 SafeFileHandle hFile,
 IntPtr pFileSizeHigh);

}

I use IntPtr wherever I intend to just pass a null. When I used void *, I had to make the class unsafe.

This construct “enum SomeEnumName : uint“ tells the compiler to make the enum a unsigned integer.

The flags attribute, “[Flags]”, treats an enumeration like a bitfield. The debugger prints, in clear English, all the names of the bits that have been set.

SafeFileHandle is a wrapper class for a WinAPI file handle. It will automatically dispose of the non managed handle upon garbage collection.

Friday, June 12, 2009

Hacking a C# Covariant Generic Cast Solution

Click here to skip the story of how I figured out the solution.

So I get onto the phone for my first job interview and everything is going great until they hit me with this question.  Could the following pseudo code compile?

class A { } 
class B: A { }
... 
List<A> a = new List<B>(); 

I had no idea what that would do...I am a recovering VB developer who was a recovering C/C++ developer who was a recovering assembly developer, so details about C# generics are not my strong suit.  I knew it wouldn’t compile in C++, and I guessed that you could probably write code in List<> that run into problems, so I muttered something along the lines "uh I don't think it will compile… that doesn't look very good."  The interviewer told me it would not compile and was an example of the lack of support for covariance/contravariance in generics in C#.  I didn’t get the job. 

Covariance... contravariance... what the heck are they?  If you are like I was and don’t know what that word means, try here  (courtesy of Eric Lippert.)  Read all the installments. 

The gist is that this situation occurs enough that making it work would be handy.  Imagine if you had a grid that displayed a List<DataRow> and you had defined your own MyDataRow class that inherited from DataRow.  Wouldn’t you want to pass your List<MyDataRow> to the grid? 

Back to the interview question.  Since class B inherits from class A at first glance it looks like it should work.  A little research shows that it can work in some other languages.  A neat aside is that  C# supports covariant arrays : 

A[] a = new B[1000];  // this compiles 

Although, it is at the expense of runtime checking:

A[] aa = new B[1000];
aa[0] = new B();  // looking good...
aa[1] = new A();  // runtime ArrayTypeMismatchException

Unfortunately the developers of C# decided to omit the support for covariance and contravariance with generic parameters for now.  Runtime checking is good enough reason for me (I like performance!  Faster please!)  In all fairness, C# isn’t meant to be the highest performing language in the world.  So why not do a quick runtime check on assignments and when passing parameters?  Why not allow inherited classes to pass the check?  The answer is they had more reasons than performance.  Consider this:

class B : A {
  int _SomeVar;  // B is larger than A now...  
}

Now B is larger than A.  If List<T> had a function that 1) created a new T and 2) added the new T to its list, you would need a runtime exception to prevent memory corruption even though B is an inherited class

public static class Extension { // runtime exception or memory corruption… you pick! public static void AddNew<T>(this List<T> list)
where T:new() { T item = new T(); list.Add(item); } } class Program { static void Main(string[] args) { List<A> a = new List<B>(); // if this compiled.. a.AddNew(); // boom here! } }

What’s the use of covariance if it causes runtime exceptions all the time, even for inherited classes?  Needless to say covariant/contravariant generics didn’t make it into the language yet. 

But what if we know we have a situation where we need to make the cast and we know its safe.  All is not lost!  I won’t bore you with all the work I did to try to hack the exact layout of the internal MethodTable structure that is used to hold type information.  I got off track assuming that since there were definitions for covariant generic parameters that all I had to do was use a memory hack to turn on the the GenericParameterAttributes.Covariant flag in the T parameter for List<T>.  I almost got to the point where I could do that, except there is an internal handle based system for storing certain things and the attributes were stored in there.  I was unable to break into the handle storage…yet   But that is for another day. 

I ended up with this.  Note, unsafe code follows:

delegate List<A> CCastDelegate(List<B> b);

...

DynamicMethod dynamicMethod = new DynamicMethod(
"foo1",
typeof(List<A>),
new[] { typeof(List<B>) },
typeof(void));

ILGenerator il = dynamicMethod.GetILGenerator(); il.Emit(OpCodes.Ldarg_0); // copy first argument to stack il.Emit(OpCodes.Ret); // return the item on the stack CCastDelegate HopeThisWorks = (CCastDelegate)
  dynamicMethod.CreateDelegate(typeof(CCastDelegate));

...

List<A> = HopeThisWorks(new List<B>());

Basically, I created a very simple function using IL that takes a single parameter and returns it.  The parameter is define as a List<B> and the return is defined as a List<A>.  This simply bypasses the type safe checking that C# works so hard to maintain.

My full test application code is here:

using System; using System.Collections.Generic; using System.Text; using System.Reflection.Emit; namespace Covariant { class A { public virtual string Name() { return "A"; } } class B : A { public override string Name() { return "B"; } } delegate List<A> CCastDelegate(List<B> b); class Program { static unsafe List<A> CastBasAIL(List<B> bIn) { // This creates a simple IL function that takes a
//
parameter and returns it. // Since it takes it as one type and returns it as
// another, it bypasses the type checking
DynamicMethod dynamicMethod = new DynamicMethod(
"foo1",
typeof(List<A>),
new[] { typeof(List<B>) },
typeof(void)); ILGenerator il = dynamicMethod.GetILGenerator(); il.Emit(OpCodes.Ldarg_0); // copy first arg to stack il.Emit(OpCodes.Ret); // return item on the stack CCastDelegate HopeThisWorks = (CCastDelegate)
dynamicMethod.CreateDelegate(typeof(CCastDelegate)); return HopeThisWorks(bIn); } static void Main(string[] args) { // make a list<B> List<B> b = new List<B>(); b.Add(new B()); b.Add(new B()); // set list<A> = the list b using the work around List<A> a = CastBasAIL(b); // at this point the debugger is miffed with a, but
// code executing methods of a work just fine. // It may be that the debugger simply checks that type
// of the generic argument matches the // signature of the type, or it may be that something
// is really screwed up. Nothing ever crashes. // prove the cast really worked
TestA(a); // add some more elements to B b.Add(new B()); // element added to B shows up in A like we expected TestA(a); return; } static void TestA(List<A> a) { Console.WriteLine("Input type: {0}",
typeof(List<A>).ToString()); Console.WriteLine("Passed in type: {0}\n",
a.GetType().ToString()); // Prove that A is B Console.WriteLine("Count = {0}", a.Count); Console.WriteLine("Item.Name = {0}", a[0].Name()); // see if more complicated methods of List<A> still work int i = a.FindIndex(
delegate(A item) {
return item.Name() == "A";
}
); Console.WriteLine(
"Index of 1st A in List<A> = {0}", i); i = a.FindIndex(
delegate(A item) {
return item.Name() == "B"; }
); Console.WriteLine(
"Index of 1st B in List<A> = {0}\n", i); // can we convert a to an array still? Console.WriteLine(
"Iterate through a, after converting a to an array"); foreach (var x in a.ToArray()) Console.WriteLine("{0}", x.Name()); } } }

Too bad I figured this all out after the interview! 

Bear in mind, I developed and checked this on framework 3.5, under 32 bit windows XP.  I think it will work on the other OS’s and address sizes, but I leave that exercise up to the reader.  Since this is an unsafe cast, the compiler is not protecting you from yourself.  You will get runtime errors and or crashes if you are not careful.

Friday, June 05, 2009

Blogging for food

I decided to change my career during an economic downturn. Thus, I am finding myself with a lot of free time on my hands. I haven't looked for a job since college! I didn't realize how much work this would be. I have found a bug in email.... All email clients and servers. Every one of em! Whenever you attach a resume to an email, the chances of the email getting delivered drop to near zero. At least I tell myself that this is why nearly all of resumes I have sent out receive no response!