Sunday, November 21, 2010

Building a Reflective Object System in C++

Every game engine I've worked with for the last several years has had some sort of reflection system.  Reflection, and its cousin introspection, are features supported by some languages that allow runtime inspection of object structure and types. My most recent game, Replica Island, as a Java application makes use of Java's Class object, as well as useful type-related tests like instanceof.  Many other languages support some sort of reflection, especially those that are modern and VM-based (like C#).  Reflective objects can provide the answers to all sorts of interesting questions at runtime, like "is this anonymous pointer derived from class X?" or "does this object have a field named Y?"  It's also common to be able to allocate and construct objects by string in reflective systems.

These traits make it very easy to serialize and deserialize reflective objects: entire object trees can be written to disk and then read back again without the serialization code having to know anything about the objects it is writing.  Reflection also makes it very easy to write tools; if the tools and the runtime have access to the same type data, the tools can output data in the same format that the runtime will access it in memory (see my previous post about loading object hierarchies).  Reflection systems can also make for some pretty powerful tools: for example, it's easy to build a Property Editor dialog that can edit any object in the code base, and automatically adds support for new classes as they are written over the course of development.

But C++ doesn't support reflection.  There's RTTI, which gives a tiny little sample of the power that a real reflection system brings, but it's hardly worth the overhead on its own.  C++'s powerful template system is often used in places that reflection might be used in other languages (e.g. type independent operators), but that method is also restricted to a small subset of a fully reflective system.  If we want real reflection in C++, we'll need to build it ourselves.

Before we dive into the code, let me take a moment to describe the goals of my reflective object system.  Other reflection systems might choose a different approach based on different needs (compare, for example, Protocol Buffers to the method I'm about to describe).  For me, the goals are:
  1. Runtime type information of pointers.
  2. Ability to iterate over the fields in an object.
  3. Ability to query and set a field in an object.
  4. Ability to allocate and construct an object by string, or based on other meta data.
  5. No extra overhead for normal class use.
The Java Class object is a pretty good model to follow here: each class has a static object that describes the class in detail.  Using classes normally involves no extra overhead, but if we choose we can also load the Class descriptor and work with objects using reflection.  So my goal is to make something like the Java Class object in C++; my approach is to generate a class for each reflected class that holds metadata about the class it describes.  A "meta object."

Once I start generating meta objects for individual classes, servicing the first goal of anonymous pointer typing isn't very hard.  As long as we know that the pointer comes from some base type that implements a virtual accessor for returning the meta object, we can pull a static class definition out and compare it to other classes, or to a registry of meta objects (and, if we're tricky, we can even do this when we can't guarantee that the pointer is derived from some known base).  So let's assume we have a way to generate meta information for each class, wrap it up in a class called a MetaObject, and stuff it as a static field into the class it describes.

The next question is, what does this MetaObject class contain?  Well, per requirement #2, it must at least contain some information about fields.  In order to access fields within a class we'll need to know the offset of the field, its size (and, if it's an array, the size of each element), and probably the name of the field and the name of its type as strings.

Now might be a good time to think about what a C++ object actually looks like in memory.  Say we have the following object:
class Foo : public MetaBase
{
   public:
      static const MetaObject sMeta;  // The object describing Foo
      virtual const MetaObject* getMetaObject() { return &sMeta; }  // required by the abstract base
 
      void setup() { mBar = 10; mBaz = 267; }
   private:
      int mBar;
      int mBaz;
};

If we allocate an instance of Foo and then pop open our debugger to inspect the raw memory, it probably looks something like this (assuming MetaBase has no fields):

C0 A1 D0 F7     // Pointer to the virtual table; some random address
00 00 00 0A     // Value of mBar
00 00 01 0B     // Value of mBaz

I say "probably" because the actual format of a class in memory is basically undefined in C++; as long as it works the way the programmer expects it to, the compiler can do whatever it wants.  For example, the actual size of this object might very well be 16 bytes (rather than the twelve bytes shown above), with zero padding or junk in the last word; some architectures require such padding for alignment to byte boundaries in memory.  Or the vtable might be at the end of the object rather than the beginning (though, to be fair, all the compilers I've worked with have put it at the top).

Anyway, assuming this is what we see in the debugger, it's not hard to pull out the information we want for our meta object.  The value of mBar is at offset 4 (the first four bytes are the address of the virtual table), and the value of mBaz is at offset 8.  We know that sizeof(int) == 4.  And sMeta, because it is static, doesn't actually appear in the class instance at all--it's stored off somewhere else in the data segment.  If we had this information about every field in every class, we'd easily be able to access and modify fields in objects without knowing the type of the object, which satisfies most of the goals above.  And since this data is stored outside the object itself, there shouldn't be any overhead to standard class use.

Here's a abbreviated version of the object I use to describe individual fields in classes.  You can see the entire object here.

class MetaField : public Object
{
public:
  enum MetaType
  {
    TYPE_value,
    TYPE_pointer,
  };
  
  MetaField(const MetaType type, const char* pName, 
    const char* pTypeName, int offset, size_t fieldSize)
  : mType(type),
    mpName(pName),
    mpTypeName(pTypeName),
    mOffset(offset),
    mFieldSize(fieldSize),
  {};
  
  const char* getName() const;
  const char* getTypeName() const;
  const int getOffset() const;
  const size_t getFieldSize() const;
  const MetaType getStorageType() const;
  
  virtual void* get(const MetaBase* pObject) const;
  virtual void set(MetaBase* pObject, const void* pData) const;
  
private:
  const MetaType mType;
  const char* mpName;
  const char* mpTypeName;
  const int mOffset;
  const size_t mFieldSize; 
};


A static array of MetaFields is defined for each class and wrapped up in a container MetaObject, which also provides factory methods and some other utility functions.  You can see that object here.  These two objects, MetaField and MetaObject, make up the core of my C++ reflection implementation.

So we know what information that we need, and we have a class structure to describe it.  The hard part is finding a way to generate this information automatically in a compiler-independent way.  We could fill out MetaObjects by hand for every class, but that's error prone.  It might be possible to pull this information out of the debug symbols generated for a debug build, but symbol formats change across compilers and we don't want to compile a debug build for every release build.  We could probably contort C++ macro expansion to generate meta data, but in the interests of sanity let's not do that.  We could write a preprocessor that walks our header files and generates the necessary meta data, but that's actually a sort of annoying problem because of the idiosyncrasies of C++ syntax.

The solution I chose is to use a separate format, an interface definition language, to generate metadata-laden C++ header files using a preprocessor tool.  The idea is that you write your class headers in the IDL, which converts them to C++ and generates the necessary metadata objects in the output as it goes.  I leverage compiler intrinsics like sizeof() and offsetof() to let the compiler provide the appropriate field information (meaning I don't care where the vtable is stored, or what padding might be inserted).  My IDL looks like this:

metaclass PhysicsComponent
{
  base GameComponent

  function void update(const float timeDelta, GameObject* pParentObject) { public }
  function virtual bool runsInPhase(const GameObjectSystem::GameObjectUpdatePhase phase) { public }
  
  field mMass { type float, value 1.0f, private }
  field mStaticFrictionCoeffecient { type float, value 0.5f, private }
  field mDynamicFrictionCoeffecient { type float, value 0.1f, private }
  // mBounciness = coeffecient of restitution. 1.0 = super bouncy, 0.0 = no bounce.
  field mBounciness { type float, value 0.0f, private }
  field mInertia { type float, value 0.1f, private }
}

.. and the output of the preprocessor tool looks like this:

class PhysicsComponent : public GameComponent
{
 public:
  void update(const float timeDelta, GameObject* pParentObject);
  virtual bool runsInPhase(const GameObjectSystem::GameObjectUpdatePhase phase);
 private:
  float mMass;
  float mStaticFrictionCoeffecient;
  float mDynamicFrictionCoeffecient;
  // mBounciness = coeffecient of restitution. 1.0 = super bouncy, 0.0 = no bounce.
  float mBounciness;
  float mInertia;
 public:
  // AUTO-GENERATED CODE
  static void initialize(PhysicsComponent* pObject);
  static PhysicsComponent* factory(void* pAddress = 0);
  static void* factoryRaw(void* pAddress, bool initializeObject);
  static PhysicsComponent* arrayFactory(int elementCount);
  static const MetaObject* getClassMetaObject();
  virtual const MetaObject* getMetaObject() const;
  static bool registerMetaData();
  static PhysicsComponent* dynamicCast(MetaBase* pObject);
};

You can see that the IDL pretty much just spits C++ out exactly as it was written, but in the process it also records enough information to generate the functions at the bottom of the class.  The most interesting of those is getClassMetaObject(), which is a static method that defines the meta data itself:

inline const MetaObject* PhysicsComponent::getClassMetaObject()
{
  static MetaField field_mMass(MetaField::TYPE_value, "mMass", "float",
    offsetof(PhysicsComponent, mMass), sizeof(float));
  
  static MetaField field_mStaticFrictionCoeffecient(MetaField::TYPE_value, 
    "mStaticFrictionCoeffecient", "float", 
    offsetof(PhysicsComponent, mStaticFrictionCoeffecient), sizeof(float));
  
  static MetaField field_mDynamicFrictionCoeffecient(MetaField::TYPE_value, 
    "mDynamicFrictionCoeffecient", "float", 
    offsetof(PhysicsComponent, mDynamicFrictionCoeffecient), sizeof(float));
  
  static MetaField field_mBounciness(MetaField::TYPE_value, "mBounciness", "float",
    offsetof(PhysicsComponent, mBounciness), sizeof(float));
  
  static MetaField field_mInertia(MetaField::TYPE_value, "mInertia", "float",
    offsetof(PhysicsComponent, mInertia), sizeof(float));
  
  static const MetaField* fields[] =
  {
    &field_mMass,
    &field_mStaticFrictionCoeffecient,
    &field_mDynamicFrictionCoeffecient,
    &field_mBounciness,
    &field_mInertia,
  };
  
  static MetaObject meta(
    "PhysicsComponent", 
    MetaObject::generateTypeIDFromString("PhysicsComponent"),
    MetaObject::generateTypeIDFromString("GameComponent"), 
    sizeof(PhysicsComponent),
    static_cast(sizeof(fields) / sizeof(MetaField*)), 
    fields, 
    GameComponent::getClassMetaObject(), 
    &PhysicsComponent::factoryRaw);
  
  return &meta;
}

*note that, in more recent versions, I've replaced offsetof() with a macro that does the right thing for compilers that do not support that intrinsic.  Offsetof() isn't really kosher in C++, but for my purposes it works fine.  If you want to learn all about it, and why it's rough for "non-POD types," try Stack Overflow.

With this data, I now have a pretty complete reflection system in C++.  I can iterate over fields in a class, look them up by string, get and set their values given an anonymous pointer.  I can compare object types without knowing the type itself (I can implement my own dynamic_cast by walking up the MetaObject parent hierarchy and comparing MetaObject pointers until I find a match or reach the root).  It's very easy to construct objects from a file, or serialize a whole object tree.  I can, for example, make a memory manager that can output an entire dump of the heap, annotated with type and field information for every block.  And I can compile all of my objects into a DLL and load them into a tool and have full type information outside of the game engine environment.  Sweet!

There are, however, many caveats.  This implementation doesn't even attempt to support templates, multiple inheritance, or enums.  If we serialize and deserialize using only this data, some standard programming practices start to get screwed: what happens when we can construct objects without invoking the constructor?  How do we deal with invasive smart pointers or other data that weakly links to objects outside of the immediate pointer tree?  How do we mix objects that have this meta data and objects that do not?  How do we deal with complex types like std::vector?  If object structure is compiler dependent, how can we serialize class contents in a way that is safe across architectures?  These are all solvable problems, but the solutions are all pretty complicated.  They often involve dusty corners of the C++ language that I rarely visit, like placement new.  If you get into this stuff, Stanley Lippman is your new best friend.

But even with those caveats in mind, the power of reflection is absolutely worth the price of admission.  It's the first chunk of code I write or port whenever I start a new project in C++.  It was the first bits of my old game engine that I got running on Android, and is now the core of the engine I am building on that platform.  Reflection is not a simple bit of infrastructure to get up and running, but once you have it it's really hard to go back.

5 comments:

  1. Very nifty article there :) Not sure why you don't mention that Qt has a system built in which is very similar to what you describe there, but still, really great stuff none the less :)

    ReplyDelete
  2. I've never used Qt, but there are many similar systems out there.

    ReplyDelete
  3. Hi! I can't download this game by scanning the QR Code. I've used two different scan apps. Always not found messages I've got. It was really gone? Oh, I'm in Brazil, btw..

    ReplyDelete
  4. Cool! I wrote a C++ reflection system for a previous project, using another method you touch on. It required the user to manually add some stuff to the CPP file for the class, but via some macros that made it pretty easy. It was possible for the two to be out of sync, but in practice it rarely happened and was easy to detect (crash early!) and fix when it did.

    I think having reflection in there is really valuable, and the route you take is really a tertiary concern of taste, context, etc.

    Have you used similar preprocessor / code generation techniques to tackle NDK / JNI issues?

    ReplyDelete
  5. Hi,
    I guess this isn't really the best place to ask this, but I guessed you probably wouldn't see it if I posted it on an older post.

    Anyway, I'm in the process of developing my first game for android and thus is also learning OpenGL at the same time, I just watched your two Google I/O talks and was wondering more specifically how you're drawing the tile map thingy.

    In your first talk you had these 4 different ideas which are all pretty straightforward and easy to understand, except for the 3rd one, what do you mean with "Make "meta tiles" out of vertex arrays, uv them to the atlas texture, and project them using orthographic projection"?
    Isn't all 2D drawing with OpenGL using an orthographic projection?
    And what do you mean with "uv them to the atlas texture"? (I know what an atlas texture is, just not what "uv to it" mean)

    And also, in your second talk you said you changed to using a single large VBO and drawing 'scanlines' of tiles, could you explane that a little more?

    Thanks Chris, you're awsome!

    ReplyDelete