Introduction
We should start with a description of C++ virtual functions implementation; fortunately, there are many articles (particularly this one) which explain it well. Some advanced issues, for instance the multiple inheritance implementation, are described here .
Short summary: if a C++ class contains at least one virtual function, then for each object of this class, the memory chunk allocated for this object contains a pointer to this class virtual function table (vftable for short). On x86 architecture, if the ecx register points to the object variable (so, ecx equals "this" pointer), then a call to this object's third virtual function can be implemented like this:
mov eax, [ecx] ; load eax with a pointer to vftable
call [eax+8] ; call the third function in the table
Why bother to detect vftables?
There are a couple of reasons why detection of vftables can be useful for binary analysis:
- Because vftables can be stored within .text segment, a disassembler may try to treat it as code. Particularly, IDA sometimes does this; as a result, it produces functions containing weird opcodes, for instance:
sbb (byte_7D3939FF-7D393A7Dh)[ebp], bh
arpl [edx-79D682D4h], ax
If we knew what regions are occupied by vftables, we could instruct IDA not to disassemble them.
- Another usage is related to binary matching of different versions of the same code ( here you can learn more on what binary matching/binary diffing is about). From now on, we assume the debugging symbols are not available.Let's assume that we have already matched a certain number of functions from binary A with functions from binary B (say, we have matched functions with identical bodies, or with identical sets of called imported functions). If
- a certain function funcA from binary A is present in only one vftable vftA,
- a certain function funcB from binary B is present in only one vftable vftB,
- we have already matched funcA with funcB
then we may safely assume that vftA and vftB refer to the same class; therefore, we may match all members of vftA with respective members of vftB. Similarly, if we have matched class constructors, we can match all members of respective (referenced in the constructor) vftables.The above method has some advantages when compared with other matching algorithms. Particularly, it can reliably match functions which have few/none distinguishing features - all we need is its offset in vftable.
How to locate vftables?
In order to locate a vftable, we may use the fact that the vftable address is explicitely used in a constructor - as a part of object initialization, a constructor stores vftable address within the memory chunk allocated for an object. Therefore, the algorithm looks like this:
simple_vft_loc:
Does it really work?
In order to test the above algorithm, let's run it on a binary for which the debugging symbols are available: this way, we will be able to compare this algorithm's results with .pdb file contents. In case of VC compilers, C++ mangled names of vftables start with "??_7″ prefix, so we can easily extract all vftable entries from the output of any .pdb parser.We have chosen mshtml.dll for our test drive (I bet some of you share the idea that it makes sense to examine this particular binary in some detail). For mshtml.dll version 6.0.3790.2577, mshtml.pdb contains 886 vftable names; they point to 763 different vftables. Simple_vft_loc outputs 768 addresses which are supposed to be vftables. It turned out that 28 vftables were not detected ("false negatives"); mostly because some static objects variables contain a preinitialized vftable pointer (so, the vftable pointer is not set by a constructor, it is set by the linker). On the other hand, 33 addresses were "false positives": they pointed to variables which were not actually vftables, they just happened to start with a function pointer.
As we see, the false negative ratio is below 4%. Moreover, it is very probable that in a binary we would match our mshtml.dll with, the matching vftable would not be detected as well. Therefore, vftable detection false negatives should not impair the matching algorithm.
The false positive ratio is similarly low. Again, it should not lead to errors in binary matching - instead of matching vftable entries, we will match entries in other structures containing function pointers.
The simple_vft_loc algorithm was integrated in the "funcmatch", a binary matching tool, and so far, its performance is very satisfactory.
Other tables of functions?
Another common construction containing function pointers is a RPC dispatch table. An approach very similar to the above, using dispatch table detection, was implemented in the funcmatch tool as well.