Archive for May 15th, 2006

“Where Internet users go, attackers follow”

A new study is available on the Siteadvisor Web site. Named The Safety of Internet Search Engines, it was made by Ben Edelman and the Siteadvisor folks. Authors compared safety of leading search engines, using the company's automated Web site ratings. They find most leading search engines similar in the safety of the sites they link to, though MSN is the safest and Ask lags noticeably behind. The paper also demonstrates that sponsored results are significantly less safe than a search engines' organic results. There are heightened risks for certain keywords, including those frequently searched by kids and novice users. The study started in January 2006, and analysis uses search engine results as well as SiteAdvisor safety data from April 2006 :

  • Overall, MSN search results had the lowest percentage (3.9%) of dangerous sites while Ask search results had the highest percentage (6.1%). Google was in between (5.3%). Click here for the chart.
  • Sponsored results contained two to four times as many dangerous sites as organic results. Click here for the chart.
  • Dangerous sites soared to as much as 72% of results for certain risky keywords (Click here for the chart). Particularly dangerous keywords include "free screensavers", "bearshare", "kazaa", "download music", and "free games."
  • Authors estimate that US consumers make 285 million clicks to hostile sites every month as a result of search engine results.

Binary code analysis: benefits of C++ virtual function tables detection

Introduction

We should start with a description of C++ virtual functions implementation; fortunately, there are many articles (particularly this one) which explain it well. Some advanced issues, for instance the multiple inheritance implementation, are described here .
Short summary: if a C++ class contains at least one virtual function, then for each object of this class, the memory chunk allocated for this object contains a pointer to this class virtual function table (vftable for short). On x86 architecture, if the ecx register points to the object variable (so, ecx equals "this" pointer), then a call to this object's third virtual function can be implemented like this:
mov eax, [ecx] ; load eax with a pointer to vftable
call [eax+8] ; call the third function in the table

Why bother to detect vftables?

There are a couple of reasons why detection of vftables can be useful for binary analysis:

  • Because vftables can be stored within .text segment, a disassembler may try to treat it as code. Particularly, IDA sometimes does this; as a result, it produces functions containing weird opcodes, for instance:
    sbb (byte_7D3939FF-7D393A7Dh)[ebp], bh
    arpl [edx-79D682D4h], ax
    If we knew what regions are occupied by vftables, we could instruct IDA not to disassemble them.
  • Another usage is related to binary matching of different versions of the same code ( here you can learn more on what binary matching/binary diffing is about). From now on, we assume the debugging symbols are not available.Let's assume that we have already matched a certain number of functions from binary A with functions from binary B (say, we have matched functions with identical bodies, or with identical sets of called imported functions). If
    • a certain function funcA from binary A is present in only one vftable vftA,
    • a certain function funcB from binary B is present in only one vftable vftB,
    • we have already matched funcA with funcB

    then we may safely assume that vftA and vftB refer to the same class; therefore, we may match all members of vftA with respective members of vftB. Similarly, if we have matched class constructors, we can match all members of respective (referenced in the constructor) vftables.The above method has some advantages when compared with other matching algorithms. Particularly, it can reliably match functions which have few/none distinguishing features - all we need is its offset in vftable.

How to locate vftables?

In order to locate a vftable, we may use the fact that the vftable address is explicitely used in a constructor - as a part of object initialization, a constructor stores vftable address within the memory chunk allocated for an object. Therefore, the algorithm looks like this:
simple_vft_loc:

  • find all occurrences of "mov [reg+small_const_offset], some_const_val"
  • for each "some_const_val",
    • check whether it is a correct address within a binary boundaries
    • If so, extract the DWORD pointed to by some_const_val; let's name it FPTR.
    • Check whether FPTR is a valid pointer into an executable segment, and if it points into something resembling code, not data

    If all above steps succeed, then assume "some_const_val" is a beginning of vft, and a "mov" instruction referencing it belongs to a constructor.

Does it really work?

In order to test the above algorithm, let's run it on a binary for which the debugging symbols are available: this way, we will be able to compare this algorithm's results with .pdb file contents. In case of VC compilers, C++ mangled names of vftables start with "??_7″ prefix, so we can easily extract all vftable entries from the output of any .pdb parser.We have chosen mshtml.dll for our test drive (I bet some of you share the idea that it makes sense to examine this particular binary in some detail). For mshtml.dll version 6.0.3790.2577, mshtml.pdb contains 886 vftable names; they point to 763 different vftables. Simple_vft_loc outputs 768 addresses which are supposed to be vftables. It turned out that 28 vftables were not detected ("false negatives"); mostly because some static objects variables contain a preinitialized vftable pointer (so, the vftable pointer is not set by a constructor, it is set by the linker). On the other hand, 33 addresses were "false positives": they pointed to variables which were not actually vftables, they just happened to start with a function pointer.

As we see, the false negative ratio is below 4%. Moreover, it is very probable that in a binary we would match our mshtml.dll with, the matching vftable would not be detected as well. Therefore, vftable detection false negatives should not impair the matching algorithm.

The false positive ratio is similarly low. Again, it should not lead to errors in binary matching - instead of matching vftable entries, we will match entries in other structures containing function pointers.

The simple_vft_loc algorithm was integrated in the "funcmatch", a binary matching tool, and so far, its performance is very satisfactory.

Other tables of functions?

Another common construction containing function pointers is a RPC dispatch table. An approach very similar to the above, using dispatch table detection, was implemented in the funcmatch tool as well.