This week in Paris, a friend asked me how the anti-virus situation was going and how we will be able to face up to the unexpected increase in malware number. “In a day, one of your competitors announces more than 1.7 million new detections. Its total detection jumped from 74,000 to 1,800,000! If this keep this up, the level of 2 million viruses will be overtook rapidly”, he said. Humorously, the man I was talking to concluded: “and you [McAfee], you still detect less than 400,000 threats?”

Counting malware can be quite a tricky business. At McAfee, and with each anti-virus definitions for VirusScan, we announce how many threats we are detecting with each new DAT release. This figure, however, is a *family* count. Yesterday (June 17th, 2008), the clock said 407125.

In September 2004 with DAT release 4391we reached 100,000 threats detected. With the 4800 release on May 2006 the number of threats detected reached 200,104 detections. This figure doubled in 2 years, and the situation could be analyzed as follow:

To explain how it was possible to pass from 74,000 to 400,000 or to 1,800,000 malware, I informed my friend we had to take into consideration AV researchers “zoos” - in other words: “collections” – consisting of several million malware samples (sometimes we use the term “unique samples”) collected each day.  I explained to him we had, roughly, in our high-security servers, 10,000,000 files:

  • classified by family
  • often with a vast number of variants
  • sometimes with multiple infected files from a single malware variant (when it is parasitic or polymorphic), or when malware authors configure their threats to serve a binary-unique version with each download. In that case, some zoos contain 1 or 2 *versions* while others will have 10,000 and others still 100,000!!
  • without forgetting the terrific “miscellaneous” subfolder for files that we cannot pigeonhole

Of course, I said almost all were detected and consequently all these prediction numbers were not gospel truth. I added they were only useful to establish a long-term trend on condition that their computation complies with a single rule as time goes by.

To end my demonstration I searched for real figures. Firstly I fell on AV-test.org statistics. On their site, they explain they manage 60 terabytes of testing data, including several million malware samples and clean files. They tests malware on all important desktop and server platforms, including all currently supported versions of Windows, Linux, Solaris, Unix, Lotus Domino/Notes and MS Exchange. Having just recently received from Germany some figures summarizing their malware collection items, I precisely heard of the size of their collection which exceeded 11 million unique samples (11,002,741 in April 2008).

Strengthened by this number, I was pretty sure we had - at McAfee - the same volume including parasitic and polymorphic malware for which we had to own multiple samples. I asked for a confirmation and received some figures I entered in this other chart:

While I wrote this blog entry, I imagined the reader surprise: in 3 months (from January 31 to April 30) collections increased by 2,880,000 million samples (at McAfee) and by 1,700,000 million samples (at AV-test.org); an average of 760,000 new files each month… This is true, and it is why we constantly work on new technologies to answer this challenge.

To conclude this blog entry, I propose to you the following……. It demonstrates that it is possible to announce that we detected, at the end of 2007, “between 357,820 (DAT-5196) and 8,600,000 pieces of malware”. And I predict we will detect at the end of 2008 between 450,000 and 22,000,000 malware”. OK, I joke a bit, but I also want to demonstrate there are many manners to count malware and you must not judge a product only by the announced number of detections.