Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Mystery of Duqu Programming Language Solved

samzenpus posted more than 2 years ago | from the solving-a-mystery dept.

Programming 97

wiredmikey writes "Earlier this month, researchers from Kaspersky Lab reached out to the security and programming community in an effort to help solve a mystery related to 'Duqu,' the Trojan often referred to as 'Son of Stuxnet,' which surfaced in October 2010. The mystery rested in a section of code written an unknown programming language and used in the Duqu Framework, a portion of the Payload DLL used by the Trojan to interact with Command & Control (C&C) servers after the malware infected system. Less than two weeks later, Kaspersky Lab experts now say with a high degree of certainty that the Duqu framework was written using a custom object-oriented extension to C, generally called 'OO C' and compiled with Microsoft Visual Studio Compiler 2008 (MSVC 2008) with special options for optimizing code size and inline expansion."

cancel ×

97 comments

Sorry! There are no comments related to the filter you selected.

compiled by (1)

XCDBFPL (846367) | more than 2 years ago | (#39403715)

Pretty sure this was written for SD-6 at the direction of Arvin Sloane.

Re:compiled by (0)

Ihmhi (1206036) | more than 2 years ago | (#39404267)

Quick, get Sydney Bristow on the case! I hear she's married to Daredevil now, so I guess he... well actually, that might be a hindrance to the whole thing.

OOO !! SPECIAL OPTIMIZATIONS !! (-1)

Anonymous Coward | more than 2 years ago | (#39403725)

Like what everyone already uses !! Pft !!

Well that was disappointing. (1)

spxZA (996757) | more than 2 years ago | (#39403781)

I guess allens don't exist.

Re:Well that was disappointing. (5, Funny)

jhoegl (638955) | more than 2 years ago | (#39403821)

Oh no, Allens do exist. Although he spells it Alan.

Re:Well that was disappointing. (1)

stevegee58 (1179505) | more than 2 years ago | (#39403863)

*sarcasm*
I was disappointed too. I figured it was written in Python or Ruby on Rails.
*/sarcasm*

Re:Well that was disappointing. (1)

Mitchell314 (1576581) | more than 2 years ago | (#39406015)

Would be even better if it was all done in javascript.

Re:Well that was disappointing. (1)

Kenja (541830) | more than 2 years ago | (#39404091)

Just means the Aliens made MSVC 2008.

H-1B (1)

tepples (727027) | more than 2 years ago | (#39404709)

Just means the Aliens made MSVC 2008.

Then what country were they from?

Re:H-1B (0)

Anonymous Coward | more than 2 years ago | (#39406445)

India

Re:Well that was disappointing. (1)

Oswald McWeany (2428506) | more than 2 years ago | (#39404679)

Of course they do. The government just doesn't want you to know- so they made this bit up about being written in C.

Sure they do, the Stonecutters hide them (2)

IwantToKeepAnon (411424) | more than 2 years ago | (#39406145)

Who keeps Atlantis off the maps?
Who keeps the Martians under wraps?
We Do, We Do...

Let's See It (1)

rsmith84 (2540216) | more than 2 years ago | (#39403795)

A link to the actual code snippet would've been nice; I'd love to see the structure and logic behind it.

Re:Let's See It (5, Informative)

Baloroth (2370816) | more than 2 years ago | (#39403829)

Re:Let's See It (1, Interesting)

Ihmhi (1206036) | more than 2 years ago | (#39404311)

You know, I wonder if the antivirus suites of the future will be able to see stuff like this being written. Like "oh no, he is using emacs/vi and writing a php injection script - perhaps this is something we should look into specifically". I don't think heuristics of this sort would be any more onerous than the deep sort of file scanning that antivirus suites already do.

As an aside, Kaspersky is fantastic and aside from a small hiccup a year or two ago where they lost some CC data (and handled it pretty well IMO) I recommend it to my friends - especially the ones who are less computer literate. The whole "red light / yellow light / green light" thing on their Windows Vista/7 widget is very intuitive for computer newbies. (For some of my customers, I tell them to immediately call me on a red light. Makes my job easier lol.)

Re:Let's See It (2)

s.petry (762400) | more than 2 years ago | (#39404547)

This is already done to a large degree, at least with what matters in binary code. The "Script kiddie" tools are extremely well documented. This goes way back in time to when a tool came out called (I hope I'm remembering the name right) VCL or Virus Creation Labratory. It became pretty easy to determine VCL based code and the tool set pretty much evaporated.

What editor you use is really unimportant. The compiler is what counts, and the compiler never sees your editor.

Re:Let's See It (1)

ShaunC (203807) | more than 2 years ago | (#39405869)

You know, I wonder if the antivirus suites of the future will be able to see stuff like this being written. Like "oh no, he is using emacs/vi and writing a php injection script - perhaps this is something we should look into specifically"

I can't imagine that someone with enough technical ability to create the "mystery" Duqu code isn't already doing their development in a sandboxed VM with no AV apps installed. I doubt it's worth the time on the AV companies' part to attempt to detect the act of malware actually being written.

Re:Let's See It (1)

CSMoran (1577071) | more than 2 years ago | (#39411401)

You know, I wonder if the antivirus suites of the future will be able to see stuff like this being written. Like "oh no, he is using emacs/vi and writing a php injection script - perhaps this is something we should look into specifically".

Real programmers use cat anyway :).

Re:Let's See It (1)

tehcyder (746570) | more than 2 years ago | (#39411671)

(For some of my customers, I tell them to immediately call me on a red light. Makes my job easier lol.)

Does it play that old "red light spells danger - can't hold out much longer" song to reinforce the point?

Because that would be like a bucket full of awesome.

Re:Let's See It (0)

Anonymous Coward | more than 2 years ago | (#39405237)

"This means that a custom OO C framework is the most probable answer to our question.
We kept this (OO C) version as a "worst-case" explanation - because that would mean that the amout of time and effort invested in development of the Framework is enormous compared to other languages/toolkits."

Re:Let's See It (0)

Anonymous Coward | more than 2 years ago | (#39410639)

Unless someone took an existing language or toolkit from some corporate environment. I wrote similar frameworks for employers and myself, to ease various tasks. One was specifically designed to automate various tasks in conjunction with industrial controllers. It had a custom programming language even. No big deal ... I wrote the thing in 4 days.

Re:Let's See It (0)

Anonymous Coward | more than 2 years ago | (#39403849)

The 'mystery' involved has nothing to do with the actual code. The actual code itself is probably not really all that advanced.

Re:Let's See It (2)

Sulphur (1548251) | more than 2 years ago | (#39404719)

The 'mystery' involved has nothing to do with the actual code. The actual code itself is probably not really all that advanced.

The reverse compiler UnStux says the program is called Hello World.

Re:Let's See It (1)

God of Lemmings (455435) | more than 2 years ago | (#39408701)

That means nothing, All programs are derivations of Hello world.

If this keeps up.... (2)

ElmoGonzo (627753) | more than 2 years ago | (#39403815)

they may have learn MASM to avoid detection.

Re:If this keeps up.... (0)

Anonymous Coward | more than 2 years ago | (#39406139)

Before the late '90s, the very idea of writing a virus in a high-level language was laughable.
There were a few proofs of concept, but come on--even a clueless user would notice his executable files growing by 20K or more!

Re:If this keeps up.... (0)

Anonymous Coward | more than 2 years ago | (#39406461)

Better yet, write the code in Perl so no one will be able to understand it.

Source Code? (3, Insightful)

deemen (1316945) | more than 2 years ago | (#39403843)

How did they deduce it was an unknown programming language? By looking at the compiled machine code? How could they tell this wasn't just regular C?

Re:Source Code? (5, Informative)

CaptainJeff (731782) | more than 2 years ago | (#39403923)

Different languages compile down very differently. Indeed, different compilers compile the same source code differently (try comparing GCC output to Visual Studio output and you'll see some obvious differences in how the assembly/machine code is crafted). In this case, there were clear signs of an object-oriented approach (data and functions were located around each other in memory, which is not likely to happen in non-OO languages, etc).

Re:Source Code? (1)

CSMoran (1577071) | more than 2 years ago | (#39411425)

In this case, there were clear signs of an object-oriented approach (data and functions were located around each other in memory, which is not likely to happen in non-OO languages, etc).

I agree with the gist of your statement, but I don't think the OOP source-level organization of "data close to methods" is reflected in the generated machine code or the intermediate assembly. I'd wager data would be placed in non-executable memory segments, far from where the code ('text') resides. When you print out values of pointers you can often recognize what lives on the stack, what is heap-based data and what is a function pointer just by looking at address ranges.

Re:Source Code? (5, Informative)

tomhath (637240) | more than 2 years ago | (#39403933)

It seems they recognized a sequence of instructions [securelist.com] that are typical of a class constructor, just not like any class constructor they were familiar with.

Re:Source Code? (3)

robi5 (1261542) | more than 2 years ago | (#39404103)

The GP's question was something else - how did they initially tell it was _not_ regular C (as that obviously lacks the fingerprint of OO techniques of C++). Or if it didn't look like regular C or anything else, why didn't they just assume it was written in assembly, or some other rare machine code generating language like Common Lisp?

Re:Source Code? (5, Informative)

djdanlib (732853) | more than 2 years ago | (#39404675)

They did open the lines up for suggestions, and some community members suggested that it looked like OO C. How did they know? They probably had experience using and debugging OO C, if I had to guess. There were also plenty of people who said that it definitely wasn't compiler X or language Y from their own experiences. The article links to this discussion: http://www.securelist.com/en/blog/677/The_mystery_of_Duqu_Framework_solved [securelist.com]

But about discovering the specifics of the truth? It's probably like you alluded to in your comment - fingerprinting the machine code. It would take a while, but you could come up with fingerprints for a great many various compilers and features. You could do that for Common Lisp, too. (In fact, someone DID suggest for them to look at various LISP dialects.) It has taken long enough that such a scenario - having a good library of fingerprints - is believable. Given a scanner with a dictionary of fingerprints, one could reasonably say that you either have hand-assembled machine code made to mimic another language, or that you have code generated by a very specific language and compiler. If nothing in your library of fingerprints matched, assuming you had a good handle on hand-assembling machine code, you could look and see if it smells like such a beast. It would be tremendously laborious to hand-assemble code to make it look like a specific compiler generated it, and why would you do that in the first place? I fail to see the benefit when you could just use that compiler. If you were trying to throw off the analysts with a false positive match, there would still be a ton of mysterious data that still needs examination.

Think about DNA analysis. We can look at our DNA and determine some chunks of it came from virus, and that some of it is "junk" that serves no purpose.

Also think about image analysis like OCR or various captcha-breaking software. You can map images to characters with a program, and detect anomalies and known signatures.

Then there is heuristic antivirus scanning. It knows enough to find some previously unthought-of malicious code, even if it does sometimes generate false positives.

So why not apply those techniques to machine code, and see what you get? If multiple methods give you similar results, you would be onto something, I imagine.

Re:Source Code? (1)

EnempE (709151) | more than 2 years ago | (#39409371)

Perhaps they got what I think they were hoping for, which was someone involved with creating the program giving them a tip.

By putting out a public call like that they created a forest of opinions from all over the internet, perfect for a tree that wanted to help but didn't want to get chopped down as a result.

Re:Source Code? (1)

drinkypoo (153816) | more than 2 years ago | (#39412245)

Think about DNA analysis. We can look at our DNA and determine some chunks of it came from virus, and that some of it is "junk" that serves no purpose.

Except that it's recently been discovered that more of that "junk" has a purpose than we thought...

Re:Source Code? (1)

rudy_wayne (414635) | more than 2 years ago | (#39403965)

How did they deduce it was an unknown programming language? By looking at the compiled machine code? How could they tell this wasn't just regular C?

I suppose that you could possibly tell what compiler was used by the arrangement of the machine code, but I still don't see what the point is. Who cares if it was written in assembly language, C or Atari Basic?

Re:Source Code? (4, Insightful)

Sarten-X (1102295) | more than 2 years ago | (#39404027)

Knowing the language and techniques used can speed up analysis of future variants found, because they'll know what patterns to look for first.

Re:Source Code? (2)

tlhIngan (30335) | more than 2 years ago | (#39405743)

I suppose that you could possibly tell what compiler was used by the arrangement of the machine code, but I still don't see what the point is. Who cares if it was written in assembly language, C or Atari Basic?

Because knowing the compiler and version helps analysis - each compiler tends to emit code for the same statement very differently. By knowing the compiler, its idiosyncracies in the way it emits code is understood and it makes reversing the assembly back to C much easier.

Analyszing assembly code is difficult but if you know how higher layer structures are translated into machine code by the compiler used, it's a lot easier to "decompile" the code.

Re:Source Code? (4, Interesting)

b4dc0d3r (1268512) | more than 2 years ago | (#39406271)

To tag along - it's hard to tell data from code, and it helps the decompiling app to detect what is code vs. data if it knows which compiler created it.

It looks like the original blog used IDA Pro, which has library signatures for different compilers. It can identify functions and auto-comment the code, making disassembly easier. Auto-identify stack variables and keep track of them through lots of PUSH and POP and RETURN X statements, it's quite powerful.

In this case, IDA probably gave a lot of erroneous warnings or disassembled data or refused to disassemble code, requiring lots of manual work. The classes apparently were done inconsistently, making it hard to even write a plug-in to automatically detect them (scripts exist to identify MSVC objects through their RTTI properties, and do a decent job identifying non-RTTI classes, but this would not work with this code).

http://www.hex-rays.com/products/ida/index.shtml [hex-rays.com]

When reverse engineering, and your tool basically says "WTF do I do with this?" it's one of those moments where you want to know how the attacker made it.

Is it hand-rolled? Or a new attack creation kit that script kiddies can cobble something together using?

And "unknown language" was not a really good way to describe it. "Unrecognized output" would have been better. The assumption is that a language like C would compile to a C-like syntax, C++ would do things differently. But it could have been just C++ with an unknown compiler.

Re:Source Code? (2, Insightful)

UnknownSoldier (67820) | more than 2 years ago | (#39406511)

I can tell you have never taught another programmer nor learned the benefits of reverse engineering so you can write better code!  e.g. I used to work on a professional C/C++ compiler for consoles.  Customers would sometimes ONLY provide assembly code and it was your job to figure out why the compiler was generating invalid code.

Here is an perfect example -- a friend of mine was taking a CS course and the assembly code the prof provided was absolute shit -- a perfect example of how to NOT write code.  I cleaned up the assembly code into a properly commented assembly and then provided a mid-level source.  By having the 3 versions to compare against my friend was able to get a better handle on reading and writing assembly code, understanding how a compiler would translate a mid-level language to a low level language, learn some good commenting styles, etc.

First, the original crap assembly provided by the Prof:
0000        RD     R5    Inpt       // Read the no. of integers to be added from the input buffer
0004        MOVI   R6    0          // Set a counter to reg-6 and initialize to 0
0008        MOVI   R1    0          // Set the Zero register to its value
000C        MOVI   R0    0          // Clear Accumulator
0010        LDI    R10   Inpt       //  Load address of input buffer into reg 10
0014        LDI    R13   Temp       //  Load address of temp buffer into reg 13
0018 LOOP1: ADDI   R10   4          // Point to the next address of input buffer by adding 4
001C        RD     R11   (R10)      // Load  the content(data) of address in reg-10 in reg-11
0020        ST     (R13) R11        // Store the data in the address pointed to by reg-13
0024        ADDI   R13   4          //  Point to the next address of temp buffer
0028        ADD I  R6    1          // Increment the counter
002C        SLT    R8    R6  R5     // Set reg-8 to 1 if  reg-6 < reg-5, and 0 otherwise
0030        BNE    R8    R1  LOOP1  // Branch  if  content of Reg- 8 and Reg-1 is not equal
0034        MOVI   R6    0          // Reset the counter to Zero
0038        LDI    R9    Temp       // Loading the address  temp into reg 9
003C LOOP2: LW     R7    0(R9)      // Loads the content of the address in reg-9 in reg-7 , reg-9 is
                                    //  B-reg .  0 is the offset
0040        ADD     R0   R0  R7     //  Add  the content of  accumulator  with reg-7 and stored in acc.
0044        ADDI    R6   1          //  Incrementing the counter by 1
0048        ADDI    R9   4          //  Incrementing  the B-register  by 4 bytes
004C        SLT     R8   R6  R5     // Reg-8  is set to 1  ,if Reg6 < Reg5, and 0 otherwise
0050        BNE     R8   R1  LOOP2  // Branch  if  content of Reg- 8 and Reg-1 is not equal
0054        WR      R0   Oupt       //   Write the content of the aacumulator into output buffer
0058        HLT                     // Logical end of program

Notice the total useless comment of "Incrementing the counter by 1".  No shit sherlock, if I wanted to know WHAT the machine is doing, I could read the mnemonics.  I want to know WHY it is doing it.

Here is the cleaned up assembly comments:
0000:C050005C        RD     R5    Inpt         // nSize = ram[ INPUT ]
0004:4B060000        MOVI   R6    0            // iSize = 0
0008:4B010000        MOVI   R1    0            // zero = 0
000C:4B000000        MOVI   R0    0            // nSum = 0
0010:4F0A005C        LDI    R10   Inpt         // pSrc = &ram[ INPUT ]
0014:4F0D00DC        LDI    R13   Temp         // pTemp = &ram[ TEMP ]
0018:4C0A0004 LOOP1: ADDI   R10   4            // { pSrc++
001C:C0BA0000        RD     R11   (R10)        //   nVal = *pSrc
0020:42BD0000        ST     (R13) R11          //   *pTmp = nVal
0024:4C0D0004        ADDI   R13   4            //   pTmp++
0028:4C060001        ADD I  R6    1            //   iSize++;
002C:10658000        SLT    R8    R6  R5       //   bLoop1Done = iSize < nSize
0030:56810018        BNE    R8    R1  LOOP1    //   if ( !bLoop1Done ) goto loop1;
0034:4B060000        MOVI   R6    0            // } iSize = 0
0038:4F0900DC        LDI    R9    Temp         // { pTmp = &ram[ TEMP ]
003C:43970000 LOOP2: LW     R7    0(R9)        //   nVal = pTmp[0]
0040:05070000        ADD    R0    R0   R7      //   nSum += nVal
0044:4C060001        ADDI   R6    1            //   iSize++
0048:4C090004        ADDI   R9    4            //   pTmp++
004C:10658000        SLT    R8    R6   R5      //   bLoop2Done = (iSize < nSize)
0050:5681003C        BNE    R8    R1   LOOP2   //   if ( !bLoop2Done ) goto looop2
0054:C10000AC        WR     R0    Oupt         // ram[ OUTPUT ] = nSum
0058:92000000        HLT                       // return
005C:0000000A Inpt

And here is the corresponding C source code.
char * ram = (char*) data;
const int INPUT  = 0x5C;
const int OUTPUT = 0xAC;
const int TEMP   = 0xDC;

void SumArray()
{
   int iSize;
   int nSize = ram[ INPUT ];
   int nSum;

   int *pSrc;
   int *pTmp;

printf( "Size: %d 0x%04X\n", nSize, nSize );

    // mem_copy( Inpt, Temp, nSize )
    pSrc = (int*) &ram[ INPUT ] + 1;
    pTmp = (int*) &ram[ TEMP  ];
    for( iSize = 0; iSize < nSize; iSize++ )
        *pTmp++ = *pSrc++;

    pTmp = (int*) &ram[ TEMP ];
    nSum = 0;
    for( iSize = 0; iSize < nSize; iSize++ )
        nSum += *pTmp++;

    ram[ OUTPUT ] = nSum;

    printf( "Sum: %d 0x%04X\n", nSum, nSum );
}

Just because _you_ don't see the value in reverse engineering doesn't mean someone else can't learn something from the process.

Re:Source Code? (0)

Anonymous Coward | more than 2 years ago | (#39407287)

This is +4, informative? Come on. This is a dump of assembly code for an unidentified CPU, a waste of space.

Likewise: (1)

shiftless (410350) | more than 2 years ago | (#39407699)

This is +1, Normal? Come on. This is a worthless comment, a waste of space.

Re:Likewise: (1)

UnknownSoldier (67820) | more than 2 years ago | (#39421047)

> This is a worthless comment, a waste of space.
Your tips for coding and what you learnt from reverse engineering are where again?

Mis-directed anger. (0)

Anonymous Coward | more than 2 years ago | (#39423823)

It appears he was rebuking the A.C. for wasting space with his comment and lack of meaningful contribution, so you either replied to the wrong comment or need to focus less on asm comprehension and more on English.

Re:Likewise: (1)

shiftless (410350) | more than 2 years ago | (#39435035)

The other guy said it. I enjoyed your comment and found it interesting, and thought it was lame that some stupid A.C. crapped all over it with a one line throwaway insult.

Re:Source Code? (3, Informative)

plover (150551) | more than 2 years ago | (#39410565)

It's only a clue, not an answer. But it's one data point more than they had before. And they need somewhere to start looking for the author.

OO C is very interesting. C++ developers are a dime a dozen (OK, it's 2012, we're four for a quarter.) And you can't swing a dead cat around here without hitting a C coder. But OO C developers are a subset of a subset of people. Nobody who sets out to write a virus for the first time says "I should download a four year old compiler for a language I know nothing about and start writing my virus." They don't read in their copy of "Virus Creation Lab for Dummies" book where it says to torrent a copy of Visual Studio 2008, then download some GNU OO C framework for it. This is a tool that a limited set of experts uses for their day jobs. Possibly it's something a laid off software engineer would still have on his home machine. It might be code generated by a custom library that some gaming house wrote for their own internal stuff, and that by pattern matching with commercial software products they might be able to find the company of origin. They can go back and figure out who they fired in the last three years, and who now is driving the Ferrari. Maybe there's an OO C Google Group this guy participates in. Maybe he published a bogus "please help me with my homework" question on stackoverflow, and they can match some source code to some object code.

Or maybe it doesn't help find the guy today, but tomorrow if they haul a potential perpetrator before a judge, they can provide as corroborating evidence to the jury that the person who wrote this code was very specialized in his knowledge of this esoteric tool, and the defendant worked with this tool every day.

Whatever that clue might be, it could be useful knowledge to someone hunting down the author. Either way, it certainly has value.

Re:Source Code? (4, Insightful)

Baloroth (2370816) | more than 2 years ago | (#39403973)

There are certain characteristics to the way C++ behaves (the manner in which you pass parameters, etc). Mainly, through having looked at lots and lots of code samples, they can say what they expect the compiled code to look like. If they know C++ compiled code looks like x, regular C looks like y, and this looked like z, it can't be C. Essentially, the code did things you simply can't do in C++ or C (even Objective C) by itself. The problem is, that method only allows you to compare to known languages. More details here [securelist.com] .

It's basically like identifying an animal by footprint. Once you know a deer leaves a certain kind of footprint, you can identify more deer by examining footprints. But you can't identify an unknown animal that way: if you haven't seen a given footprint before, you won't know what animal it is, only what general characteristics it has (weight, etc.)

Re:Source Code? (0)

Anonymous Coward | more than 2 years ago | (#39404291)

It's basically like identifying an animal by footprint. Once you know a deer leaves a certain kind of footprint, you can identify more deer by examining footprints. But you can't identify an unknown animal that way: if you haven't seen a given footprint before, you won't know what animal it is, only what general characteristics it has (weight, etc.)

so bigfoot is real... that's what you're trying to say?

Re:Source Code? (2)

plover (150551) | more than 2 years ago | (#39410585)

so bigfoot is real... that's what you're trying to say?

bigfoot is indeed real, unless declared integer.

Re:Source Code? (1)

CSMoran (1577071) | more than 2 years ago | (#39411867)

I use IMPLICIT NONE, you insensitive clod.

Re:Source Code? (1)

plover (150551) | more than 2 years ago | (#39412541)

Thank you for the laugh I desperately needed this morning!

Re:Source Code? (1)

X0563511 (793323) | more than 2 years ago | (#39405681)

You're talking about the ABI [wikipedia.org] between objects, correct?

Microsoft's Big Chance (4, Funny)

JoeCommodore (567479) | more than 2 years ago | (#39403913)

A well publicized article featuring Microsoft Development products of all things, I think they should use that PR in their Microsoft Visual Studio Ads...

Re:Microsoft's Big Chance (1)

X0563511 (793323) | more than 2 years ago | (#39405701)

Code it like you own it!

Re:Microsoft's Big Chance (0)

Anonymous Coward | more than 2 years ago | (#39407159)

They could even point out that they too have the -Os!

"Custom object-oriented extension to C" (0)

Anonymous Coward | more than 2 years ago | (#39403941)

In other words, macros from hell, invoking other macros and building function tables and so forth (MFC was a representative example).

Re:"Custom object-oriented extension to C" (1)

tomhath (637240) | more than 2 years ago | (#39404537)

More like a lightweight open source framework [sourceforge.net] .

Re:"Custom object-oriented extension to C" (0)

Anonymous Coward | more than 2 years ago | (#39404891)

It's far from clear from TFA that your link (ooc from "old-fashioned") is the framework that the attackers were using.

Re:"Custom object-oriented extension to C" (1)

tomhath (637240) | more than 2 years ago | (#39406005)

Agreed, but the article does seem to indicate ooc is an existing, lightweight object oriented extension to C that the programmers compiled themselves. I didn't get the impression they think the programmers threw something together on their own.

Re:"Custom object-oriented extension to C" (1)

plover (150551) | more than 2 years ago | (#39426149)

I don't think the article implied that this framework was pre-existing or not. They don't know. It could have been a custom written framework to help the author(s) specifically build this virus. Rudimentary OO C frameworks are easy enough to write from scratch.

What they saw was a regular pattern common across many different functions in the binary, which suggested that the same source was used to create it throughout the code. And if the source code is that regular, either the author is a serial copy-paster (which implies bad coder, which the rest of the code provides evidence that he isn't), or the more likely answer is that macros were used to produce it.

They are probably hoping it's a particular OO C framework that's been used elsewhere, but not far and wide. They might be able to fingerprint the libraries that are in use around the world, and further narrow down the list of suspects. They might be able to compare this to different published software package fingerprints to see if any are a potential match. Maybe they'll find out it was downloaded from sourceforge, which leaves a short list of only a few thousand people or IP addresses who've ever downloaded it before (depending on what kinds of logs the servers and ISPs kept.)

The chances are really good that it won't lead straight to a specific Joe Ocaml of 123 Main Street, Fairview, Connecticut. The chances are high that it won't give them anything useful at all. But if there's even a 1% chance it might tell them exactly who wrote it? That's totally worth it to pursue the leads, especially since it's so little work to run that small bit of investigation.

I don't understand the fuss about the language (1)

Viol8 (599362) | more than 2 years ago | (#39403961)

If you can disassemble it then who cares whether it was written in OO C , C++ or Logo? I don't see why it mattered so much. Just follow the assembler.

Re:I don't understand the fuss about the language (0)

Anonymous Coward | more than 2 years ago | (#39404007)

A major part of reverse engineering code is understanding its runtime environment. Also, it's interesting to note the level of sophistication it implies: they are presumably rolling their own runtime environment instead of using COTS tools like usual.

Re:I don't understand the fuss about the language (3, Informative)

Brett Buck (811747) | more than 2 years ago | (#39404009)

They are trying to do the forensics. If you know the tools used, you have a much better idea where to look for the people who did it. It was almost certainly NOT a matter of determining what it was doing, they wanted to figure something out that would help them track it back to the source.

Re:I don't understand the fuss about the language (0)

Anonymous Coward | more than 2 years ago | (#39405465)

Forensics that lead to whom? Clever people identified that it was mysterious extra code was OOC. That could be anyone. I think they hid their code well, from my reading of things. They made sure to use professional mainstream tools, and the only extra sauce they used (OOC) could and probably is used by any number of groups. From my point of view as a fairlly novice programmer, I don't care too much about the forensics. It seems to me that MCVC 8 with an open source OCC is the standard in programming these days lol. So that's the language to learn, but I won't learn it as that would mean I'm a small fissh in a big pond. I'll stick to one of the lesser novice languages I think.

infective c? (1)

Anonymous Coward | more than 2 years ago | (#39403969)

Objective C but then for the MS platform?

Re:infective c? (2)

MadKeithV (102058) | more than 2 years ago | (#39404177)

I'd call it "subjugative C".

You FAIL iRt!? (-1)

Anonymous Coward | more than 2 years ago | (#39404115)

FreeBSD's would take aboUt 2

Old-school or new-school? (4, Insightful)

j33px0r (722130) | more than 2 years ago | (#39404405)

FTFA:

Why did the authors of Duqu use OO C? While there is no easy explanation why OO C was used instead of C++ for the Duqu Framework, Kaspersky experts say there are two reasonable causes that support its use [More control over the code & Extreme portability]. These two reasons indicate that the code was written by a team of experienced ‘old-school’ developers

Why OO C? Because it worked, because they new how to use it, because they knew it would throw Kaspersky for a loop, because they thought it was cool. There are many many reasons and they do not all have to be logical.

Kaspersky experts might want to consider that the programming wheel of life may have turned and that what was once old-school is now new-school. Whose to say that the under-estimated script-kiddies cannot grow up to be formidable adults with a whole new bag of tricks?

Re:Old-school or new-school? (-1)

Anonymous Coward | more than 2 years ago | (#39404913)

Why OO C?

There is a much more logical reason that you are missing. After watching Independence Day, you see that you can pass malicious code into anything using a Mac. Having chosen a Mac, Objective-C is the logical choice. That way with less development effort, you can transmit malware to anything from an Iranian nuclear facility to an alien spacecraft and save development dollars too !

Re:Old-school or new-school? (2, Informative)

Anonymous Coward | more than 2 years ago | (#39405725)

OO C is not Objective-C.

Re:Old-school or new-school? (1)

plover (150551) | more than 2 years ago | (#39426297)

Occam's razor. The simplest answer is usually correct. That drives an awful lot of investigations.

Despite the twists and turns that you see in TV crime dramas, most real world bad guys aren't quite that clever at hiding all of their tracks. Sure, they're going to hide the obvious ones they know they're leaving. They will use hacked proxies to deliver their code. They'll use sophisticated command and control networks to make sure nobody can track them back to the actual box making the inputs. They'll have a l33t Fast DNS setup to avoid takedowns.

But this level of sophistication in some aspects of tracking prevention doesn't mean they covered every other attribute perfectly. The author might be reading this on Slashdot right now and saying "oh, crap, I never thought they'd reverse engineer it to figure out I used my boss' old OO C library! Dammit!"

Or not. This could be a total red herring, bait cleverly dragged across the trail to distract the investigators further. But as an investigator, you have to start somewhere, and this is as good as anyplace. They'll probably never find box zero, so there's not much else to go on at this stage.

What's the animal (4, Funny)

Ukab the Great (87152) | more than 2 years ago | (#39404737)

For O'Reilly's "Mastering Duqu"?

Re:What's the animal (4, Funny)

GodfatherofSoul (174979) | more than 2 years ago | (#39404957)

It's a picture of Palpatine holding onto his nutsack.

Re:What's the animal (-1)

Anonymous Coward | more than 2 years ago | (#39405627)

an eagle? it's the us govt. isn't it?

Re:What's the animal (1)

Megane (129182) | more than 2 years ago | (#39406949)

A rubber ducky.

I'm confused (1)

ILongForDarkness (1134931) | more than 2 years ago | (#39404881)

Why does this matter? If it is a compiled program it is just a bunch of instructions. If the OS lets the instructions to run it doesn't much matter what compiler/language was used other than how efficiently it will do the crap it is told too.

Re:I'm confused (1)

Anonymous Coward | more than 2 years ago | (#39405083)

Because it gives hints about the _people_ that wrote it.

Re:I'm confused (1)

ILongForDarkness (1134931) | more than 2 years ago | (#39406235)

Hmm, that they had access to a Windows PC and knew how to code in C? That is pretty much every software developer.

Re:I'm confused (0)

Anonymous Coward | more than 2 years ago | (#39405093)

Why does this matter? If it is a compiled program it is just a bunch of instructions. If the OS lets the instructions to run it doesn't much matter what compiler/language was used other than how efficiently it will do the crap it is told too.

If you can correctly deduce its development environment, you've learned something about its author. Maybe you haven't learned enough to identify its author, but you've probably learned more than its author intended you to know.

Re:I'm confused (1)

xanthos (73578) | more than 2 years ago | (#39405417)

As a couple of others have stated, it is important in identifying who may be behind the code. "Authors" in certain parts of the world tend to use a certain set of tools for financial fraud, another group uses a different set of tools for industrial espionage, yet others may use either set of tools to mimic these groups while they do plain old espionage for a nation state.

As a defender, you probably are more worried about one group than the others. A small startup data mining firm is probably more worried about somebody stealing their IP and less about giving away any government secrets.

Re:I'm confused (0)

Anonymous Coward | more than 2 years ago | (#39411239)

Seeing such a question here makes me sad. :(

Re:I'm confused (1)

jonwil (467024) | more than 2 years ago | (#39435845)

If you know the libraries that a program is using, it can make it easier to reverse engineer.

Re:I'm confused (1)

ILongForDarkness (1134931) | more than 2 years ago | (#39437245)

Ah good point. I suppose too if the skills are rare enough just knowing the programming language might narrow it down enough to get to the few likely cuplrits.

Or the most obvious alternative... (0)

FlyingGuy (989135) | more than 2 years ago | (#39405363)

The code was written by someone with some very serious Assembler skills.

ANYTHING that can be written in any higher level language can be written in Assembler and that is an indisputable fact.

Re:Or the most obvious alternative... (-1)

Anonymous Coward | more than 2 years ago | (#39406065)

The code was written by someone with some very serious Assembler skills.

ANYTHING that can be written in any higher level language can be written in Assembler and that is an indisputable fact.

The only valuable comment here. Trust me, I know what I am talking about.

Re:Or the most obvious alternative... (2)

b4dc0d3r (1268512) | more than 2 years ago | (#39406309)

It was too consistent to be compiler intrinsics, but not consistent enough to be straight assembly. That's the impression I got from the original blog post.

No question it would have been possible, but given the rest of the code was compiled in MSVC it made sense that some sort of macro, framework, toolkit, or something was in between the course and the output.

height of dipshittery (0)

Anonymous Coward | more than 2 years ago | (#39405585)

The bizarre claims by Kaspersky about how Duqu's authors had invented their own language were patently idiotic, and bring a lot of doubt into their research process. Sure, everyone makes weird mistakes and gets rat-holed every now and then. But this... the claim was pretty dumb on its face (Occam, anyone?), AND it got all the way through their process to release without reasonable peer review, AND they did it publicly (the audience at CanSecWest kind of giggled when they presented it).

When the Wright brothers demonstrated a person flying in a winged machine, most stunned onlookers surely asked how the machine worked, but Kasperksy's ancestors must have exclaimed that the Wrights had invented the new science of genetics and engineered a weightless human being.

Goofballs.

Re:height of dipshittery (1)

Anonymous Coward | more than 2 years ago | (#39405903)

The bizarre claims by Kaspersky about how Duqu's authors had invented their own language were patently idiotic

It could well be that the Duqu authors wrote a macro language framework on top of C, for tighter code generation, greater control, the ability to easily add trace statements and/or experimental code during development runs, make it more difficult to trace to a specific commercial compiler, etc.

Re:height of dipshittery (0)

Anonymous Coward | more than 2 years ago | (#39407329)

Yes. But they didn't say that .

They asserted a technically infeasible and needlessly costly action had in fact taken place-- and then admitted they had little justification for making the claim aside from "dunno what it is, woudja have a look?" aimed at the crowd. It was silly.

Re:height of dipshittery (1)

Anonymous Coward | more than 2 years ago | (#39407421)

Anyone who uses C for a significant project writes their own framework on top of C. Sometimes those frameworks grow into gargantuan monstrosities, like glib, gtk, of libevent 2.0.

Smart programmers write minimalist frameworks that do simple, straight-forward things: simple object containers, simple event loops, etc, that can be reused without the cost of extra baggage. Extra baggage just gets in the way, because for highly specialized applications you will inevitably need to hack and refactor your libraries. A good library is one that can be hacked on, not one that tries to keep you from hacking on it by trying to do everything.

Does this mean (1)

SnarfQuest (469614) | more than 2 years ago | (#39405901)

Does this mean that Linux users need to run it under Wine? That would be inconvenient.

it's still binary code? (-1)

Anonymous Coward | more than 2 years ago | (#39405931)

why on the earth you would care what language it was written on?

It was smart to use a different language.. (2)

gorrepati (866378) | more than 2 years ago | (#39407449)

Smarter than you think. I remember reading somewhere that US radio controllers in WW-II used a native american language to communicate with each other. No amount of analysis will give you any insight, if the other party is careful to not use any trails. To translate on language into another mechanically requires deep knowledge of both the languages.

If you rolled your own language with its own grammar, you can be secure in the fact that *even* deep analysis will not yield any clues, not atleast by the current technology. I am not sure such a thing can be even done by a turing machine. People with better knowledge of it are welcome to correct me If I am wrong. All the current technology is concentrated on modifying bits for security, but if you do on a sufficiently high level(aka another language) there is no way to crack it.

This case however has a achilles heel; you can still modify the binary and see what results would be by running it. After a sufficient number of trials, you should be able to decode it.

Re:It was smart to use a different language.. (1)

Arrepiadd (688829) | more than 2 years ago | (#39411537)

Here go two links just in case you are interested in knowing more about this Navajo code talk [wikipedia.org] and the men [navajocodetalkers.org] that helped "secure" communications in the Pacific.

Re:It was smart to use a different language.. (1)

gorrepati (866378) | more than 2 years ago | (#39418553)

Awesome. It was more extensively used than I realized.

old news (0)

Anonymous Coward | more than 2 years ago | (#39407789)

Duqu just uses the Vala language to compile to C....

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>