Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Reverse Engineering Large Software Projects?

Cliff posted about 9 years ago | from the where-would-you-start dept.

Programming 104

stalebread queries: "Me and a team of other students have been tasked with reverse engineering a massive C/C++ (mostly C) computer game of about half a million lines. We have most of the source, but no clue of how to approach a task of this magnitude. Anyone have suggestions of programs, or techniques we could use to understand the structure of the game?"

Sorry! There are no comments related to the filter you selected.

Legal? (1, Interesting)

TheCarlMau (850437) | about 9 years ago | (#13761263)

Just curious... is this something legal? For example, isn't it illegal to reverse engineer Windows?

Re:Legal? (2, Interesting)

redelm (54142) | about 9 years ago | (#13761317)

I would presume that the code came from a liquidation/auction/takeover and the human capital the produced it is no longer available. First, I would try to hire one of the original sw architects to do some consulting. Who knows? They might have some email files that could be considered "part of the software".

Re:Legal? (3, Insightful)

Macphisto (62181) | about 9 years ago | (#13762184)

"Human capital"? What are you, an alien overlord of some sort?

Re:Legal? (2, Informative)

redelm (54142) | about 9 years ago | (#13762346)

Alien overlord? I love it!

"Human capital" is a rather common economics term to refer to those skills and knowledge that enable an employee to produce the desired works. Use the wiki, Luke. In this case, it is the experience and serenity which makes the Tao Master of programming worth several novice salaries :)

IMPORTANT!!! MOD THIS UP!!! (-1, Troll)

Anonymous Coward | about 9 years ago | (#13762951)

Me and a team of other students have been tasked

"I and a team of other students" or "I, along with a team of other students,".

Anyone have suggestions

"Does anyone have suggestions".

To the moderators: It is absolutely VITAL that you mod this post up, so that the editor and submitter can see it. (This is why I have posted it as close to the top as I can.) Now I know that, in the past, we have had our differences, and you have seen fit to mod my posts down for some inexplicable reason. Please note that this post is ON-TOPIC, because it refers SPECIFICALLY to the article summary. However, if you still feel that you want to mod this post down, rather than up, then please first consider the following:

If you love your country and don't wish to see Western civilization decline, you MUST mod this post up. Remember what President John F. Kennedy said when he corrected Nixon's grammar during the first televised Presidential debates in 1960: "I believe that this nation should commit itself to achieving the goal, before this decade is out, of 100% correct spelling and grammar among all of its citizens. [...] We choose to correct spelling and do the other things, not because they are easy, but because they are hard." And, during his inauguration: "Ask not what spell-checking your country can do for you; ask what spell-checking you can do for your country."

Are you a patriot? Do you love your country and eveything for which it stands, one nation, indivisible, with liberty and justice for all? If you do, then MOD THIS POST UP, so that the editors and article submitter can learn from it, so that people everywhere can learn from it, so that those countless heros of the past, who gave the last full measure of their devotion, shall not have died in vain. O say, does that Star Spangled Banner yet wave o'er the Land of the Free and the Home of the Brave? If you mod this post up, then the answer is "YES!"

Re:Legal? (1)

Fbelch (9658) | about 9 years ago | (#13761356)

He said... reverse a 'computer game' not something 'that makes a computer lame' :)

Re:Legal? (2, Informative)

jericho4.0 (565125) | about 9 years ago | (#13761462)

Why, yes! It is legal. In fact, the right to reverse engineer a piece of software or hardware for interoperbility is protected in the US, IIRC. Hence Intel clones, PC clones, Samba, etc.

But the article poster has access to the source code, something not usually associated with 'reverse engineering'. Products are still protected by patents, copyright and trademarks, and writing Samba (for example) after seeing Microsofts code would open one up to legal woes.

IANAL, or USian.

Flowcharting might help (3, Informative)

eric2hill (33085) | about 9 years ago | (#13761268)

Since you're probably proficient with C++, try a flowcharting [fatesoft.com] solution [aivosto.com] to give you a high-level map of all the classes. Maybe that will help.

best solution? (1)

RMH101 (636144) | about 9 years ago | (#13763428)

if you're trying to rev engineer code from a company that's gone bust - go hire their lead programmer for 6 months, and task him with documenting it. seriously.

Re:best solution? (1)

Da VinMan (7669) | about 9 years ago | (#13768183)

Yeah, that'll work. Because talented leads love to write documentation for 6 months at a time. No, really they do...

Re:best solution? (1)

RMH101 (636144) | about 9 years ago | (#13768805)

"talented leads"? oh, you precious little prima donna. are they keeping you in lattes and nerf guns?
pay big rates to skilled people if it's worth it to your business. pretty simple, really.

Re:Flowcharting might help (1)

computational super (740265) | about 9 years ago | (#13763750)

C-Scope [sourceforge.net] is a cool, free, class browsing tool that can make vi feel like a full-featured IDE. If you're an OS kind of person, take a look at this before you jump into the commercial tools.

Re:Flowcharting might help (0, Flamebait)

Hell O'World (88678) | about 9 years ago | (#13763976)

What, are you trying to imply that vi is not a full featured IDE?

Re:Flowcharting might help (1)

triso (67491) | about 9 years ago | (#13768994)

Sorry, but those type of product are bloody useless. The output is too complex when broken down into thousands of functions.

Perhaps the C++ products would be better since a diagram would be acceptable if broken down at the class level but the C portion of the program is useless if broken down at the function level.

Also useless is output which is a simple listing on a printer. The output must be in UML. or something similar, to transmit useful information to the reader.

oh boy (3, Informative)

QuantumG (50515) | about 9 years ago | (#13761271)

I presume you mean reverse engineering in the program understanding sense. In which case the way to go about it is to sit down and read the source code, taking notes as you go. You should then set yourself some maintenance tasks - modifying the source code is the best way to find out if you understand it or not.

I believe the instructor is assigning... (2, Informative)

Burz (138833) | about 9 years ago | (#13761378)

...a maintenance task, not a coding task. S/he is probably looking for a UML model, as I implied elsewhere in the thread. IBM Rational, Gentleware, Borland and some FOSS projects have software just for this sort of thing: Modeling all of the classes, structs, member variables and functions along with displayable relationships (using arrows, lines, and nesting).

Whats more, some of these tools can be used to modify programs within the model, and then update the source code (forward-engineering). They can also create tables/databases from your persistent entity classes, represented with their own DBMS variety of UML icons...and can even update the actual database (sometimes directly, other times with DDL scripts) and track/display relationships between tables, and with the classes that use them.

UML tools will seldom be able to reverse-engineer information about procedural code (declarations, conditionals, etc.) also this can usually be modeled by hand when such detail is necessary.

Re:I believe the instructor is assigning... (2, Funny)

QuantumG (50515) | about 9 years ago | (#13761387)

Yep, lots of luck finding a single one of these tools that works on C code. Although making pretty pictures can certainly be a good way to get an overview of the software, and maybe students need that kind of assistance. Personally I think something like C-Scope is more than enough.

Re:I believe the instructor is assigning... (1)

Burz (138833) | about 9 years ago | (#13761532)

In theory, C should only be a problem if it was coded without regard for OOP. And even then, structs will likely abound... you can pull those into the model with the built-in reverse engineering and use that as the nucleous for modeling the rest of the program either by hand or with the help of scripts. For instance, in Rose you could write a script to represent .c files and functions as stereotyped components and classes...and maybe even show what sort of data gets passed between functions.

UML assumes OOP, but the UML tools are not tied to that concept hard-and-fast: You can reappropriate the symbols as you see fit as long as you don't have to generate code from them (and even then its do-able).

Re:I believe the instructor is assigning... (1)

Doctor Memory (6336) | about 9 years ago | (#13766786)

In theory, C should only be a problem if it was coded without regard for OOP

Yes, and we know that this practically never happens. Especially with performance-critical software like games.
<eyeroll/>

Re:oh boy (1)

AvitarX (172628) | about 9 years ago | (#13765549)

Don't know the exact quote, but Sun Tzu has some wisdom that applies.

(paraphrased)"commanding many is the same as commanding few, it is generally a matter of organization".

So I would read some general stuff on how to do this (Practical C Programming has a short chapter for example, but you probably want a book all about it). I would then do what they do with their few to few thousand line sample meticulously to the whole thing as the parent post suggests.

You need to flow chart the whole thing with notes, then it can be subdivided much more easily as someone working on a block of code will know its context, but everyone will need a fairly good grasp of the flowchart to analize a segment properly.

In the quote replace commanding with reconstructing meaning of code, I am not giving management advice. I think this would be obvious, but once I said Avril Lavigne was punk with a sarcasm tag and still people thought it was serious, so I take nothing for granted.

If you already have the source... (-1, Troll)

Anonymous Coward | about 9 years ago | (#13761273)

...why are you trying to reverse engineer? Why not read the source as is? More importantly, you are a raging faggott brown nigger trying to steal american jobs. If you have to ask slashdot, you are NOT QUALIFIED for the job. Rot in your shit stained country, idiot.

Re:If you already have the source... (0, Flamebait)

Profane Motherfucker (564659) | about 9 years ago | (#13761341)

I don't agree with anything in the above comment except this: "If you have to ask slashdot, you are NOT QUALIFIED for the job."

Ditto on that. And we wonder what the fuck happened to higher education? Small rocks have been know to have more ingenuity than this.

Re:If you already have the source... (1, Interesting)

oopsdude (906146) | about 9 years ago | (#13761421)

If he already has the source, then this problem may be easy enough to make asking Slashdot unnecessary. However, there are instances in which asking Slashdot is necessary. If they didn't have most of the source, for example. Or, for example, in this article [slashdot.org] , where an IT guy was asked to make an infrastructure for over one million email accounts that must scale perfectly and have 99.9% uptime. Show me a university that trains students for that.

Reverse Engineer or Refactor/Port? (4, Interesting)

linuxtelephony (141049) | about 9 years ago | (#13761283)

It sounds like you are wanting to refactor the code, or port it to another platform. If you are missing some of the code, then you'll have to reverse engineer that portion of it.

As for how to approach it - I think it depends on the size of your team, and what goals you set for the effort. Are you just wanting to learn? Or do you want to improve performance? Or make it work on another platform? What are the goals for this project?

Once you know those details, they might give you an idea where to begin.

Re:Reverse Engineer or Refactor/Port? (5, Informative)

QuantumG (50515) | about 9 years ago | (#13761377)

Yeah, the "most of the source code" part is a bit scary. If they really are talking about reverse engineering from executables they are in for a hell of a time. The state of the art is a project I work on now and then, Boomerang [sourceforge.net] , and it isn't for the faint of heart. I've been hearing for years about people who are working on decompilation tools that are integrated into IDA Pro [datarescue.com] but I've yet to see it. The time where you can enter a binary, press a button and get back compilable, maintainable source code is still a long long way off. But that's good, friends of mine do commercial decompilation work.

Re:Reverse Engineer or Refactor/Port? (1)

undef24 (159451) | about 9 years ago | (#13762069)

Pretty impressive IDA integration here: http://pedram.redhive.com/research/process_stalkin g/ [redhive.com]

Re:Reverse Engineer or Refactor/Port? (1)

QuantumG (50515) | about 9 years ago | (#13762298)

There's lots of IDA Pro plugins.. there just aint any that do decompilation. Which is what I said.

It could help... (2, Insightful)

itistoday (602304) | about 9 years ago | (#13761292)

To understand how games are made in the first place. What kind of a game is it? Is it a single player game, or multiplayer game? If it's multiplayer you'll have to watch out for code designed to keep the game logic at a fixed rate; all other code will be built on top of that. Singly player games on the other hand don't have to worry about all the intricacies of keeping the various game clients in sync.

So it really depends on the kind of game it is. Since I'm assuming you know this, I would suggest trying to first think how you would write the game yourself, and then see if you find any similarities between your ideas for the engine structure and the games.

Re:It could help... (1)

UsualDosage (922364) | about 9 years ago | (#13774873)

Sounds like MUD/MUSH code to me. I can't think of many games with 500K lines of source code written exclusively in C/C++ that aren't MUDs. If that's the case, itistoday is right in that MUD code is built around a game timer (for fixed rate logic, MUDs use 'ticks', generally firing off around once every 30 seconds), and it's also built around clever use of sockets to allow multiple realtime connections, which is generally one of the harder things to accomplish (read, was). If I wanted to get to the nitty gritty of game code, I'd learn all about sockets first, game timing secondly, and then read into general game mechanics (balance, theme, style, story).

A UML reverse-engineering tool (2, Informative)

Burz (138833) | about 9 years ago | (#13761296)

One like Rational Rose. It can create iconographic models of programs from source code.

Other UML tools exist, like Argo and Umbrello, but I'm not sure if they reverse engineer.

Re:A UML reverse-engineering tool (1)

superpulpsicle (533373) | about 9 years ago | (#13761790)

Rational rose good suggestion, unfortunately it's just not something students will be able to afford.

I feel bad for students nowadays to have to deal with these gigantic assignments in schools who never provide enough resources. But thats a different story altogether.

Re:A UML reverse-engineering tool (1)

Iamthefallen (523816) | about 9 years ago | (#13765936)

I feel bad for students nowadays to have to deal with these gigantic assignments in schools who never provide enough resources.

Isn't that exactly what they'll do when they get out of school as well?

Re:A UML reverse-engineering tool (0)

Anonymous Coward | about 9 years ago | (#13762978)

I'll second that - go for a UML reverse engineering tool that can turn your C++ into UML class diagrams.

I've used IBM/Rational Rose and Enterprise Architect to do this and it works well. Rose is way too expensive for students unless your department has licenses.

I would go for Enterprise Architect from Sparx Systems
(http://www.sparxsystems.com.au/ [sparxsystems.com.au] ).

Have a look at http://sparxsystems.com.au/products/academic_prici ng.html [sparxsystems.com.au]
which lists the academic pricing. You will be after the Professional edition which does the reverse engineering.
For what it can do, the academic edition is a bargain.

So what will a tool like this give you. A list of all the packages/directories/sub-packages/modules in your system.
A UML class diagram for each package showing all classes and relationships between classes, includinge aggregation and inheritance relationships. Each class will have the name, attributes/types and all operations/methods with their signatures.

You can then use Enterprise Architect to further document every single class/relationship/atttribute/method.

Finally, you can from the reverse engineered model automaticaly produce API documentation for the entire system either in an RTF document or HTML somewhat similar to doxygen.

Your team can then use the class diagrams and the API documentation to help you understand the design of the system you are studying. Basically you can extract the detailed design document for the system using this technique. Having this information in addition to the source code is invaluable. Just having the source code, for a sufficiently large system is a nightmare.

Not Rational Rose unless... (1)

Doctor Memory (6336) | about 9 years ago | (#13766687)

...it's improved mightily since I last used it. Granted, it was reverse-engineering some Java code, but it wouldn't do squat unless it could compile the whole thing (I assume it created a symbol table/parse tree and based its analysis on that). Which made it useless for documenting portions of a product, or one that was in flux and not in a cleanly-compilable state. Sure, you could stub out everything, but if you're talking an entire package that isn't available, then it's more work than it's worth.

Personally, I'd try to break it down functionally first. Set off the graphics, the user interaction, the game core, any AI. Then break them down into their respective functions, and so on and so forth. Keep an eye out for "extern struct" declarations (esp. things like "extern struct _common" or "extern struct shared_vars", these will be communication vectors and be a general pain source).

Source navigator (3, Informative)

Mr2cents (323101) | about 9 years ago | (#13761297)

http://sourcenav.sourceforge.net/ [sourceforge.net]

I like to use it when browsing through code, you can search and browse as much as you like. It will still take an effort though.

Re:Source navigator (1)

hughk (248126) | about 9 years ago | (#13763386)

I used it quite nicely under Windows (Cygwin, I believe) to hack some AIX code (a big QT/Motif app). The AIX version was so old that Source Nav wouldn't compile there, but it ran on cygwin and I used to go to the file-shares where the AIX source code was and it ran quite nicely and got me out of a lot of trouble.

Profiling! (2, Informative)

redelm (54142) | about 9 years ago | (#13761302)

First run the code under a profiler. This will give you some idea of where it spends it's time. Running under a first-class debugger (SoftICE?) will also help because you can haul off stack-traces and see what's been called from where.

Re:Profiling! (1)

Vladimir (98464) | about 9 years ago | (#13770321)

I agree it's very useful. Callgrind && kcachegrind usually give a lot of insight into the code. I wonder if callgrind can be hacked to produce nice seq. UML diagrams (it knows when objects are created and what messages are sent -- should be a very nice addition to the analysis).

lots of moutain dew.... (3, Funny)

warpSpeed (67927) | about 9 years ago | (#13761322)

Oh, yeah, and hohos! Never underestimate the power of the hoho.

Me (-1, Offtopic)

Anonymous Coward | about 9 years ago | (#13761332)

"Me and a team of other students have been tasked with reverse engineering..."

though none but an infant or Tarzan would say

"Me have been tasked with reverse engineering..."

Tarzan and Tanto School of Communication Arts (0, Offtopic)

teknomage1 (854522) | about 9 years ago | (#13762635)

You my friend obviously havn't attended the World Famous Tarzan and Tanto School of Communication Arts.

  • Tired of wasting time and money on articles such as "a" and "the" ?
  • Worried about confusing listeners by switching between personal pronouns?
  • Think noun-verb agreement is innefficient and redundant?

Come on Down! TaT School of Comm. Arts am accepting applications now!

Re:Tarzan and Tanto School of Communication Arts (0)

Anonymous Coward | about 9 years ago | (#13765652)

Don't forget, TaT School of Comm. Arts offer graduation level course proctored by Professor Frankenstein.

  • Course 501 - "short good... conjunction bad!"
  • Course 502 - "verb tense... bad.
  • Course 503 - "fire bad!"
  • Course 504 - "fire BAD!"


Your modern ways confused and frightened me... until I went to the TaT School of Communication Arts! - Unfrozen Caveman Laywer

Re:Me (0)

Anonymous Coward | about 9 years ago | (#13762757)

Being more precise, for non-native english speakers:

This isn't a plural versus singular problem, as you might initially think. "Jack and Jill have a dog named Spot," cannot be changed to "Jack have a dog named Spot", because it's plural versus singular. That's pretty obvious, but it doesn't apply here.

The real problem is that 'me' is a direct object, and 'I' is a subject. You don't say "Me is going to the store" or "Me would like ice cream", unless you're trying to be funny. And you wouldn't say, "Give that ice cream to I!"

Better phrasing for the original submission would have been "A team of students and I have been assigned....."



-- Grammar Nazi

WTF? (1, Insightful)

jericho4.0 (565125) | about 9 years ago | (#13761404)

It looks like I get to be the first one to call you on this. WTF are you talking about!? You don't 'reverse engineer' something you have the code for. Maybe you mean 'port' or 'complile'.

If you wish to start getting a handle on a chunk of code, start by reading main() along with a profilers output. Grep is your friend.

Re:WTF? (1)

NextGaurd (844638) | about 9 years ago | (#13761849)

Maybe they are trying to duplicate functionality but can't use the exact code.

Re:WTF? (3, Insightful)

TheVoice900 (467327) | about 9 years ago | (#13762222)

Just because you have the code doesn't mean you know how the system is assembled and how all the components work together. "Reverse Engineering" is a pretty loosely defined, but if you take it literally, it's just that.. reversing the engineering process. From the description of the question, the poster is looking to take the finished product (the source for this game..) and move back up the high level design phase. This means analyzing the module interconnections, class hierarchy, and that sort of stuff. It doesn't necessarily mean they want to "port" or "compile" it.

Re:WTF? (1)

Scorchio (177053) | about 9 years ago | (#13766887)

Reminded me of a porting project I had a few years back. We had pretty much all the source code from the game, but only binaries for the proprietary libraries it relied heavily on. In fact, it was only when we started pulling the thing apart that we found out just how much stuff was hidden away in the libraries. For example, a lot of the code seemed very disjointed - most sections didn't seem to be called from anywhere. We found that all entities (player, enemies, effects, menus, sounds, and so on) were set up as tasks with common data structure headers containing function pointers and linkage, and the whole thing got build up as a tree. Re-implementing the task manager from the library code was a huge step towards getting the port to run.

We had the game running on the original platform, with a debugger and disassembler, so disassembling the libraries wasn't too bad - actually it was quite fun. The libs were compiled c code, which made it easier to pull out a c implementation, especially as we knew from the library stub name roughly what it was supposed to do and what data was going in.

I'm currently pulling apart an old 8-bit game to see how it worked; it's only 26kb, but it's all hand optimized and self-modifying 6502 code. Takes a little longer to figure out, but it's still fun.

Re:WTF? (1)

ifdef (450739) | about 9 years ago | (#13768680)

That's right. Since the code is only half a million lines, it should be pretty straight forward simply to read it starting at main().

IDA (1)

segra (867730) | about 9 years ago | (#13761493)

IDA pro, best dissasembler around ;) http://www.datarescue.com/ [datarescue.com]

Re:IDA (1)

ifdef (450739) | about 9 years ago | (#13768707)

Because it will be much easier to read disassembled binary that to try to figure out what half a million lines of C or C++ code does.

Reverse engineering (1, Informative)

Anonymous Coward | about 9 years ago | (#13761552)

>We have most of the source, but no clue of how to approach a task of this magnitude.

Reverse engineering is generally thought of as a "cleanroom" technique that involves having the binary and/or specification but not the source. If you have the source, then you're just reading/rewriting it (or perhaps just copying it and doing s/Old Name/Our Cool New Game That's Nothing Like Old Name/).

>Anyone have suggestions of programs, or techniques we could use to understand the structure of the game?

If it's mostly C, then you definitely need to get cscope [sourceforge.net] , but that won't tell you where to start reading because it cannot resolve calls to function pointers. To get that information, you might also try running gprof [gnu.org] .

Another neat trick is to compile the program and use nm [hmug.org] to help map out object file dependencies. You'll want to use perl or something to create a database of where the symbols are defined and where they're imported. This can help you establish which files are the meat and which ones are the potatoes.

Re:Reverse engineering (1)

ifdef (450739) | about 9 years ago | (#13768772)

... We have most of the source, but no clue of how to approach a task of this magnitude.

Reverse engineering is generally thought of as a "cleanroom" technique that involves having the binary and/or specification but not the source. If you have the source, then you're just reading/rewriting it


If you have the source but not the spec, and you're working on recreating the spec, then you're reverse engineering.

Re:Reverse engineering (1)

try_anything (880404) | about 9 years ago | (#13770688)

You can't call it "reading" and "writing." Too many computer people think that "reading" and "writing" are fast, straightforward, linear techniques that result in a disorganized mess - which for them is probably the truth. They would never understand that putting serious mental effort into creating a coherent, useful work could be called "writing," or that a systematic, intelligent effort to understand something could be called "reading."

Graphviz and GNU Global (1)

nullspace (11532) | about 9 years ago | (#13761589)

Graphviz [graphviz.org] and GNU GLOBAL [gnu.org] used in combination give a graphical and web-accessible view of a large, unknown software system. This will give your team a high-level view of the modules and how they interact. This will make it easier to discern the system design.

I know where I would start (0, Flamebait)

heinousjay (683506) | about 9 years ago | (#13761594)

I would start by trying to get someone on the internet to do my homework for me.

Oh, wait, you've already come that far. Well, I'm stumped now. Good luck!

Use our tool :) (2, Interesting)

mr_tenor (310787) | about 9 years ago | (#13761669)

www.cse.unsw.edu.au/~drt

Not that I'm biased or anything. The idea is to monitor the program while it's running and use the call graph to generate sequence diagrams and such. Feedback and ideas for further reasearch welcome :)

Headfuck (0)

Anonymous Coward | about 9 years ago | (#13766875)

OK, these should not be juxtaposed:

"It can't be fully enjoyed if you're worried about your girlfriend walking in on you fucking a jar of spaghetti."

immediately followed by:

"Use our tool :)"

Whats your goal? (1)

AuMatar (183847) | about 9 years ago | (#13761751)

Whats the goal of your project? To figure out how they do one particular thing? To figure out how the code works in a general way? To understand 1 subsystem?

For the first, I'd try and find the functions called around when it occurs, and use a debugger to step through what happens.

For the second, I'd study the interface files and use cscope. Figure out what is calling what, and see how its interlinked.

For the third, you need to do the same as above on a local level- between files of the module. Then dig into individual parts.

Really, if you want better advice, you need to tell us what you want to do.

use the tools that are available (1)

blackcoot (124938) | about 9 years ago | (#13761763)

let's see:

if you have access to (ir)rational rose, runing your code through that will probably speed up a lot of this process. otherwise, a combination of cccc and doxygen with the appropriate config files will give you about the best start you can hope for. hopefully, the code has reasonable documentation. if not, you're basically screwed --- you'll have to work out the use cases and reconstruct your software from there.

here you go (1)

pmike_bauer (763028) | about 9 years ago | (#13761828)

reenigne

Have most of the code? (2, Insightful)

mnmn (145599) | about 9 years ago | (#13761848)

It is not 'reverse engineering' if you already have the code. So you'll be reverse engineering the part that you dont have a code for, and making sense out of the code that you do have.

Draw flow charts. Then assign a seperate person for each module to make sense out of it. Next you'll do what you plan to do....

Make mods for it? Make a clone? Rewrite the code and sell the code? Recompile and port to Linux?

There are some automatic UML generators (2, Interesting)

rgbe (310525) | about 9 years ago | (#13761921)

There are some automatic UML generators that will give you an overview of the code, or parts of the code:
http://droogs.org/autodia/ [droogs.org]

What language is C/C++? (0)

kupci (642531) | about 9 years ago | (#13762003)

Check the C++ FAQ [parashift.com] then post to comp.lang.c++ [google.com] . I thought this in particular was good:
Do not refer to "C/C++." Some people get testy about that, and will (unfortunately!) ignore everything else you say just to correct you with something like, "There is no such language." It borders on pathetic, but you'll probably be okay if you say "C or C++" instead of "C/C++." Sigh.

Re:What language is C/C++? (0)

samjam (256347) | about 9 years ago | (#13762688)

Excellent comment; but, those picky people would also be wrong.

C/C++ more correctly but rarely known as C++/C is C++ written in the style of C, and is a wicked waste of Bjarne's time.

The guy behind xapian.org/xapian.com, Olly Bets knows how to write C++ with proper and repeated use of the base classes, iterators and templates and to be frank, his C++ looks almost like perl, and it is a delight to read.

C/C++ is just C with objects and falls so far short.

Sam

Re:What language is C/C++? (1)

ACORN_USER (902686) | about 9 years ago | (#13764263)

my $my;

for ($you=('like');open(LY,read('ing',$_,('perl')));){ do{};you();}

Re:What language is C/C++? (1)

samjam (256347) | about 9 years ago | (#13764849)

Nice one; ...but thats not real perl.

Strictly speaking: Global symbol "$you" requires explicit package name

Maybe the per parser doesn't throw up any errors but its no more perl that a lot of C/C++ is C++

Sam

Re:What language is C/C++? (1)

ACORN_USER (902686) | about 9 years ago | (#13772145)

Ok, so it doesn't run with strict. -w will give you a bollocking. But:

$you == $::you == $main::foo

So the package name is reall implicity implied.

Agreed though. Most perl poetry is to perl and Larry what C/C++ is to C++ and Bjarne. The problem is that many write perl code like poetry - bad poetry.

Re:What language is C/C++? (1)

ivan256 (17499) | about 9 years ago | (#13768474)

When sombody says C/C++ it can easily be something less evil, like C++ that uses some libraries written in C, or C++ with a C API binding. You know, kindof like when somebody says their app is written in Java/C. Many projects do use multiple languages, and C and C++ are both very popular.

Cross-reference first: Doxygen is your friend (4, Informative)

treerex (743007) | about 9 years ago | (#13762264)

It sounds like you are unable to build the complete system and run it, since you're missing functionality. This removes the possibility of using runtime tracing tools.

The first thing I would do is run something like Doxygen [doxygen.org] over it to generate a cross-referenced description of the structures. It won't give you a global view of things, but it will give you a decent browsable view of the code itself. Another response mentioned GNU GLOBAL [gnu.org] which may work better for you. Yet another possibility is LXR [linux.no] , though it may not work as well in C++. Regardless, a nice thing about Doxygen is that, when used with GraphViz, you can get useful diagrams generated showing class containment and file inclusion graphs.

After you have that, get out your paper and pencil, and start drawing and manually tracing things. That's how I go about coming up to speed on new code I can't execute and step through. Eventually transfer that knowledge into a text file (or, nowadays, a wiki) so that others can benefit from it.

Re:Cross-reference first: Doxygen is your friend (1)

maetenloch (181291) | about 9 years ago | (#13762655)

Doxygen is a great help in understanding someone's code. If you have Graphviz installed, and all the options turned on, it will generate call graphs, header dependencies, and even UML diagrams of your classes and structures, along with an html-ized view of the code. And best of all it's free.

Re:Cross-reference first: Doxygen is your friend (1)

aero6dof (415422) | about 9 years ago | (#13768837)

I would second this suggestion. I would also suggest you turn on the options the fully cross reference and include the source code, as well a generate the graphiz diagrams (as suggested by another reply th the parent post).

Can you compile? (1)

liquidsilver10 (921963) | about 9 years ago | (#13762479)

From the description it sounds like you are trying to understand how the program works (as you have the source code), rather than 'reverse engineering' which the usual meaning assumes you don't.

So my suggestion is start by getting it compiled, up and running ;) You can then use the debugger to breakpoint the code and follow it through. You say you have most of the source code. Is the rest available as libararies to link to? Otherwise you could create 'fake' libraries just to get it compiling and running.

Probably best to start with a top down approach. Games invariably have a 'game loop', so locate it and start there. It'll be something along the lines of

InitStuff();

while(running)
{
DoStuff();
}

CloseStuffDown();

Whilst games are many and varied in their complexity they more often than not follow the same kind of pattern. During DoStuff() they will update the user input, update the world state (objects in the world), and finally render stuff on the screen. Once you've got a basic handle on which bits, try changing or commenting out things to see what effect that has.

You could try mapping out all the source code with various software tools and the like, but the best way I've found to understand the code is just to dive in and have a play around. You'll probably find there are one or two files that do most of the interesting stuff anyway. If you have the change information from source control (eg if it's including at the top of the file) then look for the ones with the most changes ;)

Also - grep is your friend ;)

HTH & Good luck

Re:Can you compile? (1)

liquidsilver10 (921963) | about 9 years ago | (#13762808)

Another thing that might help is asking yourself what are the important events in the game and finding where they are handled (where is the score updated, where is damage worked out, spawn/death events handled). Sometimes games use global event handlers/listeners so different parts of the code can hook into the important stuff. Find where these events are handled or callbacks registered and it should give you a quick insight into the more interesting parts of the code ;)

Resources For the Code Janitor (4, Informative)

sohp (22984) | about 9 years ago | (#13762523)

I applaud your professor or thesis advisor or whoever for this real-world task. Here's a few resources which I wouldn't do without:
Code Reading: The Open Source Perspective [spinellis.gr]
Object-Oriented Reengineering Patterns [unibe.ch]
Reading Computer Programs: Instructor's Guide and Exercise [deimel.org]
Tips for Reading Code [c2.com]

Re:Resources For the Code Janitor (1)

jdowland (764773) | about 9 years ago | (#13763801)

I second the Spinellis book. Great stuff.

Holy sh8t (-1, Redundant)

Anonymous Coward | about 9 years ago | (#13762542)

Stuck with a big ball of mud? Yikes! I hates those. Outsource understanding it to India or Russia.

A Couple of suggestions (2, Informative)

jschmerge (228731) | about 9 years ago | (#13762925)

I've been through this sort of exercise several times in my career so far. 500k LOC is too much for a small team to get a handle on in any reasonable amount of time, so don't feel too helpless... You're professor is throwing you guys to the wolves and seeing what you are able to accomplish.

As for the actual suggestions, read on:

First, you'll need a tool to generate some form of cross reference for the entire codebase... I'd recommend Doxygen (hack the config file to generate the inheritance and call graphs). This will speed up your ability read the code; being able to look up the interface to any class with a couple of clicks in a web browser will make life a lot less painful.

Next, find a text editor/IDE that's good at navigating large projects. This is a must. I personally do this with vi and ctags (although many people will tell you that there are better alternatives). Being able to look at more than one source file at a time is a good thing (tm).

These are the two primary tools that you'll need. There are some other pointers that I can give too:

  • Become intimately acquainted with the project's build system. The separation of components into separate directories/libraries/modules will give you a great deal of insight into the overall program's structure. You'll be able to accomplish a lot of this by watching a complete build of the project progress. The other place to look is in the project's Makefile(s). I'd bank on the fact that most code stuck in bottom level subdirectories is code that you'll be able to treat as black boxes.
  • As you become more familiar with the codebase, you'll find that you keep coming back to certain source files to look something up. Understand that these files are the ones that are probably the most important. It may help you to keep a web browser pointed to the crossreference material for these files, or memorize their content.
  • Don't get bogged down in understanding every bit of the source. Probably 90 percent of the code in the project is used to do things that you really don't have to ever care about. A good example of this is a project I recently inherited, comprised of about 20,000 LOC. Four thousand lines of code in this project was there just to read XML config files into very simple data structures.
  • If you are having a difficult time figuring out how a piece of the code works, you may want to try running it in a debugger and stepping through the execution. I'm not a huge fan of doing this, but I know people who swear by it.
  • Import the source for the project into some form of version control system. This will afford you the luxury of being able to modify the code without fear of breaking anything too badly.
  • If you have access to the developer's source code repository, sometimes commit histories can give you a lot of insight into why things in the code are the way they are.

Anyway, good luck!

Valve (0)

Anonymous Coward | about 9 years ago | (#13763004)

Hmmm ... from reading your post, I suddenly had to remember this email address "help_valve@valve.com". As the name implies, they offer help to folks trying to understand the unfinished engine they released a few years ago.

Just in case this "massive" game has something to do with Half Life 2.

Scripts and Configuration files (1)

hayriye (609198) | about 9 years ago | (#13763372)

Are there any other files containing scripts in some language? Maybe the original coders wrote the high level logic in some scripting language.

Are there any configuration files? If no, there may be some code that's reading supposed to be found conf files.

Just a hint (-1, Redundant)

Anonymous Coward | about 9 years ago | (#13763680)

You should start at main().

I'm assuming that you have the source as a guide (2, Informative)

ACORN_USER (902686) | about 9 years ago | (#13763739)

My assumption is that you're to reverse engineer the software, but have been given fragments of the source as a guide, yet still have to show your methodologies so as to prove that you didn't just re-write the source.

I'd start buy actually reading the source - building it if you can. Run profilers on it and try to get some kind of visual representation of the underlying code tree. If you have source, try using something like DOXYGEN [doxygen.org] to autogen some documentation (and structure) out of it. Someone mentioned Rational - you can get a trial license. Try to understand what the code does. For the most part games are straight forward, in that you have objects that have specific behaviours. You can try to establish the object hierarchies and see if you can redefine these to make more sense - or just be different.

For the fragments of source you don't have - try using tools such as truss to track flow of what is going on. GDB is your friend and you probably want to try running it through the debugger - especially if the extracts you were given were compiled without stripping the symbols. nm is also another useful one at trying to get an idea of the symbols in your binary and establishing 'from meaningful names' what on earth goes on inside.

Push your binaries through a disassembler like ldasm [feedface.com] or datarescue - win [datarescue.com] . NASM [sourceforge.net] also has a disassembler. Try and get a feel for what is going on.

Now comes the hardpart - it's not called reverse 'engineering' for nothing. You've done the reverse bit. It's now time to engineer a solution which shows that you've gone through the 'reverse' bit. It can be y our view on how the code should work. Don't be affraid to reuse resource files/bitmaps, etc. That's allowed. It's the code which counts. You'll probably find that the assignment gave you something which was sub-optimal, in either design or processing - or both. It's your turn to write it the way which it should have been written. I'll leave the 'team dynamic' to you. Don't let one person have all the fun. Probably you - it's good to give others a chance. See what people are intersted in and allocate the work load. Just be prepared to fix everyone's bugs the night before submission - it's not so bad - it's 'fun.'

or... (-1)

Anonymous Coward | about 9 years ago | (#13763741)

stalebread queries: "Me and a team of other students have been tasked with improving our poor grammar..."

Prolog (1)

ACORN_USER (902686) | about 9 years ago | (#13763952)

If that's the task, you probably want to use DCG rules in Prolog. :)

Ok - that should have been under the grammer flame (1)

ACORN_USER (902686) | about 9 years ago | (#13764001)

Hmm. I was sure I'd posted under the guy who joked that they had to improve their grammer..

Understand for C++ and Source-Navigator (1)

Phatmanotoo (719777) | about 9 years ago | (#13763780)

We are evaluating some tools along these lines. The ones we liked most are RedHat's Source-Navigator [sourceforge.net] (GPL) and Scitool's Understand for C++ [scitools.com] ($$$).

Sorce-Navigator seems to be slow compared to Understand C++, I'm sure this has to do with the way they index the DB. On the other hand, the Linux version of Understand C++ needs some polishing IMHO (too many crashes on Debian/serge).

As for report-generating tools that just index and cross-reference the whole project, Gonzui [sourceforge.net] is a pretty good one.

Re:Understand for C++ and Source-Navigator (1)

ACORN_USER (902686) | about 9 years ago | (#13764069)

Wow. Understand for C++ looks awesome. Then I looked at the pricing. Hmm. Do you really think that a team of students will be able to afford this?

wow (-1, Offtopic)

Anonymous Coward | about 9 years ago | (#13764112)

It was easy to tell you are a student. Starting off a sentence with "Me and"? What if it was just you? Would you say "Me have been tasked"? What are you, freaking Cookie Monster?

The state of education for all of our youth is simply astonishing.

Massive? (2, Interesting)

idries (174087) | about 9 years ago | (#13767063)

First of all this is not a massive code base for a commercial computer game, it's about average. Many games get into the 1-2 million lines of code. Having said that most games also have teams that are probably much larger than your group of students.

I'm not exactly sure what you're trying to do here. As many ppl have said reverse engineering something that you already have the source for is not really reverse engineering at all. However if I make the (somewhat suspect) assumption that your objective is to examine the code and extract some kind of high-level understanding of the entire engine which you can then demonstrate in some way, I would advise you to think again. Most games (again, I am assuming that you have a commercially developed code base of some kind) are a giant mess with no overall design or direction in the code.

Generally you'll find that a few sub-systems have been implemented with some kind of clean design (although not necessarily in a coordinated manner) and then the rest of the game is just a mass of glue code that holds these pieces together. During the original implementation no-one will have had the kind of general overview that you're looking for, each member of the team will know their specific area or areas, and how that part interfaces to the next, but no-one will know how all of them work together. Trying to summarize how all the systems work together will either give you something very high-level (and essentially meaningless) or something so complex that it's almost as hard to understand as the source (and not suitable to give to your professor as 'proof of understanding').

My advice would be to choose one or more parts of the game and try to gain an understanding (in whatever manner you choose) of those areas. One of the best ways to choose these areas is to look at the USP (unique selling points) of the game itself. Some areas of the game will have been very important to the final product, while others will have been done just because they had to. For example, if the game is an RTS with a focus on the tactical aspect of the single player experience, then the scripting and ai systems will have been very important (and made as good as possible) while the sound engine will not have been very important (and made just good enough). The parts of code which are important to the actual gameplay will have had much more time and attention spent on them and will probably be far more interesting. Having said that the most important parts of the game will also have had more ppl working on them and they may well contain much less readable code.

Perhaps you should give us some more info on what exactly you want to do, so that we can give you more relevant advice?

One word (1)

JamesP (688957) | about 9 years ago | (#13767240)

Doxygen... Best thing I've to recommend to you

(and yes, I am in a project involving understanding several lines of code)

Doxygen beats the crap out of SourceNav and other stuff. It creates lots of graphics of major relationships thorout your code.

Forward Engineer instead... (1)

GiorgioG (225675) | about 9 years ago | (#13768007)

How about forward-engineering it? Try and add a feature to the game that doesn't already exist (and doesn't make use of any built-in scripting system/tools, etc.) That's the only way you'll really learn what's going on in there. Reading code in and of itself doesn't mean you underestand what it does. But if you have to change the code, there's no doubt that you'll figure out how (at least a part of) it works!

Reversing Std C (3, Informative)

TheDracle (836105) | about 9 years ago | (#13769842)

It's pretty simple, just time consuming. I've seen a few reverse engineering books floating around: "Reversing," "Exploiting Software." Since it's mostly stdC, it shouldn't be nearely as difficult to reverse engineer. Other languages can make things more complicated (Multiple calling mechanisms, more dynamic memory allocation, etc..).

Tools:

OllyDbg - Awesome usermode debugger, probably better suited than softice for this particular task. You can add assembly wherever you want, and it will create patches for the exe that can be automagically applied. It's also FREE.

Numega Softice - Just in case you need to bring in the big guns.

IDA Pro - Best reverse engineering tool available. Lots of extension scripts to do anything imaginable..

TSearch - Can search memory at runtime, set breakpoints, disassemble code on the stack, and dynamically insert new assembly at runtime. Nice for understanding the flow of the software as it runs, and identifying interesting variables and structures.

REC Decompiler - Awesome decompiler that produces a high level representation of the code. Not a replacement for your brain, but can save a lot of time tracing over assembly code to understand the purpose of a function.

WinPCap & Ethereal - For reversing game protocols, and understanding client-server interaction. Sometimes it's nicer to just figure out where the host name/IP string is located in the binary and replace it with 127.0.0.1, then write a little proxy program to sit in between the client and the server.

HVIEW: Hex editor with the ability to disassemble.

(Use Cygwin or mingw for the following) strace: Traces signals, system calls, and spits them out to the screen.

nm: Dump binary symbol table and names.

I've definitely forgotten a plethora of other useful tools (especially the binutils ones), but the above consist of some of my favorites.

For a game, you'll probably be dealing mostly with OllyDbg, HVIEW, REC, and winpcap/proxy. I'd recommend using nm to get a list of all of the symbols in the program, and then maybe split up and assign each student some number of symbols to understand and rewrite in C. Then they can use HVIEW or OllyDbg to navigate to those symbols, and try translating them. If they have a difficult time, have them use REC to get a higher level representation they can cheat off of.

-Jason Thomas.

Clarifications (1)

stalebread (920322) | about 9 years ago | (#13771118)

I submitted this question. Sorry about any confusion. I have all the code I care about, I can compile and run it, and I'm (unfortunately ) forced to use Visual Studio .Net.

Re:Clarifications (1)

idries (174087) | about 9 years ago | (#13771593)

So what are you trying to do? Do you want to modify this source todo something new? Do you want to document it, or represent it in some other way? You're still not telling us what you mean by 'reverse engineer'. What is your goal here?

Re:Clarifications (1)

stalebread (920322) | about 9 years ago | (#13773099)

So what are you trying to do? Do you want to modify this source todo something new? Do you want to document it, or represent it in some other way? You're still not telling us what you mean by 'reverse engineer'. What is your goal here?

We want to do both. Right now we're at the point where we're trying to document and understand the code. Eventually our goal will be to modify the source to add some features.

Re:Clarifications (1)

idries (174087) | about 9 years ago | (#13779871)

I see. Well my advice from the previous post still stands. Don't try to address the codebase as a whole, but confine yourselves to the functional (rather than architectural) areas that you're interested in.

Presumably you have some specific types of modification in mind so start by creating a list of each functional area that you think each modification will impact. You probably don't even need to look at the code for this part. For example, if you want to add a 'boost' feature to a racing game, then you'll probably have to change the following areas:

  + physics/simulation - to actually do the boosting
  + control - to allow the player to trigger the boost
  + ai - to decide when to boost

There may well be other areas (i.e. ui to indicate boost is available, fx to indicate to the player that a car is boosting, but let's ignore that for now).

Once you have done this, then you'll have a list of modifications and corresponding functional areas:

  + boost - simulation, control, ai
  + position indicator - ui, simulation
  + car damage - simulation, (rendering?)
  + manual transmission - simulation, control, ai

As you can see from my (rather contrived) example the areas you really need to look at are simulation, control and ai. You do not need to even look at the asset management system, memory manager (which is probably not a garbage collector [slashdot.org] ;), front end, sound system, networking, user settings, ui or rendering until you actually start coding.

That's not to say that you won't need to touch on these systems and understand them in some way (can't do much without allocating *some* memory) but you won't be *changing* them, and you don't need to understand how they work, just how to use them. In fact, the calls that you need to make can probably be figured out by looking at the code that you've already examined. For example, if you want to add a position indicator to the ui which is driven by the simulation, you'll probably be able to find another indicator (i.e. race progress) which is already driven by the simulation. You can just mimic it without ever looking at the ui code. Likewise, once you understand how the simulation uses the asset manager, memory manager, sound system and networking you can modify the simulation using those systems in the same manner without ever examining them in great detail.

After a while you may find that you want to change the way that some of these other systems work. Although that may be necessary (depending on how good/bad the code is and how far your final requirements differ from the original functioning of that system) don't worry about this until it actually stops you from making a modification that you want to. There's probably alot of bad code in the engine, but just focus on what you need to in order to make your changes.

Hope that helps.

Re:Clarifications (1)

SerpentMage (13390) | about 9 years ago | (#13785456)

If you are still wondering please send me an email. I have reworked very large codebases in the same order as you are talking about. I can't name the packages in public.

an autodiagrammer? (1)

mattr (78516) | about 9 years ago | (#13771387)

I was thinking about that tool for Perl that uses Devel::DProf and a diagramming program (GraphViz I think) to draw flow charts of the program as it runs, showing which routines are called, that would be great.

See graphviz.org's resources section for some links to profilers

I wonder if something like that is avaiable for C++. Found ROCASE [ubbcluj.ro] which looks like a CASE tool that can "reverse-engineer" (analyze) C++ files and automatically format diagrams for you to help understand the code structure. Post back here if you find it to work well!

It's easy (1)

CptPicard (680154) | about 9 years ago | (#13772149)

Start with "main" and go from there :-)

Use Data as the X-Ray (1)

oldCoder (172195) | about 9 years ago | (#13772342)

Usually you can find a tool that will dump out all the system calls with arguments. See what the program is doing, and maybe write some scripts to analyze the log files. It gives you a genuinely useful perspective that you'll never get from reading the code.

For programs that primarily do file processing, you can get a similar understanding by analyzing the input files and the output files.

For database programs you often can get the DBMS to log the transactions or the SQL.

For embedded systems you would need a hardware device called a logic analyzer to get the data to make the analysis.

For communications software you'd need to get something to dump the packets to log files. Lots of these for the popular protocols.

From the above analysis you might be able to write a spec (that is, a doc that would tell somebody what the program is supposed to do, so they could code it from scratch just by reading the spec).

Later on, modify the code and see what happens.

Understand for C++ (1)

skeptictank (841287) | about 9 years ago | (#13780420)

You can get a free evaluation for 30 days. Google it to find their website. It's a generalized metrics collecting and reverse engineering tool.

Just in case your still looking at this topic.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?