Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Learning and Maintaining a Large Inherited Codebase?

timothy posted more than 4 years ago | from the bequeathed-and-devised dept.

Programming 532

An anonymous reader writes "A couple of times in my career, I've inherited a fairly large (30-40 thousand lines) collection of code. The original authors knew it because they wrote it; I didn't, and I don't. I spend a huge amount of time finding the right place to make a change, far more than I do changing anything. How would you learn such a big hunk of code? And how discouraged should I be that I can't seem to 'get' this code as well as the original developers?"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered


30 to 40 thousand lines isn't large by any measure (0, Informative)

Anonymous Coward | more than 4 years ago | (#31122192)

Yes it's still a bitch to maintain it. But 30k to 40k is by no means large.

Re:30 to 40 thousand lines isn't large by any meas (0)

Anonymous Coward | more than 4 years ago | (#31122250)

Just out of curiosity, what is your opinion of a "Large" codebase then?

Re:30 to 40 thousand lines isn't large by any meas (3, Interesting)

etymxris (121288) | more than 4 years ago | (#31122434)

I inherited a code base of 1.5 million lines of code at the last job I was at. Thankfully I wasn't the only one responsible for it. My advice to the original poster is to add lots of logging information. Log statements should document what the code is doing at any point in time and tell you where it is doing it. If it's java you can get the stack trace from anywhere--this is very handy for logging.

Re:30 to 40 thousand lines isn't large by any meas (4, Informative)

Garridan (597129) | more than 4 years ago | (#31122642)

Oh yeah, well I just inherited a code base of 2.8 trillion lines of assembly code, and I have to read it over a 12.734 baud VAX connection! Why, back in my day...

Anyway... I've taken on a few large-scale software projects before, and my approach has always been "read twice, hack once". I agree with the the parent, and I'll add a note: for the love of everything sacred and unholy, use revision control, and don't trust it -- that is, back up incessantly. Document the hell out of your process. Once you've really learned the system, you might want to back out some of the newbie mistakes that you're making right now.

And yes. Learning a big system takes a lot of time -- you should be reading much more than writing until you've learned it. I find it helpful to diagram dependencies / draw up finite state machines.

Re:30 to 40 thousand lines isn't large by any meas (2, Interesting)

QRDeNameland (873957) | more than 4 years ago | (#31122818)

Just out of curiosity, what is your opinion of a "Large" codebase then?

My first programming job was on an enterprise system that was over 7 million lines of just C++ code by the time I left, not including SQL stored procedures, web server code for the reporting system, and surely other code stuff that I can't recall. The entire development team for the system was something like 45 programmers. So to many of us, 30-40 klocs does not seem like a large codebase at all.

That said, I've also inherited code in the 10-50 kloc area of magnitude that was far more of a challenge/nightmare to decipher and maintain than that 7 million line system was. Code maintainability has more to do with good system architecture and coding standards than it has to do with the size of the code base; without those you system will likely collapse under its own bloat long before it can grow to millions of lines.

Large? (2, Insightful)

VirginMary (123020) | more than 4 years ago | (#31122440)

Ha, ha! Just 4 months ago I joined a project with a code base of about 500k lines. I would call that (the 500k lines one) intermediate in size. There are code bases with many millions of lines. I now feel pretty comfortable finding things in it. And I mostly use find and grep.

Re:Large? (4, Insightful)

snowgirl (978879) | more than 4 years ago | (#31122662)

Ha, ha! Just 4 months ago I joined a project with a code base of about 500k lines. I would call that (the 500k lines one) intermediate in size. There are code bases with many millions of lines. I now feel pretty comfortable finding things in it. And I mostly use find and grep.

At my job at Microsoft, we were in the support end of the core os group. That meant that core os wrote WinXP, Server 2003, Vista, etc, and then it got completely moved over to us to maintain.

Unfortunately, Windows doesn't really have find and grep, but it does have "dir /s /b [pattern]" and "findstr /sipc:"[pattern]"" Once I learned those, that's a lot of what I used to find the code that I needed to fix.

All I can say is that it takes time, and effort to become familiar... and you're just stuck with it.

Re:Large? (0, Insightful)

Anonymous Coward | more than 4 years ago | (#31122750)

Are you Microsofties really so stupid and ignorant that you're not aware of the ports of GNU utilities to Windows [sourceforge.net] or Cygwin [cygwin.com] or even your own company's Interix [wikipedia.org] and Services for UNIX [wikipedia.org] products?

Re:Large? (5, Interesting)

snowgirl (978879) | more than 4 years ago | (#31122954)

Are you Microsofties really so stupid and ignorant that you're not aware of the ports of GNU utilities to Windows [sourceforge.net] or Cygwin [cygwin.com] or even your own company's Interix [wikipedia.org] and Services for UNIX [wikipedia.org] products?

No, but to explain this, I need to give you some background.

When I joined Microsoft, I hadn't used any version of Windows at all for any reason other than playing games. After joining Microsoft, I never used Windows at home for any purpose other than logging into the VPN to work from home... and since I did not even have an x86 machine, this required using Virtual PC on my Mac OSX box.

Now, I know of all of these tools, and I even could install GVim on the machine as well. However, I was working in a Build Group. This required me to occasionally log into 100 different machines at once in order to start the build process for WinXP/Server 2003. Most of these machines require no more input than logging in and starting up a single app... thus no reason to install special software on them.

Then, something would break, and I would have to read logs, and/or code on the actual box that had the exact problem. Spending an hour installing apps to do my job would be an unacceptable use of my time, and delay the build unnecessarily.

I learned to use the tools that were available with the environment that I was in. Thus, I did almost all of my programming at Microsoft in notepad.exe, and I'm not kidding you.

Were I in a different group? The results could have been different... but having 100 different machines, most of which I didn't have admin rights to, meant that even just installing Notepad++ or something like that would have been a waste of time.

Re:Large? (1)

Tawnos (1030370) | more than 4 years ago | (#31122768)

If you're here, then you should know that \\shindex\search has a fully indexed codebase for all branches.

As for getting acquainted with the code - find places that need improvement, learn them, learn how they interact with their immediate dependencies and neighbors, continue up and out. 30-40k lines is tiny in the grand scheme of code.

Re:Large? (0)

Anonymous Coward | more than 4 years ago | (#31122892)

Why are you divulging Microsoft's proprietary secrets? What is your employee ID?

Re:Large? (1)

snowgirl (978879) | more than 4 years ago | (#31123014)

If you're here, then you should know that \\shindex\search has a fully indexed codebase for all branches.

Oh, I knew about shindex... there was also an internal webpage that one could use to search all the codebases as well.

I however didn't have to deal with all the codebases, I had to deal with one and only one at a time in general, and typically the code was checked in last night, because if it were checked in the night before, it would have broken the build that previous night.

Actually, Product Studio provided tons of information (better than any code indexing service that was available) about what just changed, and helped out enormously.

I don't argue that had I been in a different group, that I would have had different tools at my finger tips, and many of them could have worked better... but I was stuck with what I had.

Re:Large? (0)

Anonymous Coward | more than 4 years ago | (#31122860)

> Windows doesn't really have find and grep

Um... cygwin?

Re:Large? (1)

snowgirl (978879) | more than 4 years ago | (#31122976)

> Windows doesn't really have find and grep

Um... cygwin?

Ok, again, this time with special emphasis for the retarded... WINDOWS ITSELF does not have find and grep.

Any GNU OS will, GNU/Linux and GNU/Hurd included, as does any BSD OS.

Re:Large? (0)

Anonymous Coward | more than 4 years ago | (#31122950)

gnutools has a windows bin install- one of the first things i install and put in my path-- diff & grep!

Re:30 to 40 thousand lines isn't large by any meas (5, Funny)

istartedi (132515) | more than 4 years ago | (#31122508)

Very well, sir. Here's your 40,000 lines of Perl from the late 90s. It's mostly regex to parse revisions 30 through 451 of our in-house provisioning system. Oh, and BTW don't screw up like the last guy who had this job. He provisioned 32767 customers with tier-1 service, and it was the director's job to explain why we either had to let them have it for the remainder of the year, or else deal with the CR issues.

Re:30 to 40 thousand lines isn't large by any meas (1)

abigor (540274) | more than 4 years ago | (#31122666)

That is indeed a heinous scenario, but don't conflate "obfuscated" with "large".

Re:30 to 40 thousand lines isn't large by any meas (2, Insightful)

GryMor (88799) | more than 4 years ago | (#31122822)

I currently maintain several million lines of perl. It's not hard, it mostly just works, and when it doesn't, it's not that hard to figure out where it's broken IFF there is a consistent repro case for the problem.

If you have a proper development/production divide, there shouldn't be any weird production issues unless you or your predecessor missed some test cases. If you don't have test cases, that's a problem, if you don't have a properly firewalled and complete development environment, that's a problem, the code itself? Shouldn't be a problem.

Re:30 to 40 thousand lines isn't large by any meas (0)

McNihil (612243) | more than 4 years ago | (#31122682)

Couldn't agree more. Even 4-6 million lines is probably fairly common and still not a big issue. One is more inclined to enter the "cut the cruft mode" sooner rather than later when its at that point.

Re:30 to 40 thousand lines isn't large by any meas (1)

vsound1 (1739980) | more than 4 years ago | (#31122814)

I inherited 30k lines of code when I started work "wet behind the ears". It was actionscript code (so no typing), spaghetti at its best. Probably not the best code to look at as a beginner. I also had inherited another 20k of clean java code, probably that was the only thing I felt very happy about. I agree to AC. 30 to 40k is no big deal. As a fresh programmer, i had inherited 50kloc.

Re:30 to 40 thousand lines isn't large by any meas (0)

Anonymous Coward | more than 4 years ago | (#31122826)

People are more likely to be awed by your programming skills if you can help with this person's problem, instead of trying to impress people with the size of the programs you've worked on.

Re:30 to 40 thousand lines isn't large by any meas (0, Redundant)

home-electro.com (1284676) | more than 4 years ago | (#31122834)

30-40K is nothing. One person should be able to handle that easily. Although I can imagine for an inexperienced programmer it can be too much. I remember the first 'large' program I wrote in school -- it was 400 lines.

10 years ago I had to port 1.5 million lines from one UNIX to another. Well that's a large project.

You are an idiot (-1, Flamebait)

Anonymous Coward | more than 4 years ago | (#31122198)

Learn to deal with your mental limitations, retard.

Re:You are an idiot (0, Funny)

Anonymous Coward | more than 4 years ago | (#31122706)

I am article submitter O.P. and not retard I am programmer with Master DEgree in Computer Science from Indian Institude of Technology and If I am retard why does IBM give me 40.000,00 lines of code? American IBM cannott do it so they give it to me because of my education in India

IBM paies me 2 Mexican paysos for every line of code I fix that American coder screw up and I need food and room like American does. If American wants money than American should do job correct the first time and not have to send it to INdia to get all the work done correct. As AMerican teenager say DONT HATE THE PLAYER HATE THE GAME

It's all your fault (-1, Flamebait)

Anonymous Coward | more than 4 years ago | (#31122202)

See subject line.

Have a stupid question, get a stupid answer.

Time (5, Interesting)

wmbetts (1306001) | more than 4 years ago | (#31122206)

If you don't have access to the original developers and they didn't document it you're going to just have to spend a lot of time reading the code. =\

Re:Time (1)

dintech (998802) | more than 4 years ago | (#31122626)

You might want to come up with a few good reasons (other than just the ones you stated above) for doing a clean-room re-write of the damn thing. This might give you a chance to give the users something better than they already have or that interfaces better with other systems in your enterprise. It's a long shot but doing the requirements gathering and developing it yourself might be more fun than just learning it through reading. Good luck!

A good starting point (3, Interesting)

RCL (891376) | more than 4 years ago | (#31122210)

Try to single-step it in debugger from the beginning up to main loop.

Re:A good starting point (3, Insightful)

robot256 (1635039) | more than 4 years ago | (#31122352)

I didn't get this one until I switched to my alter ego, the assembler programmer.

You never will (0)

Anonymous Coward | more than 4 years ago | (#31122216)

You are not them, your brain solves problems differently. I have found that by creating subs in areas where they have not used them, you can begin to re-write the code little by little. other than that, pouring over it or using a debugger to jump the calls is your best bet for full understanding.

don't feel bad at all (5, Insightful)

iggymanz (596061) | more than 4 years ago | (#31122234)

So you have been handed the steamin' pile o' code, it is great that you are very cautious and deliberate when modifying it. Make a set of regression tests, that is, make a set of test data and procedures and expected results to ensure original functionality that is still desirable is still working and no other errors introduced. It is hard, much more tedious than just creating new code with few constraints.

Re:don't feel bad at all (0)

Anonymous Coward | more than 4 years ago | (#31122470)

I just wanted to second this. The best thing you can do is get as many tests as possible. It is the only way you can have an ounce of confidence editing a new code base. Once you have tests though, start changing it, especially if the original authors are gone. The more you change, the more you'll know about it.

By the way, a "change" is different than committing things. It is good try and rewrite a part of the code. In rewriting it you'll see different logic that may not have seemed obvious and potential problems with the current code.

In the end though it is really hard. Don't be discouraged though because it will still help you to be a better coder. The next time you write something from scratch you'll notice that you've continued to grow as a developer even though your lines of code has decreased quite a bit.

Use Doxygen (5, Insightful)

gbrandt (113294) | more than 4 years ago | (#31122238)

Doxygen is your friend. run it over the source code and keep the HTML handy for searches and cross references.

Re:Use Doxygen (0)

Anonymous Coward | more than 4 years ago | (#31122340)

Doxygen can make a class inheritance chart, which might be a usefull place to start. Also, whenever I'm looking at a piece of code for the first time, I'll clean it up, and add comments, making it my own code.

Re:Use Doxygen (2, Informative)

eggy78 (1227698) | more than 4 years ago | (#31122668)

I have found that equally useful to Doxygen's standard documentation are the caller/callee graphs (and the source browser as well!). These features are invaluable but they don't get used when you generate documentation with a more-or-less default config.

Comments! (0)

Anonymous Coward | more than 4 years ago | (#31122242)

Make it your personal mission to soak the code in comments, refactor it where appropriate, et cetera. Diagramming it can help, too. Do all the things they should have done before giving it up; this will help you find what all of the functions do, and discover the important ones.

It depends on the language (5, Funny)

$RANDOMLUSER (804576) | more than 4 years ago | (#31122256)

If it's Perl or VB, you might want to consider self-immolation as a first step.

Re:It depends on the language (1)

rocker_wannabe (673157) | more than 4 years ago | (#31122408)

Simply running out of the room screaming "No!!!!!!" should suffice. There IS life after programming, believe it or not.

Re:It depends on the language (5, Informative)

martin-boundary (547041) | more than 4 years ago | (#31122538)

No, he meant that as an actual offering to the Perl God, Quetzal$@[&shift]L. It's a bloodthirsty god, who never sends the Divine Debugger without at least two pints of the red stuff. I would have immolated a coworker, but the parent poster seems to have been alone in the room :-/

Not lots of code (5, Insightful)

www.sorehands.com (142825) | more than 4 years ago | (#31122260)

First of all, 30-40,000 lines of code is not lots of code. Try, 250,000 of code.

To start, use a good programming editor/environment (Xcode, Vslick, Visual Studio, etc.) that gives you the ability to easily go to definition or references to variables, functions, structs and such. Run some sort of profiler or flowchart type program on it to get a high level view of the code and how it fits together. If you can get the person(s) who worked on it before you to give you an idea of it fits together.

Re:Not lots of code (4, Insightful)

Coryoth (254751) | more than 4 years ago | (#31122696)

First of all, 30-40,000 lines of code is not lots of code. Try, 250,000 of code.

To start, use a good programming editor/environment (Xcode, Vslick, Visual Studio, etc.) that gives you the ability to easily go to definition or references to variables, functions, structs and such.

30-40,000 lines can be lots of code, it really depends on how maintainably it is written. I've had to pick up codebases that were somewhat smaller but were still diabolical ... good programming environments don't buy you much when the code consists of functions that are many thousands of lines long making little or no use of typedefs or structs (arrays and lots of variables should be enough right?) and convenient variable names like 'e', 'ee', and 'eee'. Even small codebases can become practically incomprehensible if written with little thought given to long term maintenance.

Re:Not lots of code (1)

leoaloha (90485) | more than 4 years ago | (#31122842)

250000 is not a lot of code. Try over a million lines of C for train control of a transit authority. Purchased (read inherited) in escrow because management and the vendor got into a disagreement. The head software guru was upper class as far as I was concerned. I was the network guy. No documentation or very light. He had to live in the code but he was that kind of guy. My hat is off to him, I don't know how he did it

Hunt down the original developer (5, Funny)

Anonymous Coward | more than 4 years ago | (#31122276)

(And then shoot him.)

Re:Hunt down the original developer (1)

istartedi (132515) | more than 4 years ago | (#31122604)

(And then shoot him.)

With Lisp?

Re:Hunt down the original developer (1)

Omnifarious (11933) | more than 4 years ago | (#31122754)

Your comment is much funnier than the grandparent, though without the grandparent it couldn't have existed. :-)

Hunt down the original developer(s) (0)

Anonymous Coward | more than 4 years ago | (#31122702)

(And then shoot them.)

Good lord, you're not going to eat'em afterward, are you?

Re:Hunt down the original developer (0)

Anonymous Coward | more than 4 years ago | (#31122946)

(And then shoot him.)

Well, he inherited the code. Which means someone died and left it to him. So, he'd be shooting a dead person.

What's the point?

Contest the will?

Not at all. (5, Insightful)

hemorex (1013427) | more than 4 years ago | (#31122286)

I find that if the other programmer wrote it in such a way where it's too complex for me to follow, I'm not the one who's a moron.

Re:Not at all. (5, Insightful)

tsm_sf (545316) | more than 4 years ago | (#31122620)

Man, always when I run out of mod points.

Nothing like being handed a steaming plate of spaghetti and hearing about how much of a "genius" its creator was.

Re:Not at all. (1, Interesting)

Anonymous Coward | more than 4 years ago | (#31122734)

being a genius mean getting the right feature on time
the customers dont care for craftsmanship, it suck, but
deal with it

Re:Not at all. (1)

Jane Q. Public (1010737) | more than 4 years ago | (#31122886)

To add to that:

What language is it in? That could make a big difference in our answers. But in general, if it is very old code it should at least contain comments. If it was written in the last few years, the code should be in discrete sections that are organized in a logical manner. If not, then they were either seriously old-school programmers, or hacks.

Visualisation (5, Informative)

gilleain (1310105) | more than 4 years ago | (#31122290)

Anything ranging from just sketching out some informal package diagrams on some paper (I quite like using an A3 sketchpad) to something more like Code City [inf.usi.ch] which can work with code in smalltalk, java, and c++. There are UML diagram makers, of course, but automated diagrams like that probably need to be edited.

In fact, it is not the finished diagram that helps so much as the drawing of it, which is why paper and pencil is so good. Or a vector graphics package.

use a debugger (0)

Anonymous Coward | more than 4 years ago | (#31122304)

The best way to figure out how the code does action X is to run it under a debugger while it does the action, inspecting how the data structures in the program change, setting breakpoints where the decisions are made to see what happens, etc. You get to see dynamically what the program is doing step by step with the computer keeping track of it for you, instead of puzzling it out from a static listing. Running the code that way is a much faster way to gain understanding than simply reading the code.

Use it (1)

mosb1000 (710161) | more than 4 years ago | (#31122314)

The only way to learn the code is to work with it. Simply reading through it won't help, you have to go try to change things and see what works and what doesn't.

The main thing that bothers me when working with other peoples code is the sheer number of variables they use. I tend not to declare a new variable unless it is absolutely necessary (and in object oriented programming variables other than pointers are almost never necessary). It seems like code written this way is easier to read and understand (and significantly smaller). This is slashdot, so there are a lot of other programmers out there. Am I off base here? What do you think about intermediate variables that are not strictly necessary?

Re:Use it (-1, Flamebait)

Anonymous Coward | more than 4 years ago | (#31122546)

What the fuck are you talking about? Pointers are variables, as is what they point to, as are classes and structs and arrays... A little knowledge can be a dangerous thing. Step away from the compiler and hit the books, if you like a lack of variables I heard LOGO is good.

Re:Use it (1)

mosb1000 (710161) | more than 4 years ago | (#31122664)

I said pointers are variables. . .

variables other than pointers are almost never necessary

That's what "other than" means.

Re:Use it (2, Interesting)

EvanED (569694) | more than 4 years ago | (#31122710)

Am I off base here? What do you think about intermediate variables that are not strictly necessary?

I can't say you're off base per se (I don't have nearly enough production dev experience to make statements like that, and even if I did, I couldn't speak for everyone), but my personal style is not quite the complete opposite of yours.

I pretty heavily use intermediate variables. Why? A couple big reasons. One, if you give the temporary variables decent names, they serve as additional documentation. Two, if you're debugging, you can look at those intermediate values in a debugger (or log them) much easier than you could if they weren't explicitly stored somewhere. In most graphical debuggers you can just hover the mouse over a variable and see its value; if you didn't have that variable, you'd have to enter the expression in the immediate window or set up a watch or something like that.

Re:Use it (1)

aoteoroa (596031) | more than 4 years ago | (#31122782)

I'm not trying to troll here but how do you write anything without variables? Or are you suggesting that some people will use too many variables like: $FirstName $MiddleName $LastName $BirthDate $Gender when they could have simplified their code with a single class called Person?

Re:Use it (3, Informative)

mosb1000 (710161) | more than 4 years ago | (#31122924)

Not without variables, but without unnecessary ones. For example, someone might write:

int a;
int b;
int c;
int d;
int e;
int f;
int g;
a = dropBox1.Value;
b = dropBox2.Value;
c = dropBox3.Value;
d = dropBox4.Value;
e = a + b;
f = c + d;
g = e * f;
result.Value = g;

While I would write:

result.Value = ( dropBox1.Value + dropBox2.Value ) * ( dropBox3.Value + dropBox4.Value );

30-40kloc is not large (1)

aachrisg (899192) | more than 4 years ago | (#31122316)

I wouldn't try too hard with a codebase as small as 30-40k lines, but for an actually large codebase, there are a bunch of different things that can help: - examine a class or function hierarchy and call graph. If you have tools to do so and the codebase is set up for it, go ahead. If not, set up the tools and codebase to be processed for this - you'll learn stuff about the code just by hooking these tools up. - pick medium-level routines in the code base that you are interested and run the applicaiton in the debugger with breakpoints set on them. Take a look at the callstacks, step through the callers, look at the arguments, etc. - you can also get a bunch of knowlege of the structure of the app by single stepping in the debugger - "step over" to see the high level control flow, and "step into" subsystems you want to explore. - documenting the existing code using a tool such as doxygen can help you learn it while at the same time providing useful documentation for other team members.

Trace sessions and time (4, Insightful)

oldhack (1037484) | more than 4 years ago | (#31122354)

I'll echo some earlier comments.

Set up an execution environment with debugger, and run several typical scenarios and trace them with debugger. Get the feel of the big-picture execution scenarios/paths.

It will take time for your brains to get comfortable with it, though. And the details, when you look into them, will throw odd stuff at you. But that's the nature of our work.

Tried and True (2, Insightful)

cosm (1072588) | more than 4 years ago | (#31122356)

For culinary folks...
The time and money you spend tracing and inserting noodles in the spaghetti will end up being larger than the time it takes to cook a new batch (no pun intended).

For auto folks...
The time and money you spend bondo-ing, welding, rewiring, duct-taping, and C'n'Cing parts for the car will end up being larger than the time it takes to design and build a new car. (Although restoring an old/vintage car for the sake of nostalgia is a much more pleasing experience than buying a new one).

Gain an understanding of the purpose of each pivotal region. Know what your desired result should be, then begin the rewriting endeavor.

Re:Tried and True (0)

Anonymous Coward | more than 4 years ago | (#31122708)

And break a bunch of stuff that you didn't realize was there along the way, causing your colleagues to have to wait for you to fix your broken "new shiny code cuz I'm not able to understand the old". Schedules slip because now part of or the whole team is blocked. And unless the module in question is very simple (which begs the question why can't you understand it in the first place) it will happen, maybe not every time, but most of the time.

You rewrite something because it really is fundamentally broken, or can't be reasonably extended in it's current form to meet new goals, not because you haven't learned how to read other peoples code or "get" their designs.

*adding this to my list of possible interview questions*

Some things I do to figure out code... (2, Interesting)

CFBMoo1 (157453) | more than 4 years ago | (#31122398)

PL/SQL or cobol or whatever they throw at me I poke, prod, and play with it in a test environment. Someone up above mentioned pencil and paper to draw out how everything relates and that is a very good practice I've found to just get to know things. It's not instant but it helps more then you initially think. Also I use Open Office Draw to map out things as well. :P

2000 lines can be enough (1)

sugarmotor (621907) | more than 4 years ago | (#31122404)

2000 lines can be enough to throw you off!

I think it is just like learning anything. Keep at it.

The most important thing is whether you have an efficient way to
look at what effect any changes have that you may make. Any effort you put into
that is probably not going to be wasted. (Might be unit tests? Sounds like they did not come with the code)


Re:2000 lines can be enough (1)

klogg_siebentag (652321) | more than 4 years ago | (#31122714)

2000 lines?! I've inherited codebases of 250k+ LOC (Visual J++), and there were numerous single methods that would dwarf that 2000 lines! I know that claim is sort of the new millenium's version of my grandfathers claim of "walking 64 furlongs to school with only 1 shoe in 16 inches of snow because he couldn't afford the 2 and thripence yearly fee for a bus pass", but being thrown into the deep-end of a 250k line project is not unusual. If you get one with documentation, its like winning a small lottery. If you get one with up-to-date and informative documentation, then, well, do they exist?

I find that stepping through the code and trying to understand it is psycholigally damaging. You end up just wanting to hurt the person who wrote it. But as everyone else has said, its probably the best idea and the only way to achieve your goal.

Anyway, that's my 1.274p worth...

Little by Little does the trick (1)

cheesybagel (670288) | more than 4 years ago | (#31122410)

Getting something that allows you to browse code more efficiently certainly helps. There are tools for doing that.

Another trick is to compile in debug mode, run the code inside a debugger, then break and watch the function call stack. This can help understand deeply nested code some more.

In the long run however nothing substitutes practice using the codebase. Even an author can get lost if he spends some years away from the code... Either you just do not remember anymore, or the code was changed so much by someone else's edits it gets hard to recognize. Or both.

If the code does not have consistent coding style standards run it thought a indenting program. You may lose the revision control history but you certainly get a more than reasonable return from it being easier to parse manually. If it does have a consistent coding style standard, even if it is something you are not used to, probably better to keep it that way.

Cleanup code by refactoring common code blocks out, or doing other code refactoring that reduces line code code and/or increases readability. Make sure the refactored version is functionally equivalent to the non-refactored version. Unless you are fixing a bug. Even if you are fixing a bug document the change just in case something actually relies on bug for bug compatibility.

If you do not have time to do cleanups just keep adding the functionality you need. Eventually you will have read enough code that you will know the codebase. If you do not need to add any more functionality, who cares anyway?

Unit tests first (1)

Fenris Ulf (208159) | more than 4 years ago | (#31122414)

Get a copy of Working Effectively With Legacy Code. It'll help you get tests around the code base that give you the confidence to be able to change it without breaking anything.

Re:Unit tests first (1)

ChrisLambrou (742881) | more than 4 years ago | (#31122840)

I concur. Working Effectively with Legacy Code, by Michael C. Feathers, should be considered the definitive guide book for working one's way through exactly the kind of scenario you've described.

You lucky bastard (0)

Anonymous Coward | more than 4 years ago | (#31122438)

30k-40k... I am working on a project with ~2 million lines of code spread across C#, SQL & HTML/Javascript/CSS. Mind you, there are 8 developers working on it, but each one of us has to pretty much know the entire thing.

Piker.... (-1, Troll)

sbeckstead (555647) | more than 4 years ago | (#31122476)

Why are you whining about that small of a code base. GCC is far larger and I've ported it to several systems some that required rewriting core elements of the code generators. It's a few million lines of code. Thirty to forty thousand lines of code is a small program and probably a single executable at that. Build yourself a picture of the major elements of the code and from that map you can get anywhere you want. It's only computer science after all it's not rocket...oh sorry some of the code I've written is in a rocket so I guess it could be rocket science too. Well you get the picture. Just like any new city you need a map. Once you have that it's cake. Oh and the cake is a lie...

Read the source! (1)

Deflatamouse! (132424) | more than 4 years ago | (#31122498)

Seriously... if there is a lack of documentation, then you just have to start reading the source code, starting at main(). Then look at each object and read its constructors.

And start documenting it. Add comments in the code, create inheritance diagrams and sequence diagrams.

It will be tedious but you will come out of it a better programmer.

*gasp* (0)

Anonymous Coward | more than 4 years ago | (#31122502)

You mean they didn't comment all their code? *gasp*

You don't. You find out what the software did (4, Funny)

Colin Smith (2679) | more than 4 years ago | (#31122520)

And then you re-implement it in the latest language.


Re:You don't. You find out what the software did (1)

mikelieman (35628) | more than 4 years ago | (#31122848)

Good luck with that. There a business rules implemented by people who aren't there anymore for people who aren't there anymore. And it's all tied to whether $variable_1 is an "A" or "B" and $variable_2 being 999.

Hope your management understands (3, Insightful)

syntap (242090) | more than 4 years ago | (#31122536)

I have inherited projects and do my best to convince management that a pause is needed to document the code. Personally I try to flowchart the functionality and cover a couple of office walls with Visio printouts. Later on I can use such work to add detail and further documentation.

I inherited some code where the developer used names of girlfriends in variable names, it was just dumb and completely unprofessional. I didn't worry so much about keeping track of those, I was more worried about a change in one spot having unintended (and perhaps unknown until too late) consequences. Rather than spend time fixing problems, I thought it best to do some up-front documenting to at least provide a path to successful maintenance.

When I left the project, the manager had a binder of documentation and almost cried.

Try to learn the structure (5, Insightful)

phantomfive (622387) | more than 4 years ago | (#31122558)

I had an English professor who always said, "Structure is the key to understanding." He was talking about literature, but I think the same is true for programs as well.

Try to understand the structure of the program. What is the basic flow? It should have an initialization routine, a main loop, and a shutdown routine. Find out roughly where they are, then focus on the main loop. Usually there will be one piece of code that is central, and it will occasionally pass control into other large pieces of the program. Sometimes there will be more than one main loop, and control switches back and forth between the various main loops. If the program is event drive, this will make a difference in the structure.

If you are just trying to make a small change, try to find the sequence of events that will lead up to where that change needs to be made. Follow the sequence of execution until you get to the line you need to change. If you are changing a single variable, sometimes it's helpful to do a search and find all the places that variable is used, to make sure your change won't have any side effects. This may seem time consuming, but it can save 10 times more in debugging.

Learn to follow code execution with your eyes, without running a debugger. One thing that separates good coders from not so good coders is the ability to follow code that isn't being executed.

Re:Try to learn the structure (2, Interesting)

Trepidity (597) | more than 4 years ago | (#31122970)

Depending on the language and domain, one way to speed up learning the structure can be to see if you can match it to some set of programming idioms, and then read up on those idioms if it's not a style of programming you're familiar with. For example, if it's C++, can you figure out by looking at the code's layout whether it was written by someone big into C++ design patterns? If so, it might be easier to reverse-engineer what it's doing if you read a C++ design-patterns book, and then match large segments of the code to "oh it's just implementing [pattern]". In some languages there are 3-4 main styles of programming, and figuring out which of them the author adhered to, and then reading something up on that idiom, can really speed things up.

Software archeology (1)

geezerwhizard (1531429) | more than 4 years ago | (#31122622)

Consider yourself a new explorer in the developing field of Software Archeology. And if you're a programmer, consider that the task is listed under the heading of "jobs for programmers". Try to make it so that the next programmer to deal with the code has a few more advantages than you.

Doxygen (0)

Anonymous Coward | more than 4 years ago | (#31122688)

Run it and step through it. Also, use doxygen (http://www.stack.nl/~dimitri/doxygen/) to highlight keywords, create hyperlinks to follow functions, and describe the data structures.

Obviously... (0)

Anonymous Coward | more than 4 years ago | (#31122736)

You're not a kernel hacker.

Done that.. (2, Funny)

spasm (79260) | more than 4 years ago | (#31122746)

As someone who recently passed off a pile of code of about that size in poorly written and poorly documented php to someone.. All I can say is I'm very very sorry, and I had *no idea* my personal side project would work better than the original commercial offering and be declared 'mission critical' three months before I left for greener pastures..

Quit (1)

codepunk (167897) | more than 4 years ago | (#31122766)

I just took the easy way out and quit. I had inherited about 30K lines of php code
that was written by my boss. Shortly after inheriting this spagetti mess I ran a grep
across the source the word "function" did not occur a single time in the entire source
tree. To top it all off I was not to rework any of it only maintain it as it was going
away. I did end up installing it on about 5 new machines so going away anytime soon
was not going to happen. On top of all that I would run into about 20 blocks of if
statements per file and in addition most database calls etc had the report no errors
@ in front of them. I found it much easier to just hand it back to the boss and quit.

Divide and Conquer (4, Informative)

Whomp-Ass (135351) | more than 4 years ago | (#31122788)

Identify each major portion of functionality. If you are working with a sales/billing system you would probably end up with : Orders, Invoices, Payments, Admin.

Go through each of those portions and identify the major portions. Orders: Order headers, Order details, business logic, ui logic, reports, datalayer, etc. Repeat until reduced into easily consumable units.

Pick and stick to an SDLC. Use whatever fits the situation and the resources. For a small project (under 100k lines of code) you should be good by yourself. Anything more and you'll have to involve at least 1 other person for testing. For medium (100k-500k lines) you'll need an additional dev...For large projects (500K-5M lines) you'll need a project manager, lead dev, 2 devs, 1 test, and a UAT team...For larger projects you'll have something unique and frightening to the specifics of the software project and corporation/agency making it...anyway, I digress...

Go through each subdivision line-by-line and re-write it yourself (even if you aren't going to put your re-written version into production); the only way you're going to truly understand what is going on is if you do it yourself. Use whatever language you are most comfortable with or is most appropriate to the task (or languages), it does not need to be the same as the original.

Verify that for a given input, your version produces an exact output.

Take a deep breath. It's not a race. It's a one-to-one functional mapping of your software (your mindspace) and the original software (the other developer(s) mindspace(s)). The code probably will not be straight forward. It has also been battle-scarred and will be warty. Changes of initial requirements through time and feature enhancements (feature creep) will have taken it's toll on what may have originally been something simple or even elegant. It's something of a niche mindset and if it is not for you, there exist many other exciting things to be programming.

Ultimately, if you do as outlined above, you'll solve many problems, be able to make whatever changes you like, and in so doing have a way to present your design as a replacement if you want...Or not, if you don't; for 30-40k lines parallel development makes sense, in a way, for one person.

That's small (2, Interesting)

ameline (771895) | more than 4 years ago | (#31122846)

Medium size is 250 to 750 thousand lines of code (one person can still understand how it all works). Big is 1 to 10 million lines of code. Really big is >10 million.

I have worked on code bases of all of those sizes, and I like the medium size the best -- it's big enough to be interesting, and small enough that you can understand it all.

One that I've worked on (over 25 million lines) is just too big for my tastes -- over 3 hours to do a clean recompile is excessive.

Don't be discouraged, just keep at it (1)

rxan (1424721) | more than 4 years ago | (#31122902)

Don't be discouraged. It's not like English where everyone writes in a familiar way. Everyone writes code a little differently and it is hard to go through it. Even with good commenting it can be difficult. Just persist and hope that you can contact one of the original authors.

Legacy code == code without unit tests (1)

PatMcGee (710105) | more than 4 years ago | (#31122922)

Get a copy of Michael Feathers' book "Working Effectively with Legacy Code".

I taught a grad / undergrad course using this book. We took a real open-source program as the class project, and the teams made significant changes to it. I thought it worked well.


Another "How do I do my job?" ask slashdot. *sigh* (0)

Anonymous Coward | more than 4 years ago | (#31122928)

And the answer is obvious. UTSL. And since it's now mine anyway, I tend to walk around and see how things work, find places where things don't work so well, and refactor them. It's quite a lot of work, often meaning touching the same code several times to come up with something more modular, more compact, more efficient. Lots of work is ``enabling'' work. Clean up something, see what that exposes or enables some larger change to be put through. After a while change requests become simpler and faster.

If you want to see how this really works, take projects with lots of fresh graduate or even freshman code in them to poke through. It's not hard, it's just lots of work. But then, what are you being paid for, anyway?

Am I weird...? (1)

chewthreetimes (1740020) | more than 4 years ago | (#31122940)

...because I actually enjoy going through someone else's code? I roll up my sleeves and, using print statements and/or a debugger, I diagram object relationships, flow, data structures...anything I can think of. It's like figuring out a puzzle. Of course, I've had the luck of never inheriting a total pile of crap. But give me anything from not-perfect-but-serviceable on up, and I not only can deal, but I'll have a good time doing so.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account