Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Visualizing Complex Data Sets?

kdawson posted more than 5 years ago | from the see-it-to-believe-it dept.

Graphics 180

markmcb writes "A year ago my company began using SAP as its ERP system, and there is still a great deal of focus on cleaning up the 'master data' that ultimately drives everything the system does. The issue we face is that the master data set is gigantic and not easy to wrap one's mind around. As powerful as SAP is, I find it does little to aid with useful visualization of data. I recently employed a custom solution using Ruby and Graphviz to help build graphs of master data flow from manual extracts, but I'm wondering what other people are doing to get similar results. Have you found good out-of-the-box solutions in things like data warehouses, or is this just one of those situations where customization has to fill a gap?"

cancel ×

180 comments

Reminescent (0)

Anonymous Coward | more than 5 years ago | (#26524299)

What ever happened to the simple good old Novell days?

Re:Reminescent (2, Insightful)

gravos (912628) | more than 5 years ago | (#26524403)

If you have access to a plotter, Graphviz gives you a great deal of flexibility with regards to how big these images can physically be. Maybe you could consider posing them up on the wall and having a roundtable session at your office.

perhaps worth looking at? (3, Informative)

Anonymous Coward | more than 5 years ago | (#26524337)

Portraits of complex networks [arxiv.org]

Abstract: We propose a method for characterizing large complex networks by introducing a new matrix structure, unique for a given network, which encodes structural information; provides useful visualization, even for very large networks; and allows for rigorous statistical comparison between networks. Dynamic processes such as percolation can be visualized using animations. Applications to graph theory are discussed, as are generalizations to weighted networks, real-world network similarity testing, and applicability to the graph isomorphism problem.

Re:perhaps worth looking at? (5, Funny)

Anonymous Coward | more than 5 years ago | (#26524819)

If you think that's worth looking at... how about this? http://www.phdcomics.com/comics/archive.php?comicid=1121 [phdcomics.com]

Re:perhaps worth looking at? (1)

jonaskoelker (922170) | more than 5 years ago | (#26525869)

Abstract: We propose a method for characterizing large complex networks by introducing a new matrix structure, unique for a given network, which encodes structural information

I should probably go follow your link, but on the face of it, this sounds like a 60's paper about the adjacency matrix :)

get rich slow (2, Insightful)

it_begins (1227690) | more than 5 years ago | (#26524363)

SAP's "German engineering" stems from the philosophy of the more efficient the better.

Unfortunately, this means that it is much too utilitarian (and ultimately, why products like Peoplesoft are making headway).

If you find that you have developed a good product to help with operating SAP, you can sell it as a third party add on. Many of the popular add on's were created out of a sense of frustration with the "mother product".

Re:get rich slow (-1, Troll)

Anonymous Coward | more than 5 years ago | (#26524399)

SAP's German "engineering"

There, fixed that for ya.

Re:get rich slow (0)

MrNaz (730548) | more than 5 years ago | (#26524725)

What's hilarious is that parent is likely a Brit or American.

For laughs, lets compare German cars with American and British ones.

Re:get rich slow (1, Informative)

Anonymous Coward | more than 5 years ago | (#26524787)

American. But I did not intend to impugn German anything.

SAP's "engineering"

There, fixed that for me.

Re:get rich slow (0)

Anonymous Coward | more than 5 years ago | (#26524955)

"You know the Germans make good stuff" -- Vince from ShamWow!

Re:get rich slow (0)

Anonymous Coward | more than 5 years ago | (#26524963)

Have you ever tried to program against/in the SAP stack? You'll need a VC firm and a few million to get you through the first month's frustration.

Know ABAP? Better start.

But seriously, SAP is not integration friendly regardless of what is said.

PtolemyPlot (2, Informative)

technofix (588726) | more than 5 years ago | (#26524387)

PtolemyPlot and Java.

Great source of data visualization inspiration (3, Interesting)

Anonymous Coward | more than 5 years ago | (#26524393)

http://visualcomplexity.com

Have fun!

Visualizing via OpenGL (1)

Cutting_Crew (708624) | more than 5 years ago | (#26524407)

i have experience rendering massive datasets via OpenGL and this where a lot of visualization still happens in government and big business.

These can be incorporated into other general shelf visualization tools or just be used standalone on any major platform as long as the machine has the horsepower, including, not suprisingly, a powerful GPU.

the first computer i started doing visualzation on was a SGI. imagine that.

Re:Visualizing via OpenGL (0, Offtopic)

Anonymous Coward | more than 5 years ago | (#26524521)

cool story brah

got any useful information?

Re:Visualizing via OpenGL (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#26524603)

mod parent up!

Re:Visualizing via OpenGL (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#26526065)

mod parent the opposite of down!

Re:Visualizing via OpenGL (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#26525373)

Don't tase me, bro!

I have a question for you (4, Insightful)

zappepcs (820751) | more than 5 years ago | (#26524421)

How are you supposed to handle the data if you do not understand it? Sure, there can be too much to see/think about at one time, but if you don't understand it, how can you visualize it usefully?

I am asking because I have a problem: Where I work, I understand the data and I make efforts to visualize it for others. The trouble starts when they don't understand the data and it's sources and limitations, so what they see in my visualization is all they know of it, and they make assumptions about it. I've even had people worry that the network is down because there were holes in the collected data which then showed up in the visualizations.

If anyone has some good URLs for such thinking, I'd be grateful.

I simply do not understand how you can visualize data for people if you yourself do not understand it.

Re:I have a question for you (2, Interesting)

Cutting_Crew (708624) | more than 5 years ago | (#26524491)

here is a *sample* of some of my early work that i did long ago when i was just starting out. i dont have any mature 100% working screenshots but you get the idea.

the lat, lon and depth values are courtest of NOAA, freely available. this is a screenshot of a real time frame in openGL of the world with each vertex pair colored by depth. you can rotate it, probe it and a few other things.

link [rimfrost.org]

Re:I have a question for you (1)

coolsnowmen (695297) | more than 5 years ago | (#26524539)

If give you a scatterplot of X vs Y you would instantly be able to see what kind of relationship exists (if any). You might notice an exponential relation, linear relation, or that they are the exact same. For more statistical things, you might notice two distinct groupings, correlations, or completely uncorrelated date.

So your boss doesn't know what the relationship between X and Y are, but he wants to save $ and increase profits. You tell him that based on this relationship that you proved exists (whatever it is) with your graph and hard work, and we know that we want Y low, we can control X, to influence Y, and make a better widget to edge out the competition and make more $.

Re:I have a question for you (3, Interesting)

TapeCutter (624760) | more than 5 years ago | (#26526301)

Sorry but I think the GP is spot on.

What you are doing in your post is investigating the data until you UNDERSTAND what is usefull and then presenting (visualising) it for you're boss, who probably adds another layer of "visualization" for his boss, etc. (ie: You are acting as human visualisation tool that the boss can use to visualise the output of silicon visualisation tools)

To scale up you're simple X/Y plot of two variables to corporate size you propose using a visualization tool that UNDERSTANDS database structures and UNDERSTANDS the fact that to plot strings against integers you need a default transform, etc, etc. You are handed a bunch of DB's with hundereds of tables, thousands of columns and countless transaction transforms ferrying data from one DB to the other.

So you start with all possible pairs to see if there is a nice easy curve that can relate them. You get 10,000 statistically significant relationships - the problem posed in TFS is how do you now visualize all those graphs to find the relevant relationships without UNDERSTANDING the data.

As to TFS, visualization relies on data minning which will never be "solved" because given enough data you can always add one more level of UNDERSTANDING (see: Godel [miskatonic.org] ). This is not to say that trying to solve it is pointless. On the contrary, google news is excellent and accessible example of how far things have progressed in the last couple of decades.

Simply presenting multiple known facts/relationships in an easily accessible format takes a deep UNDERSTANDING of the data. Even if you do UNDERSTAND the facts/relationships, creating the format is an art that has few masters [wikipedia.org] .

R language (4, Informative)

QuietLagoon (813062) | more than 5 years ago | (#26524423)

There was a thread about the R language [r-project.org] a couple of weeks ago. Look it up and read it....

Re:R language (5, Informative)

koutbo6 (1134545) | more than 5 years ago | (#26524503)

I second that. If you are visualizing graphs be sure to get the igraph package which can be used with R, Python, C, or Ruby.
http://cneurocvs.rmki.kfki.hu/igraph/ [rmki.kfki.hu]
Processing is another package that is geared towards data visualization which java developers might find easier use
http://www.processing.org/ [processing.org]

Re:R language (1)

JumpDrive (1437895) | more than 5 years ago | (#26525143)

I third this. It is a great way to look at data and determine meaningfulness of said data.
To many times I've seen people jump to conclusions about a small part of a data set, without looking at large sets of data.
Some people don't understand it until you put their fingers on the chart and make them follow the line.

www.opendx.org (0)

Anonymous Coward | more than 5 years ago | (#26524499)

An open source version of IBM's Data Explorer. The interface is a little clunky (openmotif-based) but it is _powerful_.

I wish someone would take it and throw a modern front end on it.

Re:www.opendx.org (0)

Anonymous Coward | more than 5 years ago | (#26524543)

Oh - the license is IBM Public License: http://www.research.ibm.com/dx/srcDownload/license.html

Re:www.opendx.org (1)

iamstuffed (764517) | more than 5 years ago | (#26524987)

OpenDX is so crappy. It has a 2GB memory limit and can do nothing better than newer alternatives, such as VTK, and VisIt. I would avoid OpenDX if possible. No point in locking yourself into an ancient dead end platform.

Colourmapping (1)

kramulous (977841) | more than 5 years ago | (#26526045)

The one thing that OpenDX still has, that for some reason the others have not, is an excellent interactive colourmap editor. I use OpenDX to get the colours correct and then export to use in other toolkits.

Try the InfoVis community (5, Informative)

Mithrandir (3459) | more than 5 years ago | (#26524509)

The infovis community has been dealing with these subjects for years. There's many different visualisation techniques around. Here's a list of the past conferences and the papers:

http://conferences.computer.org/Infovis/ [computer.org]

Plenty of good products out there, but the one that I like most is from Tableau Software (http://www.tableausoftware.com/).

more details on Tableau (2, Informative)

morton2002 (200597) | more than 5 years ago | (#26525259)

Tableau Desktop is an interactive analysis and visualization product that connects to relational and cube data sources to help people see and understand their data. There was a webinar [tableausoftware.com] (slides - PDF [tableausoftware.com] ) back in November 2008 covering Blastrac Global's success in using Tableau with their ERP system.

Disclaimer: I work at Tableau Software, so I encourage you to see for yourself with a free trial: http://www.tableausoftware.com/products/tour [tableausoftware.com]

Xgobi (1)

thegrassyknowl (762218) | more than 5 years ago | (#26524577)

I used Xgobi (http://www.research.att.com/areas/stat/xgobi/) for a lot of things back in the day. It gave me the ability to 'see' and understand high dimensional data sets quite easily when I was looking at computer vision research.

Spotfire (2, Informative)

DebateG (1001165) | more than 5 years ago | (#26524583)

I work in biology, and we use Spotfire DecisionSite [tibco.com] to visualize and analyze a lot of our massive genetic data. It's a very powerful program that I barely know how to use. It seems to have packages able to analyze pretty much anything you want, and you can even write your own scripts to help things along.

Re:Spotfire (0)

Anonymous Coward | more than 5 years ago | (#26525047)

Spotfire is the best, period.

Am I missing something or... (4, Informative)

Shados (741919) | more than 5 years ago | (#26524607)

Wouldn't any everyday cube browser along with any tool to detect base dimentions in a datawarehouse schema do the trick? You may have to add a few custom dimentions on your own depending on how shitty the master data is (I don't think that can be helped, no matter the solution, if a dimention is "these two fields multiplied together times a magic number appended to the value of another table", you need to know, no tool will guess), but aside that?

Thats usually what I do anyway. I dump my data in a datawarehouse, use whatever built in wizard can auto-generate dimensions, then play with them in a cube browser. Works for even pretty archaic home-made multi-thousand-tables-without-normalization ERP systems I had to work with in the past anyhow.

Re:Am I missing something or... (0)

Anonymous Coward | more than 5 years ago | (#26525995)

detect base dimentions in a datawarehouse schema do the trick?

I accidentally read that as "datawhorehouse".

IMG is your friend (1)

Guil Rarey (306566) | more than 5 years ago | (#26524615)

Not sure what you can use to create a visualization, but the information you need is in the IMG.

I don't have a need to develop a visualization of the whole of our SAP implementation, just my little FI-CO corner of it, and that's a big enough pain

Business Intelligence (3, Informative)

Anonymous Coward | more than 5 years ago | (#26524649)

Your ERP isn't supposed to directly analyze the data. You're supposed to use a Business Intelligence software package for that. This being SAP, I believe they'll try to sell you Hyperion.

Re:Business Intelligence (1)

killproc (518431) | more than 5 years ago | (#26524871)


They'll push you towards SAP BW. A beast at best. Very strong tool, but like anything SAP, difficult to master.

Re:Business Intelligence (1)

afidel (530433) | more than 5 years ago | (#26525085)

Not likely after Oracle bought Hyperion. We're a JDE shop and went with OBIEE since our major modules aren't yet supported in Hyperion and with the acquisition noone knows what the timeline will be for adding them so we are rolling our own with OBIEE. So far the framework seems to be good, just not sure how it will do when it hits production loads since we are still in early development.

Re:Business Intelligence (1)

elrusoloco (737386) | more than 5 years ago | (#26525381)

Disclosure - I work for Oracle, though not for the OBIEE team. The people I know that work with OBIEE repeatedly claim the best scaling BI architecture in the industry. They even went so far as to describe in fairly deep detail the specific technical reasons for it, which I believe added up to being able to horizontally scale the software components to meet your specific performance needs.

lots of great ideas being discussed here (0)

Anonymous Coward | more than 5 years ago | (#26524675)

I'm sure they all can be entirely useful. But it's good to keep in mind the unifying principle:

Real estate prices in America almost always increase over time; and if they fall, the decrease will be short-lived and most certainly will only affect a few scattered markets.

That's a trick I learned from studying the methods of top experts in the banking, insurance, and hedge fund industries.

Re:lots of great ideas being discussed here (1)

ckaminski (82854) | more than 5 years ago | (#26525189)

And what makes you think this recession is any different? Real estate prices are still higher than the recession of the early 1990's. I guess it depends on your definition of "short". :-)

Honest attempt at an answer... (1)

clinko (232501) | more than 5 years ago | (#26524719)

I'll try to answer your question without the key info needed: "What is the data your modeling?"

You're on the right track...

Either way, from experience i'd say you're answer is "this just one of those situations where customization has to fill a gap"

Be warned though, out of the box solutions do exactly what's on the box. Anything else is going to be modeled by you, or customized (usually at a high rate), by the vendor.

That being said, I've used oracles' solution http://www.oracle.com/solutions/business_intelligence/index.html [oracle.com] for financial data (10TB data), and used my own solution to my music recommendation database http://www.egusta.com/ [egusta.com] (43gb Data).

At the end of the day, I like using my own custom solution. And by the fact that you're familiar with Ruby (I checked out your site), I'd say you're on the right track already.

Re:Honest attempt at an answer... (1)

jedwidz (1399015) | more than 5 years ago | (#26525127)

Hey, you managed to use both "your" and "you're" both correctly and incorrectly in one post!

I think you must be taking the idea of stochastic grammar somewhere it doesn't belong...

Just take the first 65k rows (2, Funny)

Anonymous Coward | more than 5 years ago | (#26524751)

Just take the first 65k rows and dump them into excel and create a pivot table.

Re:Just take the first 65k rows (0)

Anonymous Coward | more than 5 years ago | (#26524943)

Just take the first 65k rows and dump them into excel and create a pivot table.

Mod parent sad but true.

Re:Just take the first 65k rows (0)

Anonymous Coward | more than 5 years ago | (#26525109)

Mod parent sad but true.

They already have that. It's called "Insightful"

Re:Just take the first 65k rows (1)

afidel (530433) | more than 5 years ago | (#26525091)

Upgrade to Excel 2007 and that's ~2M rows!

Re:Just take the first 65k rows (1, Funny)

Anonymous Coward | more than 5 years ago | (#26525673)

65k rows should be enough for everybody.

Re:Just take the first 65k rows (1)

jonaskoelker (922170) | more than 5 years ago | (#26525905)

[in Excel 2007 that's] ~2M rows!

I can't help but notice that 2M ~= 2^21. So the index fits perfectly into one word of three 7-bit bytes.

Are Microsoft programmers optimizing for weird 60's architectures? ;)

Re:Just take the first 65k rows (1)

jimdataguy (1457019) | more than 5 years ago | (#26526159)

Ick. I was an Excel junkie doing stuff like that until I started using Tableau.

Re:Just take the first 65k rows (1)

jimdataguy (1457019) | more than 5 years ago | (#26526219)

Ick. I used to do that kind of stuff - I was a major Excel junkie. But then I got Tableau. Way better - I mean leaps and bounds better. And it's actually fun. http://www.tableausoftware.com/ [tableausoftware.com]

Centruflow (1)

JoGiles (701171) | more than 5 years ago | (#26524777)

Can I suggest you look at Centruflow [centruflow.com] , which is an application designed to analyse dynamic data in a nice, user friendly way.

Project is on RubyForge... (1)

tcopeland (32225) | more than 5 years ago | (#26524783)

...which I just took offline for a quick database upgrade. Er, sorry, will be back online soon!

Re:Project is on RubyForge... (1)

tcopeland (32225) | more than 5 years ago | (#26525101)

...and, back online; here's the Ruby/ASP project [rubyforge.org] which also hosts ruby-graphviz.

Perhaps use the Cloud... (0)

Anonymous Coward | more than 5 years ago | (#26524807)

I've worked with a couple of architects who used OneData (http://www.datafoundations.com/index.shtml) to do this sort of thing. Although I haven't used it myself, the idea is simple & cool. It's essentially a Software As A Service implementation of olap reporting. The demos indicate that you can theoretically get up and running rather quickly. Not sure if that's true as I haven't done it myself.

Edward Tufte, anyone? (1)

rennys66 (1441117) | more than 5 years ago | (#26524813)

I'll bet Edward Tufte would have something to say on the topic... http://www.edwardtufte.com/ [edwardtufte.com] Has anyone been to one of the "Presenting Data and Information" seminars? Any feedback?

Re:Edward Tufte, anyone? (2, Interesting)

Mithrandir (3459) | more than 5 years ago | (#26525097)

Tufte's ideas are good for presenting simple information. He gets many things right (eg if the visualisation doesn't work in black and white, adding colour won't fix it). However, many in the infovis community are outright sceptical, if not dismissive of his ideas for analysing high dimensional datasets.

Where his ideas really work is once you have "the answer" that you want to present to someone else. However, the basic exploration of the data to find interesting keypoints, is not what he specialises in. There's whole communities devoted to techniques for datamining and presentation, principly infovis/Visual Analytics.

Re:Edward Tufte, anyone? (1)

644bd346996 (1012333) | more than 5 years ago | (#26525241)

Tufte's work is a great foundation for people who don't know anything about presenting data effectively. I know several people who have been to his seminars, one of whom is a CIO who has since made Tufte's books required reading for most of the people who report to him.

Like the other poster here said, Tufte's stuff isn't that useful for really complex multidimensional data. His stuff is more oriented toward presenting relatively simple stuff in ways that are readable and aren't misleading. On those two subjects, his stuff is very good, and the basic principles all still apply to the more complex stuff the submitter's asking about. The basic principles Tufte presents just don't suffice for high dimensional visualization.

Take a look at Prefuse (1)

hkfczrqj (671146) | more than 5 years ago | (#26524827)

Take a look at Prefuse [prefuse.org] . I haven't used it myself (I considered it for a project), but it may have the right mix of a good Java API and flexibility/customizability that you're looking for. As a bonus, it's BSD licensed. YMMV. Good luck.

ActiveMetrics (0)

Anonymous Coward | more than 5 years ago | (#26524833)

http://www.pureshare.com/
Basically turns any data into a widget, without taxing your data every time you want information.

Just pipe it (2, Funny)

sleeponthemic (1253494) | more than 5 years ago | (#26524867)

Into a matrix screensaver.

General Purpose InfoViz Tool (1)

elblanco (132993) | more than 5 years ago | (#26524885)

I've had really good success using an information visualization tool called Starlight on a number of projects like this. Everything from process modeling to military intelligence. It's a commercial spin-out from the DOE PNL lab information visualization research in Washington State.

http://www.futurepointsystems.com/

OpenDX (0)

Anonymous Coward | more than 5 years ago | (#26524887)

take a look at OpenDX http://www.opendx.org/ [opendx.org]

Four dimensions (1)

Facetious (710885) | more than 5 years ago | (#26524929)

I'm sure it's my mathematics background, but when I saw the headline I assumed the author would be discussing something involving the square root of negative one, to which my response was, "Silly author, you can't visualize four dimensions. (Sober.)"

Re:Four dimensions (0)

Anonymous Coward | more than 5 years ago | (#26525009)

Visualize the real and imaginary components on separate plots. Either that, or magnitude and phase (depending on the data).

Re:Four dimensions (2, Insightful)

SillyPerson (920121) | more than 5 years ago | (#26525713)

I'm sure it's my mathematics background, but when I saw the headline I assumed the author would be discussing something involving the square root of negative one, to which my response was, "Silly author, you can't visualize four dimensions. (Sober.)"

You have a mathematical background and can not visualize four dimensions? Here is how you do it: Just visualize the problem in n dimensions and then set n=4.

TeX is Your Friend (0)

Anonymous Coward | more than 5 years ago | (#26524953)

Use TeX/LaTeX/MetaPost for the drawing and layout engine(s). Use you favorite language as a front end to turn the input data into source files for these programs. Plus, the result output is PDF, which means you can avoid the crap-fest that is Word. http://tug.org/ [tug.org]

Alternatively, you can use Asymptote, which is like a modern version of MetaPost. http://asymptote.sourceforge.net/ [sourceforge.net]

IBM data explorer (2, Interesting)

shish (588640) | more than 5 years ago | (#26524959)

I have no idea how I stumbled across this [ibm.com] , but it looks very pretty...

You need a Powerwall (1)

OSvsOS (964987) | more than 5 years ago | (#26524973)

Hello, Visualizing large data sets can be readily solved if you have following items available:

Both tools combined allow you to easily visualize large data sets and adjust the resolution of your data.

Pentaho (1)

mweather (1089505) | more than 5 years ago | (#26524975)

You could use Pentaho with one of the SAP plugins.

Traditionally... (3, Funny)

FurtiveGlancer (1274746) | more than 5 years ago | (#26525013)

Large, highly complex data sets are best described on the back of four cocktail napkins or on a fixed white board in a shared conference room. ~

Essbase (1)

hemp (36945) | more than 5 years ago | (#26525041)

Take a look at Essbase http://en.wikipedia.org/wiki/Essbase [wikipedia.org] . It is now owned by Oracle and is used by finance departments at most Fortune 100 companies.

As you have do doubt discovered, SAP is great for transaction level detail, but kinda sucks at the big picture and doing "what ifs". Essbase's tight integration with MS Excel and very cool reporting tools makes it a much easier to analyze your data than looking at spending reports from SAP.

Mainly implemented by budgeting and finance groups, Essbase is not a favorite of IT departments though, as change management is a challenge and Essbase requires quite a bit of subject knowledge and is almost impossible to outsource to another continent.

Application? (1)

SandmanWAIX (674838) | more than 5 years ago | (#26525067)

It all depends on what output you need?

DAD software [dad.net.au] has the ability to customize data types, multiple inheritance of objects, and to define different relationship types.

You can then trace along object relationships bringing back a dynamic graphic depending on what you want to show (and spit out to PDF).

Try Business Objects, Xcelsius, Polestar (0)

Anonymous Coward | more than 5 years ago | (#26525111)

SAP realized they were missing this kind of stuff so they splashed out a few billion on some BI tools.

you insen5itive clod! (-1, Flamebait)

Anonymous Coward | more than 5 years ago | (#26525113)

surprise to the Disgust, or bben which don't use the which allows

Try Spotfire or Matlab (0)

Anonymous Coward | more than 5 years ago | (#26525117)

Matlab or Spotfire can do it, assuming you have enough RAM

Processing (1)

JuzzFunky (796384) | more than 5 years ago | (#26525135)

Have a look at Processing [processing.org] , and the book Visualising Data [processing.org] by Ben Fry.

Business Intelligence (1)

Banekartr (1058752) | more than 5 years ago | (#26525137)

What kind of data is it? What are you trying to figure out by looking at the data? What type of people will be looking at it? Depending on these answers, I may recommend one of the leading BI tools on the market. IBM Cognos SAP Business Objects Microstrategy These COTS solutions are focused on visualizing masses of data, usually for some type of pattern discovery or decision making.

Network X in python (1)

bongey (974911) | more than 5 years ago | (#26525147)

I just used network x in python, ahref=http://networkx.lanl.gov/rel=url2html-2723 [slashdot.org] http://networkx.lanl.gov/> .
I used network x to visualize graphs. It is pretty simple, but it might be very similar to the ruby solution you described.
It has can interface with many other libraries as here http://networkx.lanl.gov/reference/index.html [lanl.gov]

Before analyzing that big data set.. step back and (0)

Anonymous Coward | more than 5 years ago | (#26525265)

ask - what do I need from it, and is the data accurate.

As you say, there is a lot of work on ensuring data correctness. Probably a lot of reports are printed for middle management to chase the working level people about each of their pieces to that data set and correcting it.

This will go on and on. Soon major decisions will be based on largely suspect data. Then the fun happens.

I've worked with ERPs and at Fortune 10 companies down to mom & pop places and do consulting on process improvement. Look long and hard at the data monster and then go out and pick up some good books on Lean Manufacturing and pull systems.
 
  Doesn't matter how fancy the analysis is if the data is questionable. But then your plight is to make pretty pictures. Best of luck, but be sure and read up on Lean.

I used to work for (1)

Jane Q. Public (1010737) | more than 5 years ago | (#26525313)

a company that competed with SAP. This is a problem that is industry-wide.

The solution you probably want is to make sure your SAP is set up to use a common relational database, then use another tool (Crystal Reports, Seagate, etc.) to visualize your data in ways that are not already built-in to your ERP system.

Tableau All The Way (1)

bradinthehouse (1054328) | more than 5 years ago | (#26525351)

I first saw a video of Hans Rosling [youtube.com] , who had some very unique ways of visualizing data that would otherwise be useless to a simple mind such as mine.

After I watched that, I found a piece of software called Tableau [tableausoftware.com] . I downloaded the trial version, and really liked how easy it made visualizing data for me. I can take the data I have, and Tableau will see how it's connected and allow you to generate visual reports of the data. I'm not saying that it'll work for everything, but it certainly does what I need it to extremely well, especially for my business intelligence [tableausoftware.com] initiatives.

StarLight (1)

dohmp (13306) | more than 5 years ago | (#26525359)

depending upon the problem domain, a very useful (albeit expensive) set of tools is StarLight, written for the US Government: http://starlight.pnl.gov/ [pnl.gov]

highly recommended if you've got tough visualization problems. this tends to get used for the *really* interesting visualization challenges.

SAP? ERP? (0)

Anonymous Coward | more than 5 years ago | (#26525445)

I consider myself quite a geek, but I have to admit I have no idea what SAP and ERP are. I guess I'll google it since the summary just assumes the reader knows WTF these are.

Gone to SAP? (0)

Anonymous Coward | more than 5 years ago | (#26525471)

Where I work I am sometimes writing code to import/export data from one system to another and it usually gos quite well until a company decides to switch to SAP. No 'gone to SAP' is synonymous for 'going pear shape'. It's amazing how the bad technology is the most expensive.

Exeros? (0)

Anonymous Coward | more than 5 years ago | (#26525475)

Exeros makes software for that kind of thing. I haven't tried it myself, but you might take a look and see if it does what you need.

http://www.exeros.com

PivotLink (1)

sockonafish (228678) | more than 5 years ago | (#26525555)

I am pimping my own employer's product here, and I'm admittedly biased, but we've got a phenomenal web-based/SaaS solution to this exact problem. We've done work for clients with billions and billions of rows of data (like 50+GB) and we've got a unique database that can generate reports in seconds that could take upwards of fifteen minutes on a SQL-backed solution. You can take any report, drill down arbitrarily into the data below, flip through the datasets, arbitrarily flip axes, filter out unwanted data on the fly, all that.

It's not a FOSS solution, but it is very affordable -- the last time we had a company-wide meeting marketing/sales was going off about how it costs about the same as a daily latte. Being a web solution, we're platform independent.

It is pretty much ready out of its metaphorical box. The only thing you need to setup on your end is the data export. We'll accept most any data format, usually tab-delimited CSVs. After we have your data, all you have to do is create reports, and we've got a team of people that can help you with that.

I think that's about enough self-pimping. There's more on our website, http://pivotlink.com [pivotlink.com] .

Business Intelligence Cubes (1)

Elvis77 (633162) | more than 5 years ago | (#26525653)

You haven't stated what you're needing this for, I assume it's not just for your own consumption. I work in Business Intelligence (Kimball Method Dimensional Modeling etc) and we use PeopleSoft ERP in our workplace. We have found that the best way of displaying/using this type of eclectic data is to model it in star schemas and put it in data cubes. This way the people who use the data can really use the data for analytical purposes... any other way just makes more work for us IT people, this is great for our pay packets but BAD for our work/life balance

Cytoscape (4, Informative)

adamkennedy (121032) | more than 5 years ago | (#26525705)

I had a similar situation to yours recently, except I was trying to detangle a horridly complex product substitution graph for a logistics company.

I used a bunch of Perl to crunch the raw databases into various abstract graph structures, but instead of graphviz or something created by/for developers, I found that the best software for graph visualisation is the stuff that the genetics and bio people use.

The standout for me was a program called Cytoscape [cytoscape.org] which can import enormous graph datasets and then gives you literally dozens of different automated layout algorithms to play with (most of which I'd never heard of, but it's easy to just go through them one at a time till something works)

It's got lots of plugins for talking to genetics databases and such, but if you ignore all that and use Perl/Ruby/whatever for the data production part of the problem, it's a great way to visualise it.

Looks like SAP tricked another sucker (1)

Tyrannicalposter (1347903) | more than 5 years ago | (#26525745)

Looks like SAP tricked another sucker.

A company I worked at several years ago migrated to SAP. It took several hundred million dollars, 6 years, AND the companies main branch was already using SAP. All to replace an MVS system that cost under $5M a year to run, did more, and was much faster.

SAP is NOT a business application. It's a programming environment where you get to build and customize your own. Then those German Wunderkids break your customizations every time there is an SAP change.

A "good" business software package allows you to customize "it" to match your business processes. Not the other way around as with SAP.

If all German engineering was this good, the Polish cavalry would have chased off the Nazi Blitkrieg.

mathematica? (0)

Anonymous Coward | more than 5 years ago | (#26525845)

It's quite good, but expensive. You can do very complex things with very little code.

Sql-fairy! (1)

hughbar (579555) | more than 5 years ago | (#26525883)

How about sql-fairy http://sqlfairy.sourceforge.net/ [sourceforge.net] that's open source and very complete?

A good start (1)

duckett227 (1456963) | more than 5 years ago | (#26526089)

Of all the products out there, Business Objects strikes me as the best solution to quickly engage you and provide strictly the useful information your looking for. They were also recently acquired by SAP so I would recommend you ask someone at your company what the corporate availability is to their their products. Maybe get in touch with the SAP account executive. If your company doesn't already have the availability to use the product you would probably qualify for some reduced price incentive. The Business Objects GUI is very clean/user friendly and you basically just drag your various dimensions to create a request and qualify respectively. If you can't make sense of your data doing this, then you probably have a poor and heavily denormalized database design. The other issue you address is the poor data quality present in the master data management system, be it the data warehouse etc. These issues plague many of the largest companies out there and can often take many years to sort out. Unfortunately, the best way to fix these issues is typically when someone like you decides to actually use the system for the very reason it was put there. So... ask some good questions and start scratching your head when the answers don't make sense. I can pretty much guarantee you that you will find a goldmine of things that don't make sense if you look hard enough. Its obviously a serious concern when there are data quality issues, however a company almost never knows how bad they are. To put it another way, when it comes to understanding this data your company doesn't know what they don't know yet. Its a continuous process of improvement that is largely driven by the users and their efforts to ask new and innovative questions. The questions to consider would be very industry specific, however try asking some basic queries like how many distinct US states are showing up in the system (if you get 70 thats a big problem), if the same social security number is showing up in a lot of different stores all over the place (fraud detection, criminal activity), if the cost of any products are showing up as 0 or next to nothing (your company is potentially over ordering products that that it thinks are creating huge return percentages, however are actually creating huge losses and inventory costs). To create truly informative strategic reports your going to need a good deal of historical data and a well tuned database system to provide a decent level of service. I am a little concerned as to how your accessing the data though. You state that your manually extracting the data to what I assume to be flat files. Even if your very sparsely qualifying the data and preaggregating much of it, it won't be very useful unless you mean for it to answer a few particular and very similar queries. You need to be tieing your front end tool to the data warehouse or an aggregate cube that is being built off this central system. The cube will likely perform faster unless the data warehouse is well tuned for your requests. Without going into the complicated solutions for such tuning a quick solution is to create a couple aggregate join indexes or denormalize part of the relational system as a last resort for performance. The bottom line is that the greatest value of these multi million dollar database systems is found in the information that is uncovered through asking the right questions, which lead to making better corporate decisions. This may seem obvious, however these systems are used in so many ways from operationalizing complex triggered company processes, to creating customer targeted incentives and market basket analysis/product pricing. When the data quality is bad this typically cascades through these processes and leads to poor decisions and inaccurate forecasting. One very cool solution I have seen is to use MS Excel to access the database. I'm assuming this was using ODBC drivers. You can create some very cool pivot tables and charts/graphs and voila, you save $100,000 and have yourself a pretty cool BI solution! Granted, this is not a BI product, however it can be very powerful as I have seen it work well on Teradata. So that gives you my ideal solution and a free one that can work as well. I really don't recommend looking for some middle ground unless your just poking around and not concerned with creating a long term solution. I don't believe any of the "out of the box" solutions from the major data warehouse vendors are good solutions. Look for a BI/analytics vendor that integrates well with your source/data warehouse system. I know SAS is in a major partnership with Teradata and there is a great deal of integration and R&D going on there. Good Luck :)

Data Mining? (1)

Bazman (4849) | more than 5 years ago | (#26526183)

Have you looked at data mining solutions? Someone mentioned Pentaho already, but there's also:

Rapid Miner [rapidminer.com]

Orange Data Miner [ailab.si]

all of which are packed with enterprisey features. But you may have to learn some stats. Once you get past what you can do with the pre-packaged stats methods, then head for R [r-project.org] , or write a RapidMiner plugin in python.

Check out Stephen Few and Tableau Software (1)

jimdataguy (1457019) | more than 5 years ago | (#26526189)

Check out Stephen Few's blog http://www.perceptualedge.com./ [www.perceptualedge.com] Good info there. For my money, Tableau is the way to go. It's cheap enough and easy to implement. And it reads practically everything. The visualizations are cool too.

MayaVI - Data visualizer (0)

Anonymous Coward | more than 5 years ago | (#26526417)

MayaVI - http://code.enthought.com/projects/mayavi/ (Originally http://mayavi.sourceforge.net/) , was developed by a friend of mine (actually one of the founding members of the local Linux User Group). Although initially designed for CFD (Computational Fluid Mechanics), I think, it can now handle pretty much a lot of stuff....

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...