Python Gets a Big Data Boost From DARPA

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Python Gets a Big Data Boost From DARPA 180

Posted by Soulskill on Wednesday February 06, 2013 @03:20AM from the from-unclesam-import-money dept.

itwbennett writes "DARPA (the U.S. Defense Advanced Research Projects Agency) has awarded $3 million to software provider Continuum Analytics to help fund the development of Python's data processing and visualization capabilities for big data jobs. The money will go toward developing new techniques for data analysis and for visually portraying large, multi-dimensional data sets. The work aims to extend beyond the capabilities offered by the NumPy and SciPy Python libraries, which are widely used by programmers for mathematical and scientific calculations, respectively. The work is part of DARPA's XData research program, a four-year, $100 million effort to give the Defense Department and other U.S. government agencies tools to work with large amounts of sensor data and other forms of big data."

This discussion has been archived. No new comments can be posted.

Python Gets a Big Data Boost From DARPA

Load All Comments

Search 180 Comments Log In/Create an Account

Comments Filter:

Great. Just Great (Score:1, Insightful)

by Anonymous Coward writes:

The work is part of DARPA's XData research program, a four-year, $100 million effort to give the Defense Department and other U.S. government agencies tools to work with large amounts of sensor data and other forms of big data.
Yeah the govt needs better systems to manage the huge databases and dossiers they are building on everybody with their warrentless wiretaps and reading everybody's emails. Anybody who helps with this project is pretty damn naive if they don't think it will also be used for this.

For that matter anybody who trusts the govt and thinks the govt is your friend is pretty damn naive. Yeah I would like to believe that too. No I won't ignore the mountains of evidence to the contrary. I won't treat all the cou
- Re:Great. Just Great (Score:5, Insightful)
  
  by Kwyj1b0 ( 2757125 ) writes: on Wednesday February 06, 2013 @04:07AM (#42806151)
  
  Yeah the govt needs better systems to manage the huge databases and dossiers they are building on everybody with their warrentless wiretaps and reading everybody's emails. Anybody who helps with this project is pretty damn naive if they don't think it will also be used for this.
  For that matter anybody who trusts the govt and thinks the govt is your friend is pretty damn naive. Yeah I would like to believe that too. No I won't ignore the mountains of evidence to the contrary. I won't treat all the counterexamples as isolated cases. I see them for what they are: an amazingly consistent pattern. The rule, not the exception. Govt positions are really attractive to sociopath types who just love power and control and a feeling that they are important and they get that feeling by imposing their will on us.
  So what you are saying is that DARPA funds will be used in a way to further the goals of DARPA/The government? Shocking. I haven't read anything that says which agencies will/won't have access to these tools - so I'd hazard a guess that any department that wants it can have it (including the famous three letter agencies).
  FYI, Continuum Analytics is a company that is based on providing high-performance python-based computing to clients. Any packages they might release will either be open source (and can be checked), or closed source (in which case you don't have to use it). They aren't hijacking the Numpy/Scipy libraries. They are developing libraries/tools for a client (who happens to be DARPA). (Frankly, I'd hope that Continuum Analytics open sources their development because it might be useful to the larger community). You do know that DARPA funds also go to improve robotics, they supported ARPANET, and a lot of their space programs later got transferred to NASA?
  Basically, I have no idea what you are ranting about. One government organization funded a project - it happens all the time. Do you rant about NSF/NIH/NASA money as well? If so, you'd better live in a cave - a lot of government sponsored research has gone into almost every modern convenience that we take for granted.
  
  Parent Share
  twitter facebook
  - Re:Great. Just Great (Score:5, Funny)
    
    by Anonymous Coward writes: on Wednesday February 06, 2013 @06:27AM (#42806621)
    
    What is this APRANET thing? It sounds like some useless crap loaded acronym to me.
    
    Parent Share
    twitter facebook
    - - Re: (Score:2)
        
        by luis_a_espinal ( 1810296 ) writes:
        
        Poe's Law. [wikipedia.org]
        In /., you never know.
  - Re:Great. Just Great (Score:5, Informative)
    
    by sdaug ( 681230 ) writes: on Wednesday February 06, 2013 @08:54AM (#42807227)
    
    Frankly, I'd hope that Continuum Analytics open sources their development because it might be useful to the larger community
    Open sourcing is a requirement of the XDATA program.
    
    Parent Share
    twitter facebook
  - - Re: (Score:2)
      
      by blueskies ( 525815 ) writes:
      
      You have no idea what he's talking about? It was pretty clear: factions within the US government wants these tools to datamine all the ISP data they have been snarfing up so they can spy on everyone in the world. Saying that you believe otherwise is a pretty extreme view
      He has no idea why there is ranting about open source code that everyone in the world can use for any purposes. Did you rant about git being open source? I'm betting the gov't can use that to manage code related to data mining. Do you rant about postgres or any of the databases used by the US gov't? Would postgres suddenly become evil because the gov't threw some money their way?
- Re: (Score:2)
  
  by blueskies ( 525815 ) writes:
  
  Yeah the govt needs better systems to manage the huge databases and dossiers they are building on everybody with their warrentless wiretaps and reading everybody's emails. Anybody who helps with this project is pretty damn naive if they don't think it will also be used for this.
  Isn't this true of all useful open source projects?
I get the impression that (Score:5, Interesting)

by Chrisq ( 894406 ) writes: on Wednesday February 06, 2013 @03:35AM (#42806023)

I get the impression that in the Engineering and Scientific community Python is the new Fortran. I hope so, because it would be "Fortran done right".

Share
twitter facebook
- Re: (Score:2)
  
  by BlackPignouf ( 1017012 ) writes:
  
  I think you're right.
  I love Ruby, it's a very fun and effective language, I could write it in my sleep but there are so many cool projects that are written in Python.
  Those languages are *very* similar, and it's a shame that so much effort is being divided between communities.
  I might get to learn Python one day but I'm afraid I'd become a so-so programmer in both languages.
  - Re:I get the impression that (Score:5, Interesting)
    
    by jma05 ( 897351 ) writes: on Wednesday February 06, 2013 @04:16AM (#42806185)
    
    > I might get to learn Python one day but I'm afraid I'd become a so-so programmer in both languages.
    I empathize since I conversely only barely use Ruby. Once someone learns one of these languages, there is not that much that the other offers. But happily, one need not learn advanced Python to benefit from these projects.
    > it's a shame that so much effort is being divided between communities
    AFAIK, all scientific funding from US and Europe is/was always directed to Python, not Ruby. So Python is firmly established as a research language and there is not much effort being divided with Ruby (which seems to have a much more spotted and amateur movement in this direction), at least as far as scientific stuff is concerned (Ruby is more popular on web app side). For me the tension for scientific use is not between Python and Ruby, but between Python and R. Python community is replicating a lot of R functionality these days but R still has a much better lead in science libraries. Happily, it is quite easy to call R from Python.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by DragonWriter ( 970822 ) writes:
    
    I love Ruby, it's a very fun and effective language, I could write it in my sleep but there are so many cool projects that are written in Python. Those languages are *very* similar, and it's a shame that so much effort is being divided between communities.
    I think I disagree. I think that its great that both communities exist and each can develop languages in ways unconstrained by the particular historical choices that shaped the other languages (and that, in both cases, each has subcommunities around part
  - - Re:I get the impression that (Score:4, Interesting)
      
      by lattyware ( 934246 ) writes: <gareth@lattyware.co.uk> on Wednesday February 06, 2013 @08:01AM (#42806995) Homepage Journal
      
      The GIL is an overblown issue. Threading is designed to get around issues with accessing slow resources, not for serious parallel computing. Just use multiprocessing if you want to do lots of computing in parallel, problem solved.
      
      Parent Share
      twitter facebook
      - Re: (Score:3)
        
        by lattyware ( 934246 ) writes:
        
        Oh, and Python without a GIL exists, it's called Jython.
      - Re: (Score:2)
        
        by lattyware ( 934246 ) writes:
        
        PyPy isn't there yet. [stackoverflow.com]
    - GIL is a non-issue. (Score:2)
      
      by luis_a_espinal ( 1810296 ) writes:
      
      I think you're right. I love Ruby, it's a very fun and effective language, I could write it in my sleep but there are so many cool projects that are written in Python. Those languages are *very* similar, and it's a shame that so much effort is being divided between communities. I might get to learn Python one day but I'm afraid I'd become a so-so programmer in both languages.
      Both languages suffer from the global interpreter lock defect and will require a rewrite in the next 5-10 years if the languages have any chance of surviving in the servers.
      Gee, because there are no distributed enterprise solutions written on Python or Ruby <rolls eyes/>
      It will take some very serious, dedicated, low level work and I just don't see it happening.
      It already has happened. The solutions aren't just in the mainstream versions, though. Take Jython. On a typical JVM, it is the fastest Python in-the-trenches implementation available. Throw that over specialized Java-focused hardware (like the Azul Vega 3), and you are on fire.
      Furthermore, a solution to the GIL problem is not necessary in the general case. In any modern system, the cost of communicatin
      - Re: (Score:2)
        
        by luis_a_espinal ( 1810296 ) writes:
        
        Java/JEE never shines. It is total crap.
        That's an invective, not an argument. Now go back and finish your programming homework.
    - Re: (Score:2)
      
      by mattr ( 78516 ) writes:
      
      It would be easier to get some of that Darpa money sent over to Pynie [bitbucket.org] and it will all run on Parrot (multithreaded stable as of last month apparently). Then you will be able to call Perl6 and Befunge when you get tired of indenting all the time (ducks)
    - Re: (Score:2)
      
      by blueskies ( 525815 ) writes:
      
      Both languages suffer from the global interpreter lock defect and will require a rewrite in the next 5-10 years if the languages have any chance of surviving in the servers.
      You don't really understand big data, if you think it needs to run on ONE computer.
      
      This is only a problem if you think threading is the solution to scaling CPU computations across hundreds of computers. If you generalized your code to run on hundreds of computers, there is no reason you can't run a process per core for your multicore machines.
      - Re: (Score:2)
        
        by blueskies ( 525815 ) writes:
        
        I used to do "big data" and "cloud" computing when it was called clusters.
        Did you run one process with multiple threads across all of those machines, or was threading less of an issue once you started thinking about distributed computing?
        I can say this with a certainty: Anything other than a compiled language with low level facilities is a pure waste of time and money.
        Isn't that what Numba does? Compiling Python code using LLVM and being able to understand numpy data structures? I'm still not sure I understand what threading has to do with this. The OP said threading was an issue, but threading doesn't
        While with Java you at least get some safety for big projects
        Safety? Job security?
    - Neither LANGUAGE has a GIL (Score:2)
      
      by DragonWriter ( 970822 ) writes:
      
      Both languages suffer from the global interpreter lock defect and will require a rewrite in the next 5-10 years if the languages have any chance of surviving in the servers.
      No, they don't. The CPython and MRI/YARV implementations of Python and Ruby, respectively, have global interpreter locks, but those are implementation quirks not language features. On the Python side, IronPython and Jython don't have a GIL, on the Ruby side neither JRuby, MacRuby, IronRuby nor Rubinius (the latter being particularly impo
- Re:I get the impression that (Score:5, Informative)
  
  by solidraven ( 1633185 ) writes: on Wednesday February 06, 2013 @04:18AM (#42806195)
  
  You're dead wrong, nothing quite beats Fortran in speed when it comes to number crunching. If you need to go through hundreds of gigabytes of data and performance is important there's only one realistic choice: Fortran. Python isn't fit to run on a large cluster to simulate things, too much overhead. And lets not forget what sort of efficiency you can get if you use a good compiler (Intel Composer). You won't find Fortran on the way out over here, it's here to stay!
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by ctid ( 449118 ) writes:
    
    Why would Fortran be any faster than any other compiled language?
    - Re:I get the impression that (Score:5, Informative)
      
      by Anonymous Coward writes: on Wednesday February 06, 2013 @04:46AM (#42806261)
      
      Short answer, Fortran has stricter aliasing rules so the compiler has more optimization opportunities. Long answer, see Stack Overflow [stackoverflow.com].
      
      Parent Share
      twitter facebook
    - Re: (Score:2)
      
      by martin-boundary ( 547041 ) writes:
      
      Why would Fortran be any faster than any other compiled language?
      
      Because the language is simpler, so the compiler can make assumptions and generate better automatic optimizations. C/C++ are much harder to optimize (=generate optimal assembly instructions).
  - Re: (Score:3)
    
    by ssam ( 2723487 ) writes:
    
    FORTRAN does arrays in a way thats slightly easier for the compiler to optimise. But some modern techniques and data structures are much harder to do in FORTRAN compared to c++. It is also quite easy to call C, C++ or FORTRAN functions from python.
    Writing a loop in python is slow. You express that loop as a numpy array operation you get a substantial way towards c speed. if you use numexpr you will get something faster than a simple C version.
    Processing big data is as much about moving the data around, and
    - Re: (Score:2)
      
      by martin-boundary ( 547041 ) writes:
      
      Processing big data is as much about moving the data around, and minimising latency in this movement as the raw processing speed. so a language that lets you express things efficiently will win in the end.
      
      If by expressing things efficiently you mean easy for the programmer to write, then you're wrong. What matters (doubly so for big data) is full control over the machine's resources, ie how data is laid out in memory, good control over i/o etc. While this has always been the key to fast performan
  - Re:I get the impression that (Score:5, Interesting)
    
    by Kwyj1b0 ( 2757125 ) writes: on Wednesday February 06, 2013 @05:14AM (#42806355)
    
    Compared to plain old Python, yes. But Cython offers a lot of capabilities that improve speed dramatically - just using a type for your data in Cython gives programs a wonderful boost in speed.
    As someone who uses Matlab for most of my programming, I have come to detest languages that do not force specifying a variable type and/or declaring variables. Matlab offers neither, but it is a standard in some circles.
    
    Parent Share
    twitter facebook
  - Re:I get the impression that (Score:5, Insightful)
    
    by LourensV ( 856614 ) writes: on Wednesday February 06, 2013 @06:06AM (#42806555)
    
    You're probably right, but you're also missing the point. Most scientists are not programmers who specialise in numerical methods and software optimisation. Just getting something that does what they want is hard enough for them, which is why they use high-level languages like Matlab and R. If things are too slow, they learn to rewrite their computations in matrix form, so that they get deferred to the built-in linear algebra function libraries (which are written in C or Fortran), which usually gets them to within an order of magnitude of these low-level languages.
    If that still isn't good enough, they can either 1) choose a smaller data set and limit the scope of their investigations until things fit, 2) buy or rent a (virtual) machine with more CPU and more memory, or 3) hire a programmer to re-implement everything in a low-level language and so that it can run in parallel on a cluster. The third option is rarely chosen, because it's expensive, good programmers are difficult to find, and in the course of research the software will have to be updated often as the research question and hypotheses evolve (scientific programming is like rapid prototyping, not like software engineering), which makes option 3) even more expensive and time-consuming.
    So yes, operational weather forecasts and big well-funded projects that can afford to use it will continue to use Fortran and benefit from faster software. But for run-of-the-mill science, in which the data sets are currently growing rapidly, having a freely available "proper" programming language that is capable of relatively efficiently processing gigabytes of data while being easy enough to learn for an ordinary computer user is a godsend. R and Matlab and clones aren't it, but Python is pretty close, and this new library would be a welcome addition for many people.
    
    Parent Share
    twitter facebook
    - Re:I get the impression that (Score:5, Insightful)
      
      by nadaou ( 535365 ) writes: on Wednesday February 06, 2013 @07:41AM (#42806929) Homepage
      
      You're probably right, but you're also missing the point. Most scientists are not programmers who specialise in numerical methods and software optimisation.
      Which is exactly why FORTRAN is an excellent choice for them instead of something else fast (close to assembler) like C/C++, and why so many of the top fluid dynamics models continue to use it. It is simple (perhaps a function of its age) and because of that it is simple to do things like break up the calculation for MPI or tell the compiler to "vectorize this" or "automatically make it multi-threaded" in a way which is still a long from maturity for other languages.
      Can you guess which language MATLAB was originally written in? You know that funny row,column order on indexes? Any ideas on the history of that?
      R is great an all, and is brilliant in its niche, but how's that RAM limitation thing going? It's not a solution for everything.
      MATLAB is pretty good too, as is Octave and SciLab, and it has gotten a whole lot faster recently, but ever try much disk I/O or array resizing for something which couldn't be vectorized? Becomes slow as molasses.
      If that still isn't good enough, they can either 1) choose a smaller data set and limit the scope of their investigations until things fit,
      heh. I don't think you know these people.
      2) buy or rent a (virtual) machine with more CPU and more memory,
      Many problems are I/O limited and require real machines with high speed low latency network traffic. VMs just don't cut it for many parallelized tasks which need to pass messages quickly.
      Forgive me if I'm wrong, but your post sounds a bit like you think you're pretty good on the old computers, but don't know the first thing about FORTRAN and are feeling a bit defensive about that, and attacking something out of ignorance.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by LourensV ( 856614 ) writes:
        
        You're not picking on me [slashdot.org], you're arguing your point. That's what this thing here is for, so no hard feelings at all.
        I'll readily admit to not knowing Fortran (or much Python! ;-)); I'm a C++ guy myself, having got there through GW-Basic, Turbo Pascal and C. I now teach an introductory programming course using Matlab (and know of its history as an easy-to-use Fortran-alike), and I use R because it's what's commonly used in my field of computational ecology. I greatly dislike R, and I'm not too hot on Matlab
      - Re: (Score:2)
        
        by gl4ss ( 559668 ) writes:
        
        buying vm's on the same farm is just another way of getting access to real machines on the cheap for limited time. it's just another way of saying of buying time on a supercomputer now.
      - Re: (Score:2)
        
        by Spacelem ( 189863 ) writes:
        
        That funny row/column order in matrix indices (aka column major order) is because it's the correct mathematical order.
        Consider that you can only multiply two matrices if matrix A is of size [i,j], and matrix B is of size [j,k], i.e. the number of rows in A must be equal to the number of columns in B. The product C=AB is then of size [i,k]. This works for any number of matrices, so, [i,j]*[j,k]*[k,l]*[l,m] is valid, and gives [i,m].
        This naturally leads to the indexing you see in Fortran and Matlab, because i
    - Re: (Score:2)
      
      by nadaou ( 535365 ) writes:
      
      So yes, operational weather forecasts and big well-funded projects that can afford to use it will continue to use Fortran and benefit from faster software.
      I don't mean for this to be pick on LourensV day, but I have another small nit to pick. You're presuming operational weather forecasting is well funded? I don't think funding has anything to do with it. Often it's what the original author knew which chose the language.
      And have you seen what's been done to NOAA's budget over they last decade?? Well funded.
      - Re: (Score:2)
        
        by csirac ( 574795 ) writes:
        
        Perhaps he means it's well funded in the sense that they have dedicated programmers at all. "Run of the mill" science is done by investigating scientists or their jack-of-all-trades research assistants, collaborators or grads/post-docs, etc. most of which are unlikely to have substanital software engineering experience or training in their background.
        Nonetheless, they write code - very useful, productive code - but it's in whatever tool or high-level language popular among their peers/discipline (matlab, R,
    - Re: (Score:2)
      
      by tyrione ( 134248 ) writes:
      
      You lost me at ``Most scientists are not programmers...'' schtick. Whether it was my Mechanical Engineering professors fluent in ADA, C, Fortran, C++ or Pascal or my EE professors in the same, to my Mathematics Professors all in the same, not a single CS Professor could hold a candle to them, unless we started dicking around with LISP, SmallTalk or VisualBasic for shits and giggles. In fact, they became proficient in these languages because they had to write custom software to model nonlinear-dynamic system
    - Re: (Score:2)
      
      by solidraven ( 1633185 ) writes:
      
      I disagree partially with what you said based on personal experience. As an EE student I had to learn to use Fortran for my thesis. I needed to run a large EM simulation and not a single affordable commercial program was able to run on a small cluster of computers that was available. So I resorted to using Fortran and MATLAB for visualisation. So I managed to learn basic Fortran over the weekend and then use it to write a working program for a cluster, all within 1 week time. I just don't think I could have
      - Re: (Score:2)
        
        by steveha ( 103154 ) writes:
        
        I resorted to using Fortran and MATLAB for visualisation. So I managed to learn basic Fortran over the weekend and then use it to write a working program for a cluster, all within 1 week time. I just don't think I could have done that with Python.
        Python with SciPy is a lot like MATLAB. Python, the language, is far superior to MATLAB's language; I hate 1-based array indexing, for example. MATLAB's language does have a few special features for matrices that Python lacks, but that is just syntactic sugar (th
        
        Re: (Score:2)
        
        by Spacelem ( 189863 ) writes:
        
        Sadly, with the exception of a few times where I get to sum an array, pretty much my whole model needs to be run in a fast language like C or Fortran (I use C, my supervisor uses F77). It's the kind of model (a spatial stochastic disease simulation) that doesn't really lend itself to coding up in Python. No matrices, just lots of little bits of data interrogation, calculating one event at a time, and so many loops (unavoidable) that it would just crawl in Python. If you try to start in Python and replace th
        
        Re: (Score:2)
        
        by Spacelem ( 189863 ) writes:
        
        Thanks for the link. My problem is that there isn't any one bit you can point to and say "that's the slow bit" (unless it's telling the code which parameters to use, varying the parameters, and then graphing the results when done -- I'm currently doing those parts with bash and Octave, and to be fair I would probably be better off doing both of those in Python).
        The main work is the simulation, and it's where I've got a trivially small amount of data (say a 20x20 lattice of sites containing the number of sus
        
        Re: (Score:2)
        
        by butalearner ( 1235200 ) writes:
        
        Python with SciPy is a lot like MATLAB. Python, the language, is far superior to MATLAB's language; I hate 1-based array indexing, for example. MATLAB's language does have a few special features for matrices that Python lacks, but that is just syntactic sugar (there are functions to do everything in Python).
        Even as a MATLAB user I agree, as long as we're strictly talking about the language. Many of GNU Octave's woes (though they're getting JIT now!) can be blamed on the poor language design.
        But there are many things that SciPy doesn't have. Yes, MATLAB is an unnecessarily expensive choice for data analysis, but my employer only uses it for that (not "big data," mind you) because it's already our design tool, so it's an ideal rapid prototyping environment. That's where it really shines: Simulink and code gen
        
        Re: (Score:2)
        
        by solidraven ( 1633185 ) writes:
        
        I doubt SciPy would have been as easily to expand for running on a cluster. These sort of things come of as natural to Fortran. Additionally if I write my code in Fortran the compiler can optimize it a lot further than Python will allow me to. Hence the speed advantage is still in Fortran's hands which is important if you don't have access to the latest hardware and time on large clusters.
  - Re: (Score:2)
    
    by Terrasque ( 796014 ) writes:
    
    PyPy [pypy.org] might change that in the future, especially with the Transitional Memory [pypy.org] branch.
    - Re: (Score:2)
      
      by K. S. Kyosuke ( 729550 ) writes:
      
      Wouldn't it be simply better for people to learn Haskell?
  - Re: (Score:2)
    
    by Dcnjoe60 ( 682885 ) writes:
    
    You're dead wrong, nothing quite beats Fortran in speed when it comes to number crunching. If you need to go through hundreds of gigabytes of data and performance is important there's only one realistic choice: Fortran. Python isn't fit to run on a large cluster to simulate things, too much overhead. And lets not forget what sort of efficiency you can get if you use a good compiler (Intel Composer). You won't find Fortran on the way out over here, it's here to stay!
    Isn't that the point of DARPA funding this project - to make it so Python is fit to run on a large cluster to simulate things? I do agree, though, that Fortran is here to stay. However, it is so specialized in what it does and that often a solution then requires multiple languages to get the task accomplished.
    Back in the day (1970s) I had a professor who would say that you can write anything in anything. For instance you could write a business app in Fortran and you can use COBOL for plotting trajectories
    - Re: (Score:2)
      
      by solidraven ( 1633185 ) writes:
      
      Sure you can, any language that has a full feature set can do any task that the system is capable off. But efficiency is also important, and Fortran simply has so much advantages over Python. Complex data structures aren't needed for most simulations while they make optimisation so much harder. Additionally interpretation is a serious bottleneck.
  - - Re: (Score:2)
      
      by solidraven ( 1633185 ) writes:
      
      Yes, but Python is still an interpreted language and very slow compared to Fortran.
      - Re: (Score:3)
        
        by pthisis ( 27352 ) writes:
        
        The core processing in SciPy/NumPy is done in compiled C or Fortran libraries (LAPACK is used extensively where available), not in Python.
        I'm unaware of a widely-used interpreted version of Python. Whether Python is byte-compiled (CPython), JIT'd (psyco, pypy, IronPython, many Jython stacks), or compiled ahead of time to machine code (Jython+gcj, ShedSkin) depends on which Python implementation you're talking about.
        
        Re: (Score:2)
        
        by solidraven ( 1633185 ) writes:
        
        Most of those are still interpreted. It's not because it's a bytecode that it's not interpreted. In fact even your CPU interprets complex instructions and executes them using a set of simple instructions in a lot of cases (yay for RISC/CISC hybrids). A pre-compiled generalised library will never reach the performance of real Fortran code. People often forget that a lot of Fortran's performance comes from the way it deals with memory, pre-compiled libraries can't do that. Not to mention what a few decades of
        
        Re: (Score:2)
        
        by pthisis ( 27352 ) writes:
        
        Most of those are still interpreted. It's not because it's a bytecode that it's not interpreted. In fact even your CPU interprets complex instructions and executes them using a set of simple instructions in a lot of cases (yay for RISC/CISC hybrids).
        Okay, then Fortran's an interpreted language too. What was the point of your original post, then?
        Moving the goalposts like this in the middle of a conversation is pointless--sure, there's a semi-rational definition under which x86 assembler is an interpreted l
        
        Re: (Score:2)
        
        by solidraven ( 1633185 ) writes:
        
        Fortran is compiled directly to machine code in most cases. The Fortran VMs and interpreters aren't used all that often as far as I'm aware. At least I haven't seen any of them used in production environments. Lets take a good example: Intel Composer actively seems to avoid microcode based instructions and goes for fast hardware implemented ones and uses all features of the hardware. Pretty interesting to see at times how much a good compiler can make a difference. Trying to achieve the same with compiled p
      - Re: (Score:2)
        
        by SolitaryMan ( 538416 ) writes:
        
        Yes, but Python is still an interpreted language and very slow compared to Fortran.
        Nope, it's not. Never was.
        
        Re: (Score:2)
        
        by solidraven ( 1633185 ) writes:
        
        Sorry, but you're wrong: it is. Or did you forget where the PYC files come from? You might want to read the official Python documentation on this one http://docs.python.org/3/glossary.html [python.org] . Go to "interpreted" in case you're too lazy to find it yourself. And by the definition we use over at the electronics department Python is an interpreted language no matter what you wish to claim.
        
        Re: (Score:2)
        
        by pthisis ( 27352 ) writes:
        
        Sorry, but you're wrong: it is. Or did you forget where the PYC files come from? You might want to read the official Python documentation on this one http://docs.python.org/3/glossary.html [python.org] [python.org] . Go to "interpreted" in case you're too lazy to find it yourself. And by the definition we use over at the electronics department Python is an interpreted language no matter what you wish to claim.
        You're conflating implementations with languages.
        Not every Python implementation even has .pyc files. When I co
        
        Re: (Score:2)
        
        by solidraven ( 1633185 ) writes:
        
        First of all, if you keep pulling different implementations out of your hat you can try to prove anything. And C and Fortran interpreters, lets just not go there before this turns into a complete comedy. You butcher both languages simply by doing that. On the other hand, Python wasn't designed to be compiled like that and is inefficient at it. So yes, our definition still holds. Compiling such a language will never lead to an optimal implementation in size, memory usage or performance. If you take the path
  - - Re: (Score:2)
      
      by solidraven ( 1633185 ) writes:
      
      Nah, Fortran was designed with number crunching for scientific and engineering applications in mind. It won't choke, it won't stop. Fortran compilers are far smarter than Python when dealing with memory. The language was designed to allow the compiler to make assumptions to speed up computation and make for efficient memory management. But I'll agree that you shouldn't write the entire application in Fortran. For visualisation other languages are better suited (MATLAB/Octave comes to mind). You can have a p
  - - Re: (Score:2)
      
      by solidraven ( 1633185 ) writes:
      
      You're comparing two very different tasks. A game and a large simulation are very different things. Lets compare two extremes: EVE online and the FDTD algorithm (EM field solver). EVE Online has a lot of conditionals. It's very unpredictable in memory usage. But the FDTD algorithm has very different properties. It needs a lot of data, but there are no conditional expressions. Additionally what's needed from the memory is known long before it's ever needed. It just goes over the data every pass without analy
  - - good lord no. (Score:2)
      
      by luis_a_espinal ( 1810296 ) writes:
      
      Well.. there's C, of course...
      I work with C and C++ on a daily basis, and I have to ask/answer: For parallelized scientific computation or data crunching? No thank you. You don't use a phillips screw driver to unscrew a hexagonal bolt, do you? Know your tools, their strenghts and limitations.
      - Re: (Score:2)
        
        by nr ( 27070 ) writes:
        
        I disagree, Fortran and C is very good for parallel scientific computations. If you are doing computations you care about speed, and the closer you are to the iron (and the os) the faster it runs and more work you can do in a time unit. You have nice tools like OpenMP, UPC, Cilk and MPI, etc. Posix SHM is the best for local IPC/RPC.
        Python may be a nice lang to work with, but it is a slow dog.
- Re: (Score:2)
  
  by SpzToid ( 869795 ) writes:
  
  No one seems to be pining away for Fortran programmers. At least not much ayways. A quick 'n dirty search on Dice.com yields 46 results, (and no doubt a few are doubles).
- - - Re: (Score:2, Informative)
      
      by Anonymous Coward writes:
      
      I guess the problem is that people who speak about Fortran actually think about FORTRAN. The last FORTRAN standard was from 1977, and that shows. After that, there had been no new standard and little new development until the Fortran 90 standard (note the different capitalization). Fortran 90 got rid of the old punch card based restrictions by giving it completely new, much more reasonable code parsing rules (it still accepts old form code for backwards compatibility, but you cannot mix both forms in one fi
- - Re:I get the impression that (Score:4, Informative)
    
    by Chrisq ( 894406 ) writes: on Wednesday February 06, 2013 @07:09AM (#42806799)
    
    The entire point of Fortran is that it has difficult-to-deal-with aliasing rules that make the compiler more free to produce optimized code. That's why it is suitable for things that require every last bit of performance you can wring out of it. Today probably you can get the same thing with C or C++ provided you are prepared to use things like restrict, but it used to be you couldn't, so Fortran ruled certain topics.
    Python is an easy-to-use system with abysmal performance - expect 10-100x slowdown for code that runs in pure Python over a similar C version. If you can get things set up so Python is only gluing other C components together and the data never has to touch native Python data structures or loops, then performance will be fine, but now you aren't really coding in Python any more.
    The point is, the purpose of Fortran and the purpose of Python are entirely opposed. They are exactly the opposite of each other. So it boggles the mind how you can think that Python can be Fortran "done right". So much so that now I suspect I got trolled. Well done, sir.
    Yes I understand, and many people made the same point. However Fortran was for a lot of scientists and engineers the hammer to crack any nut. It was used for simple "try outs" where performance wasn't needed, simply because it was the language that Engineers knew. I think the same thing is happening with Python now, it is the first and sometimes only language that many engineers know. Now for the performance issue, it will not give the best performance but packages like SciPy and NumPy do give very good performance (arguably by using these libraries you are just using python to string c functions together, but it is properly integrated). Tests show that you are getting about a third of the performance of Fortran [nasa.gov], (with the exception of the Fortran DGEMM marix multiply which greatly outperforms Python and other Fortran variants). The typical engineering reaction to performance needs is to throw hardware at the problem, then optimise your algorithm, and only change language if absolutely necessary!
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by pjabardo ( 977600 ) writes:
    
    You are actually right but you are missing the point. Python doesn't compete with Fortran, it supplements it. With tools such as f2py, it is very easy to call fortran code from python (and there are tools that make it easy to call C/C++). This combination really potentializes both languages: bottlenecks use Fortran/C/C++ and the rest python. This combination is already popular: numpy/scipy is basically that.
    
    I don't think that being easy is python's main advantage. Using a dynamic environment were you can
Matlab (Score:1)

by Anonymous Coward writes:

Bye-bye Matlab. I liked your plotting capabilities, but that was about it.
- Re: (Score:2)
  
  by sophanes ( 837600 ) writes:
  
  matplotlib already does this in conjunction with Numpy and Scipy - its plotting quality and flexibility compares favourably to Matlab.
  Its biggest drawback is that it is pretty glacial even by Matlab's standards when rendering large datasets (think millions of points). I'm not sure whether matplotlib or the interactive backend is at fault, but anything DARPA can do to improve the situation would be welcome.
- Re: (Score:2)
  
  by 0100010001010011 ( 652467 ) writes:
  
  Still nothing for Simulink.
  - - Re: (Score:2)
      
      by 0100010001010011 ( 652467 ) writes:
      
      Sage doesn't do anything Simulink does.
Python 2 or 3? (Score:4, Interesting)

by toQDuj ( 806112 ) writes: on Wednesday February 06, 2013 @03:51AM (#42806095) Homepage Journal

So is this going to focus on Python 2 or 3? Might be a reason to upgrade..

Share
twitter facebook
- Re:Python 2 or 3? (Score:5, Informative)
  
  by SQL Error ( 16383 ) writes: on Wednesday February 06, 2013 @04:53AM (#42806279)
  
  Both. The prebuilt "Anaconda" distro defaults to Python 2.7, but it also works with 3.3 and 2.6.
  
  Parent Share
  twitter facebook
Wrong language (Score:5, Funny)

by Dishwasha ( 125561 ) writes: on Wednesday February 06, 2013 @04:03AM (#42806135)

The put the money in the wrong place. They should have put it in to R which very popularly interfaces with Python.

Share
twitter facebook
- Re:Wrong language (Score:4, Informative)
  
  by SQL Error ( 16383 ) writes: on Wednesday February 06, 2013 @05:09AM (#42806337)
  
  DARPA runs a lot of these research seed programs, putting a couple of million dollars into a bunch of different but related research projects. In this case the program budget is $100 million in total, and Continuum got $3 million for their Python work (Numba, Blaze, etc). Some of the program money may have gone to R as well; there's a couple of dozen research groups, but I don't have a full list.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by csirac ( 574795 ) writes:
  
  Wow, I hope not. As much as I am actually a Ruby fan at heart; and as much as I appreciate the R community and everything R has done, it always seems much easier to write slow and/or memory-intensive R code than in Python. Perhaps I never quite spent enough time with it but there are many corners to the language which seem unnecessarily tedious. And no references - variables are all copied around the place, which is expensive. I know, I know... worrying about pass-by-value and efficiency of assignment state
  - Re: (Score:3)
    
    by hyfe ( 641811 ) writes:
    
    http://en.wikipedia.org/wiki/R_(programming_language) [wikipedia.org]
    R is a statistical programming language. It has lots of neat methods and functions implemented, and is rules the world of statistical analysis.. which is kinda cool, since it's also open source.
    It sits pretty much halfway between Matlab and Python.. It's pretty usuable and convenient because of the huge library, but as a programming language it just, well, sucks ball. Building up the objects some of the methods there need, if you get data from an un
- Re: (Score:2)
  
  by drinkypoo ( 153816 ) writes:
  
  Others have complained about limitations of R in this very thread, so it doesn't seem as cut-and-dried as you make it out to be. Python is the popular language of this particular fifteen-minute period, so it's the logical choice to put the effort into. Scientists would like to benefit from language popularity too.
- - Re: (Score:2)
    
    by toQDuj ( 806112 ) writes:
    
    Perhaps. After all, it is in the nature of companies to ask as much money as possible for as little work as possible.
Good news for the Python community (Score:3, Funny)

by kauaidiver ( 779239 ) writes: on Wednesday February 06, 2013 @04:04AM (#42806145)

As a full time Python developer for going on 6 years this is good to hear! Now if we can get a Python-lite to replace Javascript in the browser.

Share
twitter facebook
- Re: (Score:2)
  
  by lattyware ( 934246 ) writes:
  
  Yeah, the issue is that Python is pretty hard to sandbox, being the hugely dynamic language it is. I imagine it would take a lot to get the browsers to stop working on their JavaScript implementations that they have sunk insane amounts of time and effort into, and start something brand new.
  Trust me, I'd love to see it happen, but I don't think it will.
  - depends (Score:2)
    
    by luis_a_espinal ( 1810296 ) writes:
    
    Yeah, the issue is that Python is pretty hard to sandbox, being the hugely dynamic language it is.
    
    Forgive me but JavaScript is also hugely dynamic. How does this prevent effective sand boxing in the general sense?
    I imagine it would take a lot to get the browsers to stop working on their JavaScript implementations that they have sunk insane amounts of time and effort into, and start something brand new.
    Another solution is to program in a subset of Python that gets verified at compile time with additional restrictions, and then compiled into JavaScript (the way CoffeeScript does.) That way we re-capture the investment already made in browser-side JavaScript technology.
    Trust me, I'd love to see it happen, but I don't think it will.
    That sounds more like a solution looking for a problem. No need to reinvent the browser vm wheel. Reuse what's there to greate
  - Re: (Score:2)
    
    by thetoadwarrior ( 1268702 ) writes:
    
    It could easily be done but there are too many people who are heavily invested in JS being broken.
So... (Score:3)

by CAIMLAS ( 41445 ) writes: on Wednesday February 06, 2013 @04:37AM (#42806237)

So, they're porting R and Perl PDL to Python, then?

Share
twitter facebook
There is also Pandas (Score:2)

by siDDis ( 961791 ) writes:

Pandas http://pandas.pydata.org/ [pydata.org] is another great tool for data analysis. It use numpy and is highly optimized with critical code paths which is written in C.
There's more to XDATA (Score:2)

by seekthirst ( 1457205 ) writes:

It's strange that this article focused on Python and Continuum when there is a much bigger story to be had. The XDATA program is being run in a very open source manner, and there will be a multitude of open source tools created and delivered by the end of the contract. The program is focusing on two major tasks: the analytics/algorithmic tools to process big data; and the visualization/interaction tools that go along with them.
Python? (Score:3)

by Murdoch5 ( 1563847 ) writes: on Wednesday February 06, 2013 @08:42AM (#42807179) Homepage

Have they heard of Matlab?

Share
twitter facebook
- Re: (Score:2)
  
  by vgerclover ( 1186893 ) writes:
  
  Have you heard of Open Source?
  - Re: (Score:2)
    
    by Murdoch5 ( 1563847 ) writes:
    
    Fine then use Octave or one of the other mathematical open source packages. The issue is that they want to adapt a system instead of using an existing one.
    - Re: (Score:2, Insightful)
      
      by Anonymous Coward writes:
      
      Okay, look. I used Octave for a long time on Linux and on Windows. On Linux (Ubuntu) it generally worked rather well and I used it for classwork where possible. On Windows, it works well as long as you don't need to plot anything. I can't tell you the number of times I installed/uninstalled various versions of Octave on Windows to find out that the plotting was broken in some way. MATLAB is great until you run in to licensing issues.
      Then I found out about the combination of IPython/Numpy/Scipy/Matplotl
      - Re: (Score:2)
        
        by naroom ( 1560139 ) writes:
        
        Thanks! As a scientist looking to switch away from Matlab, this was really informative! Somebody get this guy some mod points :)
- 110 reasons to pick Python over Matlab (Score:2)
  
  by naroom ( 1560139 ) writes:
  
  Plenty of reasons. [hammerprinciple.com]
  - Re: (Score:2)
    
    by Murdoch5 ( 1563847 ) writes:
    
    Well it's group voted on so it's not like I can argue the list. How ever that being said, Matlab or any mathematical computing language is still better suited for big data, the lack of skill of a programmer should never be blamed on the language, it's an easy way out.
    - Re: (Score:2)
      
      by naroom ( 1560139 ) writes:
      
      You may not be familiar with SciPy / NumPy. They are the scientific computing side of Python. They support matrix operations and linear algebra at least as well as Matlab does. Underneath, both NumPy and Matlab are just LAPACK anyway. Here, have a relevant wiki article [wikipedia.org].
- Re: (Score:2)
  
  by thetoadwarrior ( 1268702 ) writes:
  
  Proprietary languages, lol.
- Re: (Score:2)
  
  by steveha ( 103154 ) writes:
  
  Have they heard of Matlab?
  Have you heard of SciPy?
  I predict that a tipping point is coming, and after we reach that tipping point, Matlab will become a legacy language and all the new projects will be SciPy.
  Right now Matlab is benefiting from network effect: everyone uses Matlab because everyone uses Matlab. It's the standard, you expect to see everyone using it in certain industries. But it's a proprietary product controlled by a single company that is doing its best to extract maximum revenue from it.
  Me
  - Re: (Score:2)
    
    by Murdoch5 ( 1563847 ) writes:
    
    Fine, then grab one of the other million and 1/2 great open source mathematical packages and run with it. Basically a ton of money is being sunk into something that can be solved by moving platforms. Open or not, there is software which fulfills the need and for $100 million you can get a lot of anything.
    - Re: (Score:2)
      
      by steveha ( 103154 ) writes:
      
      grab one of the other million and 1/2 great open source mathematical packages
      Okay, why?
      The scientific community is already coalescing around SciPy. You are arguing that DARPA should send money to anything but SciPy but you didn't give a reason.
      - Re: (Score:2)
        
        by Murdoch5 ( 1563847 ) writes:
        
        The scientific community is already coalescing around SciPy.
        Maybe some it is but I'm going to bet the vaste majority aren't even touching python. I would never use python for scientific computing, it's not designed for it, simply put. Sure you can do light scrientific computing in SciPy maybe even some more advanced functions but if Python has to go balls to the wall, it simply wont measure up!
        
        So if your looking for a system that can handle all your big data and your large storage why not look towards a system which can handle most of it out of the box, that's
YAY (Score:3)

by sproketboy ( 608031 ) writes: on Wednesday February 06, 2013 @09:05AM (#42807293)

Now China can win!

Share
twitter facebook
Big Data != Analytics (Score:3)

by michaelmalak ( 91262 ) writes: <michael@michaelmalak.com> on Wednesday February 06, 2013 @11:15AM (#42808527) Homepage

The summary and article seem to conflate Big Data with Analytics. These days the two often go together, but it's quite possible to have either one without the other. Big Data is "more data than can fit on one machine", and analytics means "applying statistics to data". E.g. many Big Data projects start out as "capture now, analyze a year or two from now," and maybe just do simple counts in the interim, which is not "analytics". And of course, many useful analytics take place in the sub-terabyte range.
The irony with this story is that Python is useful for in-memory processing, and not "Big Data" per se. To process "Big Data" typically requires (today, based on available tools, not inherent language advantages) JVM-based tools, namely Hadoop or GridGain, and distributed data processing tasks on those platforms require Java or Scala. Both of those platforms leverage the uniformity of the JVM to launch distributed processes across a heterogeneous set of computers.
The real use case here is one first reduces Big Data using the JVM platform, and only then once it can fit into the RAM of a single workstation, use Python, R, etc. to analyze the reduced data. So typically, yes, these Python libraries will be used in Big Data scenarios, but pedantically, analytics doesn't require Big Data and Python isn't even capable (generally, based on today's tools) of processing raw Big Data.

Share
twitter facebook
Imagine the research if we took all lobbying (Score:2)

by tyrione ( 134248 ) writes:

cash and put it to advancing applied sciences to better the nation. We piss billions down the drain marketing to morons and yet whine about spending billions on DARPA, DoE and whatnot. This county is truly too stupid for its own well-being.
- Re: (Score:2)
  
  by lattyware ( 934246 ) writes:
  
  Seriously? Sphinx makes beautiful documentation that is easy to find your way around. Compared to the ugly-ass JavaDocs that are painful to browse through, I wouldn't even give it a second thought.
- Re: (Score:2)
  
  by blueskies ( 525815 ) writes:
  
  Pandas is right under: "Scientific & Data Analysis Packages" http://docs.continuum.io/index.html [continuum.io]
- Re: (Score:2)
  
  by blueskies ( 525815 ) writes:
  
  Luckily all of the XData funded work is required to be open source. This looks like the numpy work: http://blaze.pydata.org/ [pydata.org]

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Great. Just Great (Score:1, Insightful)

Re:Great. Just Great (Score:5, Insightful)

Re:Great. Just Great (Score:5, Funny)

Re: (Score:2)

Re:Great. Just Great (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

I get the impression that (Score:5, Interesting)

Re: (Score:2)

Re:I get the impression that (Score:5, Interesting)

Re: (Score:2)

Re:I get the impression that (Score:4, Interesting)

Re: (Score:3)

Re: (Score:2)

GIL is a non-issue. (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Neither LANGUAGE has a GIL (Score:2)

Re:I get the impression that (Score:5, Informative)

Re: (Score:3)

Re:I get the impression that (Score:5, Informative)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re:I get the impression that (Score:5, Interesting)

Re:I get the impression that (Score:5, Insightful)

Re:I get the impression that (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

good lord no. (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Informative)

Re:I get the impression that (Score:4, Informative)

Re: (Score:2)

Matlab (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Python 2 or 3? (Score:4, Interesting)

Re:Python 2 or 3? (Score:5, Informative)

Wrong language (Score:5, Funny)

Re:Wrong language (Score:4, Informative)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Good news for the Python community (Score:3, Funny)

Re: (Score:2)

depends (Score:2)

Re: (Score:2)

So... (Score:3)

There is also Pandas (Score:2)