×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Data Sorting World Record — 1 Terabyte, 1 Minute

kdawson posted more than 3 years ago | from the out-of-sorts dept.

Math 129

An anonymous reader writes "Computer scientists from the University of California, San Diego have broken the 'terabyte barrier' — and a world record — when they sorted more than a trillion bytes of data in 60 seconds. During this 2010 'Sort Benchmark' competition, a sort of 'World Cup of data sorting,' the UCSD team also tied a world record for fastest data sorting rate, sifting through one trillion data records in 172 minutes — and did so using just a quarter of the computing resources of the other record holder."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

129 comments

How to block KDawson articles? (-1, Troll)

Anonymous Coward | more than 3 years ago | (#33053330)

Anyway to filter out articles by KDawson without logging in?

Re:How to block KDawson articles? (-1, Offtopic)

Anonymous Coward | more than 3 years ago | (#33053652)

Just put a fucking bullet in your head and you'll never have to see another kdawson article again.

Re:How to block KDawson articles? (2, Funny)

Yacoob Al-Atawi (1192531) | more than 3 years ago | (#33054836)

Just put a fucking bullet in your head and you'll never have to see another kdawson article again.

Don't do it while reading one of his articles or it will be imprinted in your eyes forever.

Re:How to block KDawson articles? (0, Offtopic)

MobileTatsu-NJG (946591) | more than 3 years ago | (#33053992)

Anyway to filter out articles by KDawson without logging in?

Yes but you have to install Flash.

What is this (2, Funny)

blai (1380673) | more than 3 years ago | (#33053348)

I have one record of size one terabyte.

Sort by size.
It is already sorted.
Sort finished. time: 0s! WTH!

Doesn't sound so hard... (5, Funny)

Minwee (522556) | more than 3 years ago | (#33053352)

As long as you use Intelligent Design Sort [dangermouse.net] .

Re:Doesn't sound so hard... (1)

Thanshin (1188877) | more than 3 years ago | (#33054278)

It's actually pretty simple.

My 1TB Archos is sorted alphabetically at this very moment; in ascending and descending order at the same time!

I used to think it was great (4, Interesting)

MichaelSmith (789609) | more than 3 years ago | (#33053356)

I had a 6502 system with BASIC in ROM and a machine code monitor. The idea is to copy a page (256 bytes) from the BASIC ROM to the video card address space. This puts random characters into one quarter of the screen. Then bubble sort the 256 bytes. It took about one second.

For extra difficulty do it again with the full 1K of video. Thats harder with the 6502 because you have to use vectors in RAM for the addresses. So reads and writes are a two step operation, as is incrementing the address. You have to test for carry. But the result was spectacular.

Re:I used to think it was great (1, Insightful)

DerekLyons (302214) | more than 3 years ago | (#33053598)

The idea is to copy a page (256 bytes) from the BASIC ROM to the video card address space. This puts random characters into one quarter of the screen.

Well, no. It puts the decidedly non-random contents of the BASIC ROM on the screen.
 

Then bubble sort the 256 bytes. It took about one second.

Which is roughly as surprising and unexpected as the Sun coming up in the East in the morning. Since much of (the non random and limited character set) data is repeated, many of the bytes don't have to move very far before being 'sorted'.

Re:I used to think it was great (1)

MichaelSmith (789609) | more than 3 years ago | (#33053646)

Which is roughly as surprising and unexpected as the Sun coming up in the East in the morning.

When you are 12, lots of things can be fun.

Re:I used to think it was great (0)

Anonymous Coward | more than 3 years ago | (#33054306)

Which is roughly as surprising and unexpected as the Sun coming up in the East in the morning.

While I see the value of your knowledge, I don't think that's what GP meant.

I so desperately require a "-1, Killjoy" mod point right now. Sorry. Unless you have something insightful to share that I know not.

- troll8901 (at work)

Re:I used to think it was great (-1, Offtopic)

Anonymous Coward | more than 3 years ago | (#33054764)

Well, no. It puts the decidedly non-random contents of the BASIC ROM on the screen.

Are you always such an asshole or is it just on Slashdot? Who gives a shit?

Re:I used to think it was great (1)

thesupraman (179040) | more than 3 years ago | (#33053654)

That must have been in basic yes?

A 6502, in assembly language, can sort 256 adjacent byte, faster than that.
Even with a trivial bubble sort.
they are not fast,

Re:I used to think it was great (2, Informative)

Cold hard reality (1536175) | more than 3 years ago | (#33053716)

A 256 byte bubble sort requires about 33,000 operations. At 1 MHz, that means every operation (compare + possible swap) took 30 cycles. While somewhat slow, this is definitely much faster than what basic could have achieved.

Re:I used to think it was great (1)

MichaelSmith (789609) | more than 3 years ago | (#33053730)

No, it was in hand assembled machine code. The CPU ran at 1Mhz but I think it was the bubble sort algorithm which made it slow. For each iteration it looped through the full 256 bytes.

Re:I used to think it was great (0)

Anonymous Coward | more than 3 years ago | (#33054810)

Dunno if it's true on the 6502, but on other architectures access to device memory (like your video framebuffer) is a lot slower than main memory.

Re:I used to think it was great (0)

Anonymous Coward | more than 3 years ago | (#33055264)

On the C64, the processor could access video memory at exactly the same speed as normal memory. This because there was no real distinction between video and main memory.

The videochip and the cpu took turns accessing memory. Ever wondered why you lost your video when using your cassette recorder? That's because the videochip got turned off, so the cpu could access the memory faster (effectively doubling the memory bandwidth). You could also POKE this status yourself, if you needed the pure, raw speed.

These were the days... ;-)

Re:I used to think it was great (1)

noidentity (188756) | more than 3 years ago | (#33053952)

Yeah, vectors to RAM addresses, having to use Y for indexing. The cycle count was one greater than normal indexing, pretty good. Wait, what were we talking about again?

Re:I used to think it was great (2, Insightful)

hey! (33014) | more than 3 years ago | (#33054956)

I always preferred Shell Sort to Bubble Sort. It's just as easy to code as Bubble Sort. Although it's not asymptotically better that Bubble sort, it makes fewer exchanges and so tends to run faster in the average case. In my experience, it can be dramatically faster.

Basically Shell Sort is Insertion Sort, but it starts by comparing keys that are far apart. The heuristic is that in an unsorted array, keys usually have pretty far to go before they are in their final position. The algorithm removes larger amounts of entropy in the early rounds, leading to fewer exchanges in the later iterations.

That's what I like about Shell Sort. It takes a very simple sort algorithm and improves it with a clever insight into the problem.

Mechanical Turk (0)

Anonymous Coward | more than 3 years ago | (#33053374)

Data Sorting World Record -- 1 Terabyte, 1 Minute

The fact that they needed a computer should disqualify them.

1-byte records? (2, Insightful)

mike260 (224212) | more than 3 years ago | (#33053378)

TFA variously refers to 1 trillion records and 1Tb of data. So each record is 1 byte?
Doesn't seem like that requires any real computation - you just go through the data maintaining a count of each of the 256 possible values (an embarassingly parallel problem), then write it back out with 256 memsets (likewise trivially parallelisable).

Re:1-byte records? (4, Informative)

mike260 (224212) | more than 3 years ago | (#33053396)

Ah, my mistake - the "trillion data records" refers to a different benchmark. In the 1TB benchmark, records are 100 bytes long.

Re:1-byte records? (1)

cffrost (885375) | more than 3 years ago | (#33054144)

In the 1TB benchmark, records are 100 bytes long.

"100 bytes?" Why this arbitrary, ugly, sorry-ass excuse of a number? An elegant, round number like 64, 128... hell, even 96 would have been a sensible and far superior choice.

Re:1-byte records? (0)

Anonymous Coward | more than 3 years ago | (#33054208)

96 elegant, round and sensible number? Have you been designing ATM networks by any chance?

Re:1-byte records? (1)

jesset77 (759149) | more than 3 years ago | (#33054540)

"100 bytes?" Why this arbitrary, ugly, sorry-ass excuse of a number? An elegant, round number like 64, 128... hell, even 96 would have been a sensible and far superior choice.

Because those are only round(ish) numbers in binary. Might make sense if you were sorting 1TiB of data, but they were not. Per TFA:

one terabyte of data (1,000 gigabytes or 1 million megabytes)

This clarifies that they were working with the SI prefix "Tera", meaning that powers of ten much more evenly divide. To whit, 10^2 byte records in a 10^12 byte dataset = exactly 10^10 (ten billion) specific records. 64 or 128 byte records would divide evenly into a non-round number of records, 96 byte records would not divide evenly.

This of course also assumes precisely 1TB of data, TFA suggests "more than" 1TB, so your guess is as good as mine. It might even be 1TiB, which is about 10% more.

Re:1-byte records? (0)

Anonymous Coward | more than 3 years ago | (#33054914)

Ah, yes, the confusion created by the 'new and improved' SI prefix, which makes it so you can never tell what someone is talking about since you can never be sure if they're aware that the meaning of KB, MB, etc. was 'changed'.

Retarded in so many ways.

Re:1-byte records? (1, Informative)

topcoder (1662257) | more than 3 years ago | (#33053684)

Doesn't seem like that requires any real computation - you just go through the data maintaining a count of each of the 256 possible values (an embarassingly parallel problem), then write it back out with 256 memsets (likewise trivially parallelisable).

That technique is called bucked sort BTW.

Great to see sorting research advance (3, Insightful)

SuperKendall (25149) | more than 3 years ago | (#33053388)

You almost think at this point that with super fast systems attention to algorithmic complexity hardly matters, it's good to see research still advancing and that there are areas where carefully designed algorithms make a huge difference. I look forward to seeing how they fare against the Daytona set.

Re:Great to see sorting research advance (2, Interesting)

Anonymous Coward | more than 3 years ago | (#33053690)

I work in the OLAP realm. Trust me, it matters. Being able to run an adhoc query across terabytes of data with near real-time results is the holy grail of what we do. The industry has known for a while that parallel computing is the way to go, but only recently has the technology become cheap enough to consider deploying on a large scale. (Though Oracle will still happily take millions from you for Exadata if you want the expensive solution.)

One other area... (2, Interesting)

SuperKendall (25149) | more than 3 years ago | (#33053752)

Come to think of it, one area where it also matters currently is in mobile development. If you aren't considering memory or processor usage you can quickly lead yourself into some really bad performance, thinking hard about how to make use of what little you have really matters in that space too.

So only desktop or smallish backend development can generally remain unconcerned these days with algorithmic performance...

I had to work with large datasets in my previous life as a backend IT guy, but nothing at the levels you are talking about. Even then I thought carefully about how any give approach would affect performance.

Re:One other area... (1)

raddan (519638) | more than 3 years ago | (#33055426)

Algorithmic efficiency also matters in the mobile space as designers get closer to building low-power systems that harvest energy directly from their environments. Just yesterday I was speaking with a researcher who had worked on animal tracking devices (e.g., for turtles) that collected power from small solar arrays. Any efficiencies that they could take advantage of, they used, from smart scheduling of their GPS units (don't madly run the unit if you don't have the power) to a delay-tolerant networking protocol (only phone home when you have the resources) to a programming language that informed the operating system which computations were power-hungry and should be performed later.

Re:Great to see sorting research advance (0)

Anonymous Coward | more than 3 years ago | (#33055492)

... I look forward to seeing how they fare against the Daytona set.

I don't think you need to allow for any great intellectual surprises from people who enjoy turning left for five hours while going 180 mph and chugging beer.

8)

World Cup of data sorting (3, Funny)

the_other_one (178565) | more than 3 years ago | (#33053394)

How many parallel predicting octopuses were required to predict their victory?

Re:World Cup of data sorting (0)

Anonymous Coward | more than 3 years ago | (#33053448)

According to Holt you only need one.

Re:World Cup of data sorting (2, Funny)

Thanshin (1188877) | more than 3 years ago | (#33054260)

How many parallel predicting octopuses were required to predict their victory?

Infinite, but one of them wrote some pretty good stories about a ghost.

Re:World Cup of data sorting (1)

pinkushun (1467193) | more than 3 years ago | (#33054478)

f = n / x

where n is the number of records, and x the number of tentacles per octopus.

So 1000 records only need 125 parallel 8-legged octopi. This is the optimal amount, adding any more will invoke Brooks's Law [wikipedia.org]

Only 52 nodes (4, Interesting)

Gr8Apes (679165) | more than 3 years ago | (#33053406)

You've got to be kidding me. Each node was only 2 quad core processors, with 16 500GB drives (big potential disk IO per node) but this system doesn't even begin to scratch the very bottom of the top 500 list.

I just can't image that if even the bottom rung of the top 500 was even slightly interested in this record, that they wouldn't blow this team out of the water.

Re:Only 52 nodes (3, Informative)

rm999 (775449) | more than 3 years ago | (#33053426)

I don't know much about them, but perhaps most supercomputers break this rule: "The hardware used should be commercially available (off-the-shelf), and unmodified (e.g. no processor over or under clocking)"

Re:Only 52 nodes (2, Informative)

afidel (530433) | more than 3 years ago | (#33053526)

Many of the systems on the TOP500 list are simple COTS clusters with perhaps an interesting interconnect. Remember the Appleseed cluster that was ranked in the top10 a few years ago, it was nothing more than a bunch of Apple desktops with an interesting gigabit mesh network. Regardless they hardly used optimal hardware, hex core Xeon's with 96GB's of ram per node and a better I/O subsystem could have crushed this result.

Re:Only 52 nodes (3, Insightful)

Anonymous Coward | more than 3 years ago | (#33053628)

You say that in a way where someone could misinterpret the interconnect as not being a big deal. Poor interconnects are why EC2 wasn't useful for HPC (their new option sounds somewhat better, but is limited to 16 nodes, IIRC) and why iPod Touches or similar devices have very few possible applications for clustering. If good interconnects weren't important, then clustering portable systems would be inevitable -- e.g., if you wanted more computing power for your laptop, you could just plug in your phone or mp3 player or iPad.

Re:Only 52 nodes (0)

Anonymous Coward | more than 3 years ago | (#33053664)

Perhaps, but the quote "If you are idling your processors or not using all your RAM, you’re burning energy and losing efficiency." could be the reason they chose this paticular configuration, and more cores or RAM would just mean more idling and less power efficiency.

Re:Only 52 nodes (4, Informative)

straponego (521991) | more than 3 years ago | (#33053572)

No, most Top500 machines are composed of commercially available, unmodified parts. More than half use GigE interconnects, though Infiniband has nearly caught up. I'm not sure if you'd count Infiniband as COTS-- at least a couple of years ago, it was fairly involved to configure, and it's not cheap. But anybody who has the money can buy it.

Re:Only 52 nodes (2, Insightful)

antifoidulus (807088) | more than 3 years ago | (#33053898)

Doubtful. The biggest bottlenecks for contests like this are I/O, network, and memory latency. While faster CPUs are always welcome increases in CPU speed rarely make a HUGE differences, esp. when you are talking about the relatively small gains you get from overclocking. Considering the drawbacks of overclocking, namely increased heat production and decreased reliability, it doesn't really seem all that compelling for supercomputer operators to employ it.

Re:Only 52 nodes (0)

Anonymous Coward | more than 3 years ago | (#33053460)

Yea I think there is just a lack of interest in doing it. Many of the top 500 can easily do the entire sort in memory, but I bet a lot of people would be pissed off to see that people are using research supercomputers to sort numbers :)

Besides, Google reported a 68 second terasort mark (http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html) 2 years ago, and I'm positive they can easily beat that number today.

52 nodes? So many for 1 TB worth of data? (1)

c0lo (1497653) | more than 3 years ago | (#33053840)

when they sorted more than one terabyte of data (1,000 gigabytes or 1 million megabytes) in just 60 seconds.

Each node is a commodity server with two quad-core processors, 24 gigabytes (GB) memory and sixteen 500 GB disks

Ok, let's do the math. 52 computers X 24 GB RAM each = 1248 GB. Letting aside the RAM taken by OS, that's roughly 1 TB of RAM to dedicate for the sorting.
Le'me guess: the main difficulty was to partition the data and perform a merging of the in-memory sorted partitions?

(of course I'm posting before reading the details of the method they used, it is /. after all. I guess I even merit a price for the weirdness of RTFA article before posting)

Re:Only 52 nodes (0)

Anonymous Coward | more than 3 years ago | (#33054442)

You've got to be kidding me. Each node was only 2 quad core processors, with 16 500GB drives (big potential disk IO per node) but this system doesn't even begin to scratch the very bottom of the top 500 list.

I just can't image that if even the bottom rung of the top 500 was even slightly interested in this record, that they wouldn't blow this team out of the water.

And that's why calling a cluster of separate computers a single "supercomputer" is highly misleading.

Lots of little computers can solve lots of separate little jobs quickly. So if your problem can be broken up into lots of little chunks where the results of each chunk are independent of the results of other chunks, clusters are great.

Sorting data is not such a problem.

Re:Only 52 nodes (0)

Anonymous Coward | more than 3 years ago | (#33055008)

Yes it is. With an algorithm such as merge sort and n processors you can easily divide your problem in n little chunks and sort them separately, then you will only use less than all your processors at the end when you merge all the little chunks.

With some creativity other sorting algorithms can be done in parallel too. Take quick sort, after the first iteration you have two lists that can be sorted separately and they won't even need to be merged when you're done.

Amendments to the Geek Heirarchy (5, Funny)

Lord_of_the_nerf (895604) | more than 3 years ago | (#33053416)

LARPers > Fan-fiction writers > Professional Data Sorting Competitors > Furries

Re:Amendments to the Geek Heirarchy (0)

Anonymous Coward | more than 3 years ago | (#33053550)

Adding...

Solar racers > ... > Filking

Re:Amendments to the Geek Heirarchy (2, Funny)

MobileTatsu-NJG (946591) | more than 3 years ago | (#33054006)

LARPers > Fan-fiction writers > Professional Data Sorting Competitors > Furries

Now sort by income level!

Re:Amendments to the Geek Heirarchy (1)

jesset77 (759149) | more than 3 years ago | (#33054562)

LARPers > Fan-fiction writers > Professional Data Sorting Competitors > Furries

w8, you left out that a good portion of LARPers and a resounding majority of Fan-fiction writers are Furries to begin with.

So, are we just randomly bashing on imagination-have, or? ;D

Re:Amendments to the Geek Heirarchy (0)

Anonymous Coward | more than 3 years ago | (#33054628)

So, are we just randomly bashing on imagination-have, or? ;D

Of course we are.

And I've got to say, for people who've often been at the receiving end of bullying and harassment from the jocks, we're surprisingly eager to find people to look down on and belittle. :(

Re:Amendments to the Geek Heirarchy (1)

f3rret (1776822) | more than 3 years ago | (#33054758)

LARPers > Fan-fiction writers > Professional Data Sorting Competitors > Furries

What about a LARPer who is also a furry and enjoys writing fan-fiction and professionally sorting data in competitions?

Re:Amendments to the Geek Heirarchy (0)

Anonymous Coward | more than 3 years ago | (#33055214)

Gets a Darwin Award since it's outside of the gene pool?

The most important part (0)

Anonymous Coward | more than 3 years ago | (#33053434)

The most important part of any system, is its Intelligent Qube.

Isn't this a little slow? (0)

Anonymous Coward | more than 3 years ago | (#33053500)

I'm pretty sure that modern analytical DW systems like Greenplum or Teradata on the same number of nodes/RAM/disk/commodity servers could do this faster?

good choice (1)

Main Gauche (881147) | more than 3 years ago | (#33053512)

"a sort of 'World Cup of data sorting"

It ended in a 0-0 tie?

Re:good choice (0)

Anonymous Coward | more than 3 years ago | (#33053542)

"a sort of 'World Cup of data sorting"

It ended in a 0-0 tie?

Yep.

the UCSD team also tied a world record for fastest data sorting rate

Re:good choice (0)

Anonymous Coward | more than 3 years ago | (#33053778)

No it was 2-1: two sorts, one cup.

Been doing this for years... (3, Funny)

HockeyPuck (141947) | more than 3 years ago | (#33053532)

Why is sorting 1TB so hard?

I can just instruct my tape library to put the tapes in the library in alphabetical order in the slots... Y'know AA0000 AA0001 AA0002... moves a hell of a lot more than 1TB.

Re:Been doing this for years... (1, Informative)

Carthag (643047) | more than 3 years ago | (#33054466)

Why is sorting 1TB so hard?

I can just instruct my tape library to put the tapes in the library in alphabetical order in the slots... Y'know AA0000 AA0001 AA0002... moves a hell of a lot more than 1TB.

That's not sorting 1TB. That's sorting n records where n = the number of tapes.

Re:Been doing this for years... (2, Funny)

cc1984_ (1096355) | more than 3 years ago | (#33054548)

That WHOOOOOOSH noise is the sound of a backup tape flying over your head.

Library of Congress (0)

Anonymous Coward | more than 3 years ago | (#33053564)

Is it sorted?

Best case (1)

nikanth (1066242) | more than 3 years ago | (#33053588)

Did they sort the best case input and claim record? Or is it the average or worst case? Or the algo's best case time == worst case time?

1 Trillion Records Sorted (4, Informative)

amirulbahr (1216502) | more than 3 years ago | (#33053620)

When I read the summary I thought what's the big deal if the 1 TB of data only contained two records 0.5 TB each. Then I saw that kdawson wrote the summary. So I browsed over here [sortbenchmark.org] and saw that the impressive thing is that they sorted 1,000,000,000,000 records of 100 bytes each with 10 byte keys.

Anything is faster than Hadoop (1)

Tassach (137772) | more than 3 years ago | (#33053672)

Considering that the previous terabyte sort record was held by Hadoop, it's not surprising that another system can do the same job in a fraction of the time with a fraction of the resources.

Hadoop a useful framework, but it's a horribly inefficient resource hog. Who was the f*ing genius who thought it was a good idea to write a high performance computing application in Java?!?

Re:Anything is faster than Hadoop (1)

Trieuvan (789695) | more than 3 years ago | (#33053710)

Java was slow 15 years ago , not now !

Re:Anything is faster than Hadoop (0)

Anonymous Coward | more than 3 years ago | (#33053908)

It is still a resource hog. Try using millions of small objects.
"Don't do that," you say!
I don't, thank you. Emulating structs with huge arrays gets around that, clumsy as it is. Feels like C, almost as fast, too.

Re:Anything is faster than Hadoop (0)

Anonymous Coward | more than 3 years ago | (#33054030)

Java is still slow, it's just that now the hardware is fast enough that you don't notice it so much anymore.

Re:Anything is faster than Hadoop (1)

flanders123 (871781) | more than 3 years ago | (#33055456)

Who was the f*ing genius who thought it was a good idea to write a high performance computing application in Java?!?"

My Boss: "I Agree. I personally would have copy-pasted the records into this slick little Excel sheet i made, and clicked the "sort A-Z" button. Done and done. This programming stuff is easy!"

Important Details.... (5, Interesting)

pdxp (1213906) | more than 3 years ago | (#33053776)

... to make it clear that you won't be doing it (TritonSort, thanks for leaving that out kdawson) on your desktop at home:
  • 10Gbps Networking
  • 52 servers x 8 cores each = 416 CPUs
  • 24 GB RAM per server = 1,248 GB
  • ext4 filesystem on each, presumably with hardware raid

I think this is cool, but.... how fast is it in a more practical situation?

source [sortbenchmark.org]

Pretty strange contest... (0)

Anonymous Coward | more than 3 years ago | (#33053810)

Sorting (like most other algorithms) can be optimized for run-time or memory consumption. Assuming a box with infinite memory the sorting algorithm itself doesn't take any significant run-time. Considering the low amount of memory of the test systems compared to the data to be sorted a quite significant portion of the time required is disk-IO. So the main challenge here seems to be to minimize disk-access - and to put the fastest disks available into the test system. Why aren't they simply using SSDs, this would likely speed things up quite a lot...

What makes it a barrier? (5, Insightful)

Patch86 (1465427) | more than 3 years ago | (#33053846)

Impressive and all, but I take umbrage at calling it a "1 TB barrier". Is it disproportionately more difficult than sorting 0.99 TB?

Breaking "the sound barrier" was hard because of the inherent difficulty of going faster that sound in an atmosphere (sonic booms and whatnot). It was harder than simply travelling that fast would have been.

If this is just further evolution of sorting speed, it makes it a milestone, not a barrier.

Re:What makes it a barrier? (0)

Anonymous Coward | more than 3 years ago | (#33053930)

Absolutely nothing makes it a barrier. It's just a neat round number.

Re:What makes it a barrier? (5, Funny)

maxwell demon (590494) | more than 3 years ago | (#33054024)

Impressive and all, but I take umbrage at calling it a "1 TB barrier". Is it disproportionately more difficult than sorting 0.99 TB?

At 1TB the attraction of the 1-bits gets so large that if you are not careful, your data collapses into a black hole.

Re:What makes it a barrier? (1)

wvmarle (1070040) | more than 3 years ago | (#33054240)

It is psychological. It adds another digit to the size of the number.

For the rest it is as much of a barrier as "the stock market has broken through the 20,000 point barrier", that idea.

Breaking the light speed barrier now that would be an achievement.

Re:What makes it a barrier? (1)

Sockatume (732728) | more than 3 years ago | (#33054560)

kdawson thinks in an esoteric language where numbers go "1, 2, 3... 999,999,999, many."

Truly Impressive! (1)

hyades1 (1149581) | more than 3 years ago | (#33054046)

I have to admit that I'm genuinely impressed. This rate of data processing is almost as efficient as my girlfriend when she hears, evaluates and demolishes my alibis.

Re:Truly Impressive! (1)

maxwell demon (590494) | more than 3 years ago | (#33054052)

Are you sure that demolishing your alibis is as hard as sorting 1TB of numbers?

Re:Truly Impressive! (1)

hyades1 (1149581) | more than 3 years ago | (#33054076)

Yes, I am. She is able to parallel process and arrive at what she deems an appropriate response simultaneously. That response frequently involves numbers...time, usually, if you catch my drift.

Pretty close to theoretical max (3, Interesting)

melted (227442) | more than 3 years ago | (#33054128)

Let's consider 100TB in 172 minute thing they also did. 52 nodes, 16 spindles per node is 832 spindles total and 120GB of data per spindle. 120GB of data can be read in 20 minutes and transfered in another 15 to the target spindles (assuming uniform distribution of keys). You can then break it down into 2GB chunks locally (again by key) as you reduce. Then you spend another hour and a half reading individual chunks, sorting them in memory, concatenating and writing.

Of course this only works well if the keys are uniformly distributed (which they often are) and if data is already on the spindles (which it often isn't).

The results are in... (1)

DeanLearner (1639959) | more than 3 years ago | (#33054140)

In third place, with a time of 01:02.305 is Sort United.

In second place, with a time of 00:58.112 is Team Sorted.

And in first place, with a time of 00:59:836 is UCSD!

Re:The results are in... (1)

Terrasque (796014) | more than 3 years ago | (#33054488)

In third place, with a time of 01:02.305 is Sort United.

In second place, with a time of 00:58.112 is Team Sorted.

And in first place, with a time of 00:59:836 is UCSD! .. which also designed the winner sorting algorithm

Fix'd

Hah! We'll find you now, ET! (1)

ibsteve2u (1184603) | more than 3 years ago | (#33054246)

Assuming UCSD gives the algorithm to SETI, anyway, and doesn't instead go to Wall Street to be used in detecting market trends and major share movements as part of their never-ending quest to separate the small investor from their money faster.

Re:Hah! We'll find you now, ET! (0)

Anonymous Coward | more than 3 years ago | (#33054472)

This isn't about a new algorithm, really, bit about reducing the constants in existing algorithms. As far as I know, sorting is still O(n log n), and it is reasonable to believe it always will be.

Oh brother.. (2, Funny)

jackd (64557) | more than 3 years ago | (#33054552)

Wow, I realize I am seriously a nerd, getting genuinely excited about someone sorting large amounts of data. Please don't tell my girlfriend, she'll realize I've been faking being a semi-normal cool guy for years, and leave me.

Re:Oh brother.. (3, Funny)

Dunbal (464142) | more than 3 years ago | (#33054856)

Please don't tell my girlfriend, she'll realize I've been faking being a semi-normal cool guy for years, and leave me.

      Nah, she'll realize that you're more turned on by a new processor than by a pair of tits, and stay with you forever knowing you'll never cheat on her (except when you go to that site to download algorithms). You might have to give in and move out of the basement though. But not to worry - a good set of blackout curtains and it's almost the same. Plus as you get older the lack of dampness is easier on your rheumatism...

when are we going to have access to this (1)

hesaigo999ca (786966) | more than 3 years ago | (#33054968)

when are we going to have access to this technology? It is always nice to hear new breakthroughs, but governments always control access to these new technologies, and make it 5 to 10 years before we see any of it, while somewhere like japan, you already have it available...

((Triton)|(Gray)|(Minute))Sort (4, Informative)

Lord Grey (463613) | more than 3 years ago | (#33054976)

A paper describing the test is here [sortbenchmark.org] . TritonSort is the abstract method; GraySort and MinuteSort are concrete/benchmarked variations of TritonSort.

As TFA states, this is more about balancing system resources than anything else. The actual sort method used was "a variant of radix sort" (page two of the linked PDF). Everything else was an exercise in how to break up the data into manageable chunks (850MB), distribute the chunks to workers, then merge the results. That was the real breakthrough, I think. But I wonder how much is truly a breakthrough and how much was just taking advantage of modern hardware, since one of the major changes in MinuteSort is simply avoiding a couple of disk writes by keeping everything in RAM (a feat possible because that much RAM is readily available to a single worker).

Regardless, this kind of work is very cool. If web framework developers (for example) paid greater attention to balancing data flow and work performed, we could probably get away with half the hardware footprint in our data centers. As it stands now, too many COTS -- and hence, often-used -- frameworks are out of balance. They read too much data for the task, don't allow processing until it's all read in, etc.. This causes implementors to add hardware to scale up.

Trivial (0)

Anonymous Coward | more than 3 years ago | (#33055056)

Hell, I can sort a terabyte of data in less than a second. As long as the records are a terabyte long.

NSA has orgasm (0)

Anonymous Coward | more than 3 years ago | (#33055318)

upon hearing that news...
and so did
credit companies
banks
and other overlords ....

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...