Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Researcher Shows How GPUs Make Terrific Network Monitors

samzenpus posted about a year ago | from the keeping-an-eye-on-things dept.

Network 67

alphadogg writes "A network researcher at the U.S. Department of Energy's Fermi National Accelerator Laboratory has found a potential new use for graphics processing units — capturing data about network traffic in real time. GPU-based network monitors could be uniquely qualified to keep pace with all the traffic flowing through networks running at 10Gbps or more, said Fermilab's Wenji Wu. Wenji presented his work as part of a poster series of new research at the SC 2013 supercomputing conference this week in Denver."

cancel ×

67 comments

Sorry! There are no comments related to the filter you selected.

Shut up! Just shut up! (0)

Anonymous Coward | about a year ago | (#45488251)

This is just what we needed, *awesome* discovery.

You totally do not have to say everything that pops into your mind...

That's it? (4, Informative)

drcheap (1897540) | about a year ago | (#45488261)

So in violation of /. convention, I went ahead and read TFA in hopes that were would actually be something more than "we solved yet another parallel computing problem with GPUs." Nope, nothing. Not even some useless eye candy of a graph showing two columns of before/after processing times.

And the article just *had* to be split into two pages because it would have killed them to include that tiny boilerplate footer on page one. What a fail...at least it wasn't a blatant slashvertisement!

Re:That's it? (5, Funny)

timeOday (582209) | about a year ago | (#45488305)

They said it achieves a speedup of 17x, here is the graph:

CPU: X
GPU: XXXXXXXXXXXXXXXXX

Re:That's it? (-1)

Anonymous Coward | about a year ago | (#45488339)

A speedup of 17x over a single core. Using an 8 core Xeon (16 threads with hyperthreading) would give a similar speedup.

Re:That's it? (1)

Anonymous Coward | about a year ago | (#45488393)

I don't think you understand how hyperthreading works.

It doesn't let it do twice as many tasks simultaniously, it just lets it work on another task when it would otherwise be idle while waiting for some other hardware to do something.

Re: That's it? (0)

Anonymous Coward | about a year ago | (#45488567)

I do know how it works and for some workloads it does effectively approach double the execution throughput. Sometimes though it causes cache thrashing and can actually reduce throughput.

Re: That's it? (1)

Anonymous Coward | about a year ago | (#45488815)

lol wut? name *one* (real-world) workload where you get double the execution throughput.

Re: That's it? (0)

Anonymous Coward | about a year ago | (#45489183)

one instruction memory I/O, the other computational SMID instruction. Both of those will execute in parallel t full speed without blocking each other (assuming that you have already loaded your data into the registers)

Re: That's it? (1)

KingMotley (944240) | about a year ago | (#45491631)

That isn't a real world workload. I typically see 10-30% increase from hyperthreading, which isn't nothing, but it's not a 100% speed bump either.

Re: That's it? (0)

Anonymous Coward | about a year ago | (#45491995)

That isn't a real world workload. I typically see 10-30% increase from hyperthreading, which isn't nothing, but it's not a 100% speed bump either.

^ THIS

OP obviously decided to "baffle with bullshit" rather than honestly reply (doesn't work so good on this forum)

Re: That's it? (0)

Anonymous Coward | about a year ago | (#45489469)

MIPS CPUs that timeslice between the hyperthreads on a per-cycle basis anyway ...

Re: That's it? (0)

Anonymous Coward | about a year ago | (#45493117)

Mining Bitcoins, I see 3.8Mh/sec using all 8 (virtual) cores, vs 2.2 using only the 4 physical cores. Is this not hyperthreading?

Re:That's it? (1)

VTBlue (600055) | about a year ago | (#45488649)

A speedup of 17x over a single core. Using an 8 core Xeon (16 threads with hyperthreading) would give a similar speedup.

From the two-page report: "When compared to a 6-core CPU (m-cpu), the speedup ratios range from 1.54 to 3.20."

so yeah, the 17x is misleading because what network monitoring load would run on a single core?

Re:That's it? (0)

Anonymous Coward | about a year ago | (#45488813)

Yes, and they used a tesla M2070 which has 448 cores on 40nm process, which offers less than a 14% the performance of nvidia's most recent offerings.
If you want a less misleading comparison, try some more relevant metrics like power consumption or transistor count per performance unit over both architectures.

Re:That's it? (5, Informative)

TeXMaster (593524) | about a year ago | (#45489127)

Yeah but with this kind of applications the real bottleneck is the fact that the discrete GPU needs to access data through the high-latency, low-bandwidth PCIe bus. For this kind of application, an IGP, even with the lower core counts, is often a much better solution, unless you manage to fully cover the host-device-host transfers with computations.

I'd be really curious to see this thing done in OpenCL on a recent AMD APU, exploting all the CPU cores and the IGP cores concurrently.

Re:That's it? (0)

Anonymous Coward | about a year ago | (#45489931)

I wish I didn't spend my last points on some hyperthreading discusion above. What TeXMaster said.

Re:That's it? (2)

hairyfeet (841228) | about a year ago | (#45489985)

But at what cost? The power of the high end GPUs like the top Tesla units are frankly insane while the power of both AMD and Intel CPUs has been going down. Last I checked those top o the line GPUs sucked so much juice you could probably run a 32 core Intel and still use less juice and require less cooling.

Re:That's it? (0)

Anonymous Coward | about a year ago | (#45491817)

The power of the... top Tesla units [is] frankly insane...

And prone to catch fire if you look at them funny

Re: That's it? (1)

loufoque (1400831) | about a year ago | (#45489481)

A Xeon also has SIMD, which gives a speed-up of 8.

Re:That's it? (0)

Anonymous Coward | about a year ago | (#45488399)

I was expecting exploiting PCIe's full access to physical memory and CPU caches (Yah for lack of IOMMU and thus non-virtualized DMA!). Your GPU can snoop your network data right off the CPU cache your NIC DMAs it into, and then put its analysis right back into the NIC to go out to some destination. Its fully possible to sniff traffic (and anything else in RAM) and broadcast the results without touching the CPU. Fire-wire and thunderbolt externals can do this as well, and is perhaps much more worrying.

I was hoping for an interesting note about our horrible lack of security. I got nothing interesting. Why Intel disables the IOMMU in the K series processors is a mystery to me, but thats another story.

Re:That's it? (0)

Anonymous Coward | about a year ago | (#45488789)

Because IOMMU is desirable in server environments, and we want you to buy bigger beefier CPUs for your server rather than using the huge OCs you can achieve.

Re:That's it? (0)

Anonymous Coward | about a year ago | (#45489971)

Most OSs(Windows, Linux, *BSD) create a new memory allocation for each packet. It may be hard for your GPU to sniff traffic without help from the OS if each data packet could potential be anywhere in memory. A better zero-copy network stack in coming to FreeBSD and is being ported to Linux. Because of the zero-copy nature, the data is all in well defined places.

Re:That's it? (0)

Anonymous Coward | about a year ago | (#45488693)

What a fail...at least it wasn't a blatant slashvertisement!

It was a blatant NSA-related-government-grant-advertisement. For crap of that kind, the gubmint always has some petty cash handy, because terrists and Nashville segwity.

Re:That's it? (1)

edibobb (113989) | about a year ago | (#45488697)

In defense of the researcher (and not the author of the article), it was just a poster presented at a conference, not a published paper.

Re:That's it? (4, Informative)

NothingMore (943591) | about a year ago | (#45488867)

I saw this poster at the conference and I was not impressed and in fact it was one of the weaker posters that I saw at the conference (it was light on details and had some of the information on the poster when talking about GPU's in general was not entirely accurate). It is really a poster that should not have been at SC at all. While it is interesting in the network sense the amount of data they can process is not anywhere close to the amount that is actually flowing through these large scale machines (up to 10 GB/sec per node) and there was no information about scaling this data collection (which would be needed at extreme scales) to obtain meaningful information to allow for tuning of network performance.

This poster should have been at a networking conference where the results would have been much more interesting to the crowd attending. Also of note, IIRC the author was using a traditional GPU programming model for computation that is not efficient for this style of computation. The speedup numbers would have been greatly improved by using a RPC style model of programming for the GPU (persistent kernel with tasking from pinned pages). However this is not something I totally fault the author for not using since it is a rather obscure programming technique for GPU's at this time.

Re:That's it? (1)

gentryx (759438) | about a year ago | (#45492145)

However this is not something I totally fault the author for not using since it is a rather obscure programming technique for GPU's at this time.

Good point. I guess this will change once Kepler GPUs are widely adopted and CUDA 6.0 is published: With Kepler you can spawn Kernels from within the GPU and unified virtual addressing will make it easier to push complex data structures into the GPU (according to the poster these appears to be some preprocessing happening on the CPU).

Re:That's it? (1)

Anonymous Coward | about a year ago | (#45492959)

As a PhD in computer networking, I'll tell you, it would have been easy to publish on SC, than other reputable networking conferences. This article to me is non news.

Re:That's it? (0)

Anonymous Coward | about a year ago | (#45488923)

And the article just *had* to be split into two pages because it would have killed them to include that tiny boilerplate footer on page one. What a fail...at least it wasn't a blatant slashvertisement!

It's called browser real estate and someone has got to pay for it. Not you with that attitude it seems

Re:That's it? (1)

solidraven (1633185) | about a year ago | (#45489869)

He also has no clue about ASICs, lets take a look at this line: "Nor do they offer the ability to split processing duties into parallel tasks,"
If there is one thing you can do on an ASIC, it's parallelisation. Application specific cores are small, very small, standard multi-project wafer run technologies have a good number of metal layers so routing isn't too problematic, etc. So you can actually fit a whole lot of cores on a small silicon area in a modern technology. The main issue is the cost of the hardware designer, EEs sufficiently skilled in HDL to take on such a large project are an expensive commodity.

not new (2)

postmortem (906676) | about a year ago | (#45488357)

NSA already does this, how else you think they process all that data?

Re:not new (1)

Anonymous Coward | about a year ago | (#45488385)

millions of gnomes!

Re:not new (1)

Anonymous Coward | about a year ago | (#45488405)

Field programmable gnome arrays are going to be the next big thing in computing.

Re:not new (2)

JustOK (667959) | about a year ago | (#45489393)

Fuck that. If I have to go into a field to program the gnome arrays, I'm not doing it.

Re:not new (1)

VortexCortex (1117377) | about a year ago | (#45490449)

Thus the old system was abandoned, and Gnome3 was born.

Re:not new (1)

KingMotley (944240) | about a year ago | (#45491701)

Distributed cluster of lawn gnomes.

Re:not new (1)

jovius (974690) | about a year ago | (#45488409)

With a pinch of salt?

Re:not new (0)

Anonymous Coward | about a year ago | (#45488483)

NSA already does this, how else you think they process all that data?

Alien technology, acquired from a brief alliance with the Greys.

Re:not new (1)

bob_super (3391281) | about a year ago | (#45488745)

They don't "process" it.
"Processing" is bad, it's like "collecting".

They don't collect or process, they just "store" it. Nothing to worry about citizen. Move along, now.

Re:not new (0)

Anonymous Coward | about a year ago | (#45493147)

Principally, in software, with this: http://www.narus.com/

They used to use Cell processors, now they use normal CPUs, and for the fast paths they use GPUs, FPGAs and some custom ASICs.

For the "full-take feeds" they just flat-out grab everything and search through it later using Hadoop.

Re:not new (0)

Anonymous Coward | about a year ago | (#45504053)

[Citation needed]

Re:not new (0)

Anonymous Coward | about a year ago | (#45494129)

I have integrated Intel GPU. Take that, NSA! Ha Ha!

wishful thinking (1, Insightful)

Anonymous Coward | about a year ago | (#45488407)

"Compared to a single core CPU-based network monitor, the GPU-based system was able to speed performance by as much as 17 times"

Shouldn't "researchers" know better how to execute benchmarks in such a way that a comparison between a CPU and a GPU actually makes sense and is not misleading? Why didn't they compare it to a 12 or 16 core CPU to show that it is only marginally better and requires programming in OpenCL or CUDA? Why didn't they take a 2P system and show that it is actually performing worse? In that case they could have drawn the correct conclusion that it actually makes no sense to use GPUs for this purpose! It is sad that even among Fermilab researchers wishful thinking bends results.

Re:wishful thinking (1)

Anonymous Coward | about a year ago | (#45488619)

Shouldn't "researchers" know better how to execute benchmarks in such a way that a comparison between a CPU and a GPU actually makes sense and is not misleading?

If the goal is hard science, then that would make sense. But when the goal is to wow the press, grab attention, and whore in the media, then no... that would be the opposite of what you'd want.

Re:wishful thinking (1)

Guy Harris (3803) | about a year ago | (#45488835)

Why didn't they compare it to a 12 or 16 core CPU to show that it is only marginally better and requires programming in OpenCL or CUDA?

"Compared to a six core CPU, the speed up from using a GPU was threefold." If the 12-core CPU is twice as fast, that's 1.5x, and for a 16-core, that's 1.12x.

Re:wishful thinking (0)

Anonymous Coward | about a year ago | (#45489483)

That's not the "correct" conclusion. The correct conclusion relies on a cost-benefit analysis of not only coding time, but also number of 16-core CPUs you have to buy (yeah, real low end of the market, those) compared to how much GPU metal you need, across however many nodes you want, and the consequent energy usage and cooling costs.

The result is: if you want to do this, this is how fast it is. If you already have a 16-core CPU in your desktop at home and you want to play network monitor then of course it makes more sense to code in C on your own machine, but that's not what "researchers" are usually interested in.

Re: wishful thinking (2)

loufoque (1400831) | about a year ago | (#45489531)

In practice, most people who publish results of a new algorithm ported to GPU do not have a version well-optimized for CPU, or aren't that good at optimization in the first place. I've had several cases where I could make the CPU version faster than their GPU version, despite them having claimed a x200 speed-up with the GPU.
If you have a fairly normal algorithm in terms of data access and your speed-up is bigger than 4, you're probably doing it wrong.

GPU replaces CPU (1)

harshal.tawade (3010845) | about a year ago | (#45488493)

Now I think, I should buy a GPU only and no need to spend money on CPU. Going to save some money.

Re:GPU replaces CPU (1)

gl4ss (559668) | about a year ago | (#45488631)

yeah so you'd be buying a cpu?
msg me when someone writes an os for it.

pyramid3d? blast from the past?

Talk URL (1)

TheSync (5291) | about a year ago | (#45488511)

Here is a URL to a presentation [fnal.gov] on the issue of GPU-Based Network Monitoring.

BTW, with PF_RING and a DMA-enabled NIC driver (PF_RING DNA [ntop.org] ), one should have no problems capturing 10 Gbps on a single CPU modern server. I can capture/playback 4.5 Gbps no problem using this with four 10kRPM HDDs - 8 drives should give you 10 Gbps rate capture/playback.

Re:Talk URL (0)

Anonymous Coward | about a year ago | (#45488685)

is it cheaper?

Re:Talk URL (1)

cez (539085) | about a year ago | (#45491055)

I just demo'd Fluke Networks's TruView system that does 10Gb/s stream to disk, 24 TB array of 26 1TB hard drives... was very nice, not cheap though. 2 Xenon 16 Core CPUs if memory servers and a whole crap load of pretty analysis and correlations between Netflow & SNMP data... scary cool with the VOIP module.

Network Processors ? (0)

Anonymous Coward | about a year ago | (#45488599)

What about network processors ? They usually have an ARM CPU for the control plane, and a large number of custom processing units for the data plane. That's what this solution should be compared to, not a standard x86 CPU!

Re:Network Processors ? (1)

Shatrat (855151) | about a year ago | (#45490837)

You can't buy a custom ASIC off the shelf at Fry's, but you can buy a CPU or a GPU. I don't think it's an apples to apples comparison if you throw in custom hardware.

Aaaand now we're buying Big Brother off-the-shelf. (0)

Anonymous Coward | about a year ago | (#45488633)

I have always told my friends and family that PC gamers and the drive for faster gaming rigs led to the computer and microprocessor revolution of the 90s and early 2000s. I told them gamers saved the world.

Now we have apparently doomed it as well.

Well..... it was fun while it lasted.

Re:Aaaand now we're buying Big Brother off-the-she (1)

jones_supa (887896) | about a year ago | (#45491319)

Not so fast, buddy-boy. We still have positive efforts like Folding@home which tap the power of GPUs.

Sorry, but ... (wrong tool for the job) (2)

Ihlosi (895663) | about a year ago | (#45488937)

... the main task of GPUs is floating point calculations, and I doubt you need many of those when monitoring networks. Wrong tool for the job.

It's like saying that GPUs are "terrific" for Bitcoin mining, until you realize that they require one or more orders of magnitude more power for the same amount of processing than specialized hardware. And network monitoring is probably a common enough task that it's worthwhile to use hardware tailored to this particular job.

I'd like a regex computing device (0)

Anonymous Coward | about a year ago | (#45488949)

I'd like specialized hardware for regex on large amounts of data.

Re:I'd like a regex computing device (2)

Ihlosi (895663) | about a year ago | (#45489037)

Get an FPGA development system and implement your hardware in the FPGA, then ask a chip manufacturer to turn it into an ASIC. Expect to pay bucketloads of money on the way, though. It's only feasible if either costs are not an issue or you expect the resulting device to be mass-proced (six or better yet seven digit numbers manufactured per yeat).

Re:I'd like a regex computing device (0)

Anonymous Coward | about a year ago | (#45489305)

I believe some Freescale QorIQ (Power based) have this kind of engine running in parallel with the core.

Re:Sorry, but ... (wrong tool for the job) (0)

Anonymous Coward | about a year ago | (#45489509)

No they're probably using all those well-known statistical methods that work in the integer field.

Re:Sorry, but ... (wrong tool for the job) (0)

Anonymous Coward | about a year ago | (#45489757)

If you wanted to calculate SNMP statistics then having floating-point helps. This is the most compute intensive part of network monitoring. Your packets coming whizzing in as simple 1D arrays of bytes (stored by the network card as a ring buffer), but the actual statistics calculations for source, destination, packet size, protocol, packet type fields are random access and involve hash bucket indexing, floating point value calculations for time, averages, minimum and maximum values.

The RFC (Request For Comments) documents describe these in detail. They are the network industries equivalent to OpenGL extension specifications:

http://tools.ietf.org/html/rfc1757
http://tools.ietf.org/html/rfc4898

SNMP organises statistics into tables (like texture arrays) and you can have tables of tables (like arrays of arrays). These would be built up automatically as time progressed, and eventually you would have a whole hierarchical tree of data. This would be replicated across entire networks and be retrievable by any remote client if it had permission (much like GLX).

In Soviet Russia, your TV watches YOU! (1)

Thor Ablestar (321949) | about a year ago | (#45489011)

As I understand, there are at least 2 purposes for monitoring the network: debugging and spying. I believe that due debugging is already built-in. But spying is a concern, especially since the Russian authorities have required the ISPs to preserve ALL data traffic in their network for 12 hours for further investigation. What about NSA?

... and NICs will make great GPUs too (0)

Anonymous Coward | about a year ago | (#45489215)

In the other news researcher shows how Network Gear makes terrific GPUs!
US vendors like Cicso (due to the NSA cleverness) seriously consider entering the gaming business now.
And all NICs from now on, in addition to the new (whistling) blower will feature HDMI and display ports as well.

Breaking news (1)

viperidaenz (2515578) | about a year ago | (#45502621)

massively parallel system is suited to massively parallel tasks.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?