Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

The Double Life of Memory Exposed With Automata Processor

Unknown Lamer posted about a year ago | from the perl-accelerator dept.

Supercomputing 32

An anonymous reader writes "As Nicole Hemsoth over at HPCwire reports 'In a nutshell, the Automata processor is a programmable silicon device that lends itself to handing high speed search and analysis across massive, complex, unstructured data. As an alternate processing engine for targeted areas, it taps into the inner parallelism inherent to memory to provide a robust and absolutely remarkable, if early benchmarks are to be believed, option for certain types of processing.'" Basically, the chip is designed solely to process Nondeterministic Finite Automata and can explore all valid paths of an NFA in parallel, hiding the whole O(n^2) complexity thing. Micron has a stash of technical documents including a paper covering the design and development of the chip. Imagine how fast you can process regexes now.

Sorry! There are no comments related to the filter you selected.

post went up before i finished typing it (0)

Anonymous Coward | about a year ago | (#45506171)

that's fast

almost sure i was going to say something heartfelt (-1)

Anonymous Coward | about a year ago | (#45506195)

the new material for the age of momkind (our centerpeace) & beyond

no need to forget what we never know about (0)

Anonymous Coward | about a year ago | (#45506313)

motive always equals results http://www.youtube.com/watch?v=MLO3NmGJuHg

where do they get unlimited supply of ordinance? (-1)

Anonymous Coward | about a year ago | (#45506343)

weapons factories? who pays for all the damage?

some still calling this 'weather' so we are slow learners by no fault of our own

Re: post went up before i finished typing it (0)

Anonymous Coward | about a year ago | (#45506263)

Oh joy. That means the NSA can use these to get our data before we finish it.

Exploring NFAs in parallel (4, Interesting)

supersat (639745) | about a year ago | (#45506177)

Well, strictly speaking, you can already explore all paths of an NFA simultaneously. You simply convert it to a DFA, which each state representing a potential subset of states the NFA could be in. It requires exponentially more states, but the running time is equivalent.

Re:Exploring NFAs in parallel (0)

Anonymous Coward | about a year ago | (#45506257)

The conversion costs the same as just solving the NFA in the first place.

Re:Exploring NFAs in parallel (5, Interesting)

Anonymous Coward | about a year ago | (#45506273)

In the real world (as opposed to the Big O world), the running time is waaayyyyyy faster.

We wrote a program to translate ~10,000 non-trivial Perl regular expressions to DFAs (specifically, into Ragel syntax). Actually, only about a quarter could be completely translated, because of various Perl extensions. The rest had to be programmatically analyzed for sub-expressions which could be turned into filters--only if the filters matched did the PCRE actually run.

Achieved a 20x (yes, 20x!) improvement in throughput compared to Perl or PCRE, and very nearly that compared to RE2 (RE2 doesn't scale well wrt memory when you try to union more than a few complex expressions together). However, as you pointed out, the state machine was enormous--over 1GB compiled, and over 2GB in source code. It was so big that neither GCC nor clang could compile it within an afternoon because their array and structure handling code has some algorithmic deficiencies. So we had to come up with some tricks there, too.

You would think the DFA approach would run afoul of memory bandwidth and cache trashing at that scale. And it probably does, although there appears to be a significant degree of locality involved for most inputs. But regardless it's clearly better than running NFAs.

Tangent:
I was once on a conference call with Intel engineers who were hawking some fancy substring matching code. I think they implemented Aho–Corasick, utilizing new Xeon vector operations. Their goal was obviously to sell more expensive hardware by selling optimized software at a nominal cost and tying you to their platform. Their numbers were nice, but Ragel was still performance competitive, could scale better (in terms of number of patterns), is FOSS, and has an ingenius design which makes it trivial to integrate into your application without having to deal with an API that breaks your threading or I/O model.

I'm sure the Automata processor would do well if it can be productized properly. But there's plenty of room for improvement in the way people do regular expressions now.

Re:Exploring NFAs in parallel (1)

Wootery (1087023) | about a year ago | (#45506473)

Achieved a 20x (yes, 20x!) improvement in throughput compared to Perl or PCRE

You were using Ragel with C/C++, right?

However, as you pointed out, the state machine was enormous--over 1GB compiled, and over 2GB in source code

Good lord, what were you processing?

Re:Exploring NFAs in parallel (1)

wkcole (644783) | about a year ago | (#45507645)

However, as you pointed out, the state machine was enormous--over 1GB compiled, and over 2GB in source code

Good lord, what were you processing?

AC discussing massive complex unstructured data analysis? My bet is a PRISM LOVEINT project...

Re:Exploring NFAs in parallel (2, Interesting)

Anonymous Coward | about a year ago | (#45506871)

Hi, I'm a biochemist and I had a very nice conversation with these guys at SC13.

If it actually does appear on the market as a normal memory module, it is likely to find many uses that cannot be imagined now!

In the paper presentation the phrase "non-von neumann" architecture is a hint at the applications that may be useful.

And while we are on the subject, the quantum d-wave talk was fascinating...!

I stopped reading TFA (1)

Anonymous Coward | about a year ago | (#45506181)

It's so full of scripts that keep phoning home the annoyance couldn't possibly justify the maximum possible value gainable from reading the content.

O(n^2)? (3, Insightful)

Anonymous Coward | about a year ago | (#45506221)

I suppose that was meant to say O(2^n)? Emulating an NFA on a DFA requires exponentially more either memory or time.

Re:O(n^2)? (1)

deaf.seven (2669973) | about a year ago | (#45506397)

with n^2 we wouldn't have the NP = P problem^^

"handing" - handling [EOM] (0)

Anonymous Coward | about a year ago | (#45506265)

See subject.

I take it this was invented by an African (-1)

Anonymous Coward | about a year ago | (#45506281)

Oh, wait...

Africans are INCAPABLE of creating things like this, but we should just let them move into our countries by the MILLION, right?

Re:I take it this was invented by an African (-1, Troll)

Anonymous Coward | about a year ago | (#45506497)

Um, someone has to date our fat chicks.

Re:I take it this was invented by an African (-1)

Anonymous Coward | about a year ago | (#45506699)

Such as his mom. Her cuck husband can't satisfy her with his 1-inch erect baby cock.

Double life (0)

Anonymous Coward | about a year ago | (#45506291)

Every double-life must end in a double-death.

When CPU meets RAM. (0)

Anonymous Coward | about a year ago | (#45506431)

Re:When CPU meets RAM. (1)

rpresser (610529) | about a year ago | (#45507113)

Somewhat similar. Also resembling WISARD [emeraldinsight.com] , which used RAM like a neural network.

The essential point to think about is that a 1GB RAM can act as a cellular automaton with a billion cells. There's no algorithmic complexity to defeat because there's no algorithm. Of course the harder the problem gets, the more cells you need....

wonder how it compares to GPGPU (4, Interesting)

Trepidity (597) | about a year ago | (#45506463)

There is some [sigcomm.org] existing work [github.io] on the same basic idea of massively parallelizing regex matching by doing all the NFA branches in parallel, but using a GPU. Now a GPU is not necessarily perfectly suited to this problem, but it does have the advantage of being a mass-market consumer product, which produces economies of mass-market scale that let the average GPU have a ton more processing units and RAM than this Automata processor does. Would be interesting to see if an NFA-specialized processor gets enough of a speedup to overcome the manufacturing advantage of just throwing a beefy GPU at the problem.

Yes, but how many Hashes/s can it crack?!? (0)

Anonymous Coward | about a year ago | (#45506513)

Yes, but how many Hashes/s can it crack?!?

The NSA finally gets some good news! (0)

Anonymous Coward | about a year ago | (#45506537)

The final obstacle to complete surveillance falls.

Speeding up the wrong algorithm again? (4, Interesting)

Theovon (109752) | about a year ago | (#45506685)

I skimmed over the article, but I couldn’t tell: Is this another example of someone choosing an O(n^2) algorithm over an equivalent O(n log n) algorithm and then patting themselves on the back because they speed up the O(n^2) algorithm through parallelism, even though it’s still slower than the O(n log n) on a single processor?

I can’t tell because they go on about NFA’s, which would be exponential in time compared to DFA’s. That’s a totally different thing.

Re:Speeding up the wrong algorithm again? (0)

Anonymous Coward | about a year ago | (#45506735)

The n^2 comment is simply the incompetence of the Slashdot editors on display once again.

Re:Speeding up the wrong algorithm again? (0)

Anonymous Coward | about a year ago | (#45511341)

Yes, it is.

Re:Speeding up the wrong algorithm again? (0)

Anonymous Coward | about a year ago | (#45512977)

The O(n^2) thing is from the submitter, not the article. It's a complete facepalm because obviously the problem is NP-complete by definition.

Space for Time Trade-off (1)

Anonymous Coward | about a year ago | (#45506837)

From a brief scan, this is an interesting Space-time tradeoff [wikipedia.org] . Definitely has close physical limits to scaling, and we're not that strong algorithmically on what to do with this type of chip. Should be good Ph.D. material for a few years at least.

No, it cannot hide O(n^2).... (1)

gweihir (88907) | about a year ago | (#45508251)

O(n^2) is the complexity of the problem, not that one of the implementation. That means it is impossible to hide it. This thing just gives some level of parallelism, hence the performance looks a bit better. It is still O(n^2) and when the NFA expands the search tree beyond that parallelism, the thing is back to plain quadratic behavior.

running NFA in even LINEAR time is trivial (0)

Anonymous Coward | about a year ago | (#45509067)

Well you can run any NFA in linear time (special hardware or no)... you don't have to determinize. Simply start in the initial state and carry along the set of all states reachable: if your word is "abbbab" then you check which states can be reached by 'a', then which states can be reached from those with a 'b'.. etc. None of those sets is larger than the number of states (constant) and the input length sets a hard limit on the length of the computation. In effect you produce a run of the powerset automaton without constructing that automaton explicitly.

You can even check if the language is non-empty (simple depth-first search).

What you cannot do with an NFA (efficiently) is to check whether there's an input that's rejected. THAT requires exponential time (it's PSPACE complete).

My guess is that the hardware uses some sort of symbolic representation of the NFA to speed up use of that kind of thing. Or possibly to try speed up some of those PSPACE-hard stuff I mentioned... like non-universality (there exists an input that is not accepted), intersection and so forth.

Now They Have Two Problems (0)

Anonymous Coward | about a year ago | (#45514695)

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

-- Zawinski [wikiquote.org]

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?