Researching Searching Algorithms? 62
NiN3x asks: "I have recently written a sorting algorithm that can be close to four times as fast as quicksort and never much slower. I was wondering if there are lots of sorting algorithms like this out there. I don't think I could be the only one that has thought of something like this. I am only in my second year of computer science, so I don't know a lot about these things. I have tried searching the net and can't find anything. My algorithm is NOT an inplace sorting algorithm. Can anyone point me to some sources for this type of thing?"
Bible (Score:4, Informative)
Publish it? (Score:5, Interesting)
Is there some reason why you cannot publish it here for a review? A basic C reference cannot be very many lines? This way people could give real input for you. I don't believe you have much to lose. If you do, attach your favorite license to it :)
Copy and paste it here (Score:2, Insightful)
Check out Perl's source code for more info. (Score:3, Interesting)
If your algorithm is general-purpose and happens to be faster, why not GPL/LGPL/BSD it and contribute to Perl's, Python's, and PHP's sorting and searching code?
Re:Check out Perl's source code for more info. (Score:2, Informative)
I'm mainly referring to Perl's C libraries, BTW.
However, to rebut:
Firstly, Perl is an interpiled/pseudo-compiled language, so it's probably very comparable to Java or Python or PHP or VB for application performance, but YMMV. It also does garbage collection, memory allocation, etc.
Secondly, most of the *wait time* for most apps, whether they are written in Perl or another language, is based on hardware (such as disk I/O), the network, the database, etc.
Thirdly, hardware is very fast today, and programmer time is FAR more expensive than a few measly CPU cycles. I've read anecdotal evidence that a competent Perl programmer is about 50 times more efficient than a competent VB programmer, to solve a given programming task (not counting GUI's or MS-specific stuff)! Also, a programming contest that used to let you use any available language has banned Perl, since it makes solving problems too easy! Since programmer time is a "speed" metric, this point hurts your argument.
Fourthly, Perl's forte is string processing. Heavy string, search, and sort processing is supposedly *MUCH faster* in Perl than in C! This is because of Perl's much faster algorithms.
Fifthly, as a programmer, you're in the driver's seat. What algorithms do you use? How well do you program? Are you using good pre-built modules, or old ones?
Sixthly, there are high-speed math and other high-speed modules for Perl. Do you use those or let Perl convert numbers to/from strings, before/after mathematical operations?
Seventhly, this is a old, moot argument, most of the time, these days. Unless something is "too slow" to be of practical use, it isn't. Go back to the six points above this one, no matter what language you use, and think through your problem.
Ergo, nope, Perl is most definitely NOT SLOW, until you've exhausted the above, at least. The same is true for any language, including assembly.
Besides, execution time is no longer important to most people, for good reason. Other factors, these days are higher on the list, as they should be.
Do you have specific examples? Where was it slow? Did you profile the functions in your code and then ask for help, once you couldn't find a reasonable solution? Did you try somebody else's pre-built module? What version of Perl did you use? Did you have hardware that was too small to load up the environment? Did you try Perl CGI without mod-Perl or Fast CGI, etc.? How exhaustive were you? (This and the above are on the short list, actually.)
Maybe it's me, but as a person who programs almost daily in Perl, I find that it's *really, realy, really fast*, most of the time. When it's not, I'm waiting on some silly Access database (not Oracle, MySQL, or PostgreSQL), or some other resource, the rest of the time. For example, when I load millions of records, on inferior hardware, from a flat file, in Perl, it usually takes a fraction of a second, unless I print it to the screen, or load straight into a database. How much faster should I make it? (I could, but what a waste of time, unless I need to load billions of records? Of course, we're now talking about 64-bit equipment.)
Perl has perhaps the largest organized community of developers that are very willing to help you on this.
I don't intend to start a language war, but have some facts before trolling, please. Any language I could have chosen will have people for and against it. If I would have said "Python" or "PHP "or "Java" or "VB" or even "C++," I might have gotten this response.
Funny that this topic came up... (Score:5, Informative)
Art of Computer Programming, Volume 3: Sorting and Searching by Donald Knuth
http://www.amazon.com/exec/obidos/tg/detail/-/0
And since you are a student, go show it to a CS professor for immediate feedback. My guess will be that you haven't properly analyzed the O(n) performance.
Re:Funny that this topic came up... (Score:2)
Re:Funny that this topic came up... (Score:5, Interesting)
My guess, is that it really isn't. There are many algorithms that are faster than quicksort for smaller inputs. For less than 100 elements, insertion sort wins (hands down). For less than "really more than you usually need to sort, but not really extreme values", shell-sort usually wins.
Also, there are lots of algorithms that are faster than O(nlogn) when you have more information than that given to you by <:. One example would be bucket-sort.
Personally, I couldn't care less. The chances of a second-year student inventing a better sorting algorithm that isn't already published somewhere is nada. That doesn't mean that he is not smart, and that his algorithm is stupid. It *is* probably better than quicksort for the test-cases he has tested it on (and maybe for many more). What it means is that the field of sorting is so well-studied that it is very unlikely to get real contributions from somebody without at least two PhD's in mathematics these days.
Re:Funny that this topic came up... (Score:3, Interesting)
The interesting thing about O(nlogn) versus O(n) or O(whatever-you-want) is that an "operation" is not often the same between algorithms.
So an O(nlogn) implementation is still faster than an O(n) implementation when the input set is less than 1,000,000 items and the latter implementation requires 6 times the time per operation compares to the former.
So while it is unlikely that someone without a vast theoretical background will discover a better algorithm for all cases (or even the extreme cases), as you point out; there is a significantly greater liklihood that he could have discovered an algorithm which provides improvement for data sets which are commonly used in certain - even many - fields. Without having multiple doctorates.
Sorting in general is a well-studied field; but as the application of computers grows, the need for less general sorting grows as well. Many data structures and algorithms are not considered "sorted" because they are partitioned, or implicitly ordered, yet they make use of sorting theory.
Re:Funny that this topic came up... (Score:2)
To clarify even more, insertion sort is O(n^2), but really fast for smaller datasets, and for data already sorted (in that case it is O(n)).
Shell sort is like an advanced insertion sort, that makes it faster for larger datasets, but also adds a larger overhead. It is also O(n^2) in the worst case, but if I remember correctly, Sedgewick proves a much better average-time for (different variants of) it in his algorithm book (which I unfortunately not have handy).
In general, people should avoid quicksort. The obvious implementation (recursion) is not particularly efficient, either speed-wise or in terms of space. Selecting a good value to divide the dataset on is hard (if you want to avoid O(n^2) performance). And it's overkill for anything but quite large datasets. In general, insertion-sort or shellsort is much better.
In general, you should also avoid the C-library routine qsort(). It requires a function pointer for comparison, making it extremely slow. It's also hard to remember how to use it.
Re:Funny that this topic came up... (Score:2, Insightful)
The original poster mentions that his algorithm isn't a simple in-place sort - which, based on my best guess, means that he's probably sorting pointers quickly then re-arranging the elements in a single pass. Maybe, maybe not - but still no faster than O(n log n).
But at this point, I'd start worrying more about scalability, memory usage, etc. - one of the nice things about quicksort is that it is so general, it can quickly be adapted to low-memory or multi-processor implementations. Sure the algorithm is faster - but faster does not necessarily mean better. If my implementation guess above is correct, the algorithm would use significant additional memory, making it unsuitable for sorting simple numbers (where the pointers would dwarf the elements themselves) or for use in low-memory (i.e. embedded) applications.
The true test of whether the algorithm is innovative is: 1) have a professor look over it, and then 2) get a (respectable) journal to publish it. If it passes the peer review required for publication, then it's an innovative algorithm. If not, it's one of thousands of alternative implementations of the same idea. Clever, maybe even useful, but not spectacular.
Fast vs. Optimal (Score:2, Insightful)
I know the question raised was on the correct analysis of the algorithm in question, but several of these postings have diverged into languages and types of data. This is good and it is the whole point of a well grounded CS education -- knowing all of the strengths and weaknesses of available algorithms so you can make the correct choices based on the size and type of the data and the language (and libraries of choice).
Re:Fast vs. Optimal (Score:2)
That would depend on the data-size of course. Since GW-basic only supports 65k data, I would assume yes, even if the one for the fast machine was written in GW-basic...
Wrong title? (Score:2, Insightful)
Re:Wrong title? (Score:1)
Does it really take that long to proofread topics?
Re-Searching Searching? (Score:5, Funny)
Re:Re-Searching Searching? (Score:1)
Algorithm resources (Score:5, Informative)
The definitive online resource for algorithms is NISTS's Dictionary of Algorithms and Data Structures [nist.gov]. There is a list of algorithm resources [evergreen.edu], and you can also find some free e-books using The Assayer [theassayer.org].
In print you should be looking for "Introduction to Algorithms, 2nd edition". It is the bible of the field. Other excellent candidates are "Data Structures and Algorithms" ( / in Java / in C).
Google will also tell you to look here [monash.edu.au], here [uwa.edu.au] and here [uchile.cl].
This must Kunth's method G. (Score:1)
Apples and Oranges? (Score:5, Informative)
The easy way to show that faster sorts exist is to demonstrate absurd limit case of a tradeoff of space for speed. Consider you have an unlimited amount of memory available for your sort results, and that you are sorting a finite number of keys N for which a mapping M(n) exists to the positive integers. Then, since there can be at most N duplicates of any given key, scanning the data once and placing each key n(i) in memory address N*M(n(i))+i sorts all the data. This is O(N), and pretty much optimal.
If you know there are no duplicates and the keys fill a known range this can be practical.
A good place to start reading about other sorting algorithms is here: http://www.nist.gov/dads/HTML/sort.html
Re:Apples and Oranges? (Score:4, Interesting)
Yeah, the sorting is O(N) but reading the result is O(N^2) and without knowing the result the sorting itself doesn't do much good. That's because you need N^2 of memory and in worst case you have to do linear scan through the whole memory. If you think about sorting it should be pretty clear that O(n log n) is best you can get if you have no information about the data and it's uniformly distributed. IIRC worst case for heap sort is O(n log n) and worst case for quicksort is O(n^2) so if you don't need in-place sorting you should use for example heap sorting. If you don't have many items to sort just do insertion sort because it has least overhead.
Re:Apples and Oranges? (Score:2)
Sort of -- the information theory lower bound states that you cannot go faster than Omega(n log n) with a comparison-based sort. Note that "Big-Oh" notation is for stating upper bounds on function growth, not lower ones.
Your quote above is correct, but should be more clearly stated as "...Theta(n log n)" or "Omega(n log n)" because both imply lower bounds, whereas "Big-Oh" does not.
Re:Apples and Oranges? (Score:1)
Allocating/initializing/garbage collecting the array may take O(N*M), depending on the implementation. A different implementation could skip the array and use a list off of the M different buckets, reducing the memory usage to O(N+M) and improving performance similarly.
Happy Hacking,
Tupper
Re:Apples and Oranges? (Score:2)
> you have an unlimited amount of memory available for your sort results, and that you are sorting a finite number of keys N
> for which a mapping M(n) exists to the positive integers. Then, since there can be at most N duplicates of any given key,
> scanning the data once and placing each key n(i) in memory address N*M(n(i))+i sorts all the data. This is O(N), and pretty
> much optimal.
This isn't in O(N) unless your mapping meets certain criteria. Though your data will be "sorted", they will be spaced out in memory with large gaps between them -- gaps that you'd need to traverse to actually collect the result. In this case your algorithm is in O( N*(max(M(n(i)))-min(M(n(i)))) ). This is pretty good when the mapping is a small range (say, playing cards), but poor when the mapping could be large (say, strings).
In the abstract, it's not possible to sort data that only have a binary comparison operator (say, the real numbers) with fewer than n*log n comparisons.
Re:Apples and Oranges? (Score:2)
Er, oh yes it is. If the memory I'm writing to is the lights in the scoreboard at Yankee stadium - or directly into video memory - why on earth would I need to collect the results? Collection is not actually part of the sorting problem.
Anyway, I did say:
* that it was absurd
* that its
Its not bounded by O(N ln N) because it doesn't use comparisons between keys.
Read e.g. this paper [nada.kth.se] which mentions this
BTW, if you are the same Tom7 who makes the fonts, I bow humbly before you - they are brilliant
-Baz
PS as for the Anon comment on storage in memory taking O(ln N), this is true of quicksort as well, but is ignored - complexities for in-place sorts assume that memory access is instant. If you do take that into account you'll find quicksort has an O(N ln N ln N) term but the constant in front of it is tiny, and in practice the assumption of instant memory - on which the quicksort bound is based - is true. Consider that even if the N ln N term was just 100 times larger than the N ln N ln N term you'd need of the order of 2^100 items to sort before it dominated!
Re:Apples and Oranges? (Score:2)
Well, you can define the sorting problem however you like, but to me it means taking an array (or better, list) and returning an array (or list) with the sorted result. If I'm allowed to arrange the results anyhow I like, then I can do all sorts of crazy things
Yes, I am the fonts guy. Thanks!
But how do you get from one element to the next? (Score:1)
And, random access to memory should be considered O(log(N)) for a given memory size
Re:Apples and Oranges? (Score:2, Informative)
A more common and real example of this is sorting a specific range of consecutive integers [a,b]. The function M can simply be M(x)=x-a and is considered a simple hash function. Once all the elements have been hased according to M(x) into some array A, the array A will contain the sorted elements. This clearly takes only O(n) in the worst case.
I beleive it's been proven that any general sort algorithm that does not reley on special conditions on the data or memory has a lower bound of Omega(nlog). Look up decision tree proofs in you favorite algorithm book.
That's not to say that your algorithm is not usefull though. Often people use things like randomized algorithms as they have better average times. So, talk to your professors. They would be happy to explain the specifics of your algorithm to you.
Also, if your library has online journals, you can search those for sorting algorithms. And don't be afriad to show the algorithm to people. If your worried about IP, don't be. Chances are the university already has IP rights to it as your a student. That doesn't mean you can market your algorithm if it turns out to be somethings spectacular. Just talk to your professors, they are the experts!
Re:Apples and Oranges? (Score:2)
You'll find more techniques [nada.kth.se] like this in the literature if you look for integer sorting [google.com].
Re:Apples and Oranges? (Score:2)
Re:Apples and Oranges? (Score:1)
If you mean better than O(nlogn), then I don't know of one either. I just used random algorithms as an example of algorithms that may not have the best big oh runtime but can still be usefull.
The poster did not specify any runtime properties, average, worst case, or best case. It's not clear that his algorithm is any better than randomized quicksort. But depending on the properties of the poster's algorithm it could be useful so he should properly analyse it.
Re:Apples and Oranges? (Score:2)
Proper Analysis is Key (Score:5, Insightful)
There are lots of algorithms out there that seem to be multiples faster than quicksort in daily use. The problem is when you formally analyze them, you find out their not gauranteed to be faster, and could in fact be horribly slower. If you run it enough times against various wierd data, you'll probably hit one of these slow runs yourself. I can only guess because you haven't mentioned anything about the algorithm or shown any sample code. Read up on how to analyze your code for O() notation and figure out what the real worst case is mathematically.
Re:Proper Analysis is Key (Score:2)
There are lots of algorithms out there that seem to be multiples faster than quicksort in daily use.
For example: IIRC, A bubble sort appears to be efficient if the data being sorted is already mostly sorted. But on mostly random data, it sucks.
Re:Proper Analysis is Key (Score:2)
Nope, that would be insertion sort. Bubble sort is O(n^2) both on average and in worst case. It's the archetypical bad sorting algorithm (well, except for the "1: try a random permutation 2: return if result sorted 3: goto 1" algorithm which requires O(n!) time on average, but may not terminate at all :-)
Google? (Score:4, Informative)
http://directory.google.com/Top/Computers/Algorit
How about the teachers? (Score:2)
Why don't you go and talk to your professor, then? At least someone from the CompSci faculty should be able to help you. If not, then you're probably studying at the wrong university
Well, post it. (Score:4, Interesting)
Post your algorithm, then!
There are many implementations of quicksort-like sorts, but when done well, quicksort is pretty damn fast. For instance, if you're measuring against a version that doesn't special case arrays of size 3, 4, and maybe even bigger, then it will run many times slower than a tuned version. You won't be able to beat quicksort (with proper pivot-picking) asymptotically, so there won't be any ground-breaking result here, but it's probably possible to beat its constant factors (which may be practically useful)... and there are probably many people here who'd be willing to look at the algorithm if you posted it.
*sigh* (Score:3, Informative)
#2, if that gives you no results, post the code or at least pseudo-code so we can comment. Not "I have a new miracle development that could be revolutionary; please comment on it with no clue as to exactly what it is or how it works."
#3, talk to a CS professor. You should know plenty of them.
And the obligatory link - http://www.cs.ubc.ca/spider/harrison/Java/sorting
Re: animation pages (Score:2)
I used to have a set of pages up (currently dead) that launched 6 different implementations with a single click... and the animations had 50, 100, and 250 lines. Not isolated sorts
With 50 points, you think quick sort is faster, but think the simplicity of some of the other sorts may make them preferable.
But with just 250 points, you have no doubts about the relative performance of the various algorithms.
amazing (Score:5, Funny)
"I'm a 2nd year cs student, and I discovered an unknown searching algorithm. I won't provide any details, though"
"I'm not that good at math, but I just invented a new unbreakable multipad encryption. I won't provide any details, though"
"During my lunch break, I used a couple coat hangers, some aluminum foil, a 3 hole punch, a spare xeon-argon laser, and 32oz of diet pepsi to create my own home-brewed cold fusion reactor."
Can you identify the PhysicsGenius troll from the Ask Slashdot question?
And people complain about what gets through the US Patent Office!
Criteria for an Ask Slashdot submission? (Score:2)
Yeah, I have to agree that these Ask Slashdot questions are getting worse and worse. In a comment I made to a previous Ask Slashdot [slashdot.org], I suggested that the forum should be reserved for questions that stimulate a large amount of discussion on issues that do not have any obvious answers. I really don't understand what criteria the slashdot editors use to determine whether to post an Ask Slashdot submission or not. Every so often there is a really good Ask Slashdot question so I don't want to filter this out from my preferences. But part of me gets annoyed with people posting crap Ask Slashdot questions even if I have the sense not to read them.
I've seen journalists and students asking questions here along the lines of "please do my investigation for me" and this gets accepted. Then we have these people who think they've developed some new ingenious algorithm but haven't bothered to have ANYONE look at their work. I realize my post is starting to ramble at this point so I think I'll end with one request that I pray the slashdot editors will read and consider: please tighten up the criteria for what gets accepted as an Ask Slashdot question. So many of these questions are simply the result of laziness on the part of the submitter. My earlier post (that I linked to above) is one suggestion for what Ask Slashdot questions should be like. I welcome other people's ideas as well.
GMD
*DIET* Pepsi! (Score:2, Funny)
Re:amazing (Score:1)
smeat!
Re:amazing (Score:3, Funny)
I just finished Kindergarten, where we learned our ABC's, and I've invented an alphabet that can be read 7 times as fast, with only one third the effort. Why didn't some PhD come up with this before? Oh well. I don't remember what my question was, but just post this, ok?
Latin vs. Tengwar (Score:2)
I just finished Kindergarten, where we learned our ABC's, and I've invented an alphabet that can be read 7 times as fast, with only one third the effort. Why didn't some PhD come up with this before?
There are alphabets that are better than Latin in some respect. Take Tengwar [geocities.com] for instance. The script is designed along sound phonological principles (no pun intended): voiced consonants, fricatives, and nasals have a predictable change in shape from the basic voiceless stops (p, t, c, q). It's been adapted to at least English, Sindarin, Quenya [geocities.com], Polish [elvish.org], Lojban [tuxedo.org], Esperanto [tuxedo.org], and Toki Pona [angelfire.com]. On the other hand, some people have expressed increased dyslexia due to use of Tengwar [zompist.com].
Research searching algorithms? (Score:2)
do it right ;) (Score:2)
First, as others said.. profile it. What is the asymptotic upper bound (O-notation)? Read "Introduction to Algorithms",written by Cormen, Leiserson, and Rivest. Dispite the name, reading that book is not a simple task
Oh, and go to one of your professors for verification.
I have a better way... (Score:1)
SELECT * FROM Articles ORDER BY ArticleId
It works every time. Don't need any log() or sin() or any of that fancy stuff.
complete catagorization? (Score:1)
another catagory I know is:
Is it stable [everything2.com] or un-stable?
Why don't you talk to faculty? (Score:1)
Have you analyzed your algorithm yourself? Is it asymptotically equivalent to something like quicksort, but it just has a tighter inner-loop? Does it have a better average running time on random input? Does it work always (or is it flawed/probabilistic)? I think those are the questions you need to have answered.