Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

A Fictional Compression Metric Moves Into the Real World

Unknown Lamer posted about 3 months ago | from the best-thing-since-sliced-scatterplots dept.

Programming 133

Tekla Perry (3034735) writes The 'Weissman Score' — created for HBO's "Silicon Valley" to add dramatic flair to the show's race to build the best compression algorithm — creates a single score by considering both the compression ratio and the compression speed. While it was created for a TV show, it does really work, and it's quickly migrating into academia. Computer science and engineering students will begin to encounter the Weissman Score in the classroom this fall."

Sorry! There are no comments related to the filter you selected.

Darkies (-1, Flamebait)

Anonymous Coward | about 3 months ago | (#47552799)

Niggers. Coons. Jigaboos. Porchmonkeys. Yard apes.

That is all.

Dupe (-1)

Anonymous Coward | about 3 months ago | (#47552835)

From 2 days ago [slashdot.org]

Re:Dupe (1)

Travis Mansbridge (830557) | about 3 months ago | (#47552975)

Aside from centering around Silicon Valley, I don't see how these stories are related. That one is about a fictional compression algorithm, while this one is about a method for rating compression algorithms which is becoming nonfiction.

Bullshit.... (4, Interesting)

gweihir (88907) | about 3 months ago | (#47552863)

A "combined score" for speed and ratio is useless, as that relation is not linear.

Re:Bullshit.... (3, Insightful)

i kan reed (749298) | about 3 months ago | (#47552941)

Well then write a paper called "an improved single metric for video compression" and submit it to a compsci journal. Anyone can dump opinions on slashdot comments, but if you're right, then you can get it in writing that you're right.

Re:Bullshit.... (4, Insightful)

gweihir (88907) | about 3 months ago | (#47552985)

There is no possibility for a useful single metric. The question does obviously not apply to the problem. Unfortunately, most journals do not accept negative results, which is one of the reasons for the sad state of affairs in CS. For those that do, the reviewers would call this one very likely "trivially obvious", which it is.

Re:Bullshit.... (1)

buchner.johannes (1139593) | about 3 months ago | (#47555897)

This point comes up often in genetic algorithms, when more than one quantity should be optimized for. A common solution is to build a Pareto frontier [wikipedia.org] , and declare them the best.

A combination between two quantities is always a personal weighting. It may be useful, but it may also be limited in application. In the case here, the balance between compression speed and achieved size is too personal to be general-purpose, but perhaps the metric is useful for the use case of TV streaming content providers.

Re:Bullshit.... (2)

Darinbob (1142669) | about 3 months ago | (#47553703)

I don't think this metric is really in any computer science journal, it's only in IEEE Spectrum.

Re:Bullshit.... (2)

Beck_Neard (3612467) | about 3 months ago | (#47554197)

Uhm, do you really think that something as important as assessing the performance of compression algorithms wouldn't have attracted the attention of thousands (or, more likely, hundreds of thousands) of computer scientists over the years? Open up any academic journal that deals with this stuff even tangentially and you find many examples of different metrics for assessing compression performance. And there's nothing new about this 'score'. Dividing ratio by the logarithm of the compression time is a very widely-used theoretical scoring function; I can find references to it from the 90's. This particular form of that score may be new, but gweihir is right; such a score doesn't give much information and has very little use.

Re:Bullshit.... (4, Insightful)

nine-times (778537) | about 3 months ago | (#47553137)

Can you explain in more detail?

I'm not an expert here, but I think the idea is to come up with a single quantifying number that represents the idea that very fast compression has limited utility if it doesn't save much space, and very high compression has limited utility if it takes an extremely long time.

Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one. So why can't you create some kind of rating system to give you at least a vague quantifiable score of that concept? I understand that it might not be perfect-- different algorithms might score differently on different sized files, different types of files, etc. But then again, computer benchmarks generally don't give you a perfect assessment of performance. It just provides a method for estimating performance.

But maybe you have something in mind that I'm not seeing.

Re:Bullshit.... (2)

jsepeta (412566) | about 3 months ago | (#47553583)

That's kind of like the Microsoft Windows Experience Index that is provided by Windows Vista / Windows 7 which gives a score based on CPU, RAM, GPU, and hard disk speed. Not entirely useful but gives beta-level nerds something to talk about at the water cooler.
http://windows.microsoft.com/e... [microsoft.com]

At work my desktop computer is a Pentium E6300 with a 6.3 rating on the CPU and an overall 4.8 rating due to the crappy graphics chipset.
At work my laptop computer is an i3-2010M with a 6.4 rating on the CPU and an overall 4.6 rating due to the crappy graphics chipset.

A compression algorithm rated by speed and compression ability would have to weight the speed vs. the compression, right?

Re:Bullshit.... (1)

gweihir (88907) | about 3 months ago | (#47554373)

Good comparison.

Re:Bullshit.... (5, Informative)

mrchaotica (681592) | about 3 months ago | (#47553795)

Can you explain in more detail?

If you have a multi-dimensional set of factors of things and you design a metric to collapse them down into a single dimension, what you're really measuring is a combination of the values of the factors and your weighting of them. Since the "correct" weighting is a matter of opinion and everybody's use-case is different, a single-dimension metric isn't very useful.

This goes for any situation where you're picking the "best" among a set of choices, not just for compression algorithms, by the way.

Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one.

User A is trying to stream stuff that has to have latency less than 15 seconds, so for him the first algorithm is the best. User B is trying to shove the entire contents of Wikipedia into a disc to send on a space probe [wikipedia.org] , so for him, the third algorithm is the best.

You gave a really extreme[ly contrived] example, so in that case you might be able to say that "reasonable" use cases would prefer the middle algorithm. But differences between actual algorithms would not be nearly so extreme.

Re:Bullshit.... (3, Insightful)

nine-times (778537) | about 3 months ago | (#47553917)

Since the "correct" weighting is a matter of opinion and everybody's use-case is different, a single-dimension metric isn't very useful...[snip] User A is trying to stream stuff that has to have latency less than 15 seconds, so for him the first algorithm is the best.

And these are very good arguments why such a metric should not be taken as an end-all be-all. Isn't that generally the case with metrics and benchmarks?

For example, you might use a benchmark to gauge the relative performance between two video cards. I test Card A and it gets 700. I test Card B and it gets a 680. However, in running a specific game that I like, Card B gets slightly faster framerates. Meanwhile, some other guy wants to use the video cards to mine Bitcoin, and maybe these specific benchmarks test entirely the wrong thing, and Card C, which scores 300 on the benchmark, is the best choice. Is the benchmark therefore useless?

No, not necessarily. if the benchmark is supposed to test general game performance, and generally faster benchmark tests correlate with faster game performance, then it helps shoppers figure out what to buy. If you want to shop based on a specific game or a specific use, then you use a different benchmark.

Re:Bullshit.... (1)

Ardyvee (2447206) | about 3 months ago | (#47554013)

Why generate a score in the first place, when you can just provide compression ratio, compression speed, or in the case of the card: fps (at settings), energy used, consistency of the fps (at settings), along with any other characteristic you know or can test that doesn't combine two other things and let the user decide which of those things are better instead of trying to boil it all down to a single number?

Re:Bullshit.... (1)

gweihir (88907) | about 3 months ago | (#47554407)

The uses for that single number are as follows:

a) Some class of people like to claim "mine is bigger", which requires a single number. While that is stupid, most people "understand" this type of reasoning.
b) Anything beyond a single number is far to complicated for the average person watching TV.

In reality, things are even more complicated, as speed and compression ratio depend both on the data being compressed, and do that independently to some degree. This means, some data may compress really well and do that fast, while other data may compress exceedingly bad, but also fast, while a third data set may compress well, but slowly and a 4th may compress badly and slow. So in reality, you need to state several numbers (speed, ratio, memory consumption) for benchmark data and in addition describe the benchmark data itself to get an idea of an algorithm's performance. If it is a lossy algorithm, it gets even more murky as then you need typically several quality measures. For video, you may get things like color accuracy, sharpness of lines, accuracy of contrast, behavior for fast moving parts, etc.

Re:Bullshit.... (1)

nine-times (778537) | about 3 months ago | (#47554857)

Depending on what you're talking about, providing a huge table of every possible test doesn't make for easy comparisons. In the case of graphics cards, I suppose you could provide a list of every single game, including framerates on every single setting on every single game. It would be hard to gather all that data, and the result would be information overload, and it still wouldn't allow you to make a good comparison between cards. Even assuming you ad such a table, it would probably be more helpful to add or average the results somehow, providing a cumulative score. Of course, then you might want to weight the scores, possibly based on how popular the game is, or how representative it is of the actual capabilities of the card. But if that's the result that's actually helpful, why not design a single benchmark that's representative of what games do, rather than having to test so many games?

Re:Bullshit.... (0)

Anonymous Coward | about 3 months ago | (#47553909)

Well, you sure could easily detect these extreme cases automatically... But other than that, chosing a compression algorithm seriously, is generally a significantly more complex decision process...

You notably have to factor in:

- Decompression speed (which with some algorithms can be very different from the level of the compression speed, that is an algorithm can for example be optimized for compression speed, notably for large mostly write-only files/backups, but be very slow in comparison to uncompress the archives);

- CPU/GPU and memory usage (it can be very important for servers and large data sets);

- Possible data losses and their precise nature (a very fundamental and common subject for audio and image/video compression notably, with some subjective aspects);

- Implementation complexity and code quality (particularly if you rely on it for backups).

The first three items can have perfectly intended and useful variations, which you will have to select depending on your specific needs. Most algorithms provide options for various of these needs, and there are also more specialized algorithms.

Even for desktop use, needs may vary a lot, even for basic lossless compression of miscellaneous files... Some people might want maximum compression whatever the speed beside possibly some extremes... Some others might want a somewhat balanced result (which might depend on their current computer, and thus evolve with time...). Some might prefer algorithms optimized for specific file types (you won't see many desktop users zipping BMP and WAV files directly, for example... well, beside the few ignorance cases we probably all know about here... and the few far more technical usages with less concern for size... and even then, modern zip format implementations will use different algorithms depending on the file type anyway...), and some others a more generalized algorithm. Some might want integrated encryption. Some might want redundancy options (e.g., for newsgroups, or important backups). Some might want various other algorithm or UI options which might only be implemented by specific implementations of specific algorithms...

It's hard to summarize all this with a single number, even for common use cases... and it's really not needed at all... if you want a simple comparison base, just search for one of the numerous algorithm and software reviews on the web, and check the main points in comparison to your needs... And you'll need to check more than one, because of the different (and some time erroenous) testing methods, newer algorithm/implementation/UI versions, etc.

It is currently impossible to summarize all this with a single number for all use cases... Maybe one day, with more perfected algorithms (and even then, it probably always be a set of perfected algorithm, at least for different file formats... but maybe they will have some common bases...), but then most current concerns of speed, energy, cost, and size, will probably not be valid anymore...

bandwidth is not constant (0)

Anonymous Coward | about 3 months ago | (#47554191)

The reason there's no single metric available is because bandwidth isn't constant.
I'll and solve for a "best algorithm" given some different bandwidths, ignoring decompression time.

F1(X): 14 + X*(1- 0.00001%)
F2(X): 20 + X*(1-15%)
F3(X): 29*60*60 + X*(1-15.1%)

solving pairwise:
F1(40 seconds) = F2(40 seconds)
F1(8 days) = F3(8 days)
F2(3.31 years) = F3(3.31 years)

If the file can be transferred in 7 seconds, algorithm 1 is the clear winner (23.6% faster than algorithm 2, and nearly 5000x faster than algorithm 3).
If the file can be transferred in 7 days, algorithm 2 is the clear winner (17.6% faster than algorithm 1, and 20.2% faster than algorithm 3).
If the file can be transferred in 7 years, algorithm 3 is a marginal winner (0.062% faster than algorithm 2, and it's 17.8% faster than algorithm 1); also note that 0.062% is in the 30-40 hours range (you can get different answers depending on the number of seconds you use to compute 7 years).

Re:Bullshit.... (1)

gweihir (88907) | about 3 months ago | (#47554361)

It depends far too much on your border conditions. For example, LZO does compress not very well, but it is fast and has only a 64kB footprint. Hence it gets used in space-probes where the choice is to compress with this or throw the data away. On the other hand, if you distribute pre-compressed software or data to multiple targets, even the difference between 15.0% and 15.1% can matter, if it is, day 15.0% in 20 seconds and 15.1 in 10 Minutes.

Hence a single score is completely unsuitable to address the "quality" of the algorithm, because there is no single benchmark scenario.

Re:Bullshit.... (1)

nine-times (778537) | about 3 months ago | (#47554873)

Hence a single score is completely unsuitable to address the "quality" of the algorithm, because there is no single benchmark scenario.

So you're saying that no benchmark is meaningful because no single benchmark can be relied upon to be the final word under all circumstances? By that logic, measuring speed is not meaningful, because it's not the final word in all circumstances. Measuring the compression ratio is meaningless because it's not the final word in all circumstances. The footprint of the code is meaningless because it's not the final word in all circumstances.

Isn't it possible that a benchmark could be useful for some purposes other than being the final word in all circumstances?

Re:Bullshit.... (1)

gweihir (88907) | about 3 months ago | (#47555365)

Whether measuring speed is a meaningful benchmark depends on what you measure the speed of, relatively to what and what the circumstances are. There are many situations where "speed" is not meaningful, and others that are limited enough that it is.

However, the metric under discussion will not be meaningful in any but the most bizarre and specific circumstances, hence it is generally useless. For the special situations where it could be useful, it is much saner to adapt another metric than define a specific one as this pollutes the terminology.

Re:Bullshit.... (1)

sootman (158191) | about 3 months ago | (#47554655)

I'd just say it's useless because no two people can agree on what's important, so what's the point of giving a single score? And even something as seemingly simple as a compression algorithm has more than just two characteristics:
1) speed of compression
2) file size
3) speed of decompression
4) does it handle corrupt files well? (or at all?)

Even just looking at 1 & 2, everyone has different needs. Some people value 1 above all others, some people value 2, and most people are somewhere in between, and "somewhere" is a pretty big area. Yes, your examples are pretty far apart and most people would agree that "best" is somewhere in the middle, but the middle is bigger than you think. Hence, there can simply never be a "best". So why bother trying to score one?

> So why can't you create some kind of rating system to give
> you at least a vague quantifiable score of that concept?

Because it would just be too vague to be useful. I mean, yeah, it can sort out the great ones from the horrible ones, but that's easy anyway, so if you're just trying to compare a few really good ones, the difference isn't enough.

A car that goes 200 mph is great, but not if it gets 2 mpg. Likewise, 100 mpg and a top speed of 30 mph isn't useful either. If you're comparing a bunch of cars that get 32-35 mpg and go 130-140 mph, there's not a meaningful way to pick the "best" in that group that everyone will agree on, unless one has the highest speed and the best mileage, but then, again, that's an obvious winner and you don't need an algorithm's help to pick it out of the pack.

Re:Bullshit.... (1)

nine-times (778537) | about 3 months ago | (#47554827)

there's not a meaningful way to pick the "best" in that group that everyone will agree on

Metrics often don't provide a definitive answer about what the best thing is, with universal agreement. If I tell you Apple scores highest in customer satisfaction for smartphones last year, does that mean everyone will agree that the iPhone is the best phone? If a bunch of people are working at a helpdesk, and one closes the most tickets per hour, does that necessarily mean that he's the best helpdesk tech?

It's true that a lot of people misuse metrics, thinking that they always provide an easy answer, without understanding what they actually mean. That doesn't mean that metrics are useless.

If you're comparing a bunch of cars that get 32-35 mpg and go 130-140 mph, there's not a meaningful way to pick the "best" in that group that everyone will agree on

Yeah, but that's a really dumb metric since most people don't actually care what the top speed of a car is. Or to be more truthful, only morons care about top speed unless it's below 80mph, since you basically shouldn't be driving your car that fast. So really, in a metric like this, the "top speed" isn't a metric of "faster is better". It's a metric of "fast enough is good enough".

But if you were in the habit of doing car reviews, it might make sense to take a bunch of assessments, qualitative and quantitative, like acceleration and handling, MPG, physical attractiveness, additional features, and price (lower is better), and then weigh and average each score. That would enable you to come up with a final score which, while subjective, makes some attempt to enable an overall ranking of the cars. In fact, this is the sort of thing that reviewers sometimes do.

Re: Bullshit.... (1)

jrumney (197329) | about 3 months ago | (#47554749)

It depends in the situation where it is used. If your data almost but not quite fits on your available media at 15%, and you're not pressed for time, you might still go for 15%. And if you only have 15 seconds to compress it, strictly no more, you might settle for significantly less compression than would be possible in 20 seconds.

Re:Bullshit.... (1)

loufoque (1400831) | about 3 months ago | (#47555795)

very high compression has limited utility if it takes an extremely long time

I don't see how the utility is limited.
Most content is mastered once and viewed millions of time.

How much time it takes to compress is irrelevant, even if you get diminishing returns the longer you take. What's important is to save space when broadcasting the content.

Re:Bullshit.... (1)

ultranova (717540) | about 3 months ago | (#47554883)

A "combined score" for speed and ratio is useless, as that relation is not linear.

A combined score could be quite useful when implementing, for example, compressed swap. Obviously you'd need to calibrate it for the specifics of a case.

Re:Bullshit.... (1)

gweihir (88907) | about 3 months ago | (#47555385)

When you "calibrate" swap for specific uses, it becomes non-general. In that situation it is far better to let the application use on-disk storage, because _it_ knows the data profile. Sorry, but fail to understand swap.

Re:Bullshit.... (2)

sg_oneill (159032) | about 3 months ago | (#47555187)

A "combined score" for speed and ratio is useless, as that relation is not linear.

Typing at 70 words per minute, slashdot poster declares quantity over time measurements meaningless.

Re:Bullshit.... (1)

gweihir (88907) | about 3 months ago | (#47555373)

Other Slashdot poster adds meaningless posturing as that is the limit of what he can do.

Re:Bullshit.... (0)

Anonymous Coward | about 3 months ago | (#47555797)

Wrong. Only if they are statistically independent, can you not replace them with a combined metric. If they have a linear relation, or any deterministic or statistical correlation for that matter, then one can be predicted from the other. Therefore only one of them will be useful and the other will be redundant. So if and only if they *have* a relation, can you use replace the two metrics with one.

Have you heard of the PCA technique? PCA stands for Principal Components Analysis. It is used in statistics and in modern times in machine learning. It replaces multiple correlated varaibles with one combined single variable that in a combined fashion explains the statistical variation of all the original correlated variables. In machine learning this is called dimensionality reduction or feature extraction.
In general what the PCA does is that if you have an original mix of correlated and uncorrelated variables, it removes the correlations. In other words, it replaces the original variables with a new set of variables that are uncorrelated. These new variables are basically weighted linear combinations of the original variables. The weighting vectors are simply the eigenvectors of the correlation matrix of the original variables.

So in our compression problem, based on my knowledge of PCA, my suggestion is to use PCA on the compression speed and ratio to create two new metrics as two linear combinations of the speed and ratios. If the speed/ratio are correlated then one of the new metrics will be small in variance and can be discarded. The other having higher variance is the one to be used as the combined metric.

I thought it wasnt possible (1)

Anonymous Coward | about 3 months ago | (#47552873)

I thought I read an article the other day that said their algorithm seemed plausible on the surface but would eventually would begin to fall apart?

Re:I thought it wasnt possible (2)

Travis Mansbridge (830557) | about 3 months ago | (#47553007)

The fictional compression algorithm doesn't work. The metric for rating compression algorithms does work (insofar as more compressed/faster algorithms achieve a better rating).

Re:I thought it wasnt possible (0)

silas_moeckel (234313) | about 3 months ago | (#47553117)

When talking about lossy compression for video it might technically work but it's still worthless. For example my highly proprietary heavily patented postage stamp algorithm reduces all video down to 90 era dialup rate mpeg 2 aka a blurry postage stamp. This means it's massively compressed and very quick so it scores high on both metrics. It also looks like crap. Output quality and ratio are generally the metrics that matter and output quality is a subjective factor that needs to be determined by humans. How long it takes to encode is general a non factor as outside of live encoding it's a one time event. The other factor is how hard is it to decode generally not an issue right now.

Re:I thought it wasnt possible (-1)

Anonymous Coward | about 3 months ago | (#47553207)

"Massively compressed" does not mean what you think it means.

Re:I thought it wasnt possible (0, Flamebait)

Anonymous Coward | about 3 months ago | (#47553253)

Maybe you can tell us why, champ.

Re:I thought it wasnt possible (-1)

Anonymous Coward | about 3 months ago | (#47553407)

Because it's lossy. If you remove data, it's not compression. e.g. Taking number 10231 and compressing it to 1 is not compression.

Re:I thought it wasnt possible (1)

Anonymous Coward | about 3 months ago | (#47553581)

Please tell us more about how compressing is not compression.

Re:I thought it wasnt possible (0)

Anonymous Coward | about 3 months ago | (#47553697)

compressing it to 1 is not compression.

LOL

Re:I thought it wasnt possible (3, Informative)

khellendros1984 (792761) | about 3 months ago | (#47553849)

FTA:

And Jerry Gibson, a professor at the University of California at Santa Barbara, says he's going to introduce the metric into two classes this year. For a winter quarter class on information theory, he will ask students to use the score to evaluate lossless compression algorithms. In a spring quarter class on multimedia compression, he will use the score in a similar way, but in this case, because the Weissman Score doesn't consider distortion introduced in lossy compression, he will expect the students to weight that factor as well.

The scoring method as stated is only useful for evaluating lossless compression. One could also take into account the resemblance of the output to the input to allow a modified version of the score to evaluate lossy compression.

Re:I thought it wasnt possible (0)

Anonymous Coward | about 3 months ago | (#47555549)

FTA:

And Jerry Gibson, a professor at the University of California at Santa Barbara, says he's going to introduce the metric into two classes this year. For a winter quarter class on information theory, he will ask students to use the score to evaluate lossless compression algorithms. In a spring quarter class on multimedia compression, he will use the score in a similar way, but in this case, because the Weissman Score doesn't consider distortion introduced in lossy compression, he will expect the students to weight that factor as well.

The scoring method as stated is only useful for evaluating lossless compression. One could also take into account the resemblance of the output to the input to allow a modified version of the score to evaluate lossy compression.

Posting AC because I've modded in this thread and because I work at UCSB.
We really need to stop hiring people who are so clueless and useless, we really need to start firing the ones who abuse their positions, we really need to stop illegally hiring people's spouses because they are people's spouses, we really need to stop paying people extra to not teach, etc.

freemasons run the country (4, Interesting)

retchdog (1319261) | about 3 months ago | (#47552879)

The so-called Weissman score is just proportional to (compression ratio)/log(time to compress).

I guess the idea is that twice as much compression is always twice as good, while increases in time become less significant if you're already taking a long time. For example, taking a day to compress is much worse than taking an hour, but taking 24 days to compress is only somewhat worse than taking one day since you're talking offline/parallel processing anyway.

The log() seems kind of an arbitrary choice, but whatever. It's no better or worse than any other made-up metric, as long as you're not taking it too seriously.

Re:freemasons run the country (2)

AsmCoder8088 (745645) | about 3 months ago | (#47553903)

The formula is not too bad, although I would suggest a minor tweak, mainly that one should change it from:

(compression ratio)/log(time to compress)

to:

(compression ratio)/log(10+time to compress).

This will ensure that no divide by zero occurs, specifically if the time to compress is 1 second, then you would have been dividing by zero in the original formula.

Re:freemasons run the country (1)

grep -v '.*' * (780312) | about 3 months ago | (#47554725)

(compression ratio)/log(time)

I guess the idea is that twice as much compression is always twice as good, while increases in time become less significant if you're already taking a long time.

Yeah, I guess I empirically decided this for myself way back with DOS PKZip v0.92: either FAST because I want it now, or MAXIMIZE because I'm somehow space limited and don't care how long it takes. The intermediate ones (and for WinZip, WinRAR, 7z, and the others) are useless for me; either SIZE or SPEED, there IS nothing else.

(Unless you can do somehow delete or omit it; nothing's faster than not doing it to start with.)

And look -- they're using logs! Now when someone on the show talks about some curve being exponential, they're actually correct!

It really works? (-1, Offtopic)

Zero__Kelvin (151819) | about 3 months ago | (#47552925)

"While it was created for a TV show, it does really work, and it's quickly migrating into academia."

Somebody should explain that to Professor Tsachy Weissman and Ph.D student Vinith Misra, who specifically stated it doesn't really work [ieee.org] , and then school them on it then.

Re:It really works? (1)

TheSunborn (68004) | about 3 months ago | (#47552969)

He said it did work, it's just not as effective as other existing compression solutions.

Re:It really works? (1)

fnj (64210) | about 3 months ago | (#47553293)

“We had to come up with an approach that isn’t possible today, but it isn’t immediately obvious that it isn’t possible,” says Misra.

Please explain why you think that means he said "it does work".

Re:It really works? (1)

martin-boundary (547041) | about 3 months ago | (#47554423)

Because. Everything is immediately obvious to slashdotters. QED.

Re:It really works? (5, Informative)

phoenix_rizzen (256998) | about 3 months ago | (#47552991)

They're talking about the Score, not the compression algorithm. And your link doesn't mention anything about the Score.

Re:It really works? (1)

Tekla Perry (3034735) | about 3 months ago | (#47553013)

exactly. The compression algorithm is fictional; the score, while created for the show, can actually be calculated. Whether it will catch on as a metric remains to be seen.

Re:It really works? (1)

Zero__Kelvin (151819) | about 3 months ago | (#47553467)

Holy shit! Math works! Somehow, I don't think you can have a discussion about if a formula really returns a result or not. I now see that the idiot who wrote the summary was trying to say that the algorithm doesn't work, but math does. Alas that idiot has no ability to write. ... oh wait, it was you! Never mind.

Re:It really works? (1)

Zero__Kelvin (151819) | about 3 months ago | (#47553477)

Yes. That's the point, isn't it. They didn't invent math for the show. Claiming that a score "works" has no meaning, other than to say that math "works". Therefore, the only interpretation of the hideously poor writing is that the submitter is claiming the algorithm works.

Re:It really works? (2)

vux984 (928602) | about 3 months ago | (#47553651)

Claiming that a score "works" has no meaning,

I could easily devise a cpu scoring methodology that scores CPU based on chip area / cost * clock speed / register width.

Such a score "works" in the sense that the function can be evaluated, but it wouldn't tell you anything about whether to buy an i7 vs a xeon vs a pentium 2.

The suggestion in the article is that the particular scoring methodology that was created for the show is useful for comparing compression algorithms, to the point that it may well be adopted by industry.

Therefore, the only interpretation of the hideously poor writing is that the submitter is claiming the algorithm works.

The writing was perfectly fine, your reading comprehension is what failed here.

Re: It really works? (1)

Anonymous Coward | about 3 months ago | (#47553741)

Yes. He failed to comprehend that the submitter was pointing out that math really works, and a ratio of compression over time really does express a ratio.

Re: It really works? (1)

vux984 (928602) | about 3 months ago | (#47554519)

No he failed to comprehend that people have found that particular method of calculating ratio of compression over time is proving to be *useful*.

Re:It really works? (1)

Travis Mansbridge (830557) | about 3 months ago | (#47553001)

The fictional compression algorithm doesn't work. The metric for rating compression algorithms does work (inasmuch as more compressed/faster algorithms achieve a better rating).

Re:It really works? (0)

Anonymous Coward | about 3 months ago | (#47553507)

Yes. Math "works". News at 11!

score works not algorithm - Re:It really works? (0)

Anonymous Coward | about 3 months ago | (#47553019)

"While it was created for a TV show, it does really work, and it's quickly migrating into academia."

Somebody should explain that to Professor Tsachy Weissman and Ph.D student Vinith Misra, who specifically stated it doesn't really work [ieee.org] , and then school them on it then.

The compression algorithm is fictional and does not work. That is what your linked article discusses.

This is about the Weissman Score.

Re:It really works? (0)

Anonymous Coward | about 3 months ago | (#47553075)

What doesn't "really work" is the fictitious compression algorithm
developed on the show.
The "Weissman Score" metric, however, does work in assigning
a compression algorithm a somewhat valid score.

Re:It really works? (0)

Anonymous Coward | about 3 months ago | (#47553435)

The compression algorithm doesn't work, the compression and speed metric does. It does give arbitrary amounts of importance to compression and to speed, but Americans are used to arbitrary metrics.

The Misra Score (1)

mfwitten (1906728) | about 3 months ago | (#47552927)

From the article:

Misra came up with a formula

Re:The Misra Score (0)

Anonymous Coward | about 3 months ago | (#47552977)

Except that the formula happens to work.

Re:The Misra Score (0)

Anonymous Coward | about 3 months ago | (#47553339)

And the results are often plotted on a Misra bell curve.

Re:The Misra Score (4, Funny)

DoofusOfDeath (636671) | about 3 months ago | (#47553869)

From the article:

Misra came up with a formula

So, now Jar Jar Binks does C.S.? Shit...

Useless without measure of lossiness/distortion (0)

Anonymous Coward | about 3 months ago | (#47552931)

An algorithm can compress data quickly and fit it into a small number of bytes, but that doesn't mean what comes out the other end is recognizable. Without adding a weighting for lossiness, this "Weissman Score" has no merit whatsoever. Using the "Weissman Score", MP3 is always better than FLAC, and that's completely untrue for anyone who cares about audio.

Additionally, new generations of video encoders would arguably be "worse" under this weighting system compared to older generations, as improvements in video encoding are currently rather incremental, generally with massive speed penalties as they require significantly higher numbers of CPU cycles to burn through the algorithms required to compress efficiently at low bitrates while maintaining very little distortion/lossiness.

Again, this score doesn't matter because in the end, a compression algorithm is only as good as what comes out the other side.

Re:Useless without measure of lossiness/distortion (2, Funny)

bill_mcgonigle (4333) | about 3 months ago | (#47552955)

hey, "print 0" runs in O(1)!

Re:Useless without measure of lossiness/distortion (4, Informative)

retchdog (1319261) | about 3 months ago | (#47553055)

it's for lossless compression only.

anyway, you can just add a term representing the lost information and throw it into this "score". hey, why not? just figure out how important the lossiness is relative to compression rate. if it's very important, take the exp() of the loss metric; if it's unimportant (like time is), take the log(); finally, if it's just kind of important, leave it linear, or maybe square or square root. whatever.

seriously, just make some shit up and throw it in. you won't compromise anything. it's already just made-up shit.

Re:Useless without measure of lossiness/distortion (4, Insightful)

viperidaenz (2515578) | about 3 months ago | (#47553071)

In the TV show only lossless compression was being considered, so MP3 would fail.

Re:Useless without measure of lossiness/distortion (1)

Jack9 (11421) | about 3 months ago | (#47553463)

> so MP3 would fail.

That's correct. So what?

MP3 was never a good compression algorithm. It's an audio format that uses a normalization that cause SOME audio to be lossy. It's a great demonstration on how a negligible loss across a wide range of audio could result in a more useful algorithm for sound (it's quite compact). MP3 is not a good compression algorithm and doesn't see a lot of use outside of commodity audio, where you can afford to throw away data.

Re:Useless without measure of lossiness/distortion (1)

vakuona (788200) | about 3 months ago | (#47553861)

MP3 was never a compression algorithm.

FTFY

Re:Useless without measure of lossiness/distortion (0)

Anonymous Coward | about 3 months ago | (#47554027)

> MP3 was never a compression algorithm.

I'm not sure that's true. If a standard mandates a compression, but there are many ways to do that compression, what's the distinction?

Re:Useless without measure of lossiness/distortion (1)

Paradise Pete (33184) | about 3 months ago | (#47554107)

"Algorithm" is the distinction. Otherwise you're basically saying "What's my algorithm for doing X? I just demand X be done." Perhaps you could call it The King's Algorithm.

Re:Useless without measure of lossiness/distortion (2)

viperidaenz (2515578) | about 3 months ago | (#47554169)

That's correct. So what?

So, comment I was replying to

Using the "Weissman Score", MP3 is always better than FLAC

MP3 wouldn't even have a "Weissman Score" because it's not a lossless compression algorithm.

Inadequate (2)

Are You Kidding (1734126) | about 3 months ago | (#47552973)

Not only does it fail to account for loss or distortion, but also fails to consider the time to decompress. If a compression algorithm with a high Weissman score is applied to a video, it is useless if it cannot be decompressed fast enough to show the video at an appropriate frame rate.

Trivial observation (1)

osu-neko (2604) | about 3 months ago | (#47553063)

No metric is adequate for all purposes. This one is adequate for the task it was designed for, and is adequate for some other purposes as well. That's the best that can be expected of any tool. Always use the appropriate tools for the task at hand, of course.

Re:Trivial observation (1)

retchdog (1319261) | about 3 months ago | (#47553249)

It was designed as a background prop for a TV show. Not a very high bar.

It might be adequate as an artificial evaluation metric for homework in an "Intro to Data Compression" class. It might be, because it hasn't even been used for that yet.

I wouldn't exactly call this a tool. For example, it would be really easy to game this 'score' if there were any significant incentive for doing so. That's usually a bad thing.

Re:Trivial observation (3, Insightful)

fnj (64210) | about 3 months ago | (#47553429)

The reason the Score is utter bullshit is that the scale is completely arbitrary and useless. It says that 2:1 compression that takes 1 second should have the same score as 4:1 compression that takes log(2) seconds, or 1 million to 1 compression that takes log(1 million) seconds.

WHY? State why log time is a better measure than straight time, or time squared, or square root of time. And look at the units of the ratio: reciprocal log seconds. What the hell is the significance of that? It also conveniently sidesteps the variability with different architectures. Maybe SSE helps algorithm A much more than it does algorithm B. Or B outperforms A on AMD, but not on Intel. Or maybe it is strongly dependent on size of source (there is an implicit assumption that all algorithms scale linearly with size of source; maybe in actual fact some are not linear and others are).

In real life, for some compression jobs you don't CARE how long it takes, and for other jobs you care very much. Or imagine an algorithm that compresses half as fast but decompresses 1000 times faster. That doesn't even register in the score.

It's bullshit.

Re:Trivial observation (1)

Obfuscant (592200) | about 3 months ago | (#47554065)

And look at the units of the ratio: reciprocal log seconds.

The Weissman score is actually unitless. When one divides "log seconds" by "log seconds" the units cancel.

It also conveniently sidesteps the variability with different architectures.

If one measures the compression ratios and times for the same data on different architectures, one is measuring the score of the different architecture, not "sidestepping" it.

Maybe SSE helps algorithm A much more than it does algorithm B.

Then algorithm A compared to B would have a higher Weissman score on a system with SSE.

Or B outperforms A on AMD, but not on Intel.

Then the score would favor B over A when comparing the two processors. That's what the score is supposed to do. It compares two things.

In real life, for some compression jobs you don't CARE how long it takes, and for other jobs you care very much.

Then for the former you would not care what the Weissman score is, and for the latter you would care.

Or imagine an algorithm that compresses half as fast but decompresses 1000 times faster. That doesn't even register in the score.

That's not what the score measures. It also doesn't measure price (for commercial implementations of code), executable size, or whether the software salesman has BO or not.

Re:Trivial observation (1)

Lehk228 (705449) | about 3 months ago | (#47554193)

decompression speed is unimportant for general purpose compression, it is either adequate or not adequate, if deompression speed is not adequate it does not matter how well it scores on other metrics it is unusable for your use case, if decompression speed id adequate, it really does not matter if it's just barely adequate or insanely fast.

Re:Trivial observation (1)

fnj (64210) | about 3 months ago | (#47554591)

The Weissman score is actually unitless. When one divides "log seconds" by "log seconds" the units cancel.

That is because it is presented as the ratio of the figure of merit of the candidate algorithm to the figure of merit of some bullshit "universal compresser", times a completely useless "scaling constant". To strip away the obscuration, all you have to do is see that for a completely transparent effectless compresser, r is unity and log t is log 0, or unity. 1/1, and it drops out.

The underlying figure of merit once you cut through the bullshit is r / log t. r is the compression ratio (unitless) and log t is log seconds. So yes, the units of the underlying figure of merit are reciprocal log seconds.

You need learn to cut through the hocus pocus and analyze the actual underlying equation before the Oz Sauce is ladeled on. You can well imagine that those who actually understand programming metrics are holding their sides laughing at those who are taking it seriously.

Re:Trivial observation (1)

TubeSteak (669689) | about 3 months ago | (#47554853)

Maybe SSE helps algorithm A much more than it does algorithm B. Or B outperforms A on AMD, but not on Intel. Or maybe it is strongly dependent on size of source (there is an implicit assumption that all algorithms scale linearly with size of source; maybe in actual fact some are not linear and others are).

In real life, for some compression jobs you don't CARE how long it takes, and for other jobs you care very much. Or imagine an algorithm that compresses half as fast but decompresses 1000 times faster. That doesn't even register in the score.

The things you mention have always been left as an exercise for the reader.
What benchmark isn't tagged with qualifiers that explain what it does and doesn't mean?

Marketing literature in computing has always been littered with metrics that are completely useless unless you know how to interpret them in the context of what you want to be doing.

Re:Inadequate (0)

Anonymous Coward | about 3 months ago | (#47553109)

Adding that would make the formula too complex.

Compression and decompression ratios would help (2)

JoeyRox (2711699) | about 3 months ago | (#47552989)

Two scores would be useful, one for compression_time:size and decompression_time:size, since for many applications the latter is more important in compress-once consume-many applications.

Sounds like the Drake equation all over again. (2)

mmell (832646) | about 3 months ago | (#47553003)

IIRC, the Drake equation was also a 'spitball' solution whipped off the cuff to address an inconvenient interviewer question. Subsequent tweaks have made it as accurate and reliable as when it was first spat out upon the world - and about as useless.

Re:Sounds like the Drake equation all over again. (0)

Anonymous Coward | about 3 months ago | (#47553079)

I'm not so sure it actually is entirely useless, how more of it you can fill out, how more you can deduce about the remaining factors based on our current score of only 1 on found intelligent technologicall capable species. (As in ourselves)

Seeing in which part of the equation the lack of further finds comes from, would gain us a small bit of extra knowledge. It's perhaps not some amazing discovery, but it's still narrows things down a little.

Of course if some one can think up of something that lets us glean even more knowledge from our sparse data, I'm all ears.

Re:Sounds like the Drake equation all over again. (1)

Rockoon (1252108) | about 3 months ago | (#47553925)

IIRC, the Drake equation was also a 'spitball' solution whipped off the cuff to address an inconvenient interviewer question. Subsequent tweaks have made it as accurate and reliable as when it was first spat out upon the world - and about as useless.

At least the Drake equation attempts to count something. I think people are missing this important fact about this bullshit compression rating: It isnt counting anything.

circle jerk (2, Funny)

Anonymous Coward | about 3 months ago | (#47553015)

Show About Self-Absorbed Assholes Who Think Their Stupid Ideas Are The Bees Knees Gains Popularity By Making Their Stupid Idea Sound Like Its The Bees Knees

Re:circle jerk (1)

Anonymous Coward | about 3 months ago | (#47553273)

Or simply SASAAWTTSIATBKGPBMTSISLITBK for short. What, are you some kind of pompous jerk who tries to sound smart saying it in full when all of us know it by the acronym?

Not quite as useful as the Slashdot score (0)

Anonymous Coward | about 3 months ago | (#47553361)

Where's our TV show?

Re:Not quite as useful as the Slashdot score (0)

Anonymous Coward | about 3 months ago | (#47555453)

And what's next? A TV show to send common people to Mars?

Why not both? (0)

Anonymous Coward | about 3 months ago | (#47553555)

Why am I reminded of this Mexican ad when I read this?

https://www.youtube.com/watch?v=vqgSO8_cRio

/. as usual (-1)

Anonymous Coward | about 3 months ago | (#47553625)

The posts here reflect ZERO entrepreneurship. This apparent lack of respect, curiosity and ambision is why most of you will be relegated to the "workforce" for the rest of your lives. You may be happy, but there are some folks who has to create "your jobs".

I'd love to be in a community of open-minded peers, but of course this is /.. We have ego's to protect. Failures to avoid. Etc.

Weak.

F1 score, precision and recall (1)

tommeke100 (755660) | about 3 months ago | (#47553643)

Sounds a bit like the f1 measure used in classification systems, where the F-score is the harmonic mean of precision and recall. (where trying to higher precision yields lower recall and vice-versa)
however, I'm wondering how stable this Weissman score is. Compression algorithms might not all perform O(n) where n is size of data to compress.
Or it may actually give a very high score to something that doesn't compress at all.
public byte[] compress( byte[] input) { return input;}
I bet this gets a high Weissman score ;-)

Idiots born every day. (1)

Chas (5144) | about 3 months ago | (#47553851)

Oh boy. A useless metric!

Compression ratio: Sure. But the problem is, it's possible to increase compression ratio by "losing" data. So you can obtain a high ratio, but the images as rendered will be blurry/damaged.

Compression Speed: This is just as dumb since compression speed is partially a function of the compression ratio, partially a function of the efficiency of the algorithm and partially a function of the amount of "grunt power" hardware you throw at it. So one portion of this is a nebulous "hardware norm" factor that can be gamed. The other is a function of the other factor (compression ratio) which can ALSO be gamed (and creates a bias towards lossy compression).

Basically something with a high Weismann number would be extremely lossy compression on high power hardware. Which basically negates the point of high resolution viewing, as any idiot can reduce a 1920x1080 frame to 19px by 11px, and then compress it. I can already take precompressed (and lossy) JPEG files, resample down to 19x11, then back up to 1920x1080. I can wind up reducing a 930K file down to 40K (basically a 95+% savings). And the image is completely indecipherable.

Take a look at an original image versus the same image on the above-described UCCT (UltraCrappyCompressionTechique).

http://cox-supergroups.com/The... [cox-supergroups.com]

The above image is a PNG to prevent further compression artifacts from creeping into the sample.

The top portion of the image is the original 930K JPEG file.
The bottom portion is the resampled 40K JPEG file.

Re:Idiots born every day. (1)

Chas (5144) | about 3 months ago | (#47554103)

Actually replaced with a better example.

Took an 8.1MB TGA file and did three things.

1: Saved the first off as a PNG file. Resulted in a 1.7MB file with lossless compression.
2: Saved the file off as a high-compression JPEG. Resulted in a 46K file that's noticeably blurry and indistinct.
3: Downsampled to 19x11 and back up to 1920x1080 and saved as a high compression JPEG (36K file) or a lossless compression PNG (114K file). Labelled this method UCCT (Ultra Crappy Compression Technique).

Amalgamated the three images into a single PNG file to eliminate/reduce further compression issues.

Re:Idiots born every day. (1)

Renozuken (3499899) | about 3 months ago | (#47555609)

This would be correct if the score wasn't being used for lossless compression where the only two variables that really matter are time/size.

Slashvertisement for HBO? (2)

Gothmolly (148874) | about 3 months ago | (#47554099)

Given that only a subset of Slashdot users are HBO subscribers, how is this relevant?

Re:Slashvertisement for HBO? (1)

wilson_c (322811) | about 3 months ago | (#47554633)

Because a much larger, non-overlapping subset also steal HBO services.

Is the show any good? (1)

ed1park (100777) | about 3 months ago | (#47554673)

I couldn't watch the first episode. Quit maybe 10 minutes into it. Does anyone here actually enjoy the show and think it's any good?

Re:Is the show any good? (1)

lippydude (3635849) | about 3 months ago | (#47555837)

"I couldn't watch the first episode. Quit maybe 10 minutes into it. Does anyone here actually enjoy the show and think it's any good?"

I stayed with it and watched a number of episodes, I thought it caught the techie zeitgeist brilliantly. There's even a semi-aspie tech tycoon in there, just like you-know-who.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?