A Fictional Compression Metric Moves Into the Real World

Follow Slashdot blog updates by subscribing to our blog RSS feed

A Fictional Compression Metric Moves Into the Real World 133

Posted by Unknown Lamer on Monday July 28, 2014 @04:31PM from the best-thing-since-sliced-scatterplots dept.

Tekla Perry (3034735) writes The 'Weissman Score' — created for HBO's "Silicon Valley" to add dramatic flair to the show's race to build the best compression algorithm — creates a single score by considering both the compression ratio and the compression speed. While it was created for a TV show, it does really work, and it's quickly migrating into academia. Computer science and engineering students will begin to encounter the Weissman Score in the classroom this fall."

This discussion has been archived. No new comments can be posted.

A Fictional Compression Metric Moves Into the Real World

Load All Comments

Search 133 Comments Log In/Create an Account

Comments Filter:

Bullshit.... (Score:5, Interesting)

by gweihir ( 88907 ) writes: on Monday July 28, 2014 @04:42PM (#47552863)

A "combined score" for speed and ratio is useless, as that relation is not linear.

Share
twitter facebook
- Re:Bullshit.... (Score:4, Insightful)
  
  by i kan reed ( 749298 ) writes: on Monday July 28, 2014 @04:54PM (#47552941) Homepage Journal
  
  Well then write a paper called "an improved single metric for video compression" and submit it to a compsci journal. Anyone can dump opinions on slashdot comments, but if you're right, then you can get it in writing that you're right.
  
  Parent Share
  twitter facebook
  - Re:Bullshit.... (Score:5, Insightful)
    
    by gweihir ( 88907 ) writes: on Monday July 28, 2014 @04:59PM (#47552985)
    
    There is no possibility for a useful single metric. The question does obviously not apply to the problem. Unfortunately, most journals do not accept negative results, which is one of the reasons for the sad state of affairs in CS. For those that do, the reviewers would call this one very likely "trivially obvious", which it is.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by buchner.johannes ( 1139593 ) writes:
      
      This point comes up often in genetic algorithms, when more than one quantity should be optimized for. A common solution is to build a Pareto frontier [wikipedia.org], and declare them the best.
      A combination between two quantities is always a personal weighting. It may be useful, but it may also be limited in application. In the case here, the balance between compression speed and achieved size is too personal to be general-purpose, but perhaps the metric is useful for the use case of TV streaming content providers.
  - Re: (Score:3)
    
    by Darinbob ( 1142669 ) writes:
    
    I don't think this metric is really in any computer science journal, it's only in IEEE Spectrum.
  - Re: (Score:3)
    
    by Beck_Neard ( 3612467 ) writes:
    
    Uhm, do you really think that something as important as assessing the performance of compression algorithms wouldn't have attracted the attention of thousands (or, more likely, hundreds of thousands) of computer scientists over the years? Open up any academic journal that deals with this stuff even tangentially and you find many examples of different metrics for assessing compression performance. And there's nothing new about this 'score'. Dividing ratio by the logarithm of the compression time is a very wi
- Re:Bullshit.... (Score:5, Insightful)
  
  by nine-times ( 778537 ) writes: <nine.times@gmail.com> on Monday July 28, 2014 @05:20PM (#47553137) Homepage
  
  Can you explain in more detail?
  I'm not an expert here, but I think the idea is to come up with a single quantifying number that represents the idea that very fast compression has limited utility if it doesn't save much space, and very high compression has limited utility if it takes an extremely long time.
  Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one. So why can't you create some kind of rating system to give you at least a vague quantifiable score of that concept? I understand that it might not be perfect-- different algorithms might score differently on different sized files, different types of files, etc. But then again, computer benchmarks generally don't give you a perfect assessment of performance. It just provides a method for estimating performance.
  But maybe you have something in mind that I'm not seeing.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by jsepeta ( 412566 ) writes:
    
    That's kind of like the Microsoft Windows Experience Index that is provided by Windows Vista / Windows 7 which gives a score based on CPU, RAM, GPU, and hard disk speed. Not entirely useful but gives beta-level nerds something to talk about at the water cooler.
    http://windows.microsoft.com/e... [microsoft.com]
    At work my desktop computer is a Pentium E6300 with a 6.3 rating on the CPU and an overall 4.8 rating due to the crappy graphics chipset.
    At work my laptop computer is an i3-2010M with a 6.4 rating on the CPU and an ove
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      Good comparison.
  - Re:Bullshit.... (Score:5, Informative)
    
    by mrchaotica ( 681592 ) * writes: on Monday July 28, 2014 @07:04PM (#47553795)
    
    Can you explain in more detail?
    If you have a multi-dimensional set of factors of things and you design a metric to collapse them down into a single dimension, what you're really measuring is a combination of the values of the factors and your weighting of them. Since the "correct" weighting is a matter of opinion and everybody's use-case is different, a single-dimension metric isn't very useful.
    This goes for any situation where you're picking the "best" among a set of choices, not just for compression algorithms, by the way.
    Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one.
    
    User A is trying to stream stuff that has to have latency less than 15 seconds, so for him the first algorithm is the best. User B is trying to shove the entire contents of Wikipedia into a disc to send on a space probe [wikipedia.org], so for him, the third algorithm is the best.
    You gave a really extreme[ly contrived] example, so in that case you might be able to say that "reasonable" use cases would prefer the middle algorithm. But differences between actual algorithms would not be nearly so extreme.
    
    Parent Share
    twitter facebook
    - Re:Bullshit.... (Score:4, Insightful)
      
      by nine-times ( 778537 ) writes: <nine.times@gmail.com> on Monday July 28, 2014 @07:27PM (#47553917) Homepage
      
      Since the "correct" weighting is a matter of opinion and everybody's use-case is different, a single-dimension metric isn't very useful...[snip] User A is trying to stream stuff that has to have latency less than 15 seconds, so for him the first algorithm is the best.
      And these are very good arguments why such a metric should not be taken as an end-all be-all. Isn't that generally the case with metrics and benchmarks?
      For example, you might use a benchmark to gauge the relative performance between two video cards. I test Card A and it gets 700. I test Card B and it gets a 680. However, in running a specific game that I like, Card B gets slightly faster framerates. Meanwhile, some other guy wants to use the video cards to mine Bitcoin, and maybe these specific benchmarks test entirely the wrong thing, and Card C, which scores 300 on the benchmark, is the best choice. Is the benchmark therefore useless?
      No, not necessarily. if the benchmark is supposed to test general game performance, and generally faster benchmark tests correlate with faster game performance, then it helps shoppers figure out what to buy. If you want to shop based on a specific game or a specific use, then you use a different benchmark.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by Ardyvee ( 2447206 ) writes:
        
        Why generate a score in the first place, when you can just provide compression ratio, compression speed, or in the case of the card: fps (at settings), energy used, consistency of the fps (at settings), along with any other characteristic you know or can test that doesn't combine two other things and let the user decide which of those things are better instead of trying to boil it all down to a single number?
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        The uses for that single number are as follows:
        a) Some class of people like to claim "mine is bigger", which requires a single number. While that is stupid, most people "understand" this type of reasoning.
        b) Anything beyond a single number is far to complicated for the average person watching TV.
        In reality, things are even more complicated, as speed and compression ratio depend both on the data being compressed, and do that independently to some degree. This means, some data may compress really well and do
        
        Re: (Score:2)
        
        by nine-times ( 778537 ) writes:
        
        Depending on what you're talking about, providing a huge table of every possible test doesn't make for easy comparisons. In the case of graphics cards, I suppose you could provide a list of every single game, including framerates on every single setting on every single game. It would be hard to gather all that data, and the result would be information overload, and it still wouldn't allow you to make a good comparison between cards. Even assuming you ad such a table, it would probably be more helpful to
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    It depends far too much on your border conditions. For example, LZO does compress not very well, but it is fast and has only a 64kB footprint. Hence it gets used in space-probes where the choice is to compress with this or throw the data away. On the other hand, if you distribute pre-compressed software or data to multiple targets, even the difference between 15.0% and 15.1% can matter, if it is, day 15.0% in 20 seconds and 15.1 in 10 Minutes.
    Hence a single score is completely unsuitable to address the "qua
    - Re: (Score:2)
      
      by nine-times ( 778537 ) writes:
      
      Hence a single score is completely unsuitable to address the "quality" of the algorithm, because there is no single benchmark scenario.
      So you're saying that no benchmark is meaningful because no single benchmark can be relied upon to be the final word under all circumstances? By that logic, measuring speed is not meaningful, because it's not the final word in all circumstances. Measuring the compression ratio is meaningless because it's not the final word in all circumstances. The footprint of the code is meaningless because it's not the final word in all circumstances.
      Isn't it possible that a benchmark could be useful for some purpose
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Whether measuring speed is a meaningful benchmark depends on what you measure the speed of, relatively to what and what the circumstances are. There are many situations where "speed" is not meaningful, and others that are limited enough that it is.
        However, the metric under discussion will not be meaningful in any but the most bizarre and specific circumstances, hence it is generally useless. For the special situations where it could be useful, it is much saner to adapt another metric than define a specific
        
        Re: (Score:3)
        
        by nine-times ( 778537 ) writes:
        
        I find it surprising and almost funny how much ire this has drawn from people with some kind of weird "purist" attitude about the whole thing.
        It doesn't seem "generally useless" to me, but it would be more appropriate to say that it's "useful only in general cases". I would say that in most circumstances, I'd want compression algorithms that balance speed and compression. I often don't zip my files to maximum compression, for example, because I don't want to sit around waiting for a long time in order to
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        The ire is because quite a few people cannot distinguish fake TV science and engineering from the real thing anymore. This "metric" is a high-quality fake and completely useless.
        
        Re: (Score:2)
        
        by nine-times ( 778537 ) writes:
        
        Well no, the metric is real. The question would be whether it's useful or meaningful. You originally implied that it wasn't because:
        A "combined score" for speed and ratio is useless, as that relation is not linear.
        It seems now that it's not about the relation being linear, but about something else that you won't say. I'm afraid I'm not closer to understanding.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        You really do not get it, I agree. This metric is useless. It follows the definition of a metric, true, but it has no reasonable practical use, hence it does not deserve any special distinction, like being given a name. That is what is fake here.
        
        Re: (Score:2)
        
        by nine-times ( 778537 ) writes:
        
        Ok, I was giving you the benefit of the doubt, but it seems your argument boils down to "It's useless because I say it's useless. Nevermind that you earlier pointed out that it could be useful, because I decided that it's useless."
        Glad we got that sorted out.
        
        Re: (Score:1)
        
        by webmadman ( 1711978 ) writes:
        
        I suspect another variable may be at play here: "Ambiguity Tolerance".
        
        This Weissman Score may provide a great test to determine where someones tolerance for ambiguity is, based on how useful or useless they think a metric like this might be- but then, if the WS becomes useful for determining AT it will then become less useful for determining AT because it's perceived usefulness will have increased, which will then make it more useful... I think I feel my AT falling.
  - Re: (Score:2)
    
    by sootman ( 158191 ) writes:
    
    I'd just say it's useless because no two people can agree on what's important, so what's the point of giving a single score? And even something as seemingly simple as a compression algorithm has more than just two characteristics:
    1) speed of compression
    2) file size
    3) speed of decompression
    4) does it handle corrupt files well? (or at all?)
    Even just looking at 1 & 2, everyone has different needs. Some people value 1 above all others, some people value 2, and most people are somewhere in between, and "some
    - Re: (Score:2)
      
      by nine-times ( 778537 ) writes:
      
      there's not a meaningful way to pick the "best" in that group that everyone will agree on
      Metrics often don't provide a definitive answer about what the best thing is, with universal agreement. If I tell you Apple scores highest in customer satisfaction for smartphones last year, does that mean everyone will agree that the iPhone is the best phone? If a bunch of people are working at a helpdesk, and one closes the most tickets per hour, does that necessarily mean that he's the best helpdesk tech?
      It's true that a lot of people misuse metrics, thinking that they always provide an easy answer, w
  - Re: Bullshit.... (Score:2)
    
    by jrumney ( 197329 ) writes:
    
    It depends in the situation where it is used. If your data almost but not quite fits on your available media at 15%, and you're not pressed for time, you might still go for 15%. And if you only have 15 seconds to compress it, strictly no more, you might settle for significantly less compression than would be possible in 20 seconds.
  - Re: (Score:2)
    
    by loufoque ( 1400831 ) writes:
    
    very high compression has limited utility if it takes an extremely long time
    I don't see how the utility is limited.
    Most content is mastered once and viewed millions of time.
    How much time it takes to compress is irrelevant, even if you get diminishing returns the longer you take. What's important is to save space when broadcasting the content.
    - Re: (Score:2)
      
      by nine-times ( 778537 ) writes:
      
      How much time it takes to compress is irrelevant, even if you get diminishing returns the longer you take. What's important is to save space when broadcasting the content.
      Well, and also that it can be decompressed quickly and with little processing power, or else with enough hardware support that it doesn't matter. Otherwise, it'd take a long time to access and drain power on mobile devices.
      - Re: Bullshit.... (Score:2)
        
        by loufoque ( 1400831 ) writes:
        
        Compression and decompression are different things.
        
        Re: (Score:2)
        
        by nine-times ( 778537 ) writes:
        
        Ok, so let's start from where you're wrong that "What's important is to save space when broadcasting the content." There are other important things.
        Next, what would you like to do then? Change this benchmark to measure decompression speed rather than compression speed? Sure, fine. Let's do that.
        
        Re: (Score:2)
        
        by loufoque ( 1400831 ) writes:
        
        Decompression time is always real time. That's obvious.
        Compression is a whole different beast. Some applications need real-time encoding (such as video-conferencing), but most do not.
        Have you even ever written an encoder?
        
        Re: (Score:2)
        
        by nine-times ( 778537 ) writes:
        
        Decompression time is always real time? So it doesn't matter what computer, what processor, the size of the file, the complexity of the file, or even what kind of file it is? Or do you mean that it needs to be able to be done in real-time (or faster) for some particular use a a particular kind of file on a particular platform that you have in mind?
  - Re: (Score:2)
    
    by thunderclap ( 972782 ) writes:
    
    Can you explain in more detail?
    I'm not an expert here, but I think the idea is to come up with a single quantifying number that represents the idea that very fast compression has limited utility if it doesn't save much space, and very high compression has limited utility if it takes an extremely long time.
    Like, if you're trying to compress a given file, and one algorithm compressed the file by 0.00001% in 14 seconds, another compressed the file 15% in 20 seconds, and the third compressed it 15.1% in 29 hours, then the middle algorithm is probably going to be the most useful one. So why can't you create some kind of rating system to give you at least a vague quantifiable score of that concept? I understand that it might not be perfect-- different algorithms might score differently on different sized files, different types of files, etc. But then again, computer benchmarks generally don't give you a perfect assessment of performance. It just provides a method for estimating performance.
    But maybe you have something in mind that I'm not seeing.
    A compsci sacred cow being slaughtered. See there is nothing wrong with what you suggested. Thats the reason why the idea was inserted into Silicon valley to begin with. So why the bitching about its usefulness? People who spend time in computing as a whole are a fairly rigid lot. A lot of the have aspergers syndrome which gives them a leg up on coding while taking away their socialization skills. Others think its useless because they would prefer terms that dig deeper into the compression and its velocity.
- Re: (Score:2)
  
  by ultranova ( 717540 ) writes:
  
  A "combined score" for speed and ratio is useless, as that relation is not linear.
  
  A combined score could be quite useful when implementing, for example, compressed swap. Obviously you'd need to calibrate it for the specifics of a case.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    When you "calibrate" swap for specific uses, it becomes non-general. In that situation it is far better to let the application use on-disk storage, because _it_ knows the data profile. Sorry, but fail to understand swap.
    - Re: (Score:2)
      
      by ultranova ( 717540 ) writes:
      
      When you "calibrate" swap for specific uses, it becomes non-general.
      
      Metric, not swap. I'm talking about compressing memory pages before swapping out, possibly to another memory region, and calibrating the metric to balance between CPU cycles used vs. disk traffick saved, possibly dynamically.
      In that situation it is far better to let the application use on-disk storage, because _it_ knows the data profile.
      And the OS knows the general state of the system. Also, virtual memory systems are far from trivial to
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Really, you do not understand what makes swap slow or fast. Go play somewhere else.
- Re: (Score:3)
  
  by sg_oneill ( 159032 ) writes:
  
  A "combined score" for speed and ratio is useless, as that relation is not linear.
  Typing at 70 words per minute, slashdot poster declares quantity over time measurements meaningless.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Other Slashdot poster adds meaningless posturing as that is the limit of what he can do.
    - Re: (Score:1)
      
      by Samizdata ( 1093963 ) writes:
      
      Yet another Slashdot poster sits back with popcorn and watches the fracas.
- Re: (Score:2)
  
  by hey! ( 33014 ) writes:
  
  It doesn't have to be linear to be useful. It simply has to be able to sort a set of choices into order -- like movie reviews. Nobody thinks a four star movie is "twice as good" as a two star movie, but people generally find the rank ordering of movies by stars useful provided they don't read to much into the rating. In fact the ordering needn't be unique; there can be other equally useful metrics which order the choices in a slightly different way. *Over certain domains of values* minor differences in or
I thought it wasnt possible (Score:1)

by Anonymous Coward writes:

I thought I read an article the other day that said their algorithm seemed plausible on the surface but would eventually would begin to fall apart?
- Re: (Score:2)
  
  by Travis Mansbridge ( 830557 ) writes:
  
  The fictional compression algorithm doesn't work. The metric for rating compression algorithms does work (insofar as more compressed/faster algorithms achieve a better rating).
  - Re: (Score:1)
    
    by silas_moeckel ( 234313 ) writes:
    
    When talking about lossy compression for video it might technically work but it's still worthless. For example my highly proprietary heavily patented postage stamp algorithm reduces all video down to 90 era dialup rate mpeg 2 aka a blurry postage stamp. This means it's massively compressed and very quick so it scores high on both metrics. It also looks like crap. Output quality and ratio are generally the metrics that matter and output quality is a subjective factor that needs to be determined by humans
    - Re:I thought it wasnt possible (Score:4, Informative)
      
      by khellendros1984 ( 792761 ) writes: on Monday July 28, 2014 @07:12PM (#47553849) Journal
      
      FTA:
      And Jerry Gibson, a professor at the University of California at Santa Barbara, says he's going to introduce the metric into two classes this year. For a winter quarter class on information theory, he will ask students to use the score to evaluate lossless compression algorithms. In a spring quarter class on multimedia compression, he will use the score in a similar way, but in this case, because the Weissman Score doesn't consider distortion introduced in lossy compression, he will expect the students to weight that factor as well.
      The scoring method as stated is only useful for evaluating lossless compression. One could also take into account the resemblance of the output to the input to allow a modified version of the score to evaluate lossy compression.
      
      Parent Share
      twitter facebook
    - - Re: (Score:1)
        
        by Anonymous Coward writes:
        
        Please tell us more about how compressing is not compression.
freemasons run the country (Score:5, Interesting)

by retchdog ( 1319261 ) writes: on Monday July 28, 2014 @04:46PM (#47552879) Journal

The so-called Weissman score is just proportional to (compression ratio)/log(time to compress).
I guess the idea is that twice as much compression is always twice as good, while increases in time become less significant if you're already taking a long time. For example, taking a day to compress is much worse than taking an hour, but taking 24 days to compress is only somewhat worse than taking one day since you're talking offline/parallel processing anyway.
The log() seems kind of an arbitrary choice, but whatever. It's no better or worse than any other made-up metric, as long as you're not taking it too seriously.

Share
twitter facebook
- Re: (Score:3)
  
  by AsmCoder8088 ( 745645 ) writes:
  
  The formula is not too bad, although I would suggest a minor tweak, mainly that one should change it from:
  (compression ratio)/log(time to compress)
  to:
  (compression ratio)/log(10+time to compress).
  This will ensure that no divide by zero occurs, specifically if the time to compress is 1 second, then you would have been dividing by zero in the original formula.
- Re: (Score:1)
  
  by grep -v '.*' * ( 780312 ) writes:
  
  (compression ratio)/log(time)
  
  I guess the idea is that twice as much compression is always twice as good, while increases in time become less significant if you're already taking a long time.
  Yeah, I guess I empirically decided this for myself way back with DOS PKZip v0.92: either FAST because I want it now, or MAXIMIZE because I'm somehow space limited and don't care how long it takes. The intermediate ones (and for WinZip, WinRAR, 7z, and the others) are useless for me; either SIZE or SPEED, there IS nothing else.
  
  (Unless you can do somehow delete or omit it; nothing's faster than not doing it to start with.)
  
  And look -- they're using logs! Now when someone on the show talks about some cu
The Misra Score (Score:2)

by mfwitten ( 1906728 ) writes:

From the article:
Misra came up with a formula
- Re:The Misra Score (Score:5, Funny)
  
  by DoofusOfDeath ( 636671 ) writes: on Monday July 28, 2014 @07:17PM (#47553869)
  
  From the article:
  Misra came up with a formula
  So, now Jar Jar Binks does C.S.? Shit...
  
  Parent Share
  twitter facebook
- Re: (Score:3, Funny)
  
  by bill_mcgonigle ( 4333 ) * writes:
  
  hey, "print 0" runs in O(1)!
- Re:Useless without measure of lossiness/distortion (Score:5, Informative)
  
  by retchdog ( 1319261 ) writes: on Monday July 28, 2014 @05:09PM (#47553055) Journal
  
  it's for lossless compression only.
  anyway, you can just add a term representing the lost information and throw it into this "score". hey, why not? just figure out how important the lossiness is relative to compression rate. if it's very important, take the exp() of the loss metric; if it's unimportant (like time is), take the log(); finally, if it's just kind of important, leave it linear, or maybe square or square root. whatever.
  seriously, just make some shit up and throw it in. you won't compromise anything. it's already just made-up shit.
  
  Parent Share
  twitter facebook
- Re:Useless without measure of lossiness/distortion (Score:5, Insightful)
  
  by viperidaenz ( 2515578 ) writes: on Monday July 28, 2014 @05:12PM (#47553071)
  
  In the TV show only lossless compression was being considered, so MP3 would fail.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Jack9 ( 11421 ) writes:
    
    > so MP3 would fail.
    That's correct. So what?
    MP3 was never a good compression algorithm. It's an audio format that uses a normalization that cause SOME audio to be lossy. It's a great demonstration on how a negligible loss across a wide range of audio could result in a more useful algorithm for sound (it's quite compact). MP3 is not a good compression algorithm and doesn't see a lot of use outside of commodity audio, where you can afford to throw away data.
    - Re: (Score:2)
      
      by vakuona ( 788200 ) writes:
      
      MP3 was never a compression algorithm.
      
      FTFY
      - Re: (Score:1)
        
        by Paradise Pete ( 33184 ) writes:
        
        "Algorithm" is the distinction. Otherwise you're basically saying "What's my algorithm for doing X? I just demand X be done." Perhaps you could call it The King's Algorithm.
    - Re: (Score:3)
      
      by viperidaenz ( 2515578 ) writes:
      
      That's correct. So what?
      So, comment I was replying to
      Using the "Weissman Score", MP3 is always better than FLAC
      MP3 wouldn't even have a "Weissman Score" because it's not a lossless compression algorithm.
Inadequate (Score:2)

by Are You Kidding ( 1734126 ) writes:

Not only does it fail to account for loss or distortion, but also fails to consider the time to decompress. If a compression algorithm with a high Weissman score is applied to a video, it is useless if it cannot be decompressed fast enough to show the video at an appropriate frame rate.
- Trivial observation (Score:1)
  
  by osu-neko ( 2604 ) writes:
  
  No metric is adequate for all purposes. This one is adequate for the task it was designed for, and is adequate for some other purposes as well. That's the best that can be expected of any tool. Always use the appropriate tools for the task at hand, of course.
  - Re: (Score:2)
    
    by retchdog ( 1319261 ) writes:
    
    It was designed as a background prop for a TV show. Not a very high bar.
    It might be adequate as an artificial evaluation metric for homework in an "Intro to Data Compression" class. It might be, because it hasn't even been used for that yet.
    I wouldn't exactly call this a tool. For example, it would be really easy to game this 'score' if there were any significant incentive for doing so. That's usually a bad thing.
    - - Re: (Score:2)
        
        by retchdog ( 1319261 ) writes:
        
        yes, the metric is obviously a different thing, but they were both designed for the show.
        a few mediocre professors are thinking of using the metric in their courses, but are pretty open about it mostly being a gimmick.
        in conclusion, shove it up your bloated ass.
  - Re:Trivial observation (Score:4, Insightful)
    
    by fnj ( 64210 ) writes: on Monday July 28, 2014 @06:01PM (#47553429)
    
    The reason the Score is utter bullshit is that the scale is completely arbitrary and useless. It says that 2:1 compression that takes 1 second should have the same score as 4:1 compression that takes log(2) seconds, or 1 million to 1 compression that takes log(1 million) seconds.
    WHY? State why log time is a better measure than straight time, or time squared, or square root of time. And look at the units of the ratio: reciprocal log seconds. What the hell is the significance of that? It also conveniently sidesteps the variability with different architectures. Maybe SSE helps algorithm A much more than it does algorithm B. Or B outperforms A on AMD, but not on Intel. Or maybe it is strongly dependent on size of source (there is an implicit assumption that all algorithms scale linearly with size of source; maybe in actual fact some are not linear and others are).
    In real life, for some compression jobs you don't CARE how long it takes, and for other jobs you care very much. Or imagine an algorithm that compresses half as fast but decompresses 1000 times faster. That doesn't even register in the score.
    It's bullshit.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by Obfuscant ( 592200 ) writes:
      
      And look at the units of the ratio: reciprocal log seconds.
      
      The Weissman score is actually unitless. When one divides "log seconds" by "log seconds" the units cancel.
      It also conveniently sidesteps the variability with different architectures.
      If one measures the compression ratios and times for the same data on different architectures, one is measuring the score of the different architecture, not "sidestepping" it.
      Maybe SSE helps algorithm A much more than it does algorithm B.
      Then algorithm A compared to B would have a higher Weissman score on a system with SSE.
      Or B outperforms A on AMD, but not on Intel.
      Then the score would favor B over A when comparing the two processors. That's what the score is supposed to do. It compares two things.
      In real life, for some compression jobs you don't CARE how long it takes, and for other jobs you care very much.
      Th
      - Re: (Score:2)
        
        by Lehk228 ( 705449 ) writes:
        
        decompression speed is unimportant for general purpose compression, it is either adequate or not adequate, if deompression speed is not adequate it does not matter how well it scores on other metrics it is unusable for your use case, if decompression speed id adequate, it really does not matter if it's just barely adequate or insanely fast.
      - Re: (Score:2)
        
        by fnj ( 64210 ) writes:
        
        The Weissman score is actually unitless. When one divides "log seconds" by "log seconds" the units cancel.
        That is because it is presented as the ratio of the figure of merit of the candidate algorithm to the figure of merit of some bullshit "universal compresser", times a completely useless "scaling constant". To strip away the obscuration, all you have to do is see that for a completely transparent effectless compresser, r is unity and log t is log 0, or unity. 1/1, and it drops out.
        The underlying figure o
        
        Re: (Score:2)
        
        by swillden ( 191260 ) writes:
        
        some bullshit "universal compresser"
        Not a universal compressor, a standard compressor, such as gzip. The metric is ultimately just a comparison between the compressor being evaluated and the compressor chosen as the standard, and it is unitless.
        That said, I agree with you that the scaling constant has no reason to be present. As for using the logs of times... I don't know. It's essentially a base change, expressing the time of the compressor being evaluated in the base of the standard compressor, which is then multiplied by the ratio of th
        
        Re: (Score:2)
        
        by Obfuscant ( 592200 ) writes:
        
        The underlying figure of merit once you cut through the bullshit is r / log t. r is the compression ratio (unitless) and log t is log seconds. So yes, the units of the underlying figure of merit are reciprocal log seconds.
        The fact that the actual equation is a ratio between a proposed compression implementation and a reference is a hint that it is not a "figure of merit" in absolute terms, but only with respect to some common standard. Yeah, you get to pick your standard, but simply reporting r/log(t) is meaningless. The actual measurement is unitless simply because, as you point out, units of 1/log(s) is meaningless.
        It's done that way so things can be repeatable. If I create a compressor and report a Weissman of 3, then y
        
        Re: (Score:2)
        
        by anarchyboy ( 720565 ) writes:
        
        You should never take the logarithm of a dimensionful quantity like seconds. Clearly some choice of units is implied and really we should have log(t/1s) or log(t/1ms) or something which would then make the score unitless.
        You need learn to cut through the hocus pocus and analyze the actual underlying equation before the Oz Sauce is ladeled on. You can well imagine that those who actually understand programming metrics are holding their sides laughing at those who are taking it seriously.
        and you need to go take some remedial math lessons if you think log(0) = 1.
    - Re: (Score:2)
      
      by TubeSteak ( 669689 ) writes:
      
      Maybe SSE helps algorithm A much more than it does algorithm B. Or B outperforms A on AMD, but not on Intel. Or maybe it is strongly dependent on size of source (there is an implicit assumption that all algorithms scale linearly with size of source; maybe in actual fact some are not linear and others are).
      In real life, for some compression jobs you don't CARE how long it takes, and for other jobs you care very much. Or imagine an algorithm that compresses half as fast but decompresses 1000 times faster. That doesn't even register in the score.
      The things you mention have always been left as an exercise for the reader.
      What benchmark isn't tagged with qualifiers that explain what it does and doesn't mean?
      Marketing literature in computing has always been littered with metrics that are completely useless unless you know how to interpret them in the context of what you want to be doing.
Compression and decompression ratios would help (Score:3)

by JoeyRox ( 2711699 ) writes: on Monday July 28, 2014 @05:00PM (#47552989)

Two scores would be useful, one for compression_time:size and decompression_time:size, since for many applications the latter is more important in compress-once consume-many applications.

Share
twitter facebook
Sounds like the Drake equation all over again. (Score:3)

by mmell ( 832646 ) writes: on Monday July 28, 2014 @05:02PM (#47553003)

IIRC, the Drake equation was also a 'spitball' solution whipped off the cuff to address an inconvenient interviewer question. Subsequent tweaks have made it as accurate and reliable as when it was first spat out upon the world - and about as useless.

Share
twitter facebook
- Re: (Score:2)
  
  by Rockoon ( 1252108 ) writes:
  
  IIRC, the Drake equation was also a 'spitball' solution whipped off the cuff to address an inconvenient interviewer question. Subsequent tweaks have made it as accurate and reliable as when it was first spat out upon the world - and about as useless.
  At least the Drake equation attempts to count something. I think people are missing this important fact about this bullshit compression rating: It isnt counting anything.
circle jerk (Score:2, Funny)

by Anonymous Coward writes:

Show About Self-Absorbed Assholes Who Think Their Stupid Ideas Are The Bees Knees Gains Popularity By Making Their Stupid Idea Sound Like Its The Bees Knees
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Or simply SASAAWTTSIATBKGPBMTSISLITBK for short. What, are you some kind of pompous jerk who tries to sound smart saying it in full when all of us know it by the acronym?
F1 score, precision and recall (Score:2)

by tommeke100 ( 755660 ) writes:

Sounds a bit like the f1 measure used in classification systems, where the F-score is the harmonic mean of precision and recall. (where trying to higher precision yields lower recall and vice-versa)
however, I'm wondering how stable this Weissman score is. Compression algorithms might not all perform O(n) where n is size of data to compress.
Or it may actually give a very high score to something that doesn't compress at all.
public byte[] compress( byte[] input) { return input;}
I bet this gets a high Weis
Idiots born every day. (Score:2)

by Chas ( 5144 ) writes:

Oh boy. A useless metric!
Compression ratio: Sure. But the problem is, it's possible to increase compression ratio by "losing" data. So you can obtain a high ratio, but the images as rendered will be blurry/damaged.
Compression Speed: This is just as dumb since compression speed is partially a function of the compression ratio, partially a function of the efficiency of the algorithm and partially a function of the amount of "grunt power" hardware you throw at it. So one portion of this is a nebulous "hard
- Re: (Score:2)
  
  by Chas ( 5144 ) writes:
  
  Actually replaced with a better example.
  Took an 8.1MB TGA file and did three things.
  1: Saved the first off as a PNG file. Resulted in a 1.7MB file with lossless compression.
  2: Saved the file off as a high-compression JPEG. Resulted in a 46K file that's noticeably blurry and indistinct.
  3: Downsampled to 19x11 and back up to 1920x1080 and saved as a high compression JPEG (36K file) or a lossless compression PNG (114K file). Labelled this method UCCT (Ultra Crappy Compression Technique).
  Amalgamated the thre
  - Re: (Score:1)
    
    by Renozuken ( 3499899 ) writes:
    
    This would be correct if the score wasn't being used for lossless compression where the only two variables that really matter are time/size.
Slashvertisement for HBO? (Score:2)

by Gothmolly ( 148874 ) writes:

Given that only a subset of Slashdot users are HBO subscribers, how is this relevant?
- Re: (Score:2)
  
  by wilson_c ( 322811 ) writes:
  
  Because a much larger, non-overlapping subset also steal HBO services.
Is the show any good? (Score:2)

by ed1park ( 100777 ) writes:

I couldn't watch the first episode. Quit maybe 10 minutes into it. Does anyone here actually enjoy the show and think it's any good?
- Re: (Score:1)
  
  by lippydude ( 3635849 ) writes:
  
  "I couldn't watch the first episode. Quit maybe 10 minutes into it. Does anyone here actually enjoy the show and think it's any good?"
  
  I stayed with it and watched a number of episodes, I thought it caught the techie zeitgeist brilliantly. There's even a semi-aspie tech tycoon in there, just like you-know-who.
- Re: (Score:2)
  
  by TheSunborn ( 68004 ) writes:
  
  He said it did work, it's just not as effective as other existing compression solutions.
  - Re: (Score:2)
    
    by fnj ( 64210 ) writes:
    
    “We had to come up with an approach that isn’t possible today, but it isn’t immediately obvious that it isn’t possible,” says Misra.
    Please explain why you think that means he said "it does work".
    - Re: (Score:2)
      
      by martin-boundary ( 547041 ) writes:
      
      Because. Everything is immediately obvious to slashdotters. QED.
- Re:It really works? (Score:5, Informative)
  
  by phoenix_rizzen ( 256998 ) writes: on Monday July 28, 2014 @05:00PM (#47552991)
  
  They're talking about the Score, not the compression algorithm. And your link doesn't mention anything about the Score.
  
  Parent Share
  twitter facebook
  - Re: (Score:1)
    
    by Tekla Perry ( 3034735 ) writes:
    
    exactly. The compression algorithm is fictional; the score, while created for the show, can actually be calculated. Whether it will catch on as a metric remains to be seen.
    - Re: (Score:2)
      
      by Zero__Kelvin ( 151819 ) writes:
      
      Holy shit! Math works! Somehow, I don't think you can have a discussion about if a formula really returns a result or not. I now see that the idiot who wrote the summary was trying to say that the algorithm doesn't work, but math does. Alas that idiot has no ability to write. ... oh wait, it was you! Never mind.
  - Re: (Score:2)
    
    by Zero__Kelvin ( 151819 ) writes:
    
    Yes. That's the point, isn't it. They didn't invent math for the show. Claiming that a score "works" has no meaning, other than to say that math "works". Therefore, the only interpretation of the hideously poor writing is that the submitter is claiming the algorithm works.
    - Re: (Score:3)
      
      by vux984 ( 928602 ) writes:
      
      Claiming that a score "works" has no meaning,
      I could easily devise a cpu scoring methodology that scores CPU based on chip area / cost * clock speed / register width.
      Such a score "works" in the sense that the function can be evaluated, but it wouldn't tell you anything about whether to buy an i7 vs a xeon vs a pentium 2.
      The suggestion in the article is that the particular scoring methodology that was created for the show is useful for comparing compression algorithms, to the point that it may well be adopt
      - Re: It really works? (Score:1)
        
        by Anonymous Coward writes:
        
        Yes. He failed to comprehend that the submitter was pointing out that math really works, and a ratio of compression over time really does express a ratio.
        
        Re: (Score:2)
        
        by vux984 ( 928602 ) writes:
        
        No he failed to comprehend that people have found that particular method of calculating ratio of compression over time is proving to be *useful*.
      - Re: (Score:1)
        
        by Samizdata ( 1093963 ) writes:
        
        C'mon now, equal rights for AMD here.
- Re: (Score:1)
  
  by Travis Mansbridge ( 830557 ) writes:
  
  The fictional compression algorithm doesn't work. The metric for rating compression algorithms does work (inasmuch as more compressed/faster algorithms achieve a better rating).
- Re: (Score:1)
  
  by Travis Mansbridge ( 830557 ) writes:
  
  Aside from centering around Silicon Valley, I don't see how these stories are related. That one is about a fictional compression algorithm, while this one is about a method for rating compression algorithms which is becoming nonfiction.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Bullshit.... (Score:5, Interesting)

Re:Bullshit.... (Score:4, Insightful)

Re:Bullshit.... (Score:5, Insightful)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re:Bullshit.... (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re:Bullshit.... (Score:5, Informative)

Re:Bullshit.... (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: Bullshit.... (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Bullshit.... (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

I thought it wasnt possible (Score:1)

Re: (Score:2)

Re: (Score:1)

Re:I thought it wasnt possible (Score:4, Informative)

Re: (Score:1)

freemasons run the country (Score:5, Interesting)

Re: (Score:3)

Re: (Score:1)

The Misra Score (Score:2)

Re:The Misra Score (Score:5, Funny)

Re: (Score:3, Funny)

Re:Useless without measure of lossiness/distortion (Score:5, Informative)

Re:Useless without measure of lossiness/distortion (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:3)

Inadequate (Score:2)

Trivial observation (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:Trivial observation (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Compression and decompression ratios would help (Score:3)

Sounds like the Drake equation all over again. (Score:3)

Re: (Score:2)

circle jerk (Score:2, Funny)

Re: (Score:1)

F1 score, precision and recall (Score:2)

Idiots born every day. (Score:2)

Re: (Score:2)

Re: (Score:1)

Slashvertisement for HBO? (Score:2)