Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Future Hack: New Cybersecurity Tool Predicts Breaches Before They Happen

Soulskill posted about a month ago | from the do-androids-dream-of-electric-wolves? dept.

Security 33

An anonymous reader writes: A new research paper (PDF) outlines security software that scans and scrapes web sites (past and present) to identify patterms leading up to a security breach. It then accurately predicts what websites will be hacked in the future. The tool has an accuracy of up to 66%. Quoting: "The algorithm is designed to automatically detect whether a Web server is likely to become malicious in the future by analyzing a wide array of the site's characteristics: For example, what software does the server run? What keywords are present? How are the Web pages structured? If your website has a whole lot in common with another website that ended up hacked, the classifier will predict a gloomy future. The classifier itself always updates and evolves, the researchers wrote. It can 'quickly adapt to emerging threats.'"

cancel ×

33 comments

Sorry! There are no comments related to the filter you selected.

Utter garbage (0)

Anonymous Coward | about a month ago | (#47724205)

Why is this on /., the article is absolute crap!

This was elementary analysis in the '90s and 2000s (0)

Anonymous Coward | about a month ago | (#47724445)

You're absolutely right. That is seriously one of the shittiest sites I've seen Slashdot link to.

Come on, Slashdot editors. Please! Why are you doing this? Why are you systematically ruining Slashdot more than it already has been ruined by putting shitty submissions like this on the front page? Why?

Even if we ignore that the linked article is complete garbage, why is this even on Slashdot?

Five or ten years ago, before the software industry was ravaged by wave after wave of shitbrained PHPers, JavaScripters and Ruby on Railists, this sort of analysis was the first thing you'd do when setting up a new server!

In the 1990s and early 2000s, the first thing you'd do when setting up a server was make sure it was running Solaris, FreeBSD or OpenBSD. All three are known to be very secure by default. Then a secure web server like Apache would be used, if necessary. If any custom web app software was used, it had to be written in a secure language like Java, Tcl, Python or even Perl. PHP was not allowed. Ruby was not even considered.

Yeah, things are different today. People use shitty PHP and Ruby software, running on shitty Linux distributions like Ubuntu, using obscure web servers, and then wonder why their shit gets cracked and broken into. Come on. If you do stupid stuff, you're going to feel pain. If you use half-assed software, your server is going to get compromised! IT'S REALLY FUCKING OBVIOUS!

WordPress? (0)

Anonymous Coward | about a month ago | (#47724219)

Let's take WordPress sites out of this equation and see how accurate this tool is.

Re:WordPress? (1)

Penguinisto (415985) | about a month ago | (#47724441)

True - and how is it that they say they're not counting vulns when that is precisely what they're doing (albeit counting past vulns and extrapolating...)

"Accurately" (0)

Anonymous Coward | about a month ago | (#47724277)

It then accurately predicts what websites will be hacked in the future. The tool has an accuracy of up to 66%.

So... by "accurately," you mean "not really all that accurately at all."

-- CanHasDIY, am I really this lazy? Yes, apparently.

Nothing New Here (1)

sehlat (180760) | about a month ago | (#47724287)

Precrime Division has had it for years.

Isn't the correct answer: (1)

jmauro (32523) | about a month ago | (#47724331)

Given enough time all of the sites on the Internet will eventually be hacked?

Re:Isn't the correct answer: (1)

mark-t (151149) | about a month ago | (#47724353)

Not necessarily true.... somes sites on the internet are not of general interest to enough people to ever draw the attention of somebody who might even want to hack it.

Re:Isn't the correct answer: (1)

bloodhawk (813939) | about a month ago | (#47724629)

a large percentage of attacks are performed by automated tools searching for targets. They don't give a shit if the site is of huge interest or your Granny's blog talking about how cute her poodle is. check your logs, even your home computers will be receiving regular port scans, and knocks on various ports/protocols to see if there is anything to attack.

Re:Isn't the correct answer: (1)

Penguinisto (415985) | about a month ago | (#47724449)

Exception:
My ancient and long-dead first domain/site ever had never got hacked, and it never will: I shuttered it in 2001 (-ish) when I sold the domain name (spark.org). ;)

Re:Isn't the correct answer: (3, Insightful)

vux984 (928602) | about a month ago | (#47724645)

The premise was "given enough time...".

By taking the site down, you limited the time.

That's not an "exception", that's violating the premise.

Re:Isn't the correct answer: (1)

K. S. Kyosuke (729550) | about a month ago | (#47724541)

You seem to be assuming that being an HTTP server implies having security holes.

Mostly Wordpress, then. 50% accurate: all sites (5, Informative)

raymorris (2726007) | about a month ago | (#47724371)

I see of the top "features" they identified, mostly is just various tags that mean Wordpress is in use. So they learned that Wordpress sites tend to get hacked. Duh. The Wordpress team isn't interested in security. I demonstrated an exploit for a serious vulnerability in Wordpress and submitted it to their bug tracker. For two years it sat, with one WP developer saying "it can't be exploited" - even though I attached an exploit directly to the tracker issue. Two years later, the vulnerability was added to a 'sploit kit and thousands of sites were compromised over the course of just a few days. That's when WP finally got around to patcing the clear and significant vulnerability.

I see TFA claims "66% accuracy". "All sites will be hacked at some point" is about 50% accurate. I bet we could have 66% accuracy simply by saying "sites running PHP 5.2 or below will be hacked."

Re:Mostly Wordpress, then. 50% accurate: all sites (0)

Anonymous Coward | about a month ago | (#47724613)

sounds like pre-crime profiling for websites
lucky software don't have human rights

Re:Mostly Wordpress, then. 50% accurate: all sites (0)

Anonymous Coward | about three weeks ago | (#47728301)

When people use ROC curves in studies like this they are generally trying to hide inaccuracies. I mean, who plots two points--whether something was inaccurate or accurate--against an inferred third value (usually number of samples or time) rather than just putting out the accuracy data directly? The authors of this paper even go so far as to write a separate section justifying why they aren't doing this; shouldn't the graphs be enough to stand on their own without a separate section justifying their format?

It's a confidence score. Normal for binary decisio (1)

raymorris (2726007) | about three weeks ago | (#47729169)

The "inferred third value" is almost certainly the probability/score/confidence level, and it's normally included for machine-learning or any classifier algorithm, such as one that makes a yes/no decision based on a numeric value within a range. You'll see it a lot with spam filters. It's required because the USER choses at which threshold they wish to take certain actions.

I'm going to use the spam filter example because that's one many people are familiar with, specifically Spamassassin. It will score a message like this:
Body includes the word "free": 2 points
HTML and text parts are different: 1 point
Sent through an open relay: 2 points
Tiny font: 1 point
From address default whitelist: -3 points

Adding up the scores, the total score for that email is 3 points. The server admin can configure how many points are required before an email is placed in the spam box, and how many are required before the email is deleted outright. Note that the choice of how high the score needs to be to be considered spam is completely separate from the algorithm generating those scores. One admin might be very tough on spam and decide that anything over 2 points is treated as spam. Another admin might be more lenient and set it to 4, so anything 4 or higher is treated as spam. The ROC informs the admin as to the results of different settings. A threshold of 2 will obviously have more false positives than a threshold of 4.

Note again the choice of threshold to take some action is selected by the USER, not by the group who designed the algorithm. In the case of this predictive tool, a web hosting company might choose to have the following policies:

No site with a risk score over 80 can be hosted on our servers.
Any site with a score over 40 will be informed and our security team will offer assistance in making the site more secure.

Those policies of what to do at different score thresholds are completely separate from the algorithm, the team who wrote the paper doesn't choose the thresholds for specific actions. Instead, the graph informs the web hosting company "at a risk score of 80, you can expect 5% false positives. At a risk score of 40, you can expect 15% false positives".

16% Improvement! (2)

mythosaz (572040) | about a month ago | (#47724409)

That's like a 16% improvement over the quarter I flip...

Re:16% Improvement! (0)

dfsmith (960400) | about a month ago | (#47725195)

No, it's up to 16% of a quarter flip.

66%? Worthless trash... (3, Interesting)

gweihir (88907) | about a month ago | (#47724701)

I can predict for most sites that they will be hacked eventually, because they do not have anything resembling a secure set-up. But predicting when? That is impossible. Likely this tool gets even its pathetic 66% only dues to cherry-picked test data (also known as "lying" in scientific circles).

Re:66%? Worthless trash... (0)

Anonymous Coward | about a month ago | (#47726355)

Very possible. They just have to perform the attack.

Re:66%? Worthless trash... (1)

iiii (541004) | about three weeks ago | (#47732049)

My algorithm does better than 66% and I'm open sourcing it right here...
(Predicts whether site will be hacked between now and the destruction of earth)

public boolean willSiteBeHacked(Vector whateverYouFeelLike) {
        return true;
}

You can't disprove my claim.

Re:66%? Worthless trash... (1)

ThatAblaze (1723456) | about three weeks ago | (#47732419)

I'm pretty sure your algorithm would be worse than 50%. It basically amounts to "which even comes first? A) site gets hacked or b) site gets taken down."

I think more sites get taken down every day than get hacked.

... accurately predicts .. (1)

CaptainDork (3678879) | about a month ago | (#47724811)

66% = "could happen."

weIc0m (-1)

Anonymous Coward | about a month ago | (#47725275)

Your writing is good.
tanks this is my site http://thepc-pro4.blogspot.com/

RUns PHP? (1)

certain death (947081) | about a month ago | (#47725739)

100% chance it will be hacked and used as a launching point for EVARYTHANG!!!

Results? (1)

manu0601 (2221348) | about a month ago | (#47725771)

Is there a page somewhere where I can query the results to see how my own site goes?

What a coincidence. (1)

Kazoo the Clown (644526) | about a month ago | (#47726031)

66% of all websites get hacked. So if you predict EVERY website will get hacked, you'll be right 66% of the time.

Re:What a coincidence. (1)

aaronb1138 (2035478) | about three weeks ago | (#47726507)

Wouldn't it just be easier to aggregate information from social media sites using a weighted system. Just put 4Chan at the top of the weighting, with Facebook next and use separate weighting scales for positive versus negative mention counts. Both are valid predictors, so it should work and get closer.

I'm glad one of my side jobs is setting up IPS / IDP and similar security on firewalls. I'll never be thirsting for work.

In totally unrelated news (1)

Mr. Freeman (933986) | about a month ago | (#47726267)

New cyber security tool doesn't work!

Meaningless tautology (0)

Anonymous Coward | about three weeks ago | (#47727249)

Oh, it predicts hacks before they happen. Wow. That's so much better than predicting hacks after they happen.

Illiterate fuckers.

66% Accuracy versus 90%+ using Similar Technique (1)

cs2501x (1979712) | about three weeks ago | (#47728481)

I was really surprised to read this article. It uses a similar approach to some research I am doing in self-healing systems. The central premise is that by monitoring feature behaviours and then autonomously classifying the state of the system/website using high-level operational validation tests, it's possible to identify the source of faults in front-end systems: http://cs203.host.cs.st-andrew... [st-andrews.ac.uk] . Our results show a much higher degree of accuracy than the one mentioned though--averaging 90%+--even in noisy data-sets. The trick is to set windows for the data ingests and to use high-level operational validation tests for classifying data before forecasting feature behaviours--but studies in this area are new. The fact is nobody knows which learning algorithms or primitives work best and under what conditions. This is further complicated as some of the learning algorithms themselves are not understood as to how they work--take contrastive divergence as an example: http://arxiv.org/pdf/1405.0602... [arxiv.org] , http://www.ais.uni-bonn.de/pap... [uni-bonn.de] . They're widely used despite not understanding how it works--supposedly Hinton's hiring at Google put this into place for a good deal of their search and advertising operations, but that's anecdotal. Anyway, lots of people are trying to claim advances in this area for various reasons. I agree, though, that 66% is definitely not fantastic results--even at 90% you'd really need something more like 99.999% to get businesses to adopt the technology.

Re:66% Accuracy versus 90%+ using Similar Techniqu (1)

cs2501x (1979712) | about three weeks ago | (#47728511)

Ah, and if anyone is interested the paper will be presented at the 11th IEEE International Conference and Workshops on the Engineering of Autonomic & Autonomous Systems in Laurel, Maryland. So it has been vetted, etc--it's open source, and the results are publicly available as well. Venue information is here: http://tab.computer.org/aas/ea... [computer.org]

The tool has an accuracy of up to 66% (1)

TemporalBeing (803363) | about three weeks ago | (#47730351)

So in other words it could be 0% accurate...
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>