Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Spammers Using Soft Hyphen To Hide Malicious URLs

timothy posted about 4 years ago | from the conservative-in-what-you-accept dept.

Security 162

Trailrunner7 writes with this excerpt from ThreatPost illustrating the ongoing Spy-vs.-Spy battle between spammers and the rest of us: "Spammers have jumped on the little-used soft hyphen (or SHY character) to fool URL filtering devices. According to researchers, spammers are larding up URLs for sites they promote with the soft hyphen character, which many browsers ignore. Spammers aren't shy about jumping humans flexible cognitive abilities to slip past the notice of spam filters (H3rb41 V14gr4, anyone?). ... The latest trend involves the use of an obscure character called the soft hyphen or 'SHY' character to obscure malicious URLs in spam messages. Writing on the Symantec Connect blog, researcher Samir Patil said that the company has seen recent spam messages that insert the HTML symbol for the soft hyphen to obfuscate URLs for Web pages promoted by the spammers."

Sorry! There are no comments related to the filter you selected.

H3rb41 V14gr4? (4, Insightful)

MrEricSir (398214) | about 4 years ago | (#33830008)

I never got the leet speak in spam thing. Sure, it might get past the filter, but who can read it? Are they trying to sell drugs to script kiddies?

Re:H3rb41 V14gr4? (3, Insightful)

caffeinemessiah (918089) | about 4 years ago | (#33830136)

I never got the leet speak in spam thing. Sure, it might get past the filter, but who can read it? Are they trying to sell drugs to script kiddies?

I don't know about you, but I can't stop trying to figure out what word they're trying to represent with the symbols. For example, I know the second word in your subject means viagra, but what is "H3rb41"? Oh..."herbal". It's naturally (perhaps unknowingly) targeted towards geeks and puzzle-solvers, which perhaps isn't the worst market to target available-without-human-contact penis drugs towards.

Re:H3rb41 V14gr4? (3, Insightful)

maxwell demon (590494) | about 4 years ago | (#33830362)

I thought the only situation where you need Viagra is exactly human contact (in the most literal meaning of the word).

Re:H3rb41 V14gr4? (1)

clarkkent09 (1104833) | about 4 years ago | (#33830436)

Funny, I read it immediately as herbal viagra. I guess different people's brains may be handling the job of reading differently. Reminds me of Richard Feynman's "experiments" with reading and counting at the same time etc: http://www.youtube.com/watch?v=Cj4y0EUlU-Y [youtube.com]

Re:H3rb41 V14gr4? (3, Funny)

commodore64_love (1445365) | about 4 years ago | (#33830520)

I think this photograph is appropriate. And I'm happy to say: No I can't read it.

http://media.ebaumsworld.com/picture/strober/get_laid.jpg [ebaumsworld.com]

Re:H3rb41 V14gr4? (0)

Anonymous Coward | about 4 years ago | (#33831976)

You just blew my mind.

Re:H3rb41 V14gr4? (1)

froggymana (1896008) | about 4 years ago | (#33830544)

"wh47 7h3 h3|| d035 7h47 54y?" This might be the phrase you were looking for to describe your conundrum....

Re:H3rb41 V14gr4? (2, Interesting)

MysteriousPreacher (702266) | about 4 years ago | (#33830220)

I never understood how it actually worked, except as you suggested, the script kiddy crowd are heavily in to giving money to strangers in exchange for uber zomg epic sexual prowess.

Maybe I'm old fashioned, but I'm kind of reluctant to whip out my credit card to buy something from a company that employs mittens-wearing illiterates to write their adverts. Sure I'll eat at a Chinese restaurant with an amusingly translated menu, but that's a little different.

Re:H3rb41 V14gr4? (1)

Beale (676138) | about 4 years ago | (#33830778)

I hear you can get better by grinding.

Re:H3rb41 V14gr4? (1)

Obfuscant (592200) | about 4 years ago | (#33831048)

I never understood how it actually worked, except as you suggested, the script kiddy crowd are heavily in to giving money to strangers in exchange for uber zomg epic sexual prowess.

Never watched late night cable channels, have we? Does the word "Extenze" ring a bell? Those ads are taking the word "ubiquitous" to a whole new level, and proving that "skank hoes" ain't just on the street corner anymore. Well, ok, they DO go out and do "man on the street" interviews, where amazingly enough, every man they come across is a satisfied user and every woman is a satisfied usee, and they all crow about "the certain part of the male anatomy".

Too bad they ain't talking about their brains.

Re:H3rb41 V14gr4? (0)

Anonymous Coward | about 4 years ago | (#33830244)

Dude, you can't read that? Man, you missed out as a kid.
Calculators > leet speak.

Re:H3rb41 V14gr4? (0)

Anonymous Coward | about 4 years ago | (#33830390)

H3rb41 = Herbal

Re:H3rb41 V14gr4? (1)

LambdaWolf (1561517) | about 4 years ago | (#33831062)

You know how, even though only a tiny fraction of a percent of people actually respond to spam by buying the product, sending the spam is so cheap that it's still profitable to do so? I always assumed that the incomprehensible leetspeak just tacks on another factor of 0.1 or so but the resulting sales still justify the spamming. Or at least that's what the spammers think; who knows whether they're being economically rational.

Re:H3rb41 V14gr4? (1)

bill_kress (99356) | about 4 years ago | (#33831450)

Since I didn't see anyone mention it I'll take the chance you weren't just making a joke and give you the answer:

The point of the character substutitions / "Leet speek" is exactly the same as the URL mangling they are talking about here--getting around spam filters. When the spam filters know to search for anything with "Viagra" in it, you just change that to V1agra, problem solved. The next week go with V1@gra.

The people buy this stuff are likely not to mind.

Re:H3rb41 V14gr4? (2, Funny)

Anonymous Coward | about 4 years ago | (#33832124)

I like my hyphen hard, not soft. That's why I use H3rb41 V14gr4.

Can we use this ourselves? (0)

Anonymous Coward | about 4 years ago | (#33832238)

Does this work on, say, "Smart"Filter?

Would be lovely if there was a way to craft your own bookmarks to bypass the damn filters. Say, by using a Greasemonkey script or something...

Why (1)

KillaGouge (973562) | about 4 years ago | (#33830014)

Why don't modern browsers render this character?

Re:Why (0)

Anonymous Coward | about 4 years ago | (#33830072)

Why don't modern browsers render this character?

Agreed, I just made a test case and the current firefox doesn't render it, it just showed the 2 words I had on either side of it with nothing in between like their should have been.

Re:Why (5, Informative)

TopSpin (753) | about 4 years ago | (#33830080)

Why don't modern browsers render this character?

The character isn't supposed to be rendered. Soft hyphen indicates where to break words if necessary. The hyphens are not rendered if the word doesn't need to be broken.

Re:Why (1, Informative)

KillaGouge (973562) | about 4 years ago | (#33830254)

according to here [cs.tut.fi] the ISO 8859-1 standard calls for that specific character to be rendered.

Re:Why (4, Insightful)

KillaGouge (973562) | about 4 years ago | (#33830292)

please ignore my parent post. It seems that GP is correct

Not always (4, Informative)

pavon (30274) | about 4 years ago | (#33830572)

It is only supposed to be rendered when the word is split across multiple lines.

For example if your text was "super­cali­fragilistic­expialidocious" then all of the following are valid rendering depending on where the render decides to start a new line:

supercalifragilisticexpialidocious

or

supercalifragilistic-
expialidocious

or

supercali-
fragilistic-
expialidocious

Re:Why (1)

John Hasler (414242) | about 4 years ago | (#33832410)

URLs are not words.

Re:Why (3, Informative)

maxwell demon (590494) | about 4 years ago | (#33830140)

Why don't modern browsers render this character?

From Wikipedia:

"Since it is difficult for a computer program to automatically make good decisions on when to hyphenate a word, the concept of a soft hyphen was introduced to allow manual specification of a place where a hyphenated break was allowed without forcing a line break in an inconvenient place if the text was later re-flowed."

So a soft hyphen marks a position where you can hyphenate a word. If you don't do it, you of course shouldn't print anything at that position.

Re:Why (1)

DrugCheese (266151) | about 4 years ago | (#33830754)

From Wikipedia:

"Since it is difficult for a computer program to automatically make good decisions on when to hyphenate a word, the concept of a soft hyphen was introduced to allow manual specification of a place where a hyphenated break was allowed without forcing a line break in an inconvenient place if the text was later re-flowed."

Exactly it's purpose. It's never supposed to be shown, only to give an the browser client an easy way to break the word for dynamic width.

Re:Why (0, Funny)

Anonymous Coward | about 4 years ago | (#33831172)

"Exactly it's purpose. It's never supposed to be shown,"
Sort of like the useless apostrophe you hammered into a harmless possessive pronoun? Why do so many people not get it? They can master dozens of abstruse and recondite subjects, but it's means IT IS defeats them. Why?

Re:Why (1)

John Hasler (414242) | about 4 years ago | (#33832358)

Hyphenating a URL makes no sense. Ones containing this character should be invalid.

Re:Why (1)

war4peace (1628283) | about 4 years ago | (#33830182)

I'm a pretty IT-savvy guy, but WHAT IS that bloody character?
I understood pretty much everything from the summary. Everything BUT the character :) - Fail. As far as the summary is concerned.

Re:Why (-1, Troll)

Anonymous Coward | about 4 years ago | (#33830328)

O hai slashdot, kannot understand aftr reeding sumary. PLZ HALP1!!!1?

Re:Why (1)

war4peace (1628283) | about 4 years ago | (#33831722)

Let me rephrase.
You read a news entry about "the guy who committed the crime". Never in the summary do they mention the guy's name or what crime he committed, but they emphasize on how dangerous the guy is and how horrible the crime was. Now let me know if that sort of approach doesn't, um, I don't know, miss something essential.
This is not about me being lazy and not reading the article (I did), but about the summary missing some essential information (it does).

Re:Why (5, Informative)

Man Eating Duck (534479) | about 4 years ago | (#33830432)

I'm a pretty IT-savvy guy, but WHAT IS that bloody character?

Say you're laying out a book. You have the word Sauerkraut at a line wrap, but it is broken into Sauerk-raut because your layout software don't know where to break it. You then put in a soft hyphen between r and k, this indicates to your software that this word should be broken there. It turns into Sauer-kraut which is correct.

Later you get angry with the Sauerkraut and call it "bloody Sauerkraut". Now the whole word will be at the next line, and the soft hyphen won't show because your software doesn't need to break the word. Thus you can insert these freely without fretting about words containing a hyphen later on, they'll only be rendered when used as a hint.

HTH

Re:Why (1)

miro2 (222748) | about 4 years ago | (#33831388)

Bad design -- there is no reason to embed display-control strings in the character set. Is there a "start-italics" character? No, of course not. Software should keep track of hyphenation positions the same way it keeps track of other formatting positions.

Re:Why (1)

mattack2 (1165421) | about 4 years ago | (#33831666)

Italics isn't something that just 'happens' when laying out text. Hyphenation is. As one of the other replies said, this is used as a hint, as software doesn't know all of the syllables of words and how to break them.

(I hadn't heard of it before this article either.)

Re:Why (1)

Timmmm (636430) | about 4 years ago | (#33831696)

Well clearly you have to draw the line somewhere. I agree, stuff like \a, and \b obviously don't belong in the character set. But then newlines clearly do, and even spaces are 'display-control strings'. What about tabs?

I think you're probably right about this soft-hyphen though. It sounds like it is rarely used and creates more problems than it solves.

Re:Why (0)

Anonymous Coward | about 4 years ago | (#33831464)

So, in how many words do I have to add these characters because the layout software won't do it for me?

Between spam and layout issues, computers sure do make life easier.

Re:Why (3, Funny)

modecx (130548) | about 4 years ago | (#33831570)

Speaking of bloody sauerkraut, I think there was some sort of hyphen-depression when the inventors of the German language decided it would be fun to glue adjectives and nouns together. i.e. when I see something like: unabhaengigkeitserklaerungen, I have an nigh-irresistible urge to shout Gesundheit!

I'm still not sure why the nazis went to all of the trouble of building cipher-machines. The language looks sufficiently jumbled from the start.

Re:Why (2, Insightful)

JSG (82708) | about 4 years ago | (#33832486)

Where one in English might use a series of adjectives plus a noun a German would use a single agglomerative word - what is your problem?

Deutsch is a sufficiently sophisticated language without your assistance.

It doesn't work the same as your native tongue - get a life and stop trolling my forum - twat.

Re:Why (1)

JSG (82708) | about 4 years ago | (#33832402)

Beautifully put. YIDH

Cheers
Jon

Re:Why (1)

WidgetGuy (1233314) | about 4 years ago | (#33832026)

­ or ­

Re:Why (4, Informative)

Tynin (634655) | about 4 years ago | (#33830274)

Why don't modern browsers render this character?

Two reasons, the first being that HTML 4 specs [w3.org] call for it to not be rendered unless it meets the criteria. Here is the full blurb:

9.3.3 Hyphenation

In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur.

Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.

In HTML, the plain hyphen is represented by the "-" character ( or ). The soft hyphen is represented by the character entity reference ( or )

The other reason is that the current unicode standard basically says it doesn't support when and where it should be displayed as a hyphen and leaves it open to interpretation of whoever is coding for it. Here is the blurb from the unicode standard on it:

Hyphenation. U+00AD soft hyphen (SHY) indicates an intraword break point, where a line break is preferred if a word must be hyphenated or otherwise broken across lines. Such break points are generally determined by an automatic hyphenator. SHY can be used with any script, but its use is generally limited to situations where users need to override the behavior of such a hyphenator. The visible rendering of a line break at an intraword break point, whether automatically determined or indicated by a SHY, depends on the surrounding characters, the rules governing the script and language used, and, at times, the meaning of the word. The precise rules are outside the scope of this standard, but see Unicode Standard Annex #14, “Unicode Line Breaking Algorithm,” for additional information. A common default rendering is to insert a hyphen before the line break, but this is insufficient or even incorrect in many situations.

Contrast this usage with U+2027 hyphenation point, which is used for a visible indication of the place of hyphenation in dictionaries. For a complete list of dash characters in the Unicode Standard, including all the hyphens, see Table 6-3.

The Unicode Standard includes two nonbreaking hyphen characters: U+2011 non-breaking hyphen and U+0F0C tibetan mark delimiter tsheg bstar. See Section 10.2, Tibetan, for more discussion of the Tibetan-specific line breaking behavior.

Shy Spammers (3, Funny)

biryokumaru (822262) | about 4 years ago | (#33830020)

Spammers are getting more shy? That's a relief!

What is it? (5, Funny)

iONiUM (530420) | about 4 years ago | (#33830024)

Why didn't they just put the friggin character in the summary so I didn't have to read the article?

Anyways, according to the article it's &shy, which looks "identical to a regular hyphen." Are you happy now slashdot? I had to read TFA to find that out.

Re:What is it? (0)

Anonymous Coward | about 4 years ago | (#33830088)

Rendering it would depend on your screen resolution, browser window size, and things like that... Why that is will be left as an exercise for the reader.

Re:What is it? (1, Insightful)

mclearn (86140) | about 4 years ago | (#33830144)

And, as TFA points out, this is a valid tactic because "modern browsers" (ambiguously non-committal) do not render the character. I assume, spammers are writing URLs as: http://m­i­crosoft.com/ (eg. m-i-crosoft.com, but rendered onscreen as microsoft.com). This, of course, tricks folks into thinking that they are clicking on a valid microsoft.com URL.

Re:What is it? (1)

mclearn (86140) | about 4 years ago | (#33830188)

Nope. My bad. Since the SHY character is used as a way to dictate line breaks, it obviously isn't used to forge domains or anything similar. Presumably then, the SHY is used to ensure that patterns such as "Viagra" can be written as Viagra and not be caught by simple pattern matchers? TFA was light on actual examples.

Re:What is it? (0)

Anonymous Coward | about 4 years ago | (#33830252)

They get you to click on a link.

You look at your address bar, and it says "chase.com"

But, you're really at "c-h-a-s-e.com" with unrendered soft hyphens.

Getting the picture yet?

Re:What is it? (3, Insightful)

maxwell demon (590494) | about 4 years ago | (#33830554)

Are registrars accepting domain names with soft hyphens? And if so, why? It's rather obvious that such domain names would only be used for fraud.
IMHO registrars should not accept any non-printable character in domain names.

Re:What is it? (3, Insightful)

sexconker (1179573) | about 4 years ago | (#33831432)

Yes, they are. Otherwise this story wouldn't exist.
Why? Because they like money, and don't give a fuck.
Of course they should not accept any non-printable characters.

Registrars are pretty much only half a step above the spammers in terms of ethics / shittiness.

Re:What is it? (1)

adtifyj (868717) | about 4 years ago | (#33832244)

Do you have any evidence that registrars are accepting soft hyphens in domain names?

soft hyphens supposed to be eliminated in the Name Preparation phase [verisign.com] .

The soft hyphen is being used by spammers to obfuscate their URLs in order to get past anti-spam rules.

This slashdot story appears to be misinformation and a plug for Symantec.

Re:What is it? (1)

PhrostyMcByte (589271) | about 4 years ago | (#33830412)

If a word is wrapped to the next line, it shows a hyphen. Otherwise it's hidden. That's what a soft hyphen does.

Re:What is it? (1)

DrugCheese (266151) | about 4 years ago | (#33830670)

Luckily you read it and I garnered the information from your post.

Now we can all make informed opinions.

Re:What is it? (0)

Anonymous Coward | about 4 years ago | (#33831492)

OMFG, he read the article! Children, don't look at him, keep away!

So how often is it used legitimately? (4, Interesting)

JesseL (107722) | about 4 years ago | (#33830094)

Is there any good reason not to just call the presence of soft hyphens as a reliable indicator of spam and use it as the basis of a spam filter?

Re:So how often is it used legitimately? (2, Funny)

Cinder6 (894572) | about 4 years ago | (#33830158)

Well, I know I've certainly never seen it!

Re:So how often is it used legitimately? (1)

maxwell demon (590494) | about 4 years ago | (#33830718)

That's because it's so shy, it always hides.

Re:So how often is it used legitimately? (4, Informative)

Anonymous Coward | about 4 years ago | (#33830204)

Is there any good reason not to just call the presence of soft hyphens as a reliable indicator of spam and use it as the basis of a spam filter?

Yes, there is: languages other than English. In e.g. German, the use soft hyphens, while not universal, is becoming more common, at least, and for a reason: longer words that can't automatically be hyphenated by the browser as necessary lead to ugly layout, especially when there's not a lot of horizontal space (e.g. on news sites, which often tend to emulate printed newspapers).

Re:So how often is it used legitimately? (1)

AvitarX (172628) | about 4 years ago | (#33830558)

Fair enough.

Just prevent visitors from going to a URL with one.

I assume that;s the problem, something like pncbank.com looking like pncbank.com.

Simply throw up a big big warning (like that ever works) that says you are not visiting site "pncbank.com" you are visiting "p-ncbank.com".

Or simply just block them, I see no purpose of the character in a URL.

Re:So how often is it used legitimately? (1)

AvitarX (172628) | about 4 years ago | (#33830568)

ah crap, ate my markup ...something like p­ncbank.com looking like pncbank.com. ...

Re:So how often is it used legitimately? (3, Informative)

treeves (963993) | about 4 years ago | (#33830634)

So, when I get an email with a link to www.Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz.de, should I avoid clicking the link, or what?

Re:So how often is it used legitimately? (0)

Anonymous Coward | about 4 years ago | (#33830770)

Yes but in a url?

Re:So how often is it used legitimately? (1, Interesting)

TheRaven64 (641858) | about 4 years ago | (#33831016)

Hyphenating long words in German is pretty easy. Long words are usually compound words and they are correctly broken at the word boundaries. Hyphenating English automatically is actually a harder problem than hyphenating German, and is made harder by the fact that English and American have different rules for when you are supposed to hyphenate.

Re:So how often is it used legitimately? (1)

mattack2 (1165421) | about 4 years ago | (#33831780)

is made harder by the fact that English and American have different rules for when you are supposed to hyphenate

Can you explain how this is relevant to this soft hyphen issue? That is, I read the relevant part of the wikipedia article (http://en.wikipedia.org/wiki/Hyphenation), and it does mention different rules (e.g. "co-worker" in British English, but "coworker" in American English). However, that is not related to the soft hyphen issue, which is related to hyphenation for justification reasons.

Re:So how often is it used legitimately? (0)

Anonymous Coward | about 4 years ago | (#33830242)

Yeah, since according to a poster above, the soft hyphen is used to indicate where to hyphenate words should they appear at the edge of the page, there's no reason to have them in a URL. seems an easy way to block.

Re:So how often is it used legitimately? (1)

Relic of the Future (118669) | about 4 years ago | (#33830346)

Seriously. It's not valid to use a space in a URL, why would it be valid to use a soft-hyphen? If I type "goo gle.com" in the address bar in (Firefox 3.6.10), it takes me to "google.com"; this should be handled the same way.

Re:So how often is it used legitimately? (1)

Whyte Panther (868438) | about 4 years ago | (#33830442)

Why is it possible to even register a domain with a soft hyphen? Oh wait, Domain registrars are greedy.

Re:So how often is it used legitimately? (0)

Anonymous Coward | about 4 years ago | (#33830980)

Why is it possible someone as stupid as you can operate a computer?

RTFA

Re:So how often is it used legitimately? (1)

maxwell demon (590494) | about 4 years ago | (#33830694)

I think this is a mistake. "goo gle.com" should lead to an error.
If there is anything which should be treated by the stricted rules possible, then it's URLs.

Re:So how often is it used legitimately? (1)

Pharmboy (216950) | about 4 years ago | (#33830822)

I think this is a mistake. "goo gle.com" should lead to an error.

If you use the DNS servers of most ISPs, instead of error, you end up either going to a custom search page to which they are getting paid for the ads, or an offer to buy the domain.

Re:So how often is it used legitimately? (1)

maxwell demon (590494) | about 4 years ago | (#33830920)

Given that it's not a valid domain name (as opposed to a valid, but unregistered domain name), it shouldn't even hit the DNS server. The browser should detect it as invalid, and give you an error straight away. It should not pass it on, neither literally, nor altered.

Re:So how often is it used legitimately? (1)

Pharmboy (216950) | about 4 years ago | (#33831158)

I agree, but ISPs are making money off of it and not likely to give up that extra revenue, which is one more reason I just point to one of my own DNS servers at the office instead.

Re:So how often is it used legitimately? (2, Interesting)

jthill (303417) | about 4 years ago | (#33831314)

DNS permits everything in domain names. You can implement any restrictions you want on names you issue on your own authority, but

Implementations of the DNS protocols must not place any restrictions on the labels that can be used. In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs.

Re:So how often is it used legitimately? (2, Interesting)

ceoyoyo (59147) | about 4 years ago | (#33830394)

I would think most spam filters would do that automatically as they learn.

Symantec seems to think people still use character-for-character text matching spam filters that don't learn. Maybe Symantec products do.

Re:So how often is it used legitimately? (2, Insightful)

AltairDusk (1757788) | about 4 years ago | (#33830536)

Shouldn't be too hard for the spam filter to strip the soft hyphens then analyze the URL, I don't see this being useful to the spammers for too long unless I'm missing something.

Obligatory Kajagoogoo (0, Interesting)

Anonymous Coward | about 4 years ago | (#33830178)

Tongue-tied, (I'm) short of breath, don't even try
Try a little harder
Something's wrong, you're not naive, you must be strong
Ooh, baby, try
Hey girl, move a little closer.
You're

CHORUS:
Too shy shy
Hush hush, eye to eye
Too shy shy
Hush hush, eye to eye
Too shy shy
Hush hush, eye to eye
Too shy shy

Re:Obligatory Kajagoogoo (0, Informative)

Anonymous Coward | about 4 years ago | (#33831004)

Good job. Came here to blame Kajagoogoo for this.

Offtopic mod needs to hush hush.

Good News! (2, Interesting)

hardburn (141468) | about 4 years ago | (#33830192)

So now spam filters will pick up on soft hyphens used in URIs inside emails (when was the last time you saw one used legitimately?), making the spam easier to spot.

shy (3, Funny)

Anonymous Coward | about 4 years ago | (#33830228)

No good shysters.

Re:shy (0)

Anonymous Coward | about 4 years ago | (#33830550)

No good shysters.

No good shylocks.

HTML 5 (0, Offtopic)

Dthief (1700318) | about 4 years ago | (#33830268)

The advent of HTML 5 within the next couple years - and browsers that support it - is expected to solve many of these problems, because that specification finally standardizes how HTML code should be parsed by Web browsers, rather than leaving it up to individual platform vendors to develop their own interpretations of how the code should be parsed.

I bet 4pple is behind the spam trying to further promote 1-1TML-5.......$t3v3 J0b$ l0v3s v14gr4

SpamAssassin is not vulnerable to this (4, Informative)

Khopesh (112447) | about 4 years ago | (#33830356)

Just tested this in SpamAssassin with http ://exa ­ mple.com (spaced to evade slashdot's own obfuscation-eliminator) - Result: The URL domain (example.com) is properly extracted without the obfuscation.

That said, SA is fully capable of detecting the obfuscation attempt itself (using a rawbody rule)...

HTML 5 will save us (1)

rudy_wayne (414635) | about 4 years ago | (#33830404)

from the article:

The advent of HTML 5 within the next couple years is expected to solve many of these problems, because that specification finally standardizes how HTML code should be parsed by Web browsers, rather than leaving it up to individual platform vendors to develop their own interpretations of how the code should be parsed.

Note the use of the phrase "should be". I see this a lot when reading about HTML 5. Are people really that stupid and/or naive that they think all browsers will follow the HTML 5 spec exactly? (yes Microsoft I'm looking at you)

Re:HTML 5 will save us (1)

blair1q (305137) | about 4 years ago | (#33830466)

standards can only tell you how things should be.

they may tell you how things will break for you if you try to do things in a non-standard way, but they have no power to force you not to try.

The Wrongest Part (5, Funny)

SuperKendall (25149) | about 4 years ago | (#33830418)

The thing that really grates on the nerves, is using a soft-hypen to sell Viagra.

Re:The Wrongest Part (1)

blair1q (305137) | about 4 years ago | (#33830476)

That's why we have /.

Re:The Wrongest Part (0)

Anonymous Coward | about 4 years ago | (#33830528)

They started out using hard hyphens, but when the hard hyphen had been around for four hours they contacted their doctor who quickly helped convert it to a soft hyphen.

Re:The Wrongest Part (0)

Anonymous Coward | about 4 years ago | (#33831320)

The thing that really grates on the nerves is using a soft-hypen to sell Viagra.

You're doing it wrong; graters shouldn't go down there regardless of whether you're hard or soft...

Re:The Wrongest Part (1)

amicusNYCL (1538833) | about 4 years ago | (#33831428)

I thought it was funny when I started receiving spams for "Viagra soft tabs". I thought that's what it was supposed to cure.

Jumping humans, eh? (1)

noidentity (188756) | about 4 years ago | (#33830440)

Spammers aren't shy about jumping humans flexible cognitive abilities

I'm not too worried about flexible cognitive abilities, but jumping humans do bother me too.

mi8us 2, TRoll) (-1, Offtopic)

Anonymous Coward | about 4 years ago | (#33830450)

NIGGER community word5, don't get first organizatIon while the project BSD's filesystem

Journalism at its best (0, Offtopic)

T Murphy (1054674) | about 4 years ago | (#33830500)

Let's take the summary (copy+pasted from the article) and summarize each sentence (compare to the summary if you think I exaggerate):

Spammers are using the soft hyphen. Spammers are using the soft hyphen. ...Spammers are using the soft hyphen. "Spammers are using the soft hyphen."

Yes, each sentence says a little bit more, but it still repeats the same fact over and over. I usually don't complain about slashdot summaries, but this was honestly painful to read. Just because you copy+pasted what TFA says doesn't mean it's okay.

Re:Journalism at its best (1)

apoc.famine (621563) | about 4 years ago | (#33830784)

That's why slashdot has editors, instead of being just a user-submitted story aggregation site...

Why would soft-hyphen be legal in a URL? (1)

JSBiff (87824) | about 4 years ago | (#33830614)

I don't get how you can put a soft-hyphen in a URL and have it work? It's a formatting character, it shouldn't ever be legal to have a formatting character as part of a URL? Are they registering domain-names with soft-hyphens in the name? Or is this a case where the browser 'helpfully' replaces a soft hyphen with a regular hyphen when actually trying to connect to the web server, but for some reason does NOT render they hyphen when displaying it to the user? It seems like the browser should behave consistently - if it doesn't render a hyphen when displaying it, it shouldn't render a hyphen when making the DNS lookup.

Re:Why would soft-hyphen be legal in a URL? (0)

Anonymous Coward | about 4 years ago | (#33830982)

It doesn't, which is the whole point. To a filter, slash&shydot&hy.com doesn't look like slashdot.com, so it won't block it. To the browser, however, your URL passes through easily.

It seems like the simple solution is just to make filters ignore non-printable characters when looking for suspect URLs, but I don't write spam filters.

Re:Why would soft-hyphen be legal in a URL? (1)

John Hasler (414242) | about 4 years ago | (#33831134)

Mod parent up. Why in the hell is such a character allowed in URLs at all?

What is TFA talking about? (1)

z-j-y (1056250) | about 4 years ago | (#33830758)

It doesn't make any sense, probably just some nonsense to scare people into buying their product(symantec).

By using softhyphen in IDN, a spammer can get a spoof domain that looks like an authentic domain on screen. But how can that fool any spam filters?

Re:What is TFA talking about? (1)

John Hasler (414242) | about 4 years ago | (#33831160)

Maybe it fools Symantic's spam filters.

"to solve many of these problems" - ? (0)

Anonymous Coward | about 4 years ago | (#33831652)

The advent of HTML 5 within the next couple years - and browsers that support it - is expected to solve many of these problems, because that specification finally standardizes how HTML code should be parsed by Web browsers, rather than leaving it up to individual platform vendors to develop their own interpretations of how the code should be parsed.

In other words, security flaws will be baked in so that vendors can't fix them without breaking standards compliance?

terrible summary (3, Funny)

Laxori666 (748529) | about 4 years ago | (#33832106)

Is it just me or is this summary terrible? Every sentence says the same thing, just slightly reworded. In the summary, it's as if each new sentence doesn't give any additional information, but it's worded as if it does. Researchers have found that this summary is repetitive. Some say this can indicate the repetitiveness of a summary.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?