Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Question: Email links, encryption, and radom.

Chacham (981) writes | more than 10 years ago

User Journal 33

On a permission-based list, the URLs have tracking numbers. The question is, what to use?

Generating a random number for each email is quick and easy, but guaranteeing uniqueness is not as quick. It gets increasingly longer as the space is used, and adding more digits can hurt URL length.

Since every email sent is recorded, each already has a unique number. The problem is predictability.

On a permission-based list, the URLs have tracking numbers. The question is, what to use?

Generating a random number for each email is quick and easy, but guaranteeing uniqueness is not as quick. It gets increasingly longer as the space is used, and adding more digits can hurt URL length.

Since every email sent is recorded, each already has a unique number. The problem is predictability.

Two solutions were provided. One, encrypting of the unique number. Second, using the unique number and a non-verified random number.

Encryption is kind of kewl. But decryption takes time (especially when many links are hit at the same time) and hard as it is, there is a single point of failure, for any links that use the same encryption key. The neat upside is, length is lower, and nothing more needs to be stored.

The unique number seems odd. It gives a counter in the table, it adds what needs to be stored, and takes time generating the number.

What do you think?

cancel ×

33 comments

Sorry! There are no comments related to the filter you selected.

Hashing (2, Informative)

Cyberdyne (104305) | more than 10 years ago | (#7717568)

Solution: use a counter, coupled with a hash of that counter with a secret prefix. Checking the hash corresponds to that number is trivial, deriving the secret from the hash and number is non-feasible for a secure hash algorithm.

Re:Hashing (1)

Coventry (3779) | more than 10 years ago | (#7722426)

This is exactly how to do it :) Make sure your secret prefix (or suffix) isn't too short. a md5 has is really hard to decode unless you know exactly how it was made and have all but one of the pieces of the puzzle put through the digest in the first place.

These examples assume you have a db or other storage medium you are storing the countervariable in, matched against the recipient.

in python:
import md5

def onewayid(countervariable):
mymd5 = md5.new(str(countervariable) + 'this is your secret string')
return mymd5.hexdigest()
---
in php:
function onewayid(countervariable)
{
return md5( countervariable . 'this is your secret string');
}
---
in perl:
use Digest::Perl::MD5 'md5_hex';

sub onewayid
{
$countervariable = $_;
$result = $countervariable . 'this is your secret string';
return md5_hex $result;
}
---

Other examples available upon request.

And please don't use crypt - at least not the traditional two character salt version - it's horribly insecure, even for a simple one-way hash like this.

Re:Hashing (1)

Chacham (981) | more than 10 years ago | (#7724696)

The only issue i'd have with hash is the single point of failure. If someone somehow does figure out the key, he then can figure out all subsequent hashes.

Re:Hashing (1)

Coventry (3779) | more than 10 years ago | (#7725966)

crypt is hashing, just a weaker version than what md5 uses.

When the user on the other end doesn't even know the length of the plaintext, even knowing what id was included with the plaintext is useless. It is _very_ hard to decompose a md5, as-in very computationaly intensive. In some ways, due to the unknown plaintext length issue, its harder than (small key, 128) rsa encryption.

If you've read about issues with md5 hashes and, say, APOP authentication, then let me address that: APOP sends the prefix in cleartext over the socket To the client. Thus, to decode, you have a semi-known length (you know the password will realistically be between 2 and 10 characters) for the plaintext, and you know the majority of the plaintext. At that point, it's easy to write a script throwing random character sequences and known common passwords onto the end of the prefix and calculating the md5 for each combo.

Your situtation isn't like this. Your secret can be of any length you desire, and any cracker trying to decode it will not have the countervariable.

If you are still worried, then store the hash with the required info to verify it - then you can use a new suffix/prefix each time. Adding the current timestamp to your plaintext should be fine - a unique id, a secret suffix/prefix, and a timestamp (semi-unique).

It is much much more likely (though still hard to do) that someone will setup random hexidecimal strings to simulate md5s and try to brute force their way in that way. But, you're only talking 340282366920938463463374607431768211456 possible md5 strings...

Perhaps ask for authentication when the link is used at the end-page? Of course, over ssl.

Re:Hashing (1)

Chacham (981) | more than 10 years ago | (#7727847)

Not to say that people could figure it out. Merely, the issue that there is a single point of failure. Sure, its extremely unlikely to figure out, but wouldn't it be better to just not have the single point at all?

If you are still worried, then store the hash with the required info to verify it - then you can use a new suffix/prefix each time. Adding the current timestamp to your plaintext should be fine - a unique id, a secret suffix/prefix, and a timestamp (semi-unique).

But, then (storing it), is that any different than the unqiue id and a (unverified) random id?

Re:Hashing (1)

Coventry (3779) | more than 10 years ago | (#7728746)

The onyl way to avoid a single point is to do authentication on top :(

Storing a has is more secure than just passing a unique id and a random id - unless your random id is going to have a very significant range, otherwise a md5 is harder to guess.

Re:Hashing (1)

Chacham (981) | more than 10 years ago | (#7729676)

As long as it is the same size as an md5?

Re:Hashing (1)

Coventry (3779) | more than 10 years ago | (#7729939)

Same size or larger - remember that a md5 is always 16 bytes (displayed as 32 in hex), meanwhile a random id is only as big as the column type you declare... in mySQL the largest number column type is a DOUBLE or BIGINT, each of which is limited to 8 bytes. Numeric and Decimal columns are limited to a double's prescision.

Re:Hashing (1)

Chacham (981) | more than 10 years ago | (#7730038)

But, the random id can be letters, or numbers converted to base anything. Then, stored in a VARCHAR.

And, there is an attempt at keeping the amount characters to a minimum, to avoid wrapped links.

Re:Hashing (1)

Coventry (3779) | more than 10 years ago | (#7730465)

True, you can make it anything, and do base64, for example, on it to reduce display length while keeping it non binary. What I'm trying to point out is several point about doing such a thing:

- Your random number will need to have a large range to be more effective than a md5, more than 16 bytes worth. Thats big.
- finding a random number generator that will provide you with such precision can be done, but it will be slow.
- random number generation and hashing have a lot in common - no random number generator ever produces truely random numbers. Instead, such a function generates a sequences of psuedo random numbers from a given seed using math tricks. If you use the same random number generator twice in a row, with the same seed value, you get the same sequence of numbers. Using a timestamp as a seed will give you a different sequence to use each time you generate a link - but unless you use a rare random number generating algorithm, its just as weak and just as much a single point of failure as a md5. Some OSes attempt to provide truely random number generation at the kenrel level, in which large amounts of entropy are gathered and used in the seed from various, hard to reproduce system sources. My point is, however, that the random number generating algorithm will be the single point of failure at that point - just as a md5. Unlike a md5 however, most random number generators have a distinctive signature - in other words, given a sequence of generated numbers, one can eventually determin the algorithm used... and from there, predict the next number. Ouch.
This sort of prediction is actually easier (giving a sequence of significant length) than reversing a md5 into plaintext, since the plaintext length is not known by the attacker. I am taking into account the idea that the md5 attacker would have access to just as many hashes as the random-id attacker would have ids.
Note that you can add in all sorts of goodies into the text being used for a md5 - the destination email address, the id of the user the link is for, your favorite pet's name - and make it as long as you like. Things like the pet's name, your favorite movie quote, etc, all add to the length of the text - which make's the attacker's life harder. Very short plaintext's are the only easy ones to decode. In fact, if you store the hash associated with a user id and particular link, you never even have to send the user id in your link in the first place. This is analogous to keeping the user id in a session object as opposed to just being in the url string or post data on your website where anyone can see it (and try to mess with it). The less your attacker knows about what your data looks like, the better.

As a single point of failure, both forms stink. Anyone could intercept the link sent to either user. Only authentication at the destination url can help prevent that sort of attack.

What is this for, anyway?

Re:Hashing (1)

Chacham (981) | more than 10 years ago | (#7731073)

I just read all that you said. Interesting. I'll have to give myself time to chew on it.

[Also, I just re-read what i wrote. Please accept my confidence as a belief in my positions (and trust in my intuition) not as arrogance.]

Though, a couple immediate points.

the random number generating algorithm will be the single point of failure at that point - just as a md5. Unlike a md5 however, most random number generators have a distinctive signature - in other words, given a sequence of generated numbers, one can eventually determin the algorithm used... and from there, predict the next number. Ouch.

I disagree. If the random number is seeding each time by the current timestamp, the user would have to know the exact time that was generated, what other jobs the machine was running (affecting the time the seed is given), and the order, otherwise knowing what time the job was running doesn't help.

MD5, however, in the end is predictable. If one knows the method of hash used. The hash could then simply be redone by the attacker, as long as what is hashed is also sent.

IOW, for a single point of failure with a random number, the entire process must be known and replicated. With a hash, it merely must be known.

Good point though.

As for the length. Once the length is set, *any* method can choose a series of digits of that length. Using random, guarantees that all possible sequences could be used. A hash has no such guarantee.

The less your attacker knows about what your data looks like, the better.

Very, very true.

As a single point of failure, both forms stink. Anyone could intercept the link sent to either user. Only authentication at the destination url can help prevent that sort of attack.

True. But there is only so much the user can be expected to do. One can send to their email adress, and hope for the best.

What is this for, anyway?

Permission-based email. For some things, the exact user much be distinguishable (bounces, deals, referrals, etc..)

Re:Hashing (1)

Coventry (3779) | more than 10 years ago | (#7732167)

I disagree. If the random number is seeding each time by the current timestamp, the user would have to know the exact time that was generated, what other jobs the machine was running (affecting the time the seed is given), and the order, otherwise knowing what time the job was running doesn't help

I had originally thought you would be sending out many mails at the same instant - a newletter or some sort, in which one seed would be used, hence my talk of a sequence from a single initial seed. Using a different seed each time is much better; However, there are only so many algorthms to choose from - and knowning the aprox time the mesage was sent, an attacker could work backwards to try different generators on each millisecond in a 5 seconde period, rather easily.. time consuming, but still not imposible.

MD5, however, in the end is predictable. If one knows the method of hash used. The hash could then simply be redone by the attacker, as long as what is hashed is also sent.

Ah, but you don't want to send the original text pre-hash to anyone, you keep it to yourself (just as you would a random id) and either know how to rebuild it, or have it stored. This is where the 'secret phrase' and such comes into play. The has algorthm may be known, but the length and type of info used in the hash will not be known. Grab a webpage at random and feed it through a md5. Then change the first letter of the page and try again, and so on - simulating changing a single value (the countervariable) within a long string. Each md5 will be completely unique - reletive to each other. This is why md5 is used as a 'checksum' for many downloaded files. Since md5s do not contain any information about the length of the original text, figuring out the secret, encoded string is near impossible. Only in the most trivial of cases (someone md5s the string '5' or 'the') is it computationaly realistic that they will ever find the plaintext. md5ing the whole message, or the subject - that would be reproducable. md5ing the user id, a timestamp and a phrase only the server knows is near impossible to reproduce unless they've A) viewed your sourcecode and B) you didn't change the secret phrase when you compiled (versus what you release/download). If you're storing the hash, as you would a random number, you don't even have to know how to rebuild it, so it could be a hash of something completely odd that they wouldn't have access to unless they were on your server - such as the user-id, plus a timestamp, plus the text returned from a wget of slashdot (which, due to 'this page was built by a' and the # of comments, changes quite often).

md5 does not guarentee every possibly combination will be used, as you pointed out, but a flat distribution across a range is not what you get (or want!) in a random number generator. All but the best (CPU expensive) generators will tend to streak if not reseeded, and will have attractors in the algorthm that will cause 'crowding' around certain values.

I accept and understand your confidence, but I do believe that either A) you were thinking of md5 as digesting the whole message or sent text or b) you are giving random number generators too much credit and would benefit from some research into cryptographic theory (as would 99.999% of all developers, so its no big deal), which would cover random number generators and digest systems. Please try to realize that any/all arogrance I may seem to have is just how I'm expressing my experience and confidence in my own knowlege.

Re:Hashing (1)

huckda (398277) | more than 10 years ago | (#7732589)

The next time you guys hold a class send me an invite! This was a very worthwhile thread although I'm still trying to process it all!
(visual learner here)

--Huck

Re:Hashing (1)

Chacham (981) | more than 10 years ago | (#7734355)

I had originally thought you would be sending out many mails at the same instant - a newletter or some sort, in which one seed would be used,

Yes, a newsletter. However, a re-seed is more secure, and negligable timewise, especially since the generator would need to generate a new number anyway.

However, there are only so many algorthms to choose from - and knowning the aprox time the mesage was sent, an attacker could work backwards to try different generators on each millisecond in a 5 seconde period, rather easily.. time consuming, but still not imposible.

The algorithm here would very specifically be Oracle's number generator. However, knowing the sequence is still hard. Here's why.

First, there is an assumption of order in the records returned from the database. So, if one record is known (and the seed and algorithm) the rest could be figured out. However, in reality, the query returns rows in different orders (depending on caching, and when rows move) making the order more chaotic. Second, because a new seed is used for each record, the load on the machine affects the seed. This is also mostly unpredicatable, as even the temperature level in the room could affect performance

So, i would venture to say, that the cracker would have to replicate the system, not just know it.

md5ing the user id, a timestamp and a phrase only the server knows is near impossible to reproduce unless they've A) viewed your sourcecode and

That's the point. If they see it, there would be that single-point of failure. That alone should not disqualify the system. However, if another alorithm is equal to it, that point should then be considered. (Which is my main opposition argument here.)

If you're storing the hash, as you would a random number, you don't even have to know how to rebuild it, so it could be a hash of something completely odd that they wouldn't have access to unless they were on your server - such as the user-id, plus a timestamp, plus the text returned from a wget of slashdot (which, due to 'this page was built by a' and the # of comments, changes quite often).

Yeah, but grabbing a webpage takes a lot longer than a seed.

Either way, this now uses md5 as a random number generator!

but a flat distribution across a range is not what you get (or want!) in a random number generator.

I didn't think of that. Good point.

A) you were thinking of md5 as digesting the whole message or sent text

Yes. And here's why. If the text is not sent, it must be stored in the database. That text is either predictable (id) or unpredicatable, and must be stored. If strored, the reverification later must retrieve the text and md5 it. Which takes longer than just retrieving a random number and matching it. Assuming a great number of hits in an initial responce, processor time should be conserved.

you are giving random number generators too much credit and would benefit from some research into cryptographic theory (as would 99.999% of all developers, so its no big deal)

Half and half. I put confidence in it to make it mostly random, assuming a new seed is given. I would be suprised to find that Oracle's random number generator was that bad.

I may seem to have is just how I'm expressing my experience and confidence in my own knowlege.

The confidence is very important to me. I want to fight for every point, and have someone fight back just as powerfully. (According to the MBTI (and Keirsey) that's an INTJ trait. :))

Re:Hashing (1)

Coventry (3779) | more than 10 years ago | (#7737574)

ENTP here :) (on most days)

Oracle's random number generator is much better than most OS generators - If I remember corectly, only OpenBSD was significantly better.

The webpage idea was just an example :)

This sounds like an interesting system you're working on - is this day-job related or pet-project?

Re:Hashing (1)

Chacham (981) | more than 10 years ago | (#7738169)

ENTP here :) (on most days)

All the ENTPs i know are very happy people, and a joy to be around.

Oracle's random number generator is much better than most OS generators - If I remember corectly, only OpenBSD was significantly better.

Ah, good. Thanx.

The webpage idea was just an example :)

Yeah, I know. But there it is. Other than a timestamp, retrieving unique data takes a bit of time. Whereas the web data is more random, it takes that much longer to get.

This sounds like an interesting system you're working on - is this day-job related or pet-project?

I used to work for this company. Now, i am a consultant. I work with the project a lot (i designed the original schema, and did a great deal of the original coding for the logic and generation), so i have an interest in it. This particular point is not something i am working on, but i wanted to hear debate on it. They came up with the encrypting idea, and i saw the problem, and decided to answer it my own way (unique id and a random number). I was hoping someone like you would do what you just did, so the ideas are challenged.

Thanx. :)

timestamp through crypt (0)

Zarf (5735) | more than 10 years ago | (#7719317)

What's wrong with just using the timestamp run through crypt? Unique number gets encrypted producing a unique hash that's reasonably unpredictable...

I was about to explain a system I devised that did something like this but I just realized it is probably still copyrighted by my old boss so I can't tell you about it. A real shame too, I rambled on about it for three paragraphs and now this is all you get for a post. Sorry.

Re:timestamp through crypt (1)

Chacham (981) | more than 10 years ago | (#7719640)

What's wrong with just using the timestamp run through crypt? Unique number gets encrypted producing a unique hash that's reasonably unpredictable...

Why can't anyone else do the same thing? Besides, with many emails being sent every second, there would be some possibly duplication.

A real shame too, I rambled on about it for three paragraphs and now this is all you get for a post. Sorry.

Heh. I also had a much larger post first. Got two paragraphs described in detail, and then realized that they probably wouldn't like me posting that much information. :)

So, thanx. I do understand. And, i'd love to hear any ideas you could tell me.

Re:timestamp through crypt (1)

Zarf (5735) | more than 10 years ago | (#7719780)

Why can't anyone else do the same thing? Besides, with many emails being sent every second, there would be some possibly duplication.

Ah, I missed that point. So I guess I should ask: What exactly is the purpose of the scheme? Is it to prevent users from getting at things they are not supposed to? (use a login system) Is it to keep them from asking for data outside it's approved of time-frame? (login with a timer?) What exactly is it you really need to do and why? Why can't you use a login? Why can't you use referrers? Why can't you use a CGI to barf back the binary bits from a file with the right mime headers on it so that they have to pass through validation first before getting data?

Re:timestamp through crypt (1)

Chacham (981) | more than 10 years ago | (#7719917)

It's a link inside of an email.....

Re:timestamp through crypt (1)

Zarf (5735) | more than 10 years ago | (#7722691)

It's a link inside of an email.....

Yeah, so? They can't use a browser?

Re:timestamp through crypt (1)

Chacham (981) | more than 10 years ago | (#7724679)

The link has to uniquely identify the email, in a pretty non-guessable manner.

Re:timestamp through crypt (1)

Zarf (5735) | more than 10 years ago | (#7724884)

So, check-sum some part of the email that is different for each user.

Re:timestamp through crypt (1)

Chacham (981) | more than 10 years ago | (#7727792)

But how would one know what email was sent and when?

Also, if more than one email are essentially the same, the only thing different per user would be minimal.

Finally, different people do have the same name.

Re:timestamp through crypt (1)

Zarf (5735) | more than 10 years ago | (#7733047)

Check sum the header since each TO address is different. The Checksum for each header will be different. Keep the Checksum as an ID for each e-mail. Each e-mail will have a different ID even if the differences are minimal. Checksum was designed to give radically different results for very simmilar inputs.

You don't have a problem unless there are two people with the same name and the same e-mail address. If that happens I'd say that those people have a really big problem. I'd even say that it wasn't your problem to worry about.

Re:timestamp through crypt (1)

Chacham (981) | more than 10 years ago | (#7734183)

Interesting.

One issue is, though, that the to address does not alway come back. For example, if the to address is a forwarder, and the actual address bounced it, the to address in the bounce will beincorrect.

You don't have a problem unless there are two people with the same name and the same e-mail address. If that happens I'd say that those people have a really big problem. I'd even say that it wasn't your problem to worry about.

Agreed.

Re:timestamp through crypt (1)

Zarf (5735) | more than 10 years ago | (#7734677)

One issue is, though, that the to address does not alway come back. For example, if the to address is a forwarder, and the actual address bounced it, the to address in the bounce will beincorrect.

So if you can id your e-mails by this scheme, you can validate the bounce as an authentic bounce and use the checksum embedded in the e-mail as an identifier... message the original e-mail telling them they're getting removed... and remove the e-mail addy associated with the identifier.

If you want to hire me as a consultant I used to hire out at $89 an hour... I'd probably go for a lot less now. ;)

Re:timestamp through crypt (1)

Chacham (981) | more than 10 years ago | (#7735008)

So if you can id your e-mails by this scheme

Why is that better than a random id?

you can validate the bounce as an authentic bounce and use the checksum embedded in the e-mail as an identifier

Actually, that scheme won't work. Some bounces won't even return the text.

Anyway, that isn't the issue. They have bounces taken care of pretty well. It was just an example of why checksumming the email address won't work.

If you want to hire me as a consultant I used to hire out at $89 an hour... I'd probably go for a lot less now. ;)

Wish i could help. But i am a consultant as well. :) This problem was something they were dealing with, and there were different ideas. I wanted to see what others thought. That's how i can learn.

Re:timestamp through crypt (1)

Zarf (5735) | more than 10 years ago | (#7739586)

Why is that better than a random id?

Huh? Did I say it was better than a random id? I'm suggesting how you get your random id... at least that's what I thought I was suggesting last week.

If you do a crypt on something you can challenge on that something and find out if they know it without having to know or reveal the secret. So you could do the crypt, checksum, md5 or what ever on some secret and then check the thing that comes back to see if they are the same.

There are three forms of security: Challenge based as in passwords and secret handshakes, token based as in keys and id cards, and bio-metric as in facial recognition or finger prints.

If you have a token based security system which is dependant on the token of an e-mail you must have a method of validation that validates the authenticity of the e-mail. By using the pure untainted e-mail complete with user name in it you have a token that will be unique by user and by e-mail ... and ... which you can recreate to test the validity of such. So in theory you could challenge based on what was in the link id and the link id should come out the same only when you get the same data to the challenge.

Are you defending against users sharing links? Are you defending against users using the same link twice? Users sharing a login id? What is valuable on this system and what are you defending?

For most clients a timestamp plus something else like a counter or an e-mail addy sent through crypt is enough to generate a "random number" then later you can do this:

User Alice just authenticated and accessed link Phil. A link was sent to Alice at timestamp Bob. Alice plus Bob through crypt is Joe NOT Phil ... Phil was sent to Malory ... what the hell!!!

Try that with a random number.

Re:timestamp through crypt (1)

Chacham (981) | more than 10 years ago | (#7741947)

The main issue here is to uniquely identify links, and protect against them being guessed. Knowing the correct link is not too bad, but it may point where it shouldn't, or allow the user to access something not otherwise availible to him.

Re:timestamp through crypt (1)

Zarf (5735) | more than 10 years ago | (#7743238)

The main issue here is to uniquely identify links, and protect against them being guessed. Knowing the correct link is not too bad, but it may point where it shouldn't, or allow the user to access something not otherwise availible to him.

I've gone back and re-read my posts and some of the others in this JE. Really, a "random" number is fine for the id of the link in a non-guessable way. How you generate that random number is very important since you want it to be really random and really unique. There is no point at which you can escape the fact that you will have to validate users.

I say this meaning that you will have to verify the identity of a user to be certain that user "A" has not e-mailed link "A" to user "B". You must validate your user at some point independent of the link token. You don't have bio-metric validation as an option so you're stuck with usernames and passwords.

It's fundamental. The e-mail link is a token that can be easily forged so you must have a second form of validation. There is no way around this without special e-mail readers.

Re:timestamp through crypt (1)

Chacham (981) | more than 10 years ago | (#7748032)

OK, read it.

Thanx for your input. Seriously. I appreciate it.

Re:timestamp through crypt (1)

Zarf (5735) | more than 10 years ago | (#7748267)

Sure. It's good exercise.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>