New Peer-to-Peer Designs 138
We've received a lot of peer-to-peer submissions, including the one that follows and this one. Perhaps people will post links to those systems which they think have a decent chance of solving the known problems of p2p networks? PureFiction writes "Given the recent ruling against Napster and the various happenings at the Orielly P2P conference this is a good time to mention a new type of decentralized searching network that can scale linearly and adapt to network conditions. This network is called the ALPINE Network and might be a future alternative to searching networks like Napster and Gnutella while remaining completely decentralized and responsive. You can find an overview of the system as well as a Frequently Asked Questions documents on the site."
Re:Banner ad coincidence? (Score:1)
Re:Not there yet.. (Score:1)
Now, perhaps a T1 is a wide enough pipe for, say, 100,000 users. Maybe at some point the network will scale beyond this, and you'll need a T3, etc etc. The point is not to search the entire network, but to search a large enough segment of it to find what you are looking for.
If you are searching for something extremely rare (or nonexistant) and your bandwidth is small with respect to the scope of the network you may be required to cycle your connections many times until you acheive hits. As intended, the network allows you to search at the maximum speed allowed by your bandwidth--but gives you the option of doing a long (but exhaustive) search regardless of whether you have a 14.4 or a gigabit connection.
¹You just described the Napster protocol. (Score:1)
split the search tasks up into hierarchies. you search for N inside a given range. if the result can't get found within that range, you propogate the request of the tree
Which is exactly how Napster works. And look what happened to the company that hosts the biggest nap network [napster.com].
All your hallucinogen [pineight.com] are belong to us.
Re:Taking P2P Too Far (Score:3)
Sending the same data to 10K hosts in separate packets not only doesn't scale, but it's an extremely antisocial abuse of the network
Funny, I thought web servers acted this way...
Even at 60 bytes per packet, if you're
trying to send to 10000 nodes that's 600K. Then the replies start coming in - in clumps - further clogging that pipe.
If you find the reply your looking for, then there is no need to query the remaining peers. Also, you will not clog the incoming pipe, i've covered this quite a bit, you control how many queries you send out and when, and also to which peers they are sent. The adaptive nature of the protocol ensures that successive queries will be more likely to find what they are looking for sooner.
You would only query 10,000 in a worst case scenario.
The traffic patterns ALPINE will generate are like nothing so much as a DDoS attack, with the query originator and their ISP as the victims.
No, each of these 'victims' would only receive a single 60 byte packet. This is the opposite of a DoS attack, as you are sending a large number of packets, but each peer is only receiving one of them.
Omnifarious, are a little naive, but well-known technology in mesh routing and distributed broadcast can easily enough be applied
to create and maintain self-organizing adaptive distributed broadcast trees (phew, that was a mouthful) for this purpose. Read the literature.
I understand what your getting at, but your missing the main purpose of this network. If you need to search a large number of peers for dynamic content in real time, you need to reach all of them to do it. Whether you do this using a tree/routing/forward approach, or a single peer using multiple unicast packets, you have to reach them to do it.
The design of this network is so that the resources you use are your own and that you can tailor the bandwidth, peers, and effectiveness of the search to your own preferences.
This is a highly specific network architecture with a very specific purpose using very small packets. This is why alpine can bend the common conceptions about scalability and performance and still remain efficient and scalable.
Re:Actual file transfer and anonymous usage though (Score:1)
Actual file transfer and anonymous usage thoughts. (Score:2)
What about actually getting the data? Is it transferred over DTCP as well?
ALPINE will be heavily dependant on alternate delivery systems to actually *transfer* the data located within the network. The entire ALPINE is primarilly used for information location. This is the big hole in most peer based networks, as it is probably one of the more complex tasks. Once a resource has been located, you may use OceanStore, Freenet, Swarmcast, FTP, etc, to actually retreive the data. Trying to transfer anything of decent size over DTCP would be insane.
So it sounds initially they require the user to utilize two different programs to acheive their goal: 1) the Alpine to *find* the data, 2) something else to get it. I think in order for this system to reach widespread use (especially in the Windoze community), these two functions need to be combined into one interface. Is that not what helps proliferate Napster, people who barely know how to turn on the computer and quickly find and download stuff from one program. Perhaps they will incorporate both 'features' into a final product...or did I miss that in the faq?
Secondly, doesn't this facilitate in finding an end users location? After finding the information, now I get to manually enter the IP address into FTP to connect and download. Does this not make it easier for a program to simply track down 'file X', log IP addresses to file and then resolve these IP's and hunt down the users? It would seem that in the early stages of the networks growth, it could be easily quashed by the corporate forces as the number of users would be small and easy to track down/handle. OTH, if it scales as easily as it says it does into the billions of connections...at that point it might become futile trying to track down and wrangle up everyone. Still, industries could start going after random individuals and will probably inact a new law dispensing severe penalities for those caught (probably from precents set nowdays in the Napster case).
This is where having the same program find the files and transfer them could come in handy. Instead of ever presenting the final address, perhaps it could transfer this data amongst the network in an encrypted fashion. Then when the user see a match has been found for the data/file being searched, he/she tells the program to get it. Keeping the addressing route encrypted within itself should help issue of anonymous usage (I think this was mentioned earlier already as well).
Interesting system concept anyhow (what with the multiplexing schema).
Not your normal AC.
Re:Taking P2P Too Far (Score:2)
Oh ok. But that means I start all over again with the "adaptive" process each time I 'log on'. Probably ok, since statistically I'd be getting a different group each time. (People ever realize how people in your Napster Hotlist were never on ever again)
I dont see your point. Each 10,000 ME's would have their own ISP, and would use their own bandwidth.
How do the 10,000 others query him without getting to him? They have to get to his computer somehow. Thus...10,000 searches (at a time) going through the 1 client's bandwidth. (replace 10,000 with whatever number we're working with here).
Yes, I've seen lights on when i was on Napster. But all the searching was one directional--> To Napster's server. You're bypassing that now. So that means more bandwidth coming to me.
Rader
Re:Taking P2P Too Far (Score:1)
I dont see your point. Each 10,000 ME's would have their own ISP,
WTF??? No more than one user per ISP?
and would use their own bandwidth.
My God, you are so fucking stupid. Bandwith is never entirely "your own" - unless we're talking about an isolated home LAN - it is shared with many other users of your ISP (because the ISP has only a limited number of outgoing pipes) and from there on with the other users of the intervening networks. If 1000 people on one ISP start clogging up that ISPs pipes with this crap, and that ISP has a clue, they will kick those people off.
You obviously have no clue about real world networking problems. You'd even make a bad marketroid. I suggest you become a management consultant.
Re:Not there yet.. (Score:2)
Re:What about hotline? (Score:1)
internet(internet).... (Score:1)
Amazing new P2P Idea, our problems are solved (Score:3)
It's called Archie and the meta-search tool is called Veronica. You should try it out it's amazing.
Re:Not there yet.. (Score:2)
But if a modem user can only have 200 people to be connected to, then only those 200 people can be connected to him. That means Even if I had a OC48 running and "could" handle everyone's connection, i'm still offlimits to the modem people that have already "picked" their 200 people.
If that's not the case, then maybe you're thinking that a modem person is only allowed to search 200 people's files, but can recieve searches from EVERYONE. That seems beckwards, since the network-hogging activity is the 1,000's of search requests you recieve.
Rader
P2P = Push (Score:1)
I think these companies are soon going to experience "Marimba Syndrome" - if you recall, Marimba rode the push wave of '97 until it became painfully apparent that the push emporer had no clothes, then they tried to distance themselves from it as quickly as possible and never really recovered.
Re:Not there yet.. (Score:2)
Not to mention the problem of having the searcher send out an individual query to every client it want to search. If I understand this correctly, if I want to search 3,000 hosts I have to send out 3,000 otherwise identical packets. This is not what is known as a scalable protocol. In fact, from a network point of view, it's a worst-case scenario.
Problems ahead for a Windows client (Score:3)
peer to peer will always survive (Score:3)
The main problem legally with napster is that there is a central server. That problem is being solved by having multiple and/or moving servers. This makes it much harder the levy a lawsuit against anyone.
We all know napster works, but it's illegal (or will be soon). Warez is illegal, but it will never go away because you just can't prosecute.
I'm blind (Score:1)
---
Re:Not there yet.. (Score:2)
Let's take your example...of searching only what you need. Or if it's rare, and you need to slowly search everyone. I'm just saying that a T3 guy who can handle everyone is still offlimits to the modem guy that can only handle 300 people, and he's maxed out with the 300 nodes already 'picked'.
Rader
Re:What about hotline? (Score:2)
Because Hotline is a pain. Yes, it's simple enough to use and has potential, but the reality is pretty unpleasant to use. You find that someone has something you're interested in. You have to see what particular hoops you have to jump through to get access "...go to this web site and sign up for this spam-bait..." To hell with that. I share over AudioGalaxy [audiogalaxy.com] and Napster [napster.com] and the BearShare [bearshare.com] Gnutella client because I like to share, not to try to make a nickle from people.
Probably, but if you're interested in MP3s, and are not looking for movies or warez, everything else is less of a pain.
Write to your representative (Score:2)
P2P - not so great, but.... (Score:1)
obviously most of us are smart enough to realize that "p2p" and the "2-way web" are just rephrasing of the ideals and regrouping of the same technology that the internet is based on. for those of us that know what to do already, we just see a regroup and proprietizing of services and a deviation from standards.
but then again, the web browser (and things spawned from it) is the interface Joe Blow knows well. *sure* he could run Apache, use and FTP client, use Gopher or WAIS, fiddle through IRC and Newsgroups... but all that came and went, arguably, when Netscape made the web browser big.
my dad, a 48 year old man, doesn't like to juggle a different app for every service. however, my dad could easily be an non-technophile entreprenuer or a small business owner or an engineer of some sort running some sort of over-net collaboration...
p2p is "amazing" to these people because it funnels all these other "mysterious" services into one window that they're willing to pay attention too.
p2p == buzzwords. crap. silly. etc. but, then again... lots of idea are recycled. very few things are "revolutionary" or "insanely great."
Vaporware? (Score:1)
However they are going slow as bananas - for obvious reasons. The people making p2p are legit, while most p2p users are not (pirates and like - come on, how many legit people are on napster?). The people that want to use p2p the most are the pirates - i.e. its safer... however ironically they are the ones that are willing to put in the least amount of effort...
Don't get me wrong, I want p2p - I just think that it is funny everytime someone starts complaining about how sloooooooowwww it is going when they are not willing to contribute that much.
What p2p needs is a set of rules/standards - or a company to make and release a freeware version of something and keep it updated. Ordinary programmers meeting over the internet is going to take forever - example:freenet - to produce a working product.
Yawn... [I prefer IRC or HL anyway]
Re:Taking P2P Too Far (Score:2)
Perhaps you misinterpreted what I meant with that statement.
Regardless of any peering application or network you use, you will be using bandwidth. If this application is maxed out, your using *all* of your allocated bandwidth, i.e. your pipe. This happens all the time.
ISP's continually operate at near peak usage. They dont leave lots of empty bandwidth laying around because someone *might* use it.
Also, 1000 people on alpine would be no different than 1000 people on napster, a 1000 people on freenet, etc. Show me a peering application that does not maximize use of your bandwidth in a large network.
And finally, you can tune the amount of bandwidth you use. If you want to use half of your DSL line, and leave the other half free for surfing, etc. you can. UDP gives you complete control over when and how large a packet is sent. TCP cannot do this, you can only send a buffer, and it may go out as one packet, maybe five, it may be delayed a fraction of a second, etc.
Re:Color me stupid, but (Score:1)
But you're on the right track. The difficulty is building a client that acts as a server too, while also being able to perform a distributed search of other clients.
Cheers,
Chris
Re:Taking P2P Too Far (Score:2)
You dont have to. Part of DTCP provides persistant connections. You can resume a connection when you log on, even if your IP and port changed in the interim. So, you only need to start the adaptive process whenever you create a new connection.
Thus...10,000 searches (at a time) going through the 1 client's bandwidth. (replace 10,000 with whatever number we're working with here).
Yes, you are correct. And that is where slow users would have a smaller peering group that they are connected to, as well as throttling peers who query too agressively. They can even outright ban peers who are abusing bandwith.
They may also use a proxy, which would handle the replies.
And last, you control how many peers you query and when. If you find what your looking for after querying 100 peers, then then is no need to query the rest.
Likewise, if you start getting a large number of responses, you can slow or halt the broadcast of additional queries.
Re:P2P - not so great, but.... (Score:1)
Re:What about hotline? (Score:2)
Re:Actual file transfer and anonymous usage though (Score:2)
You are correct, and they are combined. Right now a simple TCP transfer ala FTP/HTTP will be used, with additional transfer types provided using pluggable modules.
Secondly, doesn't this facilitate in finding an end users location? After finding the information, now I get to manually enter the IP address into FTP to connect and download. Does this not make it easier for a program to simply track down 'file X', log IP addresses to file and then resolve these IP's and hunt down the users?
Only if the refence you provide for the content is on your machine. You may simply provide a freenet key and the user can then obtain the file anonymously using freenet. You may provide an FTP location on some offshore server that is outside the bounds of US jurisdiction. It could be anywhere. The majority may be on your machine, but this isnt a requirement.
Instead of ever presenting the final address,
perhaps it could transfer this data amongst the network in an encrypted fashion
The final address is only used during a reply. Where you actually get the data is another issue. So, for the paranoid, they may always upload their music into freent, but locate it using Alpine.
This would be the best of both worlds for fast searching and anonymous downloading.
Re:Not there yet.. (Score:2)
This amount of bandwidth is not a constant. Each connection shares total bandwidth b and then the adaptive nature of the alpine protocol as well as filtering and throttling ensure the fair use of this limited bandwidth. You can use as much as you want, this is a configurable setting in the DTCP stack.
If you are searching for something extremely rare (or nonexistant) and your bandwidth is small with respect to the scope of the network you may be required to cycle your connections many times until you acheive hits. As intended, the network allows you to search at the maximum speed allowed by your bandwidth--but gives you the option of doing a long (but exhaustive) search regardless of whether you have a 14.4 or a gigabit connection
Yes, and this is a drawback, but no diffrent than napster for example. If you cant find your song in the 3,000 to 10,000 peers on the server your at, you can keep searching, or try a different server.
The concept of per to per (Score:2)
Re:Taking P2P Too Far (Score:3)
Increasingly, in the era of second-gen content distribution networks, they don't. Where they do, they pay dearly for the privilege of sucking up so much bandwidth. I don't think you do yourself any favors by pushing a first-gen "solution" when the second gen is already out there and some people - such as myself - are already working on gen three.
You won't get the answer until you've already sent queries to the next batch. Net result: not only are you consuming all this bandwidth and creating all this congestion, then you turn around and drop those packets on the floor. That's just adding insult to injury, as far as your upstream is concerned.
Please describe how this adaptation occurs. The details are not on your website, it's a complex problem, and I think you're just handwaving about something you don't understand.
But the intervening routers are receiving them - and the replies - in huge clumps. That's just like a DDoS.
Re:Not there yet.. (Score:2)
Thats correct. I havent gone into connection cycling, but lets say one of those 200 is a lowly rated peer as far as quality is concerned (again, this ties back to the quality metrics)
This peer would probably decide to bump him off, and give you a chance. If you turned out to be a quality peer, you would migrate towards the top of his query list, and would be less likely to be bumped off in return.
If you a rogue/leacher peer then you may end up in a situation where no one wants to allow you a connection, and your T1 goes to waste.
ALPINE, yet flat? (Score:2)
Re:Taking P2P Too Far (Score:2)
Yes, they do. Network traffic is notoriously bursty, so to accomodate peak usage a network provider does indeed overprovision so that during non-peak there's a lot of unused bandwidth. There are actually some really neat opportunities there, for a heavy-data-transfer application that's smart enough to use time-shifting and caching to move traffic off-peak.
Wrong. Protocol affects bandwidth need/usage, and a brain-dead protocol will use more bandwidth than a cleverly-designed one to accomplish the same task. That's what I and apparently several others have been trying to tell you.
You're missing the point. There's a difference between effective bandwidth and total bandwidth (which includes protocol overhead). Maximizing effective bandwidth is good; maximizing total bandwidth is antisocial, and ultimately reduces the effective bandwidth left over for getting real work done. With your protocol, all of the capacity will be sucked into a black hole doing queries, and actual downloads will slow to a crawl.
P2P vs. Client/Server - Everybody's a Server (Score:2)
Re:Peer to peer file sharing == piracy. You THIEFS (Score:2)
Sure, some apps might get stopped, but that just means we move back to something less crude. (until the next version of something nice comes out again)
Rader
Latency issues (Score:4)
What about latency? I dont want to wait 2 minutes for a reply!
Get a DSL Line! ;) Also, this is assuming you query the *entire* group. Part of the purpose of the ALPINE protocol is to adapt to the repsonse you receive during queries. The first query you
make may take 2.5 seconds. The next you may query the responsive peers first, and you may find what you are looking for in 1 second. The next query may be further refined and your peers are organized so that you find what your looking for in a fraction of a second.
You can only do this type of adaptive configuration tailored to *each* peer and their use of the network if you allow them to do the quering themselves, and order the queries themselves. This implies a direct connections to the people they are quering.
You cannot perform this type of custom adative configuration without an extreme amount of overhead in a routed architecture, thus the need for DTCP.
I do not know about you, but an awful lot of users out there do not have high speed access yet. And I can think of many folks whose first action would be to search everything.
Remember, half the population is below average.
The ALPINE Network (Score:2)
I don't know what the technical merits are, but the marketing is solid! :)
One problem... (Score:3)
Anonymous? (Score:1)
I'm not sure how important this is, but will the described "flat" structure of this system allow both source and destination to choose anonymity (assuming both ends agree to it) - If so, how can the end requesting anonymity guarantee that they really can't be traced?
Or perhaps this is just the imaginings of a madman?
--
Re:Problems ahead for a Windows client (Score:4)
How do you support 100,000 connections? Wont the host system crash long before then?
These connections are all multiplexed over a single UDP connection. This is one of the functions of DTCP, to provide a multiplexing protocol for use over UDP. The multiplexing value is 4 bytes, which allows for a theoretical maximum of over 4 billion connections.
This thing uses one single UDP socket so I don't think porting it to Windows would be too hard now would it.
Re:Taking P2P Too Far (Score:2)
No, queries only use half the bandwidth on a piep at most, and can be configured to use less.
And I dont see what your implied difference is between effective bandwidth and total bandwidth.
I can log activity on my internet interface for napster, for freenet, for gnutella, and they all max out my available bandwidth.
I guess I dont understand what part of this your implying is different? Bandwidth in the overall internet with regards to these services? Bandwidth through my ISP?
Zanshin? (Score:1)
Re:Vague searches and definition of "a reply" (Score:2)
Yes, that was a poor response on my part...
What if your query isn't an exact match to one file? For instance, I'm looking for "songs by The Offspring, in
You can query as long as you want, however, if you received 100 replies you could automatically halt querying to see if they would suffice. If you want more, you can continue. This is not an atomic operation.
Also, the adaptive features would increase the likelyhood that you would find those hundred hits faster with each successive query.
Its up to you how long, how far, and how fast you want to query.
From every single user who's searching. Say a user searches a 20,000 user network once every 10 minutes (this takes into account inactive users). You'll have to handle (on the average) 2,000 queries a minute, over 60 a second. That's not even counting peak use. Can your hardware and network connection keep up?
On a DSL line you could handle 40-60 queries per second, although, you would likely have a smaller network than this, or at least trottle or exclude the really noisy peers.
A DSL line can handle over a thousand a second. This shouldn't be a problem, and again, this is all configurable and adaptable. You get to make the rules.
But whenever I think of the obvious solution to this problem (proxies that cache search requests for a group of users), I realize that such a topology would be equivalent to that of the existing OpenNap network
True, and there are specific instances where a multitude of other system would be more efficient and response than the alpine network.
This is not intended to be the be-all-end-all of peer searching. It is simply a usable completely decentralized searching network, and in some instances, this may be a very nice solution.
I can think of quite a few better ways to get certain things or lcoate certain things,
Google, AltaVista, etc,
OpenNap, Napster,
etc. etc... but none of these are completely decentralized searching networks. That is where ALPINE is intended to function.
Re:Taking P2P Too Far (Score:2)
Well, *if* I am the only one to query. In the general case, they would receive 60 bytes packets for every query done in the network.
This is the major flaw of all gnutella-like systems. If only the client knows what is on its disks, then you can kiss scalability good bye, no matter how hard you try.
Cheers,
--fred
Are we focusing on the wrong problem? (Score:1)
My take on it (and I'm sure there are others that agree.. I spent a few nights up 'till 2am reading thesis papers on this stuff) is how you CATALOG the data. Filename keyword searches suck -- really.
I log onto Napster or another P2P network and I already know what I want because the radio has crammed some stuff my way I like. What I would like to see is mathematical representations of music "feeling" really. If I like a song I want to be able to grab it's feeling signature, plop that into a search engine and say I want songs that deviate 5% or so from that. It's outside the realm of P2P technology... but it relates to it. I don't want to sift through artist after artist to find something I "jive" with.
This really applies to all media... take back ground pictures for my desktop. I like certain things, generally blue/green stuff (it's a personality thing). Discovering ways to mathematically represent this "mood" of something I think is totally essential for P2P to take off... and if it does take off people now have a way of finding stuff they like in a concrete mathemtical way.
It's far fetched... but it's very exciting to me. I've considered going back to college simply to expound my mathematical knowledge (instead of computer science) to get this kind of handle on things. It's "uber cool" in my mind... anybody else think so?
Justin Buist
Re:Latency issues (Score:2)
Below *median*
Cheers,
--fred
Re:Taking P2P Too Far (Score:2)
creating all this congestion, then you turn around and drop those packets on the floor. That's just adding insult to injury, as far as your upstream is concerned.
No, there is no batch. The query process is iterative, and can be halted, slowed, at any point in time. While there may be a dozen to a few hundred packets in transit before you start receiving replies, you can slow or stop the process once you see that you have enough replies, or that you have found what your looking for, or just want to cancel.
Please describe how this adaptation occurs. The details are not on your website, it's a complex problem, and I think you're just handwaving
about something you don't understand.
Sure, there are various criteria that indicate a bad or good peer. These include, among other things:
- Did the peer respond to your query?
- Did the peer misrepresent the response?
- Is the file or resource valid?
- Is the peer sending you too many queries?
Etc. The various properties, and other, control where in the list of peers to search an individual peer is located. A high quality peer, who often responds, has quality files, will be queried long before a peer that never responds will.
For negative behavior there is even ban lists and so forth to prevent them from bothering you further.
But the intervening routers are receiving them - and the replies - in huge clumps. That's just like a DDoS.
Only your initial upstream router is receiving them, and from there the packets fan out to their respective destinations. Any any ISP that cannot handle the bandwidth generated by a customer has much more major problems.
Re:The concept of per to per (Score:1)
Re:Banner ad coincidence? (Score:2)
Rader
¹So ALPINE users should pack up and move? (Score:1)
if you received 100 replies you could automatically halt querying to see if they would suffice.
Sorry, I realized that as soon as I hit submit.
A DSL line can handle over a thousand a second. This shouldn't be a problem, and again, this is all configurable and adaptable. You get to make the rules.
But a good dial-up connection runs at only 50 kilobit/s. Are you saying that users who want to use ALPINE should pack up and move to an area where DSL is available?
This is not intended to be the be-all-end-all of peer searching. It is simply a usable completely decentralized searching network ... I can think of quite a few better ways to get certain things
Not to put down the ALPINE system, but I'm beginning to think a completely decentralized approach just won't work for dial-up users.
All your hallucinogen [pineight.com] are belong to us.
Re:So... (Score:1)
No way they'd resist...Kill someone, or have hot sex...hmmm..
Problem solved. Till 10 minutes later .
Rader
Re:Vaporware? (Score:1)
We released beta 1 at the p2p conference in SF today and things are moving at the speed of light in development. ESP Worldwide is funding the development and the program is free and open source (don't ask how we make money its too complex
Re:Latency issues (Score:2)
Well, true, although with a genuine bell curve in a very large normal population, the results are identical.
(I gotta fix this caffeine deficiency problem I have from time to time.)
Ever heard of CGI? (Score:2)
--
Peer to Peer networks are illegal. (Score:2)
Freedom of depress. The Linux Pimp [thelinuxpimp.com]
Re:Not there yet.. (Score:2)
They cannot query everybody on the network. They can only query everyone they are connected to. So, modem users would obviously have a smaller connection pool compared to a DSL user.
If a peer they are connected to is causing too much load, they have them slow down, or they drop them entirely.
Someone on a T1 connection may indeed be able to connect to just about everyone, but they would also have the bandwidth and memory to do so.
Not to mention the problem of having the searcher send out an individual query to every client it want to search. If I understand this correctly, if I want to search 3,000 hosts I have to send out 3,000 otherwise identical packets. This is not what is known as a scalable protocol. In fact, from a network point of view, it's a worst-case scenario.
Worst case scenario is a forwarded broacast. And at any rate, 3,000 queries to find what your looking for is indeed a worst case search.
Part of the alpine protocol is the adaptive configuration of he query list so that quality peers are queried first, thus greatly increasing the chances that you dont need to query more than a few hundred to find what your looking for.
Re:¹So ALPINE users should pack up and move? (Score:2)
area where DSL is available?
They would most likely try to locate a proxy peer. They may have to gain favor by providing quality content or resources, or simply ask nicely to get a procy connection, but they are not left out in the cold.
Also, the network is still usable for modem users. It may take a few minutes to locate something, assuming you have to search everyone in your peer group, but even that may be acceptable to most users.
The adaptive configuration of searching should reduce this as much as possible, so that you may only query a few hundred peers before you locate what you're looking for.
Napster is the New Internet (Score:1)
Re:Not there yet.. (Score:2)
I have a guess I know what your answer is going to be from later posts, so let me see if I can guess. The answer is that clients will do the search, and if it fails then they will begin dropping the "lame" connections and opening new ones. The problem with this approach is that it will probably generate the same scaling issues that Gnutella faces. The overhead of the clients constantly closing and opening connections might begin to consume significant amounts of bandwidth. Clients who are "full" can still have their bandwidth eaten up by other clients asking them to open new connections. Not to mention the problem of discovering clients to connect to in the first place. The addresses of clients needs to be flooded thru the network somehow. Nobody is going to type in 200 IP address into their config! How well is this part of the protocol going to scale?
Re:Absolutely right (Score:2)
That said, there are a lot of college students with T3 quality connections. A lot of cable modem's might be suitable also.
The hard part is designing the server-server protocol so that you don't waste any bandwidth with redundant copies of the same query. When a server sends out a query, it should get sent to every other server exactly once. This is a tricky problem. Fortunately, there is a solution. I hesitate to mention it for fear that some coder who isn't half as "l33t" as he thinks he is will screw things up.
The solution is: multicast. Most major universities are connected to I2 which provides native multicast. These same connections also tend to be very fast. This solution pushes all the complexity of getting the queries broadcast out to all the servers onto the network where it belongs. The problem that needs to be solved is a multicast problem. I.e., a single host sends a single packet to a subset of hosts on the network which want to receive that packet. All the servers need to do is join a given multicast group address. When they want to issue a query, they put it into a UDP packet and send it to the group address. The network automagically sends a copy to all the other servers listening to that group address. Any server who has an answer replies back to the issuing server directly. This functionality could probably be dropped into OpenNAP with about 100 lines of code.
Right now, most folks on the Internet don't have multicast connectivity. But, if you have a tiered network, only the servers need to be on the fast multicast connections. The clients just need to connect to one of the available servers via unicast. It's so simple I'm really amazed no one has done it already. Of course, now the scalability issue has been pushed into the network's multicast implementation. I'm guessing that you could build a network with a dozen servers without I2 even noticing. They probably wouldn't complain if you cranked it up to a hundred servers. When your network reached a thousand servers, you will start to get hate mail from the network administrators. But, would there be any need for a thousand servers? Probably not until you had over a million users. BTW, I'm guestimating the number of servers you can have based on the experience of the ramen worm, which had a bug that created tables in the routers similar to a multi-thousand server network.
Re:peer to peer will always survive (Score:2)
They pose as under-cover traders (heheh) and they trade with you. Under Policy Act 5.4.11.c they log your illegal activity, turn it into a judge and then prosecute you at the $25,000 - $100,000 dollar fine for each copyright violation.
However, I think the mild trading done over the internet would be small fries compared to the assholes like me who trade 100's of albums at a time through the mail.
Rader
Inappropriate! (Score:1)
BOOM!
Inappropriate!
------
You trade bandwidth use for decentralization... (Score:1)
There are already two systems in place that would work. The first is the oldest form of P2P on the net: IRC. IRC fileswapping channels have been around for a while; the problem is that they don't have the "critical mass" of users to make them really useful. Someone, however, should write a script that reflects searches from one channel to the other. For instance, if someone sends out a search on channel 1, the bot will send the same search to channel 2. If it is sent any results, it will echo them to the original searcher. This doesn't put any bandwidth burden on the original searcher, but extends his search radius considerably, especially if the reflectors are configured to examine multiple servers (for instance, #mp3 on DALnet reflecting to #mp3 on EFnet. I don't know how you'd write it as a mIRC script, but it could be done in C...). The bots would probably be configured to keep a record of all the fileswapping channels that they have heard of. (Reflector bots should mention all the channels that they are reflecting to/for every 10 minutes or so). When the bot logs on, it should join each of its list of channels, and then determine which channel links are already being maintained. This is easy to do: send a query and see if someone reflects that query to the other channels. If any channels do not already have a reflector linking them, the bot starts bouncing queries between them. It then leaves all channels that it's not serving a function in, to cut down on its bandwidth use. Anytime a reflector hears another reflector give its periodic status report, including a line like "I am a member of channels #foo, #bar, #fnord, etc...", it should add any new channels to its list.
A system like this is halfway between Gnutella and Napster in terms of bandwidth use. (The regular users in the channels who aren't running servers could safely squelch reflectors to cut down on bandwidth.) It has no single legal point of failure: the IRC network is protected as a common carrier with a significant non-infringing use, and there are too many people running servers and reflectors to sue them.
note: I'm merely talking about technology, not endorsing or condemning its use for any specific purpose. -Entropius
ALPINE as a DDOS amplifier network (Score:2)
Now since the query is udp based I can spoof the return address to www.ebay.com. Ooops
Also you cant throttle your bandwidht on alpine. Certainly you can send out less searches but if you have 100,000 users online then there will be about 5000 searches a minute, and every 33.6 user will have to download all thoses queries.
Mathematically it doens't work out quite as badly as gnutella, but it still sucks
Re:ALPINE as a DDOS amplifier network (Score:2)
songs with a "3" in the filename. That should return every mp3 on the planet
Yes, you could, but only if they allowed the connection.
Second, they would not send an entire list of MP3s. They would send back a single packet that contains the number of hits found. like, 1,234.
To get the list of MP3s you need to do some more work.
Now since the query is udp based I can spoof the return address to www.ebay.com.
No, the return is sent to the originating connection. That is you. The handshake protocol for establishing a connection makes this as resistant to spoofing as TCP. (Which isn't perfect, but at least its better than nothing)
Also you cant throttle your bandwidht on alpine. Certainly you can send out less searches but if you have 100,000 users online then there will be about 5000 searches a minute, and every 33.6 user will have to download all thoses queries.
Again, a modem user will have fewer connections. And the response to broadcast queries is only a single packet with the number of hits found, if anything is sent at all.
The combined configuration of how many searches you perform, and how many connections you have active control how much bandwidth you use.
Re:Taking P2P Too Far (Score:2)
The more you throttle it down, the longer it takes to get past the overwhelming majority of negative response to the few positive ones, so you can have slow response because you didn't throttle your traffic or slow response because you did. Yippee.
Is a framework for collecting, collating, and using this information already thought out, or did you make up this list only in response to my query?
You need to look further than that. If you have a Napster-like number of users there will be thousands of routers out there connected to thousands of ALPINE users each generating queries. When you multiply things out to get total traffic, as was just done for Gnutella, you do get a level of traffic that will make the router owners sit up and take notice.
Piggyback IRC (Score:2)
A front end could be written so that no one even has to know that the info is being sent through IRC on the back end.
Rader
Re:Yet Another Reimplementation Of TCP Over UDP ? (Score:2)
In fact it is really a smart way of reusing the way IP provides peer to peer connections. This DTCP thing sounds like it just might work (unlike gnutella for instance).
Re:exactly. (Score:2)
A listing of what is online and offline. Mostly offline due to statistics.
How would you know what was available NOW? Not only that, but posting UP info isn't going to happen. Look at all the leeches that were on Napster. Not only that, but lets say people did try...they're still not going to reasonably post UP their changes all the time. Maybe we could automate it. But now you're never going to find Free anon web pages that can handle that.
Rader
Color me stupid, but (Score:5)
"But how do you search," I hear you cry. How do you search NOW? Google, right? Same deal here, just use DynDNS (or whatever) to get the link to stay stable.
"P2P," sheesh--it's amazing what some people think is amazing.
--
Not there yet.. (Score:4)
It is necessary to rewise the entire searching stragegy, not simply linearly reduce the number of queries.
P2P can it be stopped? (Score:2)
I know Napster this and Napster that, but we are talking about something that is much bigger. P2P sharing will always be around. Before Napster there were DCC bots on IRC and ratio FTP servers that were basically the predecesor to P2P. People upload, people download. There's just a layer between. People are getting more and more used to this type of sharing anyway.
There are hundreds of ftp server applications for Windows 98, or whatever. When a large group of people learn to put up their own ftp servers, there's nothing sponsoring this other than the end users. It's at their own risk. There may not be pretty interfaces and chat rooms anymore, but seriously, did any of you ever use that?
In the future, I see listserves with people sharing today's port and password to a community of millions.
Dissenter
Re:One problem... (Score:3)
A reply is then returned which has your masqueraded IP and port which the NAT router is using. From this point on, this masqueraded address is what you use to identify yourself.
Some systems may need to turn on loose UDP masquerade or the equivalent to allow reply packets from sources other than the initial destination to which you sent the discovery packet.
There are additional details, but the end result is NAT users are supported.
FLIPR (Score:2)
Check it out.. The server is on linux too...
Ripple in a Pond (Score:2)
Re:Taking P2P Too Far (Score:2)
Funny, I thought web servers acted this way...
A web server only sends out to it's 10,000 users. Those users aren't also web servers sending out 10,000 packets each. Web servers are getting away with murder compared to 10,000 searching ALIPINE users.
Rader
Re:Taking P2P Too Far (Score:2)
Yea, each victim only gets ONE single 60 byte packet. FROM ME. But we're talking about 10,000 users doing the same, then ALL of them will be getting 10,000 packets.
There is only one thing in the back of my mind that would support where you're going with this information you're sharing...is that your research shows that 90% of the people connected are just connected to be nice, (went to bed, etc) and they're not active. Leaving a rotating 10% of active users. (active=searching)
Rader
Re:ELF dominated by Anime (Score:2)
My main concern about Project Elf is its scaling issues using "broadcast" packets for searches...what happens if there are a million clients out there?
Ho, hum (Score:2)
BTW, I've got this great idea for a round device. You put a stick thru the middle of it and you can easily move things around. Any ideas on how to improve it?
Re:Taking P2P Too Far (Score:3)
What is an appropriate sized batch? 200 queries at once? 100? Seems like searches will take forever by stalling a query.
Sure, there are various criteria that indicate a bad or good peer. These include, among other things:
Wow, this seems like a lot of information to keep track of on the client side. Not only am I keeping track of every IP-node user out there, but I have to keeep track of it over time. In a napster-success scenario, I'd have 2 million entries to keep track of. Not only that, but it seems like a lot of wasted overhead? Even if a user doesn't have what I want, I have to compute statistics into his/her record each time.
Um...look I'm just one user. Any searching done by me, yes, is only one person's activity. But I'm logging into a group of 10,000 active users? The ISP will have to handle 10,000 user requests of ME. And you can't reiterate the B.S. about throttling search requests. That's like saying there'll be less pee in the world if we all just pee'd slower.(yes, the only analogy I could think of. I'll brb, i gotta go P)
Rader
Re:Absolutely right (Score:2)
However, once you start doing this, the popular servers will get pressured from the RIAA and be forced to shut down.
So what sized machine/bandwidth are we talking about being able to handle being on the wide "backbone" you spoke of? I'm curious to see how many people out there would be able to be part of the backbone of the system. From what I've seen, the bandwidth is more important than the speed of the computer (the ratio of computaton vs. bandwidth being pretty small, so any decent computer could handle the computations). If only T1's were a requirement, then I'd see quite an inexhaustible supply of volunteers, but if it required more than a T3, then I see an easy target for the RIAA.
Rader
Re:Not there yet.. (Score:2)
This is not the case. You only have to search unitl you *find* what your looking for. This is a big difference, and part of the ALPINE protocol is adapting to the responses and peers your communicating with to ensure that you search fewer peers each time your looking for something.
This is covered in the documents, and is a major benifit. The network adapts to your preferences and optimizes accordingly.
Anybody want to help start a project? (Score:2)
I would like to start building a P2P system based on the ideas here [slashdot.org] and The StreamModule System [omnifarious.org]. I expect that it can be built fully decentralized and completely scalable. I also want a lot of careful protocol documentation along the way so people can easily see how to works so holes can be poked before it gets too big.
Taking P2P Too Far (Score:5)
I have to admit that it's a little bit strange posting something with such a subject line from the conference hall at the O'Reilly P2P conference in SF, but I can't help myself.
Implementing a pseudo-broadcast by sending separately to all destinations is stupid. Real network designers have known this for years. First off, to send to N destinations you have to shove N packets down your local pipe, which may be narrow. Even at 60 bytes per packet, if you're trying to send to 10000 nodes that's 600K. Then the replies start coming in - in clumps - further clogging that pipe. That single UDP socket you're using does have a finite queue depth, so it will start dropping replies left and right after the first few. Well, maybe not, but only because your ISP's routers will have dropped them first because they overflowed their own queue depths.
Sending the same data to 10K hosts in separate packets not only doesn't scale, but it's an extremely antisocial abuse of the network. The traffic patterns ALPINE will generate are like nothing so much as a DDoS attack, with the query originator and their ISP as the victims. In the same Gnutella thread in which you started hyping ALPINE, some slightly clueful people were suggesting tree-based approaches. Those ideas, as stated e.g. by Omnifarious, are a little naive, but well-known technology in mesh routing and distributed broadcast can easily enough be applied to create and maintain self-organizing adaptive distributed broadcast trees (phew, that was a mouthful) for this purpose. Read the literature. The pitfalls in what you're suggesting are already so well known that they should be part of any computer-networking curriculum, and much more reasonable solutions to the same problems are only scarcely less well known. There is no need to reinvent the wheel, especially if your wheel is square.
As Clay Shirky mentioned in his talk here yesterday, "peer to peer" can be considered a little bit of a misnomer. It's a lot more about addressing and identity issues, and even more about scalability, and having N^2 connections in a network of N nodes is no route to scalability. ALPINE's scaling characteristics will be worse than Gnutella's. Pemdas made a good point [slashdot.org] that you seem to have a talent for marketing. Stick to it. Unlike Pemdas I can evaluate the technical merits of what you're proposing, and you are headed 180 degrees away from a solution.
Absolutely right (Score:3)
Just figure out how big a query is, then figure out how many queries per second have to be in the network before all of the client's bandwidth is consumed. If you estimate a query packet to be 1000 bits, your modem users max out at 56 queries per second on the network. And that's an absolute best case which will never ever be acheived in practice.
Until this problem is addressed, these networks will never scale. You have to have some hierarchy of high-bandwidth servers which get the queries and low-bandwidth clients which don't. This can still be a truly distributed network, but you have to distinguish between the machines that have the resources to handle lots and lots of queries and those that don't.
Imagine a two-level network where you have a Gnutella-style network of OpenNap servers which the napster-style clients connect to. The servers distribute the queries amongst themselves to perform the searches. Each server knows what files it's clients are sharing Napster-style, and can answer for them. With this architechture, the well-connected hosts on cable networks and dorm subnets do the heavy lifting of the searches while the dial-up clients get good performance because they aren't being clogged with a bunch of queries. The network scales better because you aren't trying to do lots of work on really slow links. Your network is also more stable because you don't have the clients (which come and go quickly) changing the topology of your "backbone".
Re:Not there yet.. (Score:2)
Similar peers that have similar content and quality service will graviate towards the top of each others query lists. Thus, these higher quality peers will be queried before the others (if the others are queried at all).
The net result is that ech query you make with success enhances the probability and speed with with the next query will be answered.
For example, napster has grown to millions of users, but whever you execute a napster query, you are only searching among a grpoup of 3,000-10,000! And these are randomly selected.
Alpine will allow you to search 3,000 to 100,000+ of *selective* peers, which you have tuned to optimial result.
Freenet *is* anonymous (Score:2)
Re:Not there yet.. (Score:2)
Someone on a T1 connection may indeed be able to connect to just about everyone, but they would also have the bandwidth and memory to do so.
You seem to be contradicting yourself. If a modem user can limit (or has to limit) the number of connections in his/her group, then how is it possible for a T1 user to have everyone in their group? Both cannot happen.
Rader
Re:Taking P2P Too Far (Score:2)
No, you only keep track of this information for the peers you are currently connected to.. This may be 3,000 to 10,000 for a napster sized group (not all one million napster users are on the same server!) or more if you have a beefy machine that can handle it.
It is entirely up to each user how many connections and how much bandwidth they wish to use.
The ISP will have to handle 10,000 user requests of ME. And you can't reiterate the B.S. about throttling search requests
I dont see your point. Each 10,000 ME's would have their own ISP, and would use their own bandwidth.
Ever watch your modem/DSL lights when your on napster? This is no diffrent, and the throttling does work, unlike TCP streaming where the bandwidth is alsways wide open (unless you excplicitly trottle sending in your application).
Re:Not there yet.. (Score:3)
Re:Banner ad coincidence? (Score:2)
fnord.
Re:peer to peer will always survive (Score:2)
I hate to do this, because it paints P2P technologies with an unethical light, but if there is ever an official P2P war, it will have the same results as the war-on-drugs, or prohibition of alcohol, or trying to keep Marijuana illegal.
I truly believe that pot will (eventually) become legal to grow, and smoke, and the governments will tax it heavily (as they do tobacco) and profit from it. I'm not HOPEFUL that this will happen, nor am I opposed to it. I just believe it will happen.
The "war-on-drugs" is mildly successful, but, if I wanted to go our and get a shot of heroin, or a cap of Mescaline tonight, I wouldn't have a whole lot of trouble finding someone to sell it to me.
And we all know how prohibition of alcohol turned out.
Warez is illegal, but it will never go away because you just can't prosecute.
You _can_ prosecute, it's just difficult. It's a losing battle. Prosecuting one person in one town isn't going to solve anything, and prosecuting too many people just becomes ultimately more expensive than the projected "loss" by individuals 'pirating' your software.
It's like arresting one junkie for possession. It doesn't solve the problem. Our prisions just aren't big enough to hold everyone who violates the law, which is why we have varying levels of prosecution.
Re:Not there yet.. (Score:2)
It would be very unlikely, but all that would need to occur is that one of the 10,000 connections that every peer has would be to the T1 server. The rest of the connections may be to random peers, but the T1 user would still be connected to everyone, while everyone else maintains only 10,000 connections.
Re:Not there yet.. (Score:2)
1. no standard (growing) packet sizes leading to real delivery failures and more.
2. packets can not interact with each other and cancel each other, so only one packet can be sent to query the entire network. This is not bad but drastically reduces searching speed, since the packet will have to traverse the entire network and return to you. Also the packet will have to keep trace of the entire rout with every traversed nodes (imagine the size of the packet by the 100'th node) it'll probably be lost if a recieving user node is dropped before the package is redirected... how long are you willing to wait for a response for your search query?
3. the worst part is that there is no heuristic for the search, just because your packet is on node A and nodes B and C are connected to node A, there is no way to predict which direction to go, there is no preference in B and C.
But there is still hope, it should be possible to build network where the search is done on a number of self proclaimed servers that index the rest of the network. These servers must have a number of clones so that no info is lost once server goes off line and the distributed index should be able to update and redistribute itself. This will reduce the total number of search packets sent within the network. Primitive example: Imagine 26 nodes on the network, each one has info on all files stored on the net that start with a particular letter of English alphabet. The servers a cloned a few times and your queries go to the closest server that has info on your query that (for example) starts with a particular letter.......
Boink!!! (Score:2)
Vague searches and definition of "a reply" (Score:2)
Funny, I thought web servers acted this way
And they're on high-speed T3 or OCx connections to the Internet, connections that are designed to handle such a load.
If you find the reply your looking for, then there is no need to query the remaining peers
What if your query isn't an exact match to one file? For instance, I'm looking for "songs by The Offspring, in .ogg [xiph.org] or .mp3 format, at bitrate >= 160 kilobit/s," in whatever query language the system uses. (I picked a random P2P-friendly band.) I'm not "Feeling Lucky [google.com]"; I know my query is vague, but I want to survey the net around me and see what Offspring tracks are on hosts close to mine. The reply is the set of results I get back, not just the chronologically first element.
If, on the other hand, I typed in "artist contains Offspring, title contains Pretty Fly, length within +/- 3 s of [whatever the real length is], Ogg Vorbis format, bitrate 160-192 kbps, on a persistent connection," I would accept a "first reply" response.
No, each of these 'victims' would only receive a single 60 byte packet
From every single user who's searching. Say a user searches a 20,000 user network once every 10 minutes (this takes into account inactive users). You'll have to handle (on the average) 2,000 queries a minute, over 60 a second. That's not even counting peak use. Can your hardware and network connection keep up?
But whenever I think of the obvious solution to this problem (proxies that cache search requests for a group of users), I realize that such a topology would be equivalent to that of the existing OpenNap network.
All your hallucinogen [pineight.com] are belong to us.
How about pier to pier (Score:2)
ELF dominated by Anime (Score:2)