×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

How Facebook Stores Billions of Photos

CmdrTaco posted more than 5 years ago | from the laser-printer-and-a-warehouse-i-figure dept.

154

David Gobaud writes "Jason Sobel, the manager of infrastructure engineering at Facebook, gave an interesting presentation titled Needle in a Haystack: Efficient Storage of Billions of Photos at Stanford for the Stanford ACM. Jason explains how Facebook efficiently stores ~6.5 billion images, in 4 or 5 sizes each, totaling ~30 billion files, and a total of 540 TB and serving 475,000 images per second at peak. The presentation is now online here in the form of a Flowgram."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

154 comments

You know... (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#23935045)

...sometime I wish you guys would make comments that are worth moderating.

You know... (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#23935243)

...sometimes I wish you'd quit hitting mommy.

Re:You know... (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#23935567)

Mummy likes it.

Photos? You mean people use FB for photos too? (5, Funny)

denzacar (181829) | more than 5 years ago | (#23935047)

I thought it was created just so that you could have all your spam and silly forwards in one place.

Re:Photos? You mean people use FB for photos too? (4, Insightful)

oskard (715652) | more than 5 years ago | (#23935831)

I think you're thinking of MySpace.

If you used the service, you'd know that Facebook privacy settings are actually implemented very well. For example, I set up an account for my mother so she can look at all her siblings photos. She hasn't been bothered by anyone outside of the family, and is really enjoying the ability to communicate with everyone.

The best thing I can compare it to is AOL. Its got a built in Email clone, IM service, Forums, Groups, and of course, profiles. But unlike AOL, Facebook is just a web page. There's no lock in - its more of a resource provider than a service provider.

Re:Photos? You mean people use FB for photos too? (5, Funny)

snowraver1 (1052510) | more than 5 years ago | (#23936113)

I find it funny that you start by defending FaceBook from the following statement:
I thought it was created just so that you could have all your spam and silly forwards in one place.

Then proceed to futher prove the GP post by saying:
The best thing I can compare it to is AOL

Re:Photos? You mean people use FB for photos too? (1)

maxume (22995) | more than 5 years ago | (#23936375)

Not everyone prides themselves on using a 'cool' isp.

Re:Photos? You mean people use FB for photos too? (4, Funny)

hostyle (773991) | more than 5 years ago | (#23938573)

Me too!

Not everyone prides themselves on using a 'cool' isp.

Re:Photos? You mean people use FB for photos too? (0)

Anonymous Coward | more than 5 years ago | (#23937353)

It's interesting how you say FB privacy settings are implemented very well.

Every few days, I find myself looking at photo album of someone I HAVE NOT friended, and SHOULD NOT be able to see under any circumstances.

Re:Photos? You mean people use FB for photos too? (1)

encoderer (1060616) | more than 5 years ago | (#23938135)

You mean, unless they specifically set that album to be open to everybody. Which they can do.

By default, profiles and pictures are hidden unless you grant access, which you can do both by friending somebody and by sending/replying to a message. (However, access granted to a recipient merely by sending them a message is temporary)

Re:Photos? You mean people use FB for photos too? (1)

hostyle (773991) | more than 5 years ago | (#23938633)

I get updates like "Friend A has left a comment on photo X" with a link to the photo and comment - where photo X is in an album of person B - somebody I do not know. I can go view all the photos in that particular album. I'm not very up on how things at facebook work, but has Person B allowed full public access to their photos for me to do this?

Re:Photos? You mean people use FB for photos too? (1)

virgil_disgr4ce (909068) | more than 5 years ago | (#23938845)

In a related story, it turns out the MySpace servers are powered by a train of mules turning a mill-wheel, and the IT staff consists of a pair of quadriplegic chimpanzees.

Seriously, MySpace is some of the worst software I've ever, ever seen :)

Re:Photos? You mean people use FB for photos too? (4, Insightful)

vux984 (928602) | more than 5 years ago | (#23939585)

If you used the service, you'd know that Facebook privacy settings are actually implemented very well.

Given that I can't look at my sisters photos without signing up for an account I'd say her privacy is being 'protected' solely to induce all her friends and siblings to sacrifice theirs by joining facebook.

I set up an account for my mother so she can look at all her siblings photos.

You don't need facebook for that.

and is really enjoying the ability to communicate with everyone.

or that.

But unlike AOL, Facebook is just a web page. There's no lock in - its more of a resource provider than a service provider.

How exactly is requiring me to create and login to a facebook account to view content someone else wants me to be able to see not lockin?

That's like requiring me to create a gmail account to receive email from people with gmail accounts. Or requiring me to sign up to AOL to see websites hosted by AOL. Facebook is pretty much the definition of lock-in.

Re:Photos? You mean people use FB for photos too? (5, Insightful)

0100010001010011 (652467) | more than 5 years ago | (#23936349)

This stuff is cool either way, even if it is just "childish spam." Many of us only dream to work on something that will become this large scale.

Facebook started off (stolen idea or not) as a site with some php and a database. In the early years there were no applications or photos. They've managed to scale PHP beyond what most slashdotters will say PHP can even do. They've even contributed some of their stuff back to the PHP community. [facebook.com]

Look at some other similar 'home grown' sites that have had to quickly scale and invent stuff just to stay a float.
Archive.org has their pentabox [archive.org]
Google has their Google File System [google.com] and all of their own hard ware design.

Hopefully the site will recover. 540TB of data and 500k images per second while at the same time being able to process photos near instantly in the background to 4-5 different sizes is nothing to ignore. Fortune 500 companies could probably learn a thing or two...

Already been done. (5, Informative)

sirrube (622137) | more than 5 years ago | (#23937089)

This stuff is cool either way, even if it is just "childish spam." Many of us only dream to work on something that will become this large scale.

...

Fortune 500 companies could probably learn a thing or two...

This Fortune 500 company could teach a thing or two on this subject. [datatree.com] Since before 1999 DataTree has already did this. With over 40 billion land records online, and 600+TB of data, they deliver many millions of images daily. Not to put down FaceBook's Implementation, but DataTree does not need to run 10k webservers and 1800 SQL databases to provide images. It is nice to see the scalability factor of their design, but it does not mean that it is the most efficient way to do things, or to follow and learn from.

Re:Already been done. (1)

ehrichweiss (706417) | more than 5 years ago | (#23937153)

My question is, does DataTree offer the photos in 4-5 different sizes? That is one of the key factors here apparently.

Re:Already been done. (1)

sirrube (622137) | more than 5 years ago | (#23937591)

No, A little different implementation, they are Public land records, such as deeds,mortgages, and any other legally recorded documents. They offer them in the original size they were provided by the county recorder. The file sizes range up to several megabytes depending on if it is a assessor map etc.

Re:Already been done. (0)

Anonymous Coward | more than 5 years ago | (#23937343)

And what did that cost vs what facebook has spent?

Re:Already been done. (1)

davester666 (731373) | more than 5 years ago | (#23937501)

Obviously Datatree patented their setup, so they decided to use a very different implementation that works significantly different to avoid having to pay royalties...

Re:Already been done. (1)

x_MeRLiN_x (935994) | more than 5 years ago | (#23937727)

Serving images alone is very different from accepting 100 million uploads weekly.

Also, they were unable to withstand you linking to them.

Re:Photos? You mean people use FB for photos too? (1, Funny)

Anonymous Coward | more than 5 years ago | (#23937559)

Fortune 500 companies could probably learn a thing or two...

Hey now! I work at a Fortune 500 company and we resemble that remark!

I dunno. (5, Funny)

morgan_greywolf (835522) | more than 5 years ago | (#23935071)

But seeing as how this just got posted and already it's Slashdotted, I'll bet it's not the same way Flowgram stores its presentations.

Re:I dunno. (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#23935195)

how about not putting articles on the main page that link to something that requires registration

oye

Re:I dunno. (2, Informative)

OverlordQ (264228) | more than 5 years ago | (#23935235)

To view the slideshow . . err I mean 'flowgram' (whatever the fuck that's supposed to mean), you dont need to register.

Re:I dunno. (5, Insightful)

7 digits (986730) | more than 5 years ago | (#23939203)

In the late 90's we stopped using documents with images and text), because they had the following disadvantage:

1) Printable
2) Searchable
3) You could look over them at a glance to find information

We replaced them by the fabulous presentation with voice-over.

It removed part of the ability to scan over information, to search, and to print.

Unfortunately, it still had the disage of letting the user seek to some part of the presentation, so another iteration was needed.

Now, welcome to the 21th century. Thanks to flowgram, you don't have to worry about printing anymore (you can't), or searching (you can't), or even pausing, going forward, or doing anything (you can't).

If you get a phone call in the middle of the presentation, though luck. And of course, you have no way of knowing how long it is, how long is left, or anything. And if you miss a word or a sentence, you can always restart the presentation and listen more carefully the next time.

I must congratulate the folks over flowgram.com. It seems very hard to have some idea that could be less usable. I'm pretty sure there is someone somewhere working hard at this, and some VC will give him money for that, but, for now, if you want to put have a shitty unusbale presentation online, flowgram is the way to go.

Re:I dunno. (3, Informative)

aproposofwhat (1019098) | more than 5 years ago | (#23935497)

Not only that, but the UK Facebook site has been down most of the afternoon - some infrastructure, huh?

Re:I dunno. (4, Funny)

owlnation (858981) | more than 5 years ago | (#23936657)

And nothing of value was lost.

In other news, companies in the UK reported record productivity this afternoon.

How X Stores Billions of Photos (5, Funny)

Anonymous Coward | more than 5 years ago | (#23935099)

Ohhhh boy, queue the pr0n jokes in 3... 2... 1...

Re:How X Stores Billions of Photos (1)

vigmeister (1112659) | more than 5 years ago | (#23935693)

Ohhhh boy, queue the pr0n jokes in 3... 2... 1...

Cue the cuneiforms of cute girls with no acumen queuing up to watch Cusack's cucumber

Cheers!
--
Vig

This is AD SPAM (-1, Flamebait)

Bored MPA (1202335) | more than 5 years ago | (#23935219)

I clicked the site and was redirected to Alexa and some other site after 10-15 seconds. Wtf ftw!?

FLASH?! (3, Funny)

T-Bone-T (1048702) | more than 5 years ago | (#23935259)

"You either have javascript turned off or you have an older version of Adobe Flash."

That was an informative article but I didn't see anything about Facebook. At least there weren't ads and they kept it to one page!

Re:FLASH?! (0)

Anonymous Coward | more than 5 years ago | (#23935367)

"You either have javascript turned off or you have an older version of Adobe Flash."

I'm running Gnash (the Gnu flash player) and get the same message. :(

Re:FLASH?! (1)

legoman666 (1098377) | more than 5 years ago | (#23935665)

bah! I got the same message also.... probably because I don't have Flash installed. And am not going to install it.

Re:FLASH?! (1)

Skapare (16644) | more than 5 years ago | (#23936047)

They should teach Firefox and Opera how to play video directly. It's not much harder than displaying an image file.

Re:FLASH?! (0)

Anonymous Coward | more than 5 years ago | (#23936771)

Yeah. That's just what I was thinking to. "Wouldn't it be nice if I needed yet another plugin to stop annoying webpages from displaying videos?"

Re:FLASH?! (0)

Anonymous Coward | more than 5 years ago | (#23936783)

"They should teach Firefox and Opera"

I guess you didn't read the recent articles on Artificial Intelligence.

Re:FLASH?! (2, Insightful)

Firehed (942385) | more than 5 years ago | (#23936917)

I think it's becoming part of the HTML5 spec; however, it's tremendously more complicated due to the limitless plethora of video formats. With web-oriented images, it's almost all jpegs for photos and typically pngs for graphics, with plenty of gifs around. Tiff is a very established format but never sees use in websites since the files are stupidly large, and most other formats are specific to some editing program. With video, you've got half a dozen Quicktime formats, DivX, XviD, h.264, x264, WMV, Real, and a huge number of others (many of which are pro-oriented). Never mind the play/pause/scrubbing interface (which could become yet another CSS nightmare), the much bigger file size, the audio, auto-playing, etc.

Until there's a jpeg for video, I'd say we should leave it alone. Flash is currently fulfilling that role, and all things considered does it reasonably well given the ease of implementation.

Very interesting (3, Informative)

phase_9 (909592) | more than 5 years ago | (#23935331)

Fascinating Presentation for those of you who actually bother to watch the Hour or so of content.

Re:Very interesting (-1, Troll)

Anonymous Coward | more than 5 years ago | (#23936143)

If I'm going to watch an hour-long presentation, there better be titties.

Lots and lots of titties.

Re:Very interesting (0)

Anonymous Coward | more than 5 years ago | (#23938163)

Fascinating Use of random Capitalization in Your post.

Re:Very interesting (0)

Anonymous Coward | more than 5 years ago | (#23940069)

Fascinating Presentation for those of you who actually bother to watch the Hour or so of content.
Not really. Everything was done in the obvious way.

Hire the guy who asked about "re-inventing databases." They re-invented one layer of a B-Tree and don't realize it.

Slashdotted (5, Funny)

Rik Sweeney (471717) | more than 5 years ago | (#23935415)

Does anyone see the irony in Flowgram's demonstration?

Flowgram Guy 1: "OK, this is how Facebook stores billions of photos and serves thousands of them each second"
Flowgram Guy 2: "Cool, maybe we should implement that technology"
Flowgram Guy 1: "Why? It's not as if we're ever going to have our servers swamped with thousands of requests..."

Re:Slashdotted (1, Insightful)

elronxenu (117773) | more than 5 years ago | (#23936039)

That's for sure.

Plus, when their server (singular?) finally responded to me, it requires a later version of Flash than I have. So I can't read the presentation at all. Way to not get the word out, folks.

Re:Slashdotted (5, Funny)

Anonymous Coward | more than 5 years ago | (#23936387)

Get the latest version. I'm a problem solver.

Group photos problem... (0)

Anonymous Coward | more than 5 years ago | (#23935485)

Well I guess it helps the storage problem by disabling the ability to add photos to groups, which seems to have occurred in the last few days... (for a few people at least)

I know /.'ers don't admit to having facebook accounts, but a link in case any lurker wants to see the comments about this
issue. [facebook.com]

The peak is a paltry 0.45e6/s? (3, Funny)

vigmeister (1112659) | more than 5 years ago | (#23935503)

Let's all go look at pictures on fb from 12 noon EST to 12:05 EST. That ought to show them...

I 3 Myspace hunni!

Cheers!

flowgram sucks (0)

Anonymous Coward | more than 5 years ago | (#23935553)

alternate source?

Flowgram...? (-1)

Anonymous Coward | more than 5 years ago | (#23935699)

Flowgram, more like SLOWgram!

Transcript? (5, Insightful)

dstar (34869) | more than 5 years ago | (#23935715)

I don't suppose there's a transcript of this anywhere, is there? That + slides would be infinitely more useful....

Was Facebook stolen? (0, Offtopic)

commodoresloat (172735) | more than 5 years ago | (#23935767)

I don't know if this was posted on slashdot before and I'm too lazy to look, but this article [rollingstone.com] from Rolling Stone about the founder of Facebook seems far more interesting than a slashdotted hour long flash presentation.

Full sized images, please (5, Insightful)

bucky0 (229117) | more than 5 years ago | (#23935917)

I wish that facebook wouldn't resize its images on the backend. My friends all post pictures from parties/trips, etc.. there, and I'd love to be able to just download the full res version to send off to be printed, but facebook resizes the largest dimension to be ~600px, which is pretty worthless for printing.

Yeah yeaj. there's other sites that don't, and I post my stuff there (to flickr, personally), but convincing that one person who took the nice photo of you to do it too is near impossible.

Re:Full sized images, please (1)

darthnoodles (831210) | more than 5 years ago | (#23936003)

It's a Social Networking site, not a photo management site. If you want a full res copy of the photo just ask you friend for one.

Re:Full sized images, please (1)

bucky0 (229117) | more than 5 years ago | (#23937033)

Trust me, I do ask my friends for them, and they usually get around to it. With all the drive space they have at facebook and as fat as internet connections are for my friends (either at home or at school), the extra space/time would be insignificant compared to the utility of just being able to go, "print". Besides, if facebook integrated it in the site, they could probably make a killing on letting people print directly from the interface (and take a cut along the way)

There is already a site where my friends post cool pictures from events, conveniently commented on and indexed by the people in the picture. Why should they post to another site just for a higher resolution image.

Besides, get in a big enough group, and the person who has the picture you want invariably is the laziest person ever, who just won't do it, regardless of prodding.

Re:Full sized images, please (1)

Firehed (942385) | more than 5 years ago | (#23937041)

Aside from the fact that most photos on facebook are blurry drunken crap, the copyright and privacy settings coming from that kind of thing would get VERY weird VERY fast. Facebook profiting from selling my photos? Nuh-uh, I don't think so. If they want to display an ad alongside them as my friends view them, I'm okay with that - it's understood as part of using the service; without at least some sort of profit sharing, that would be a big no-no. Maybe if they want to tie in SmugMug or something that I can optionally use, there would be something.

Of course, the implementation cost of that versus what they'd get out of it would make it worthless (from their perspective) anyways.

Re:Full sized images, please (1)

bucky0 (229117) | more than 5 years ago | (#23937151)

>> Aside from the fact that most photos on facebook are blurry drunken crap

If they were blurry and drunken, I wouldn't want them. I'm thinking things like graduation group photos or pictures from study abroad type stuff.

>> copyright issues
They could make it opt-in or add it to the privacy settings "Allow [GROUP OF PEOPLE] to print photos". Or not even have the photo printing, just offer an 'original resolution' option. There's a number of ways they could work around that issue, the problem is that their upload APIs right now require that the images be prescaled before they're stored in the backened, so even if they did change the site, it would only work for photos uploaded using th enew API.

A friend of mine made a killing just going around to parties on the weekend (I went to a heavily greek-centered university) with a nice camera, taking pictures of people and uploading them to (I think) snapfish. I would think it would be a market facebook would want to expand to.

Re:Full sized images, please (1)

bill_mcgonigle (4333) | more than 5 years ago | (#23939045)

Has nobody done this with a Facebook app? The notable hurdle is that people would have to opt-in before uploading pictures.

Re:Full sized images, please (1)

bucky0 (229117) | more than 5 years ago | (#23939113)

That's the big thing. I'd knock something together myself, but if they are storing full resolution images in the DB, they're not exposed to the API

Re:Full sized images, please (1)

bill_mcgonigle (4333) | more than 5 years ago | (#23939285)

That's the big thing. I'd knock something together myself, but if they are storing full resolution images in the DB, they're not exposed to the API

You'd need to intercept on upload and store them on your own server. I'm not familiar enough with the Facebook API (I just read the whitepaper when it came out, that's about it) to know if you can intercept core modules.

If you have to add your own image upload app, that raises the hurdle even higher. If your model has enough value, that might not be a problem. If your model has a high enough margin, the long tail may be sufficient.

Please send me your newsletter. :)

Re:Full sized images, please (0)

Anonymous Coward | more than 5 years ago | (#23937537)

I believe that Facebook (usually) doesn't resize on the backend. When you use their Java image uploader, it resizes and compresses the photos before uploading them, making the upload faster while saving them processing time.

But yes, I wish we could store hi-res photos on their servers.

Re:Full sized images, please (1)

virgil_disgr4ce (909068) | more than 5 years ago | (#23939191)

If their total storage for photos now is 540 TB, what would it be with print-worthy resolutions? a handful of petabytes? :p

Re:Full sized images, please (1)

bucky0 (229117) | more than 5 years ago | (#23939725)

Their current photos have a maximum dimension of 604(ish) pixels on its longest side. Maybe increase by a factor of 4 or 9? Even 4.5PB isn't out of the realm of feasibility, it's only 4,500 1TB drives :)

(I know I know about the drives and the difference between server drives and consumer drives, and how contention on individual drives could bring the thing to a grinding halt)

Re:Full sized images, please (2, Informative)

Kimos (859729) | more than 5 years ago | (#23939383)

Get the Big Photo [facebook.com] application.

It's not ideal, but it works quite well. A friend of mine is a professional photographer and she puts all her work up there. Works well for her.

Re:Full sized images, please (1)

bucky0 (229117) | more than 5 years ago | (#23940137)

It's not me that's the problem, it's my friends.

Does it integrate with the core photo app so that when you hit the 'photos of xxx' or 'photos of you and xxx' button it shows both the core photos and the big photos? Can you tag users that don't have the bigphoto app? If not, then it just won't fly.

Flowgram slashdotted? (1, Funny)

Anonymous Coward | more than 5 years ago | (#23935985)

Flowgram serving 475000 /. users flawlessly , now that would be impressing.

540TB / 30 billion images (3, Informative)

Ralph Spoilsport (673134) | more than 5 years ago | (#23936011)

equals about 18k per image?

RS

Re:540TB / 30 billion images (0)

Anonymous Coward | more than 5 years ago | (#23936469)

Makes you wonder, what's the size on disk of each picture.. Just adding 1k means a lot..

Re:540TB / 30 billion images (5, Informative)

JuanCarlosII (1086993) | more than 5 years ago | (#23936543)

A quick survey of the most recent images on my profile tells me a full size image comes in at 50-60k and a standard thumbnail at ~5k so given the other sizes of thumbnail as well I'd say 18k per image is about right.

Facecook? (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#23936137)

What about Facecook [b3ta.com] though?

Not hard (5, Insightful)

mlwmohawk (801821) | more than 5 years ago | (#23936377)

While the article is slashdotted, this is not a hard problem. It has an expense involved, but it is not difficult.

So, as another poster implied, 18K per photo on average, so about 8Gig per second, peak.

So, assuming that the pictures are evenly distributed, you'd need a bunch of machines and a good number of "tubes" and a way of directing requests to the correct image server or server cluster.

So, what's the problem? Why would you think this is difficult? It's all off the shelf technology, just a bunch of it.

Re:Not hard (1)

dstar (34869) | more than 5 years ago | (#23936755)

Because the access to the pictures is _not_ evenly distributed. Worse, it's also not consistent.

Now, the question is, is it evenly distributed _enough_, or consistent _enough_. My guess is that it is, at best, _barely_ so, to the point that each backend system needs to be able to handle 2-3 times what the peak would be if it was evenly distributed; that's just a WAG, though. Hopefully the presentation answers that question.

I find (5, Insightful)

msimm (580077) | more than 5 years ago | (#23937149)

That if you plan to do it (or hope to) it helps to read the ups and down of people who already have. And it's *nice* that some take the time out (as ./ did and a number of other sites) to talk about it so that we can learn from their experience and mistakes.

But if you already know everything, by all means, shoot. But the outline that just got you modded as insightful isn't an application, didn't detail redundancy of any sort and would be a management nightmare (ie, all the interesting stuff).

I mean really, we could propose that solution to just about any web based application but that's not hardly the story is it?

When you are talking 500Tb, you hit limits (5, Insightful)

Anonymous Coward | more than 5 years ago | (#23937195)

Limits, like: Netap filers max out at 16Tb (raw) per volume, so you have to start using multiple volumes and get creative with mount points and hope you dont hit some other limit (max files/inodes, addressing limits of the os/fs, etc). The harder part is the "way of directing requests to the correct image server/cluster" you mention. Its not quite "off the shelf" technology, as you now have to implement something that can handle the 4750000+ requests per second and point them in the right direction for a single entry in a pool of 30000000000. And thats just images, you still have to route and serve the rest of the content for the pages. At those levels, a simple F5 load balancer is not going to cut it. Stacking a bunch of F5's still wont do. This will probably be distributed across several DCs stretched across distant geographical areas with some DNS magic to route traffic to locally close DCs. Keeping even the indexes in sync so the requests can be rerouted to the proper DC (if not stored locally) becomes an interesting problem to solve.

No, I dont work for them, but I do work for another company facing similar storage/distribution problems. When things get this big, its not simply "take what works and just make it bigger or get more of them", you have to start redesigning things. For a bad car analogy: its like saying a passenger train is just a bunch of greyhound busses.

tm

Re:When you are talking 500Tb, you hit limits (1)

mlwmohawk (801821) | more than 5 years ago | (#23937651)

Limits, like: Netap filers max out at 16Tb (raw) per volume,
Then use more than one.

Use multiple IP addresses and pipes. Balance the images based on popularity. Use redundant storage, hell even use rsync to keep images redundant.

None of this stuff is rocket science. It is all just an erector set.

I do this stuff for a living and there are much harder problems than this.

Now, if you were transcoding the images on the fly, that might be more fun.

Re:Not hard (4, Interesting)

funbobby (445204) | more than 5 years ago | (#23937671)

The issue isn't the number of bytes per second, it's the number of distinct requests. The data is _way_ bigger than will fit in memory, and hard disks can only do 100-150 seeks per second so you need a lot of them to serve from disk. A naive implementation will go to disk many times for a single file, because filesystems aren't designed for this many small files. So this is really an issue of getting exactly the right stuff in memory so you can serve hot content from memory, and if you go to disk you seek exactly once instead of several times.

Re:Not hard (1)

TheRaven64 (641858) | more than 5 years ago | (#23938495)

Most operating systems will do read-ahead caching. If you use something like the sendfile system call then they will swap the entire file in in a single read (if it's only 18KB then this takes a maximum of five seeks, and typically just one). It will then keep it in RAM for a bit, so repeated access to the same photo will be faster.

If they were clever, they would put related photos contiguously on disk and grab them with a single read. If they were really clever then they'd use progressive encoding so they could send smaller images by just sending the first n bytes of the larger one.

Akamai? (2, Insightful)

ruiner13 (527499) | more than 5 years ago | (#23936493)

Why don't they just use a 3rd party distributed storage system like Akamai NetStorage [akamai.com]? Then they don't have to worry about adding capacity, redundancy, etc. All they have to do is upload the picture there, and Akamai mirrors it all around the world.

Re:Akamai? (2, Informative)

Anonymous Coward | more than 5 years ago | (#23937481)

...you still have to do that part of uploading to Akamai. And if Akamai brings on a new node, it has to refresh most of the content from you anyway (yeh, its a tiered caching network that usually uploads from other nodes, but sometimes it doesnt). Cache hits from them tend to be in the 97%+ range if done right, but still, 97% of 8Gbs+ leaves 240Mb+ you have to serve. Akamai is a cache, not a content store. What you suggest is akin to saying its ok to pull the Raid array once things are loaded to RAM, cause the OS just keeps the data there. You still have to keep the storage, with redundancy and backups and the bandwidth to serve cache refreshes to Akamai. It does greatly reduce the problem, but it is not a complete solution in itself. It also does not work for most dynamic content, since it doesnt store your DB for you, those requests still have to go home, thus you still have to have the storage, capacity, DB horsepower, etc to serve the requests, including the ones that actually point the requester at Akamai for the static bits.

I dont work for FB, but a company that does make use of 3rd party caching networks for very large content distributions

Tm

Re:Akamai? (1)

ruiner13 (527499) | more than 5 years ago | (#23938397)

You're thinking of their caching service. NetStorage is different in that they actually host the physical file, they don't just cache the file for you.

FaceBook photo viewing is SLLLOOOOOWWWWWW... (1)

Ang31us (1132361) | more than 5 years ago | (#23936775)

I use FaceBook every day and looking at photo albums and pictures is horribly slow on their site. I consider their implementation an example of something that still needs improvement.

Re:FaceBook photo viewing is SLLLOOOOOWWWWWW... (0)

Anonymous Coward | more than 5 years ago | (#23937983)

I use FaceBook every day and ...
Really? And you admit it? In public? I hope you at least wash your hands afterwards.

Re:FaceBook photo viewing is SLLLOOOOOWWWWWW... (1)

delt0r (999393) | more than 5 years ago | (#23938129)

Everything on the FB site is slow, even logging in. Thats why i don't use it anymore. But my wife still uses it a lot.

Facebook needs to add more processing capacity (4, Interesting)

debest (471937) | more than 5 years ago | (#23936889)

I put some short video clips on Facebook's video application (just stuff of my daughter for my friends and family to see). These are AVI files generated by my digital camera, about 20-30MB in size, lasting about 1-1.5 minutes each.

They uploaded pretty quickly, but then they were put in a queue to be encoded for their flash player. It took over 3 days for them to be online in my profile! It seems they don't need to just have large capacity for storage, but a bunch more CPU for processing.

Re:Facebook needs to add more processing capacity (1)

lju (944654) | more than 5 years ago | (#23937907)

3 days is crazy. Fortunately for me, I've never had to wait that long. Maybe you did it at a bad time (i.e. maybe they were having technical problems.) I've posted a few videos on there ranging from a short 15 second clip to a 2 minute clip, and all had finished processing and were on my profile within an hour.

server load fixed (2, Informative)

gobaudd (897341) | more than 5 years ago | (#23937647)

we had some problems in the beginning but the server should be much better now.

User-mode GoogleFS (4, Informative)

Panaflex (13191) | more than 5 years ago | (#23939753)

(summarizing the big long presentation)

This is basically want to make a usermode GoogleFS. Their biggest problem is reducing reads - which are hampered by Posix file standards (inodes, metadata, etc...)

Instead they use a database-like index/data file arrangement. The index stays in memory and files are stored together in large contiguous spaces on a single file. It's possible to utilize a LUN for storage - but not there yet.

There... where's my cookie?

(Oddly enough - I'm writing the exact same code they are... bazaar world, eh??)

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...