Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Remote Data Access Solutions?

Cliff posted more than 7 years ago | from the your-favorite-protocol-alphabet-soup dept.

54

magoldfish asks: "Our company has several terabytes of data, typically chunked into 2-6 GB files, that we need to share among users at sites around the US. Our users range in technical skill from novice to guru, data access needs to be secure, and we require client access from multiple platforms (Linux, Mac, and Windows). These files will likely be accessed infrequently, and cached on local hard drives by our power users. We've been running WebDAV, primarily to accommodate non-savvy users and guarantee access from firewalled environments, but the users are really complaining about download speed — maybe 50 KB/sec serving from a Mac. Any suggestions for better alternatives, or comments on the pros and cons of alternative access techniques?"

Sorry! There are no comments related to the filter you selected.

frist p0st (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#16787741)

hai2u ... zomg i am first!!!!!11111oneoneone

Infrequent access = Send out dvds (3, Insightful)

LiquidCoooled (634315) | more than 7 years ago | (#16787753)

Save bandwidth, time and support headaches.

Re:Infrequent access = Send out dvds (1)

msobkow (48369) | more than 7 years ago | (#16811190)

While there is no denying the average bandwidth of a box of DVDs, there are alternatives to addressing the download speed. The 50Kb/sec download speed from their server is horrible no matter how you slice it -- far too many corporations serve up well over 100Kb/sec per client.

The problem is that high capacity solutions are not cheap, and I get the impression that there must not be the budget for those options, or no one would have deployed a 50Kb/sec server in the first place.

Aside from DVDs, you could look into a download service that can host encrypted file images. Use Serpent or Rjindael/AES256 and you shouldn't have to worry too much about someone pilfering data files from a semi-public server.

Personally I'd encrypt the data even if it was being sent out on DVD. Mail and courier packages do get lost, especially if a poorly paid employee thinks they can sell the contents of a package.

Best solution: (0)

Anonymous Coward | more than 7 years ago | (#16787763)

Livelink by Open Text [opentext.com]

How will the data be used? (4, Insightful)

kperrier (115199) | more than 7 years ago | (#16787769)

Without knowing how the data will be used it will be hard for anyone to supply you with some recommendations.

You have questions. We have questions. (0)

Anonymous Coward | more than 7 years ago | (#16787777)

"Any suggestions for better alternatives, or comments on the pros and cons of alternative access techniques?""

What's the access patterns on this data? What's being hit the most? The least? Reads? Writes? Time of day?

Uh ... VPN? (1)

0racle (667029) | more than 7 years ago | (#16787781)

Seems perfectly obvious to me. VPN's between sites and you access data as if it was on the local network. What am I missing that makes this not an option, or for that matter why isn't this the way it's done already?

Re:Uh ... VPN? (1)

AshFan (879808) | more than 7 years ago | (#16798602)

On a mac isn't that called APN?

Re:Uh ... VPN? (0)

Anonymous Coward | more than 7 years ago | (#16828296)

No.

Alternative suggestion (2, Funny)

LiquidCoooled (634315) | more than 7 years ago | (#16787785)

If the data can be broken down into smaller more logical chunks, put it in a database and let your users get the data they require.

make sure you use more than a 24bit key though ;)

One word: (4, Informative)

Pig Hogger (10379) | more than 7 years ago | (#16787791)

rsync [google.ca] .

(As always, Google is your friend).

Lookin into a WAFS solution (0)

Anonymous Coward | more than 7 years ago | (#16787795)

that will help the performance

Wide Area File Services (3, Informative)

mac123 (25118) | more than 7 years ago | (#16787801)

Sounds like a job for Wide Area File Services (WAFS).

Here's Cisco's version: WAFS [cisco.com]

Secure? (2, Interesting)

ReidMaynard (161608) | more than 7 years ago | (#16787821)

Do you really mean encrypted? If so, what's wrong with https ?

dbms (1)

TheSHAD0W (258774) | more than 7 years ago | (#16787829)

SQL servers, with access only via SSL tunnel. That means the access will be both convenient and secure.

well, (1)

thejrwr (1024073) | more than 7 years ago | (#16787855)

if you want longterm i think there are BR-RWs (Blur-Ray ReWrittable) and since they hold 50gb a piece, this might be better
(This is just my view tho)

Profiling? (4, Informative)

barnetda (42894) | more than 7 years ago | (#16787867)

I'm no network / data access guru, but this seems like a typical case of profile first, optimize later.

The idea is simple. Don't just go in and change stuff, first measure the pieces under typical load. Look where the bottle-neck is, address it, and move to the next bottle-neck. Repeat as often as needed.

Are you disk I/O bound? Buy faster disk / better controllers / spread the load over more machines / .....

Are you CPU bound? Is the CPU on your server spending so much time with I/O requests, that it has no cycles available to address additional requests? Buy more / faster / better CPUs.

Are you network bound? Which piece of the network is the hold-up? Your switch? Get a better / faster one. Your ISP? Get a fatter pipe.

Have you optimized all of these? What about setting up remote servers that are updated hourly/daily/weekly/whatever so the machine is close to the user network-wise for faster download speeds.

Some of the above adds complexity. Are you equipped to handle that complexity? Can you become equipped to handle it? If not, re-consider your options.

Hope this helps.

Cheers,
Dave

A really good point! (2, Insightful)

beaststwo (806402) | more than 7 years ago | (#16808822)

The questioner doesn't specifically have a data transfer problem, but instead a wide-area information processing problem (of which data transfer may be a part).

While the answer may reside with any of the main themes recommended by responders (improving transfer, reducing the amount of data to transfer, and eliminating the need to transfer via remote desktop solutions such as Citrix, MS Terminal Services, and VNC), the questioner really needs to define his needs. Does the data really need to be local at each site? If data needs to reside locally, does each site really need such large chunks? What are the OS platforms used? How important is privacy, data integrity, and "chain of custody"? Answering about ten basic questions can shed enough light the problem to determine which of the recommended solutions (if any) make sense.

It's tough to get a good answer without asking good questions. On the other hand, when you ask the right question, the answer is generally obvious...

Not enough information (0)

Anonymous Coward | more than 7 years ago | (#16787883)

It appears that your complaint is slow speeds, but there isn't really enough information to accurately recommend a solution.

Your are serving this data from a Mac, so the obvious question is whether this it is running OSX Server? How are the clients connected to it, via dial up VPNs, dedicated WANs, wireless, LAN, etc?

I've seen much faster WebDAV setups so I think the problem may not necesarily be confined to software.

Send out DVDs == Security Risk (1)

parvenu74 (310712) | more than 7 years ago | (#16787897)

Seding out copies on physical media becomes a monumental risk if you're dealing with information that is confidential, especially if that data is covered by Sarbanes-Oxley or HIPAA. Remote sessions either by VPN, Apple Remote Desktop (the author said it's a Mac server, right? MS Remote Desktop if not) would side-step the distrobution problem, plus the DBA's can lock down access via ACL so users only get at the data they need/are allowed.

bittorrent. (0)

Anonymous Coward | more than 7 years ago | (#16787907)

your mac is obviously bottlenecking....get bittorrent and serve the files as torrent streams. this is what p2p was built for.

Blu-Ray (1)

Dewser (853519) | more than 7 years ago | (#16787923)

hmm, probably not a great idea to move to something unstable such as that. Problem this guy has is remote users. He probably has a lot and trying to serve down Gigs of data is tough. The biggest problem is the location of this data and how fast the internet lines are. They need to have large upload capacities. So the bigger the data files, the longer it takes for the remote users to pull down. Security is great, but the problem with any secure connection is overheard, granted its not a lot, but still a factor to keep in mind. The more encryption used the more the overheard over securing each packet.

Now the next question is.. Are these users working from remote offices or from home offices? If they are in real offices then there are a few options ranging from bumping up the speed of their internet connections (WAN or otherwise) to dropping in a remote file server to keep up-to-date replicas of the data files. If these are home offices, well you will be out of luck since they are probably using some form of home broadband which has limitations. If anyone is using dial-up, just hang up the phone on them! :D

You could also always give them access to data using Citrix, unless the data is big like CAD files and such.

Hope this helps, but like others have said, its tough to give you suggestions without knowing more about what you are trying to do.

Remote Desktop Solution(s) (4, Interesting)

Gunfighter (1944) | more than 7 years ago | (#16787981)

If possible, write the app to run centrally and then use a remote desktop solution like LTSP, Citrix, or Windows Terminal Services to feed access to the app out to clients.

Re:Remote Desktop Solution(s) (1)

Degrees (220395) | more than 7 years ago | (#16793610)

Agreed - why move the data to the users, to when you can bring the users to the data?

Re:Remote Desktop Solution(s) (1)

UnrefinedLayman (185512) | more than 7 years ago | (#16820620)

Because that puts the load on the server, not on the client. If you have 10 users that are all analyzing 6 GB files and converting their contents to another format over Citrix, that server is dead.

It's a lot cheaper to have people download the files from a $4,000 server and crunch them locally than have them connect to a $50,000 server and crunch them remotely at 1/10th the speed.

Re:Remote Desktop Solution(s) (1)

Gunfighter (1944) | more than 7 years ago | (#16824862)

Depends on the application needs. Citrix servers can operate in load balancing farms, so you don't need one big server when a bunch of little ones will do. Basically, you're taking the computing horsepower off the desktop and loading it up on a server farm. If the computing is that intensive to where you have many people crunching many large files, they may need to re-evaluate their data collection and storage methods to better suit the intended end result.

To determine if a remote desktop solution is the best fit will require some studying to take into account the recurring increased bandwidth costs and costs of time delay (idle workers waiting for file transfers to complete?) vs. the one-time costs of a remote desktop solution.

If we're to help the original poster any further, we'll need more information.

more variables than the machines (1)

way2trivial (601132) | more than 7 years ago | (#16852668)

how much are you paying the people?

who says the 50k server has to be slower?

Move everyone (1)

mattboston (537016) | more than 7 years ago | (#16788019)

into the same office and upgrade everyone (including servers) to gigabit.

Depends on the situation (2, Insightful)

jp10558 (748604) | more than 7 years ago | (#16788071)

I would expect you could use something like VNC or remote X Sessions over ssh/vpn (Hamachi, OpenVPN, etc) and keep the data local.

Or, if you need it spread out for some reason, iFolder or rSync seem the best choices. However, you could also look at AFS.

Basically, you have to get the long haul data transfers down somehow, or else get faster connections.

I am reminded of the old saying... (1)

csoto (220540) | more than 7 years ago | (#16788077)

"Never underestimate the bandwidth of a station wagon full of tapes."

Of course, the latency kind of sucks, but that doesn't seem to affect your requirements. And, these days, you're just as likely to pop it in a FedEx canister, and they don't use station wagons. But the saying still holds...

More of a question (2, Insightful)

east coast (590680) | more than 7 years ago | (#16788211)

These files will likely be accessed infrequently, and cached on local hard drives by our power users.

But how often do these files need updated? Is the end user in a read only situation? How infrequent is infrequent? How many users are you talking about and what's the density of these users? Even though the access is "infrequent" is this an access that modifies data that would have to be shared across your entire user base?

Your scenario needs some gaps filled in as far as the requirements. I see a lot of people suggesting large capacity medias being shipped to the user but if this data is being updated frequently this is not a solution. If you have a large number of users and the data is not updated often you would still have to judge the frequency of sending out updates to X number of users versus the costs of having the data centralized and the infrastructure upgraded to meet the needs of these users. If you need to share changes in this data from the user end then using a physical media via postal services is going to cause problems in the coordination of the correct version of these files to be used by the other end users. and god forbid you have several users who need to update this data in a short timeframe as you're going to have disks being mailed in by users not knowing what changes were made by other users. You'd doubtlessly fall into some kind of data integrity problem and even if your talking only having users update their data every few months your still going to need to hire someone on to coordinate these efforts and insure that all the end users are getting every update in a timely fashion.

Without more information it's hard to suggest something and still be confident that we're not leading you down to a solution that is completely inappropriate.

Your chunking appears to be a problem... (2, Insightful)

jnik (1733) | more than 7 years ago | (#16788215)

Your data are in 2-6GB chunks. When a user needs a chunk, do they really need the whole chunk, or just a few megabytes of it? Downloading 6GB when you need 6MB really sucks. The solutions mentioned here all work by breaking down the chunk size. Thin client reduces the data transfer to what you need to see on the screen right then. Putting in a filesystem layer allows transfer of bits of files. Using a SQL database reduces it all to queries and the records you actually need to hit. (An aside: I've had a friend beat me over the head with "The single most expensive thing your program can do is open that database connection. Once you have it, USE IT. Make the DBM on the big beefy server do the work...they've optimized the heck out of it.")

Figure out how the data can be broken into smaller chunks and managed...that will probably indicate what sort of tech will enable things for you.

Data specifics (1)

inKubus (199753) | more than 7 years ago | (#16788241)

Depends on what you're doing with the data. Are you reading it? Writing it? Updating it? Are only some people updating it? Do they need real-time access to the database or can it be a version system?

What you need to do depends a LOT on these things above the size of the data. If it's TBs of customer data, you probably want that somewhere secure and centralized, with stored procedures to query it and return subsets to your users. If it's not private data, why not let Google crawl and cache it and get instant speed from anywhere? If it's a bunch of Access databases, you need to move to something more modern.

go with Novell (1, Informative)

Anonymous Coward | more than 7 years ago | (#16788389)

We run NetWare 6.5 (moving over to Suse Open Enterprise Servers). We use web access component of NetStorage. From Novell:"NetStorage provides simple Internet browser-based access to file storage on a Novell network. Users have secure file access from any Internet location with nothing to download or install on the user workstation. Through a browser interface, users can also access file properties and have the options of restoring recent versions and managing rights to files and folders. NetStorage lets users securely copy, move, rename, delete, read, and write files between any Internet-enabled machine and a NetWare server. In addition, NetStorage lets users access archived copies of files. NetStorage also supports Web-based access to files and folders on a NetWare server using Microsoft* Web Folders (Microsoft's implementation of WebDAV). And, with NetStorage installed on one NetWare 6.5 server, users can potentially have access to any NetWare 5 or later server anywhere on a geographically dispersed network. For mobile or remote users who need file access but don't need those files to reside on a local client, NetStorage provides an easy solution."

You can also look at iFolder (www.ifolder.com). Also look here: http://www.novell.com/coolsolutions/qna/1345.html [novell.com] .

Re:go with Novell (1)

pugdk (697845) | more than 7 years ago | (#16824222)

Correction: You *thought* you were going to move to SUSE servers. You will in reality be moving to M$ servers, hidden behind the SUSE name. So yeah, you will have *really* good interoperability with M$ products... =)

Look at Caymas (1)

jdehnert (84375) | more than 7 years ago | (#16788587)

Caymas systems (http://www.caymas.com) has a box that will allow a) simple and b) secure access. The speed is good, but bandwidth can be constricted at any point between you and the end user. For the record, I used to work at Caymas.

AOL? (2, Funny)

zcubed (916242) | more than 7 years ago | (#16788675)

Give them a call about sharing large files...oh, wait, never mind.

BitTorrent? (1)

Hyram Graff (962405) | more than 7 years ago | (#16788723)

If you're trying to send this data out to several places at once, bit torrent might be a good solution. At the very least it will reduce the ammount that needs to be pulled directly from the central source.

Amazon S3 (1)

whydna (9312) | more than 7 years ago | (#16788877)

This sounds like a reasonable use for Amazon's Simple Storage Service (S3). See http://aws.amazon.com/s3 [amazon.com] for more info, but it's a web service data storage solution that charges for usage ($0.15/GB/month for storage + $0.20/GB for transfer) that's redundant and scalable and allows you to store an "unlimitted" amount of data. You can take advantage of Amazon's infrastructure and avoid needing to hire people to maintain a fleet of storage servers.

One word (1)

C10H14N2 (640033) | more than 7 years ago | (#16789721)

Citrix.

The client works on Mac, Linux and Windows, can be installed from and runs in a web browser, you need only enough bandwidth as a VNC connection and if your connection is interrupted for whatever reason, it will save your session state without borking whatever application you happened to be working in.

Warez (1, Funny)

Anonymous Coward | more than 7 years ago | (#16790399)

Sounds like he is sharing warez!!

Use and OpenSource Distributed Storage Filesystem (1)

Cycnus (162186) | more than 7 years ago | (#16790617)

A possible approach that is fairly transparent is to use a Distributed Storage Filesystem.

Have a look at this article: http://www.linuxplanet.com/linuxplanet/reports/436 1/1/ [linuxplanet.com] then choose amongst the more mature projects: Coda http://coda.cs.cmu.edu/ [cmu.edu] and OpenAFS http://www.openafs.org/ [openafs.org] . Intermezzo looked promising but hasn't been updated in a long while so it's probably dead.

Hope this helps.

Compression? (1)

scdeimos (632778) | more than 7 years ago | (#16791066)

The 50k/s is going to be a limit of their connection. There's not much you're going to be able to do to improve the situation for that individual aside from (1) smaller chunks and (2) compression.
A lot of people here have mentioned breaking up your data into smaller chunks, which is valid and first priority.
Have you also considered serving-up a compressed version of the data, say using a .gz'd version of the data file on your server with the http/1.1 header "Content-Transfer-Encoding: gzip"? There's probably going to be *lots* of redundant data in your data files so they should compress really well - smaller chunks, quicker downloads. :)
I wouldn't have the server compressing the file on the fly, by the way, especially if it's likely to be getting a lot of requests (it will hammer the CPU to death).

Microsoft Server 2003 Enterprise + Terminal Server (1)

SilverThorn (133151) | more than 7 years ago | (#16791220)

Setup a Microsoft Server 2003, Enterprise Edition then install Terminal Services on it. You can have the remote sites hook into the terminal server making the session be local to where the data is found. I do know there are clients for Windows and Mac readily available. As for Unix, setting up VNC on a couple of XP Pro workstations might be the best workaround for accessing the data at the local/HQ site.

Curious... just how many people are we talking about that need access to the data?

Re:Microsoft Server 2003 Enterprise + Terminal Ser (2, Informative)

pavera (320634) | more than 7 years ago | (#16803098)

for Unix there is a client called rdesktop that works great (better than the native Mac client, I use it for mac instead of the microsoft rdp client).

Terminal Server / Citrix / etc. (1)

WoTG (610710) | more than 7 years ago | (#16791424)

Yeah, it depends on the needs, but Terminal Server sounds like a good idea to me. I get tired of waiting for a 20MB file over VPN, I can't imagine waiting for a GB sized file...

Constrained by file format? (1)

Bragi Ragnarson (949049) | more than 7 years ago | (#16792100)

For me it looks like you are dealing with some kind of media data (movies?). If you are constrained by file format and unable/unwilling to split those in smaller parts use local cache servers.

In each location provide small caching server that will rsync periodically to the main data source. Then tell users in each location to use the local server.

Remote Data Accesss (1)

fatalwall (873645) | more than 7 years ago | (#16792568)

I work for a company who's primary product line is a remote access strategy. http://remoteworkplace.com/ [remoteworkplace.com]

The method you seem to want to follow seems to make for a large amount of redundant data as well as being bandwidth consuming.

The package we provide allows user to securely log into a terminal server located at your main office (sometimes hosted by my company) and access a full desk top with nothing more then a web browser and Java installed on the computer.

This system is ideal as it removes the need to have copies of the data located at multiple sites other then backups. It also prevents the problem of a laptop holding sensitive files.

To learn more please follow the above url.

Thought about Citrix? (1)

Nausea (772152) | more than 7 years ago | (#16793838)

Citrix would provide you with a place for multiple users to co-exist on. Yes, there also Linux and Mac clients for it that work just fine (incl printing and the ability to do file transfers).
Unfortunately, implementing Citrix can be a bit pricey especially if the apps your users will run are 'heavy' on resources - just means you'd have to build a bigger, meaner server ($$$!). Plus it will cost you recurring yearly licensing fees... But for accessibility for remote users, you'll be hard pressed to find a more manageable solution.
Its just a suggestion so that the very large files your users manipulate would remain on your server LAN only. Needless to say, access to the Citrix box should still go over VPN for security reasons.

AFS away! (1)

bockelboy (824282) | more than 7 years ago | (#16802664)

This is a perfect job for AFS: - Mature, well-known clients available for all platforms. - You can control how much you cache on the local disk. - Control access through Kerberos, which can be transparently done on Windows (use your Active Directory server as the domain controller). - Built with global, distributed file systems in mind. - Already scales to many terrabytes at many different sites - not some half-cocked idea that someone has posted on a webpage, but never implemented.

WAN Optimization / Acceleration (1)

jwilhelm (238084) | more than 7 years ago | (#16804956)

We do this between a few sites. You didn't give much detail into how many users and how many sites, or into the file access method (CIFS, FTP, etc...?), but we do site to site VPNs between locations, and locate RiverBed boxes at each site. Depending on the site and number of users at each branch you could pick a box that works for you. We like their boxes because they are auto-sensing of other RiverBed boxes, very easy to do any additional configuration on, and have wonderful reporting.

Firewall friendly (1)

WillRobinson (159226) | more than 7 years ago | (#16810924)

Might check out Hamachi [hamachi.cc] which is a zero config vpn. I discovered this from a friend, and been very happy to keep my desktop, server, and laptop info available, versions for windows and linux.

So like ... what are they doing with the data??? (1)

prhodeslegend (136997) | more than 7 years ago | (#16820684)

ok, so what are your users doing with the data? are they updating it, is it purely read-only?

If it's read-only, and it's statistics, why dont you implement an analysis system instead. Find out what the users are reading from this data, break-it up into logical truth tables and, if possible, implement and update an OLAP cube. Allow people to remote admin/citrix/terminal services into the data via a medium like Crystal reports, analysis services and excel, um um many other pieces of software...

A VPN with AFS. (1)

rindeee (530084) | more than 7 years ago | (#16837932)

AFS has a bit (okay, a big) learning curve, but it's a great free wide are file system.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?