Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Subversion as Automatic Software Upgrade Service?

Cliff posted more than 8 years ago | from the thinking-out-of-the-box dept.

Software 41

angel'o'sphere asks: "I'm working on a contract where the customer wants a automated, Internet-based check-for-updates, update and install system. So far we've considered a Subversion based solution. The numbers are: a typical upgrade is about 10MB in size. Usually it's about 30 to 50 new files (which have an average size of about 200kB) and 2 database files (which can be anywhere from 500MB to 2GB) that change regularly. Upgrades are released about every 3 months, and this will probably become more frequent as the system matures. The big files are the problem as we estimate about 100-300 changes in every file. The total user base is currently 2000 users, creeping up to probably 5000 over the next year, and might be finally end up at some 30,000 users. Any suggestions from the crowd about setting up a meaningful test environment? How about calculating the estimated throughput of our server farm? Does anyone know of projects that have tried something similar using an RCS or a configuration management system?""We want to support as many concurrent users as possible (bandwith is not an issue). We use an Apache front end as a load balancer and as many Subversion servers as necessary on the backend. My largest worry, from my calculations, is disk access on the Subversion server. We could not run meaningful tests, because a typical PC kills itself if you try to run more than 4 or 5 parallel Subversion clients doing an upgrade (due to insanely high disk IO, and high seek times)."

cancel ×

41 comments

Sorry! There are no comments related to the filter you selected.

rsync (5, Insightful)

¡ (97947) | more than 8 years ago | (#13580373)

Why not use rsync instead of Subversion? Subversion wasn't really designed for this, where as rsync is used for mirroring and syncing large repositories all over the place all the time.

Re:rsync (1)

davecb (6526) | more than 8 years ago | (#13580795)

Actually use both: distribute binaries as binaries, configuration and xml files as subversion files, both via rsync.

On the customer site, run a script that applies a visual file merge to any config files that have changed both places. The customer will have a good chance of recognizing changes they've made, and if there are clashes will tend to call you on the phone and ask what to do next.

--dave

Agreed, rsync rocks (2, Informative)

Anonymous Coward | more than 8 years ago | (#13580869)

I have several apps like this. One is deployed to more than a dozen locations around the country, each having roughly 5000 users. It's a mod_perl app on BSD.

My general routine: I have a "development server", and a staging farm (set up exactly like one of the customer's locations, right down to the network hardware). After changes are made and unit-tested, the changes are pushed to the staging servers using rsync. When all the various remaining tests pass, the software is pushed out to a customer's location (if they need to review the changes), or out to all locations.

Note that I use rsync to PUSH changes on a regular schedule. The apps do not ever "phone home".

My rsync script basically copies all the files except for unit tests, photoshop files, data, all that stuff, just the stuff it needs for run-time. It depends on an SSH key (which exists only on two machines and has a passphrase, so a key agent is required). It has a "fan-out" setting which allows up to N machines to be done in parallel.

Also, my app is completely relocatable and cross-platform. I can check it out in any directory on any Mac, BSD, or Linux box and get to work. I can then push my changes directly from that development area to the staging server if needed. I use CVS and Darcs but that's not important, except to note that the rsync script needs to skip those "CVS" or "_darcs" files.

Works great, very powerful. Of course I am leaving out details like choosing CVS tags, database schema migration, restarting/upgrading/installing daemons (hint, if you don't use daemontools, your apps will never be reliable), handling 3rd-party open source packages, pulling in changes that were made on the customer's machine (in an emergency for instance) etc., etc. But rsync is the core of it.

Re:rsync (1)

photon317 (208409) | more than 8 years ago | (#13580948)


I tend to agree with the parent. You might want to do version control on your software releases with subversion, but ultimately you should check out the new stable copy you want everyone upgraded to and then distribute it via other means, like rsync. rsync is particularly a good choice because it will only send the minimum amount of data neccesary to get the job done efficiently.

Re:rsync (3, Informative)

commanderfoxtrot (115784) | more than 8 years ago | (#13581548)

Subversion uses binary diffs in a similar way to rsync. The original poster pointed out bandwidth was not an issue- therefore any bandwidth advantages rsync gives (and yes, there are plenty) are meaningless.

Subversion gives excellent control (tags anyone?) of binary installations. We use it at for things way beyond the usual source code storage.

I have also found disk IO is the main killer. I would suggest looking in to caching. The subversion client sends straightforward HTTP commands to the server. I have a custom PostgreSQL backend which does some caching- in his place, I would have a Squid set up to cache some basic data fetches- obviously, you need to be careful to not cache old data but that's not hard.

So yes, Subversion is excellent for this, and with a little thought, the heavy disk IO can be reduced. Cache, cache, cache.

Re:rsync (1)

mcclungsr (74737) | more than 8 years ago | (#13581934)

If it's economically feasible in this case, I would suggest a better disk subsystem. The more spindles, the better. Something fibre channel, if possible. A memory size large enough to get to a supercached state will certainly help, but disks are cheap in quantity and using more of them in a RAID configuration is an orthodox solution to high service times.

When all you have is a hammer.... (1, Redundant)

wowbagger (69688) | more than 8 years ago | (#13580420)

This sounds to me a bit like "All I Have Is A Hammer, So Everything Is A Nail".

You want to update large files over the 'net, Files which have changes in the middle of the file.

Why use Subversion? Why not use rsync?

MOD PARENT UP! (0)

Anonymous Coward | more than 8 years ago | (#13580910)

Subversion needs a local working-copy, in other words: every file is present 2 times on the client.

Transfer file to compare, then change file (2)

Hey, Retard... (915400) | more than 8 years ago | (#13580500)

Sounds like twice the work for thrice the price.

Re:Transfer file to compare, then change file (1)

ElGameR (815688) | more than 8 years ago | (#13597275)

Not really, to compare all you need is a hash of the file, which is much smaller than the average file. And if the hashes match, theres no need to send the whole file.

Rsync? (3, Informative)

Karora (214807) | more than 8 years ago | (#13580519)

Wouldn't Rsync be better for what you want? Why do you need to be able to choose different versions to fetch?

If the files contains parts that are constant along with parts that vary then rsync will in many cases only transfer the partial file. With Subversion that won't apply for binary files, but rsync will still recognise partial matches even on those.

Re:Rsync? (1)

bran880 (84112) | more than 8 years ago | (#13580775)

also, in my experience svn is slow with large files.

Re:Rsync? (1)

halfnerd (553515) | more than 8 years ago | (#13583762)

Subversion does employ a binary delta algorithm, xdelta. Older versions used some different algorithm, but that was also capable of binary deltas.

Re:Rsync? (0)

Anonymous Coward | more than 8 years ago | (#13590276)

Subversion uses a binary delta algorithm (another poster mentioned xdelta); the issue you're thinking of applies to CVS, which only deltas text (using diff, essentially).

I agree that rsync is a better solution here, but in theory, Subversion could actually beat rsync. rsync needs to compute what portions of the file are the same on every invocation, which it does using hashes, while Subversion can rely on previous revision information to send the same information based on information stored on the servers only.

This post actually hits most of the points I raised:
http://ask.slashdot.org/comments.pl?sid=162475&cid =13580773 [slashdot.org]

So while Subversion wasn't designed for this task, you could certainly improve on rsync by adding revision history, at least on the "push" side.

Incidentally, another problem with Subversion is that it stores local copies of the working files (to allow local revert), essentially doubling storage requirements. For a huge database, that's a really bad thing. :)

times two (3, Informative)

Lord Bitman (95493) | more than 8 years ago | (#13580551)

remember that svn always uses more than double the actual space required to hold the files for a "working copy". For "one-way" updates, svn is _NOT_ the answer.

Re:times two (1)

abartlett_219 (600259) | more than 8 years ago | (#13580874)

double the actual space required to hold the files for a "working copy"

True. However, using an export (svn export), you can just get a non-working copy of the code.

rsync is probably a better solution anyway. If you want to track what went into each release, maybe a subversion backend, with a cronjob to update everything to a rsync server.

Re:times two (2, Insightful)

saurik (37804) | more than 8 years ago | (#13583600)

By non-working it should be noted that you also mean non-upgradable. Once you do an export, you dan't do an update, which makes that feature useless for this purpose.

Re:times two (1)

commanderfoxtrot (115784) | more than 8 years ago | (#13585264)

They're also looking at using compression in upcoming versions for the local "hidden" originals.

Re:times two (1)

angel'o'sphere (80593) | more than 8 years ago | (#13585831)

We know that and we accept that.

Even worse, we make (as client configuration option) a third copy to allow a local rollback to reverse changes without need for accessing the upgrade server via the internet.
angel'o'sphere

If this was in java... (3, Insightful)

hexghost (444585) | more than 8 years ago | (#13580674)

You would use java web start. Maybe you should consider writing something like it for this project?

Apt? (2, Funny)

cortana (588495) | more than 8 years ago | (#13580717)

Can the clients run dpkg and apt? A daily apt-get update && apt-get upgrade is very convenient. Server-side, you don't need anything more complicated than a web server.

Re:Apt? (1)

Jussi K. Kojootti (646145) | more than 8 years ago | (#13594200)

Why is this funny?

It's not very convenient though: apt doesn't do binary diffs as far as I know, so the 2GB file would have to be downloaded every time it's changed... With 30000 users that would be 60 terabytes per update.

Not Subversion (2, Insightful)

the eric conspiracy (20178) | more than 8 years ago | (#13580745)

rsync is excellent at this, and rdist can have benefits too if you are updating a bunch of servers at once.

How about bsdiff/patch and some scripts? (5, Interesting)

Fweeky (41046) | more than 8 years ago | (#13580773)

This is the technique used by portsnap [daemonology.net] ; basically you generate binary diffs from a known starting point, and the client keeps track of what new patches it needs to keep in sync. Since you're just serving static files, scaling it should be as easy and cheap as it gets.

rsync is highly general purpose; your servers will end up generating hashes for every n-bytes of every file for every client, which is a lot more heavyweight than just serving patches you generate once. SubVersion may be more effecient since it should know something about the files it's checked out previously, but it's still going to end up dynamically generating diffs between whatever versions each client has and the latest; this likely gets worse if your clients aren't tracking HEAD.

Also note that a custom solution can likely get away with a single tag file detailing the latest patches; rsync and svn are going to be scanning their directory trees religiously. Both you and your users will probably appreciate a single GET to a small file on a webserver than a load of CPU use and disk thrashing.

Re:How about bsdiff/patch and some scripts? (1)

cperciva (102828) | more than 8 years ago | (#13581612)

Yes, this might be the best approach; but it's hard to say without knowing more details.

I think the right solution for the submitter is "talk to someone with experience in this area" -- ideally, me. I'm no longer looking for a job, but I'd still be happy to hear details about a problem and offer my opinion on how best to attack it.

CVS (2, Insightful)

alexpach (807980) | more than 8 years ago | (#13580871)

I have been using CVS to manage many different websites and/or projects on various servers. It doesn't store more then it needs (just the CVS folders) and it add, updates, patches and removes the files according to your repository.

Additionally you can use branches and sticky tags to keep track of files that don't need to be updated, or files that vary from client to client.

It is also easy to trigger and update over ssh or cron.

One downside compared to SVN is the lack of a binary diff mechanism, but I have been able to get by fine without it managing projects up to a GB in size.

Alex

Re:CVS (1)

matheny (450016) | more than 8 years ago | (#13582583)

CVS updates are not atomic, unlike subversion. If integrity of data is important to your customers, don't consider CVS. As far as using Subversion is concerned, I would be wary of giving customers that type of access to my systems.

Re:CVS (1)

angel'o'sphere (80593) | more than 8 years ago | (#13585842)

CVS lacks in our eyes easy access via HTTP and by that easy circumvention of firewalls on the client site.

Second drawback is user management on the server.

Regarding binaries, CVS might not be able to merge binaries, and probably its default configuration does not even DIFF them, but: it can do binary diffs!

Also, we can't work without diffs, if everything would fail us, we likely would diff the big files manualy and distribute them as "new release" of a patch file.

angel'o'sphere

Re:CVS (1)

NuShrike (561140) | more than 8 years ago | (#13588195)

CVS is better than SVN here because SVN lacks the 'obliterate', or 'admin -o' ability that Perforce and CVS have.

This is important because you DON'T need to be storing 100 large revisions of your software release in the repo with no way to ever remove it.

Of course CVS sucks when tagging a huge repo, and removing releases is a PITA, but you got no such options in SVN.

Disk Accesses (2, Informative)

Anonymous Coward | more than 8 years ago | (#13581219)

My largest worry, from my calculations, is disk access on the Subversion server.

Put enough ram in your server, and the changed portion will likely fit in cache. If that's not an option, use RAID to speed up disk accesses.

Others have mentioned rsync. You might also consider xdelta.

Disk I/O (2, Insightful)

pete-classic (75983) | more than 8 years ago | (#13581580)

Let's see. You have a ceiling of 2.01GB worth of updates. You have disk I/O problems.

Your problem is either that you don't have enough RAM in the system, or you have an OS that doesn't do a rational job of caching disk.

Or both.

-Peter

perhaps (2, Informative)

/dev/trash (182850) | more than 8 years ago | (#13581710)

rdiff-backup

cfengine (1, Informative)

Anonymous Coward | more than 8 years ago | (#13584083)

First of all, it's obvious you are not using enough RAM on the servers. Get 8 GB. Don't do the balancing with Apache. If you are using Linux, resort to IPVS instead. For the large database files you'll want to use rsync. After the transfer, though, most likely you'll still need to perform the actual update. That's where cfengine comes in. You set it up to run rsync every N hours, then perform operations (restarting programs, cleaning up, whatever) when there's new data. You can also use it to restart dead istances of your application, etc.

Some clarifications, especially about rsync (2, Informative)

angel'o'sphere (80593) | more than 8 years ago | (#13584157)

First of all, thanks for so many replies!

First I like to clarify a bit, probably my original question was not clear enough!

The clients of the system are customers. They have Windows PCs as the software runs on windows. On the server side we need to be able to authenticate every client as there are several region and user level restrictions about who may access which file.

You can assume there are simply 5 to 10 user levels, where a user on level 10 may access everything and a user on level 5 only a subset.

So far SVN looks good:

* authentication via the Apache front end, probably via a LDAP server

* structuring the "download area" into directories with user level appropriated content

Regarding, rsync:

* first off all, I did not know about it :D

* my first investigation indicates several draw backs

It seems not to run on Windows (without Cygwin), users need to be unix/linux users on the server, building a distribution seems "more complicated" than making a tag/version with SVN.

Please consider: from the point of view of the service provider the system is just the same like hosting a hugh pile of sourcecode. The starting distribution probably has 3000 files and is about 2.5 GB big.

The users need to have the ability to fall back on a later revision in case of errors during distribution.

Users need to be able to upgrade to the latest HEAD (there is only one main thrunk anyway).

Regarding performance of SVN, yes we are clear we need to put a lot of RAM into the servers. But we cant get rid of the disk IO it seems as SVN does not cash requests (in this case all clients allways want the same release to upgrade to, and most of the time they either have the previous or the second oldest release installed)

However: alternatives to SVN are very welcome! I only wanted to make clear why we considered DVN in the first place.

angel'o'sphere

Re:Some clarifications, especially about rsync (2, Interesting)

NuShrike (561140) | more than 8 years ago | (#13588225)

Here's a combination of available strategies:

o DON'T use SVN (imo)
o check out your latest rev to a staging 'folder'
o rename your previous release 'folder' to backup name
o rsync the data from your staging 'folder' to all your clients one by one.

If you have issues with the release, just roll back to the previous release 'folder'.

There other thought is to use rsync a .torrent file and use something like bittornado to distribute from your 'staging' folder.

All this should let you get by with a 1GB or less ram master file server, and crappy i/o too.

You figure out a security-scheme to wrap around this.

Re:Some clarifications, especially about rsync (1)

angel'o'sphere (80593) | more than 8 years ago | (#13589684)

I can't rsync to my clients.
If at all, the clients can rsync from me, and as rsync does not run natively on windows, we can't rely on rsync, imho.

Strange, did I use the wrogn term? No one of you has a program that has an automated check for updates from vendor option?

Thats what we want to do. A client, over the internet, not via LAN, has to be able to use HTTP!!! and needs to be athenticated and it's pull and not push distribution.

A bit torrent is completely out of option as we have several different access rights and most customers have a firewall which very likely blocks torrents.

But probably we could figure, like you suggest, a security sheme around this :D

angel'o'sphere

rsync on Windows (1)

Kaseijin (766041) | more than 8 years ago | (#13593249)

If at all, the clients can rsync from me, and as rsync does not run natively on windows, we can't rely on rsync, imho.
All one needs to run a Cygwin binary in general is the cygwin1.dll library. rsync in particular requires cygpopt-0.dll from the libpopt0 package. It can be daemonized with srvany.exe and instsrv.exe from the Windows 2003 Resource Kit [microsoft.com] . You might have to adjust the timestamp window to account for client time zones or the two-second resolution of FAT32, but it doesn't require exceptional wizardry.

Re:Some clarifications, especially about rsync (3, Insightful)

jrockway (229604) | more than 8 years ago | (#13590488)

> Regarding performance of SVN, yes we are clear we need to put a lot of RAM into the servers. But we cant get rid of the disk IO it seems as SVN does not cash requests (in this case all clients allways want the same release to upgrade to, and most of the time they either have the previous or the second oldest release installed)

Subversion doesn't need to cache requests -- the OS* does this itself. With plenty of RAM, whatever isn't being used by processes is used for cache. If you don't trust the disk caching algorithm, just make a 2.5G ramdisk and copy your files over to that when you want to release them. Then the disk won't be a problem.

* Assuming you're using a Real OS, and not Windows. Don't use Windows for anything that requires speed or reliability.

Re:Some clarifications, especially about rsync (0)

Anonymous Coward | more than 8 years ago | (#13592344)

You may want to look into CVSup. It is used in FreeBSD for users to download the source and ports trees, which many, many users do on a regular basis. It is very efficient (it uses the rsync algorithm internally), and it understands CVS tags, so it would solve the multiple revision problem for you.

The downside for you is that it has not been ported to Windows. You may be able to get it to compile using cygwin (keep in mind that a user does NOT have to have cygwin installed to run a cygwin program -- you just have to distribute the cygwin DLL file).

Re:Some clarifications, especially about rsync (3, Informative)

eklitzke (873155) | more than 8 years ago | (#13594072)

You may be interested in the Unison project. More info can be found here: http://www.cis.upenn.edu/~bcpierce/unison/ [upenn.edu]

Consider CFEngine (1)

garyebickford (222422) | more than 8 years ago | (#13608667)

A previous poster mentioned cfengine [cfengine.org] briefly. If I understand cfengine correctly, it may be just what you're looking for.

Also, if you're the sort who can/does go to conferences, the LISA '05 [usenix.org] conference (Dec. 4-9 2005) features several sessions on cfengine by Mark Burgess. (LISA is the "Large Installation System Administration Conference", put on by USENIX [usenix.org] and SAGE [sage.org] . There's also a conference BLOG [lisaconference.org] , and this is the link to the tech program info [usenix.org] .
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>