Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

On PHP and Scaling

CowboyNeal posted about 10 years ago | from the web-pages-served-fresh-all-day dept.

PHP 245

jpkunst writes "Chris Shiflett at oreillynet.com summarizes (with lots of links) a discussion about scalability, brought about by Friendster's move from Java to PHP. Chris argues that PHP scales well, because it fits into the Web's fundamental architecture. 'I think PHP scales well because Apache scales well because the Web scales well. PHP doesn't try to reinvent the wheel; it simply tries to fit into the existing paradigm, and this is the beauty of it.' (The article is also available on Chris' own website.)"

cancel ×

245 comments

Frist Post !!!!!!! (-1, Offtopic)

Anonymous Coward | about 10 years ago | (#9599548)

Poop. Gnaa

~~~ INTRODUCING "REAL TROLL TALK" (-1, Offtopic)

Anonymous Coward | about 10 years ago | (#9599569)


~~~ I promised someone that I would post here today to introduce Real Troll Talk [slashdot.org] .

~~~ It's a frequently-updated webzine featuring popular Internet trolling personalities revealing their most intimate thoughts and feelings.

~~~ Stop by today to read the first issue, featuring pb [slashdot.org], and the second issue, featuring the one and only TRoLLaXoR [slashdot.org].

~~~ (C)opyright Real Troll Talk [slashdot.org] 2004

Re: ~~~ INTRODUCING "REAL TROLL TALK" (-1, Offtopic)

Anonymous Coward | about 10 years ago | (#9599708)

Read it. Very gay. Get a life, loser.

A few things that could lead to scalability (5, Interesting)

Dozix007 (690662) | about 10 years ago | (#9599558)

PHP inherntely will not lead to scalability, however, if you ever try to create any applications that use a DFS-type algorithm, it can happen. PHP (I know it is web-based, shouldn't ask too much) does not allow for extremely simple soloutions in DFS type algorithms that are apparent to most users. Many will end up with too many "while()" statements and bring down script efficency exponetialy.

Re:A few things that could lead to scalability (1, Informative)

Dozix007 (690662) | about 10 years ago | (#9599565)

*Will Inherently Lead to Scalability* (Damn, can't type this early)

Re:A few things that could lead to scalability (3, Insightful)

julesh (229690) | about 10 years ago | (#9599644)

Sorry, your abbreviations are confusing me. DFS? I know Disk File System and Distributed File System, but neither of those seem a good fit for what you're talking about. So... what are you talking about?

Re:A few things that could lead to scalability (5, Interesting)

Dozix007 (690662) | about 10 years ago | (#9599655)

Depth-First-Search. You can use PHP to create a simple search engine by using arrays, fopen, fread, and while() loops. If done improperly, you can eventually loop your script into oblivion creating big time inefficency.

Re:A few things that could lead to scalability (2, Informative)

Anonymous Coward | about 10 years ago | (#9599687)

You're not thinking in a PHP architecture.... thinking Java style J2EE does not apply to using PHP.

What is a PHP "server"... it is the combination of Apache and PHP and a request being served. Since the web is stateless with simple session IDs tying things together it's not really necessary to share memory or resources between requests... hence Rasmus Lerdorf's "share nothing architecture."

It doesn't make sense do an olympic-sized web crawling script, and certainly not invoke it in the time of a web request. It makes more sense to write a script that is spawned by cron, with probably multiple instances that divy up the task of doing the search and creating the index.

Re:A few things that could lead to scalability (3, Interesting)

Dozix007 (690662) | about 10 years ago | (#9599892)

Actually... a web crawling script is quite small. I am not thinking in a Java mindset, but a CS one. Basic CS theory and knowledge can be applied most anywhere. PHP search scripts are quite usefull for internal site search, or a small network of sites. I also think that many should stop downing PHP as an unavaiable possibility for large projects. It is possible, you just need to be dynamic and well organized when doing so. A well coded site can work quite well, you just need to know what you are doing.

Re:A few things that could lead to scalability (2, Interesting)

bigattichouse (527527) | about 10 years ago | (#9599690)

Why not try term vector space? I've recently completed a PHP/Mysql term vector space engine (see website).

Re:dfs style algorithm! (0)

Anonymous Coward | about 10 years ago | (#9600023)

Double prices
Wait for Bank Holiday
Halve prices
Promote MASSIVE SALE!!!!!!
Profit

(sorry www.dfsonline.co.uk :)

~~~ INTRODUCING "REAL TROLL TALK" (-1, Troll)

Anonymous Coward | about 10 years ago | (#9599562)


~~~ I promised someone that I would post here today to introduce Real Troll Talk [slashdot.org] .

~~~ It's a frequently-updated webzine featuring popular Internet trolling personalities revealing their most intimate thoughts and feelings.

~~~ Stop by today to read the first issue, featuring pb [slashdot.org], and the second issue, featuring the one and only TRoLLaXoR [slashdot.org].

~~~ (C)opyright Real Troll Talk [slashdot.org] 2004

Gah, no! (4, Funny)

DrEldarion (114072) | about 10 years ago | (#9599563)

it simply tries to fit into the existing paradigm

Allright, he used the word "paradigm", that makes his opinion automatically invalid.

Re:Gah, no! (5, Funny)

Prowl (554277) | about 10 years ago | (#9599581)

but he didn't use the phrase "paradigm shift", so we could give him the benefit of the doubt

Re:Gah, no! (0)

Anonymous Coward | about 10 years ago | (#9599869)

paradigm.. paridigmish? I know what that means.. I swear! I used it in high school.

nous paradigmons
vous paradigmez
ils paradigment

see? SEE?!

Re:Gah, no! (1)

azuretek (708981) | about 10 years ago | (#9600002)

must be a typo

Re:Gah, no! (1)

beebware (149208) | about 10 years ago | (#9600066)

But, basically, at the end of the day - we do need to achieve synergy to ensure that long term goals are made.

Author seems to live in a vacuum (5, Insightful)

Michalson (638911) | about 10 years ago | (#9599575)

The only real argument I could really find was "Java doesn't do X well, therefore PHP must be great". The author seems to live in a universe with only two choices, his straw man Java, and his favorite web language, PHP. When he does try and argue PHP's merits on its own, it seems to collapse into a PHP is good because its good argument. I don't see any part of the article addressing how PHP can benefit the developer facing real issues of large scale web development (such as the need for caching systems on high volume websites, or the maintence challenge of larger code bases on complex sites). While good arguments may exist for PHP, they just don't seem to be here.

Re:Author seems to live in a vacuum (5, Informative)

lamz (60321) | about 10 years ago | (#9599625)

I don't see any part of the article addressing how PHP can benefit the developer facing real issues of large scale web development (such as the need for caching systems on high volume websites, or the maintence challenge of larger code bases on complex sites).

The article doesn't mention it, but Smarty [php.net] is an excellent PHP library that implements, among other things, caching. I have used it extensively with excellent results.

Re:Author seems to live in a vacuum (1)

RAMMS+EIN (578166) | about 10 years ago | (#9599997)

"Java doesn't do X well, therefore PHP must be great"

But then, how can you conclude a language scales well, other than by comparing it? Java is supposedly used by so many sites that it can be used as a measuring standard. If PHP scales better, it must be great. I think that sounds quite reasonable.

For the record, I like neither language that much. I use PHP every day, but I would rather be using Python or LISP.

Re:Author seems to live in a vacuum (1)

smitty45 (657682) | about 10 years ago | (#9600003)

Did you read all of the authors opinions ? I see many good reasons for right tool for the right job listed here, and Friendster is obviously one of them.

Re:Author seems to live in a vacuum (1)

Bluelive (608914) | about 10 years ago | (#9600162)

Its worse than that, Because in their implementation java did share information on the server between connections and the php implementation did this less, the php version could be cloned over multiple servers. They could have done the same with the java implementation.

PHP scales down, too (4, Insightful)

DavidNWelton (142216) | about 10 years ago | (#9599582)

Perhaps it's not mentioned very often because it's obvious, but I think it's an advantage for systems like PHP, or Rivet [apache.org] that they scale down very well.

What does this mean? That they don't consume too much in the way of resources, and are very easy to get started with. This puts a dynamic web site within reach of more people, which is a good thing, even if inevitably some of them will, yes, write crappy code. It is another example of the "worse is better" philosophy.

I just wish they had used Tcl or something else already out there instead of creating a language that in and of itself is nothing very exciting, and has been a bit slow.

Not always a good thing... (2, Insightful)

Dozix007 (690662) | about 10 years ago | (#9599628)

It is not a good thing that there is a short learning curve on PHP. While it does put the ability for dynamic webcontent at the fingers of most users, it also creates a crapflood of insecure sites. Not to mention when a user may get into more advanced PHP programming and know nothing of basic CS (I know, not a big CS language, but some things must be known). Inefficent scripts will bog down sites, improper loops and insecurity can wreak havok on a network. I have recieved several emails in relation to a PHP security project [slashdot.org] that I run from university admins who have difficulty with insecure PHP coders and allowing them to have access to PHP servers and SQL databases that others use.

Re:Not always a good thing... (1, Troll)

aled (228417) | about 10 years ago | (#9599652)

Quick and dirty; PHP is the VB of XXI century ;-)

Re:PHP scales down, too (-1, Offtopic)

Anonymous Coward | about 10 years ago | (#9599734)

george bush let 9/11 happen so the U.S could go to war on israel's behalf.

the anthrax attacks were designed to scare the bejesus out of the lawmakers so the USAPATRIOT act would get passed without even being read.

the Project for a new american century, who's members include most of Bush's (jewish) high cabal advocate the US's development of genotype-specific biological weapons.

Arabs are NOT the enemy. your goverment wants you to think they are.

PS:

BOO!

Re:PHP scales down, too (3, Funny)

Anonymous Coward | about 10 years ago | (#9599749)

You're right.

however, if you wrote the same thing about Visual Basic / ASP, you would have been modded a troll.

Re:PHP scales down, too (0)

Anonymous Coward | about 10 years ago | (#9600042)

I see your ASP and rise by an IIS worm. Visual Basic just sucks.

Another article (4, Informative)

Anonymous Coward | about 10 years ago | (#9599585)

Here's an article from Jack Herrington on PHP's scalability.

http://www.onjava.com/pub/a/onjava/2003/10/15/ph p_ scalability.html

jsp is a bad idea, but Java is not (5, Informative)

ahmetaa (519568) | about 10 years ago | (#9599588)

if someone wants to produce a high performance web site in Java, jsp is a bad choice. use Velocity - pure java objects - a decent DB abstraction mechanism (Hibernate, iBatis). . Plus, i used php, ok, it is easy to use and can be preferred small to medium size web sites. but call me biased, it is nowhere near the elegance of java.

Re:jsp is a bad idea, but Java is not (0)

Anonymous Coward | about 10 years ago | (#9599646)

You're biased.

Re:jsp is a bad idea, but Java is not (2, Insightful)

Anonymous Coward | about 10 years ago | (#9599661)

You can do ugly things using any languaje/patform it depends on the programmer

Re:jsp is a bad idea, but Java is not (1)

julesh (229690) | about 10 years ago | (#9599672)

Correct me if I'm wrong, but isn't JSP just a simplified syntax for creating servlets by embedding Java code withing an HTML document?

If so, why should this cause performance problems?

(As an aside, I've run a JSP server in the past on a 100MHz pentium, and after the first use of each page performance was OK, so I'm not sure what the big problem is...)

Re:jsp is a bad idea, but Java is not (2, Informative)

Decaff (42676) | about 10 years ago | (#9599711)

The problems with JSP are to do with writing maintainable code, not speed. There is a principle of software development that suggests that it is a bad idea to embed software logic in presentation code, as this does not allow for easy modification. If you support this principle, JSP (and some ways of using PHP) are not a good idea. However, JSP is not slow: the JSP pages are translated into Java Servlet source code and then compiled. This can result is very fast websites.

Re:jsp is a bad idea, but Java is not (2, Informative)

javab0y (708376) | about 10 years ago | (#9599739)

No...you are not correct. Your point about JSPs is only this is only true under a model 1 implementation/design. A model 2 implementation (where your business logic is done in Java objects) utilizes JSPs exactly like Velocity...only as a template. See Struts and other MVC frameworks that embrace a model 2 implmentation of JSPs.

Re:jsp is a bad idea, but Java is not (1)

Decaff (42676) | about 10 years ago | (#9599847)

No...you are not correct. Your point about JSPs is only this is only true under a model 1 implementation/design.

You are right - I was talking only about model 1. JSPs can be used very effectively in MVC frameworks.

What I was trying to describe was JSP use when Java code is embedded, rather than tag libraries.

Re:jsp is a bad idea, but Java is not (3, Informative)

mabinogi (74033) | about 10 years ago | (#9599808)

JSP on it's OWN is a bad idea.

just as Velocity on it's own would be a bad idea.
Write your buisness logic in plain java, use servlets to manage the flow of control, and to call your java API to create value objects (beans) to place in the request, and then use JSP to format the data.

You only run in to problems if you try to do everything with JSP, which is always a bad idea, just as it's always a bad idea.

and JSP 2.0 is even better with the JSTL expression language built in.

PHP frontend and Java Backend (2, Informative)

proudlyindian (781206) | about 10 years ago | (#9600083)

JSR 223: Scripting Pages in JavaTM Web Applications
The specification will describe mechanisms allowing scripting language programs to access information developed in the Java Platform and allowing scripting language pages to be used in Java Server-side Applications. JSR 223 [jcp.org]

What's Really Going On Here... (4, Interesting)

TheNarrator (200498) | about 10 years ago | (#9599600)

I've seen a friendster stack trace before, when the app was running slow at 5 am. For those of you who don't know what this is, it's when Java runs into an error and tells you were your program died. It was really funny. Basically there was a servlet and a call to Database.java and on line 8000 of database.java they were calling mysql directly. Real nice architecture, NOT!

Re:What's Really Going On Here... (1)

Morgahastu (522162) | about 10 years ago | (#9599616)

And?

Re:What's Really Going On Here... (1)

aled (228417) | about 10 years ago | (#9599645)

Only proves that:
-friendster programmers don't know how to catch an error in Java, something that Java has plenty ways to do.
-is easy to find where the error is in Java. I've seen lot's of "Warning: MySQL Connection Failed: Unknown MySQL error in /www/something.php on line nn" so it is a similar thing with PHP. The stack trace makes usually easier to the programmer to find the problem but it should not be shown to the user.

Re:What's Really Going On Here... (1)

fimbulvetr (598306) | about 10 years ago | (#9599960)

Hey, pay attention.

Warning: MySQL Connection Failed: Unknown MySQL error in /www/something.php on line nn

That's a mysql error. I've worked with php since early 3.x, and I've NEVER seen an unknown error.

Re:What's Really Going On Here... (1)

aled (228417) | about 10 years ago | (#9600094)

I just found 15000 hits [google.com] in Google of "MySQL Connection Failed: Unknown MySQL". Many of those are down sites. Perhaps you are doing what they should do: catch errors.

Re:What's Really Going On Here... (2, Interesting)

higginsm2000 (242840) | about 10 years ago | (#9600053)

It also means that there is a database access class with 8000 lines which is a scary thought...

Re:What's Really Going On Here... (2, Informative)

julesh (229690) | about 10 years ago | (#9599659)

What do you mean "calling mysql directly"? I can assure you that isn't actually possible in Java. MySQL is a C application, Java can't call C code without some kind of intermediate layer.

Also, what's "Database.java" -- if it's part of the MySQL/Java interface layer, this would be perfectly appropriate behaviour.

Re:What's Really Going On Here... (0)

Anonymous Coward | about 10 years ago | (#9599683)

Basically there was a servlet and a call to Database.java and on line 8000 of database.java they were calling mysql directly

Does this mean they were invoking the mysql executable via system() or exec() or similar? Or are you saying that each page view hits the database whether it needs to or not, i.e. are you saying that there is no layer to cache commonly used info? Both are bad, but not caching is probably not too terrible a plan if mysql is fast enough. (It should do some caching of its own, although not at the speed you'd get if you cached within the application.)

Re:What's Really Going On Here... (0)

Anonymous Coward | about 10 years ago | (#9599686)

Oh no, they were calling mySQL directly! Who cares? They probably never intended to port to any other database?

If they had a "DB abstraction layer" the thing would've been even slower.

Re:What's Really Going On Here... (1)

Decaff (42676) | about 10 years ago | (#9599728)

If they had a "DB abstraction layer" the thing would've been even slower.

No, as such abstraction layers can cache results, store compiled queries etc.

A few more method calls between your code and the database is going to make no difference whatsoever on modern hardware.

Re:What's Really Going On Here... (1)

julesh (229690) | about 10 years ago | (#9599750)

As far as I'm aware, Java enforces the use of a database abstraction layer. It's called 'JDBC' and is in the package java.sql. If they're using JDBC then switching to another database server would be a matter of changing the contents of two strings (and ensuring compatibility of their data schemas and queries, but that's a problem however you do it).

I'm not aware of any other way to access MySQL from a Java application.

Re:What's Really Going On Here... (1)

cygnus (17101) | about 10 years ago | (#9599765)

/me slaps forehead. no, that was the MYSQL JDBC driver you saw in the stack trace. JDBC is a database-agnostic plugin architecture for communicating with databases that abstracts away a lot of the stuff that makes talking to a database a "to the metal" task. it's like pear_db, only you don't have to hope that your ISP included it when it was compiling PHP. you just plop some JARs in your web application and chug.

Line 8000 - wtf (1)

panurge (573432) | about 10 years ago | (#9599775)

No-one seems to have picked this up, but it says it all. If any of your support classes has a Line 8000, especially one at which an exception can arise that is not being caught, you may need to go back and do a few basic classes in software design.

Confession time: the worst Swing based class I have ever committed has about 4000 lines, but about 2/3 of that is Swing.

Re:What's Really Going On Here... (1)

TheNarrator (200498) | about 10 years ago | (#9599974)

Just to clarify, IMHO their architecture appeared to be jsp model 1 [oracle.com] architecture which IMHO is not a very performance oriented architecture. They could have at least used jsp model 2 and used various caching layers for business objects,etc.

Sorry buddy... (1, Insightful)

Anonymous Coward | about 10 years ago | (#9599605)

... but scaleable enterprise systems just AREN'T written in PHP. It's a great language, and I can see where it has a niche, but it doesn't offer the same kind of power over distributed objects/systems that Java does. It's like comparing MySQL to Oracle for enterprise systems.

Re:Sorry buddy... (1)

julesh (229690) | about 10 years ago | (#9599727)

That depends entirely on the complexity of the system you're trying to implement. Scalability has little to do with that. A very frequently used but simple application is probably better implemented in PHP than Java.

Re:Sorry buddy... (2, Insightful)

Zefram (49209) | about 10 years ago | (#9600098)

The mistake you're making is to think that the language is going to magically fix all sorts of problems, and without this magic you're up shit's creek.

JavaBeans are great in that they're an architecture to communicate through multiple levels and allow for separate tiers. But to think that the same thing can't be done in PHP is foolish. PHP is about keeping the language simple only giving the developer the tools he needs to get work done; making easy things easy, and hard things easier.

I've written a system (propreitary, sorry) that has a complete separation among the 3 (or more) tiers, that allows retrieval of remote objects and combining that with local objects. It allows a user's session to be shared amongst a round-robin server farm, abstracted data access, and my very own templating system.

The language is the lesser issues: it's the developers working on a piece of software and the design of the system that's important.

Zef

Definition of Scalable (4, Insightful)

Morgahastu (522162) | about 10 years ago | (#9599608)

I think the term is subjectable depending on the context in which it's used. Scalalable does have many definitions but I don't think that they are all wrong except for one.

His definition suits him well but it might not be helpful for me.

I might use scalable just to say that an application can easily (with little or no modification) handle 100x more users. This doesn't necessarily mean that the difference in system load varies a minimal specific amount per each extra request. All that matters is that it will work with higher demand. Who cares how or why.

I think scalable can also mean that an app can handle 10,000 users when hosted on a single machine but when put on a cluster of computers it can handle exponentially more users. To me that is a scalable application.

Scalable has no set definition in the contexts of applications.

Agree on defination first (4, Interesting)

bangular (736791) | about 10 years ago | (#9599689)

The term "scalable" has become an industry buzzword. It is fruitless to argue whether something is scalable or not if there is no clear defination. It's like arguing whether you believe in freedom or not. Of course most people in the world will say they believe in freedom, but if you ask 100 people to define it you will get 100 different answers (the Bush administration has had a field day with this because the minute you oppose them, they accuse you of not believing in freedom; their defination of course).

It is impossible to say php is or is not scalable unless a defination can be agreed on. And with "scalable's" current buzzword status, I don't see that happening very soon.

Re:Definition of Scalable (1)

iamdrscience (541136) | about 10 years ago | (#9599823)

This is just me being a grammar nazi, but you use the word "subjectable" in the first sentence of your post, that's not a word. You're thinking of the word "subjective".

Re:Definition of Scalable (1)

Morgahastu (522162) | about 10 years ago | (#9599977)

Oops.

scalability is a dead issue (5, Insightful)

jenkin sear (28765) | about 10 years ago | (#9599611)

Scalability is rarely that much of an issue- any halfway decent architecture (php, java, even .net) will let you scale horizontally- and Moore's law will take care of any performance problems in time.

My big issue with PHP is maintainability- I see it (perhaps incorrectly) as a glorified templating language, which places it on the same evolutionary track as ASP and cold fusion; developers will tend to munge sql calls into the templates, blow off any MVC separation, and get a system that is very hard to keep going for more than a few revisions.

Re:scalability is a dead issue (3, Insightful)

Anonymous Coward | about 10 years ago | (#9599639)

you are correct that you are incorrect.... if anything, developers are moving towards MVC, like Mojavi [mojavi.org] - probably PHP's best MVC framework because it doesn't try to port struts to PHP, it writes a very flexible framework using PHP the way it was meant to be used.

Also, maintainability is not a feature of a language, it's the organization practices of the developer. Java developers are used to throwing files wherever, doing import statements wherever, and once its compiled, it's organized! Well, you have to organize your files a little bit better in PHP for higher performing code. But hey, if you're sloppy then that's your fault, not PHP's fault it's just one of the aspect of a scripting language just like waiting around for compiling is an aspect of a compiled language.

Re:scalability is a dead issue (4, Interesting)

julesh (229690) | about 10 years ago | (#9599694)

developers will tend to munge sql calls into the templates, blow off any MVC separation, and get a system that is very hard to keep going for more than a few revisions.

Yes, that is tempting. But, conversely, it's a very useful capability for small projects. For larger projects, you just need to ensure you have the discipline not to use the capabilities.

For instance, here [covcen.org.uk] is a site I developed in PHP using a strict model-view separation. There is direct linkage between view and controller and controller and model -- I couldn't be bothered to sort that out for a project of limited size like that one. In a larger project, I'd probably devise some kind of mechanism for that.

You can write unmaintanable code in any language you choose. Discipline is the key.

Re:scalability is a dead issue (1, Interesting)

Anonymous Coward | about 10 years ago | (#9599731)

My main issue with PHP scalability is the lack of a global context for app-level caching.

Sure you can toss more database servers at the task, but a little caching would often (app dependant, of course) give a significantly more efficient solution.

Re:scalability is a dead issue (2, Informative)

julesh (229690) | about 10 years ago | (#9599925)

My main issue with PHP scalability is the lack of a global context for app-level caching.

http://www.php.net/manual/en/ref.sem.php [php.net] -- system V shared memory. See specifically the functions shm_put_var() and shm_get_var().

Re:scalability is a dead issue (1)

iamdrscience (541136) | about 10 years ago | (#9599858)

Any developer tackling a serious project will soon realize the same "problems" with maintainability you have, but fortunately, there are solutions to all of them. First and foremost, is the Smarty templating engine. Then there are the PEAR classes, PECL extensions and if you want to get all CS-ey, there are the OO features of PHP which are expanded and refined in PHP5 (and unlike Perl 6, PHP5 will actually be the release version in the very near future).

If PHP were as unmaintainable as it seems to all of you people that are unfamiliar with it then people wouldn't use it, but thousands of professional websites (Yahoo, Friendster, etc.) DO use it, and that alone speaks volumes about how competitive it is with Java, Perl, Python, etc.

Moore's Law won't scale O(n^3) or similar problems (0)

Anonymous Coward | about 10 years ago | (#9600021)

Unless your time scales are geologic.

In other words, scalability of an architecture will always be a factor for complex problems.

Who know what scales? (3, Interesting)

Lolaine (262966) | about 10 years ago | (#9599615)

First of all; Everytime I see the term "Scalation", the narrator writes as If scalation was only a term for "bigger". We have to think not only of being bigger, but being smaller.

PHP has a wide support for many RDBMS, APIs and Operating Systems, but it is only a Language. A language doesn't scale, it's the platform that scales.

That's why I see the PHP/Apache/Unix to scale far better than (for example) ASP/IIS/NT: The first platform can run from a PDA to a high-perfomance Minicomputer; The second can run from an I686 (pentium support was removed?) to the best PC-Architecture based computer you can buy. That's the difference: A wide option platform versus a closed option platform.

Probably, the first platform will have perfomance leaks and will not take every perfomance point from the machine it runs within, but its scalability potential resides that it can run in whatever you throw it at. Maybe J2EE or other platforms will run faster on the same hardware than PHP, but PHP will scale there and will be looking shoulder to shoulder to it.

That's why I don't like to valuate Scalability from the "speed" point of view, but the "where it runs" point of view.

Re:Who know what scales? (1, Redundant)

julesh (229690) | about 10 years ago | (#9599700)

pentium support was removed?

I'm currently running PHP 4.2, a recent Apache 1.3 and Linux 2.4 on a Pentium 100MMX without difficulty.

Re:Who know what scales? (1)

julesh (229690) | about 10 years ago | (#9599733)

Ignore me. I misread the GP post.

Re:Who know what scales? (1)

badriram (699489) | about 10 years ago | (#9600186)

Maybe you should read a bit more about the mono project and ASP.NET. Because right now, I am running a bunch of ASP.NET/Apache2/UNIX.
There is also SunOne ASP which would let you run ASP on Apache/UNIX based systems.

Maintaining State (1)

goul (41924) | about 10 years ago | (#9599622)

From the article
"Scalability is gained by using a shared-nothing architecture where you can scale horizontally infinitely. A typical Java application will make use of the fact that it is running under a JVM in which you can store session and state data very easily and you can effectively write a web application very much the same way you would write a desktop application. This is very convenient, but it doesn't scale. "

Storing and more importantly trying to replicate stored state via sessions in Java can be expensive, but saying Java scales badly because it makes it easy to do things that don't scale well is a poor argument. I don't know enough about the merits of PHP to comment on how it deals with this issue, but when you've done lots of server side Java programming you learn to be very judicious in the use of Session scope.

Re:Maintaining State (1)

julesh (229690) | about 10 years ago | (#9599716)

In fact, PHP sessions work in almost exactly the same way as a Java servlet session; they store variables in a memory block that is shared between all the PHP processes (as opposed to one that is shared between threads for Java). You can use disk or a database for storage, if you want, but exactly the same options ought to be achievable with Java, surely?

Re:Maintaining State (1)

the eric conspiracy (20178) | about 10 years ago | (#9599972)

they store variables in a memory block that is shared between all the PHP processes

The problem with PHP in this area is that it doesn't have the flexibility Java does in scoping serverside objects. Java has page, request, session and application scopes. PHP's limitations make it difficult to implement MVC frameworks that are as powerful as say, Apache Struts.

Re:Maintaining State (3, Informative)

selkirk (175431) | about 10 years ago | (#9600020)

PHP sessions are NOT stored in a memory block shared between PHP processes. The default is to store session information in a file on disk. This is the point of the debate. "Shared nothing" means that there are no memory blocks shared between PHP processes.

As your application scales beyond one server, you then need to find a way to share your session between servers. This can be done in PHP via NFS with the default file based session driver (I think sourceforge does this), or with a database session driver.

If you had stored sessions in memory, then you would encounter problems with having to route requests based on session, or migrate to a method for sharing session data between machines.

Re:Maintaining State (1)

iamacat (583406) | about 10 years ago | (#9600109)

Storing and more importantly trying to replicate stored state via sessions in Java can be expensive

And why would you want to do such a thing? Just tie each session to a home server and redirect the user to s155.mysite.com when he/she logs in. For some tasks it's better to keep state in memory, for some database is the correct solution. I don't see how using a language with only one option is an advantage.

Yahoo. (5, Interesting)

downbad (793562) | about 10 years ago | (#9599630)

Yahoo is a prime example of PHP's scalability. Although they still use some legacy C code, nearly all of their new developments use PHP and BSD.

I worked in a small shop developing web apps, and while it wasn't mission critical stuff like banking, it wasn't exactly brainless "dump data from MySQL" stuff either. I was lucky that my boss wasn't picky about languages. But if anyone I work with doubts the power and simplicity of PHP, I usually bring up Yahoo.

IMHO, PHP rocks. It's suitable for pretty much any and all web development. It can be used for quick hacks, or you can code it like a pro with objects and stuff.

Re:Yahoo. (5, Informative)

Anonymous Coward | about 10 years ago | (#9599710)

Actually that's only partially true. Yahoo uses C/C++ for almost all backend development. PHP is used mostly for what it's good at: Simple web frontends that call on extensions written in C and C++ to do most of the heavy lifting, or access backend systems written in C/C++.

Yahoo is very much a C/C++ shop first and foremost - PHP is used as a template system (alongside several proprietary systems) to allow easy modification of high level behaviour.

Re:Yahoo. (1, Informative)

Anonymous Coward | about 10 years ago | (#9599924)

That's only partially true as well -- Yahoo uses Perl for tons of their backend stuff. But yes, PHP is only the finally delivery bit, not the actual applications at Yahoo.

Scalability and Maintainability go hand in hand (4, Insightful)

Christianfreak (100697) | about 10 years ago | (#9599658)

PHP's problem is that it quickly becomes unmaintainable in larger projects. That's why it doesn't scale, not because the platform isn't fast enough or Apache can/can't scale.

PHP will continue to have this problem until someone comes and tells the developers about a nifty invention called 'namespaces'

Some other things that could help: Standard templating for easier separation of design/content from code, a better module architecture that doesn't require me to recompile just to get some new functionality, some nice standard modules that go with that new architecture.

Of course if someone did all of that you'd have Perl and since we already have Perl, I'll stick with it.

Re:Scalability and Maintainability go hand in hand (1)

julesh (229690) | about 10 years ago | (#9599760)

PHP will continue to have this problem until someone comes and tells the developers about a nifty invention called 'namespaces'

Namespaces are handly, I'll agree, but I don't see them as a golden-bullet that are impossible to live without.

Let's face it, they don't actually achieve anything that a consistent naming strategy couldn't also achieve.

Re:Scalability and Maintainability go hand in hand (4, Informative)

iamdrscience (541136) | about 10 years ago | (#9599795)

You sound like somebody who didn't use PHP long enough. Large PHP projects become plenty maintainable once you start using handy stuff like the Smarty templating engine (which IIRC is included by default now). There are also a myriad of great PEAR classes and PECL extensions. As for a module architechture that doesn't require you to recompile, that would be nice, however, I would bet that most PHP programmers have never recompiled their installation or needed to do so. You're right though, it would be nice.

For the most part though, I would say that PHP is slightly better equipped for web development, just like Perl is better equipped for general scripting tasks... I'm a python man myself though ;-)

Re:Scalability and Maintainability go hand in hand (5, Insightful)

mrandre (530920) | about 10 years ago | (#9599840)

PHP only becomes unmaintainable if you don't know what you're doing, or if you don't plan well at the onset. The thing about PHP is that it doesn't force you to do anything, which means it doesn't force you to do anything the right way. This is not a fault. I wouldn't be a PHP developer today were it not for the ease with which I learned to write some very, very bad code. Of course, there's room to grow. The result is that the onus is on the developer, and not the language. So you're right, PHP doesn't scale. Not it's job. PHP provides the opportunity to scale, and the toolset, which are more than adequate, and improving over time.

This is particularly funny coming from a perl developer. Perl can become unmaintainable on a small project.

Re:Scalability and Maintainability go hand in hand (1)

sweede (563231) | about 10 years ago | (#9599952)

PHP has a module arch, if you need new functionality you compile that extension and load it in the php.ini or your script using dl().
Templates? use Smarty templates. www.php.net uses them and as said elsewhere, they're included in the php pear package.

If you are talking about basic/core php functionality, then what do you expect? maybe you should post to the kernel mailing list asking that all the new features/enhancements of the kernel be in a module so you dont need to recompile the kernel.

Friendster's No-Hacking Policy (0)

Anonymous Coward | about 10 years ago | (#9599667)

I'd like to point out this blog post [kottke.org] on Kottke:

Moore's buddy Matt Chisholm chimes in to tell me about a similar hack, a JavaScript app he wrote with Moore that works on Friendster. It mines for information about anyone who looks at his profile and clicks through to his Web site. "I get their user ID, email address, age, plus their full name. Neither their full name nor their email is ever supposed to be revealed," he says.

Notified of the security holes Moore and Chisholm exploit, Friendster rep Lisa Kopp insists, "We have a policy that we are not being hacked." When I explain that, policy or no, they are being hacked, she says, "Security isn't a priority for us. We're mostly focused on making the site go faster."

Implementing a site in PHP... (3, Insightful)

bigattichouse (527527) | about 10 years ago | (#9599701)

One of the great boons of PHP is the fact that you can build shell scripts with it. This allowed me to create a large distribution/inventory/control system in PHP, AND do all the back end processing in PHP as well. Sound inefficient, sure, but it works like a champ - plus any new programmers get to learn the system quite quickly due to consistency.

dotNet / Java (0)

Anonymous Coward | about 10 years ago | (#9599895)

Yha,

lord knows that Java or C# wouldn't allow you to write console or GUI apps.

Re:Implementing a site in PHP... (2, Informative)

C_Kode (102755) | about 10 years ago | (#9600151)

I don't think it's inefficient. I use it. I have an extensive CLI PHP scripting system setup that does it all. It connects to FTP systems downloaded data for updates, runs updates on several databases, generates plain text reports, csv (Excel type reports), and most of all combining it with crontabed called from others systems it allows me to share data between two systems that previously where unable to do so.

This also allows me to move code blocks between different platforms without issue. It also allows some of our beginning programmers to make changes and updates to this systems without having to know 5+ different languages. Most of them took C classes in school and the transition to PHP is fairly easy. We have a online documentation server (php/postgresql) that we also keep a list of no nos for programming in php so alot of those new to php don't make common mistakes. I have found php to be invaluable. Sure it's doesn't fit for every job in you come up with, but it makes system automation a snap.

Anyway, it's made my job much easier. Perl can do everything that CLI PHP can, but it's far less cryptic to those that are new to it which means far less training time and far less debugging on my part after someone new to the language drops syntactic money wrenches into our code or logical errors.

The reason (2, Insightful)

Diclophis (203740) | about 10 years ago | (#9599715)

HTTP URL Wrappers and file_get_contents and serialize, unserialize. With these functions alone you can recreate any CORBA SOAP XML-RPC type remoting. And remoting is good for for scalability because it lets you 'outsource' the workload to another machine. Truly N-Tier design (N>3).

in a nutshell: (0)

Anonymous Coward | about 10 years ago | (#9599723)

Java: Wack
PHP: Dope

What's the deal with nutshells? (0)

Anonymous Coward | about 10 years ago | (#9599830)

Are they dope, or are they whack?

Re:What's the deal with nutshells? (0)

Anonymous Coward | about 10 years ago | (#9599915)

nutshells are AAAieeemftmftptangwalaueueueueueoouuuia!!

rebuttal (4, Informative)

Anonymous Coward | about 10 years ago | (#9599724)

I will start with mandatory links to the great series of articles that Ace's Hardware ran, describing their server scenario and their migration from PHP to Java/J2EE:

  1. Building a Better Webserver [aceshardware.com]
  2. Building a Better Webserver in the 21st Century [aceshardware.com]
  3. Scaling Server Performance [aceshardware.com]

The PHP Scalability Myth starts of by defining three types of server architectures. The first, two-tier, and the last, logical-three-tier, are the same conceptually (there is the slight distinction between whether display and business logic code is "mingled", but this is typically not a performance issue, but just an aesthetic or design issue). This two-tier/logical-three-tier architecture is the only one PHP supports natively. The article then proceeds to compare a two-tier PHP architecture against the most elaborate full three-tier Java architecture, which is used rarely in practice, and extremely rarely in the same domain in which a PHP solution is feasible. Instead of comparing apples and oranges (if PHP supported a full three-tier architecture, I would imagine two-tier PHP vs. three-tier PHP would have the same performance discrepencies), let's simply compare the only architecture PHP supports natively, two-tier, against JSP talking directly to a database, as this scenario is the most analogous to the PHP one. Let's also discard any caching as again this is something that Java handily accomodates but is not natively (or at least easily) available in PHP due to lack of state. And let's assume the database is the largest bottleneck.

The article states:

At the time when the first versions of the JSP and EJB standards were released, the prevalent web server was (and still is) Apache 1.x, which had a process model that was not compatible with Java's threading model. This meant that a small stub was required on the web server side to communicate with the servlet engine. The remains a non-trivial performance overhead for those that decide to pay it, and was a significant performance overhead when the first scalability comparisons were made.
I'm not sure what "stub" the article is referring to, but I will assume it means an Apache module which talks a "native" protocol to the servlet engine. The first such module was mod_jserv, which could run the servlet engine both in-process and over a compact protocol called AJP (Apache Java Protocol), which represents essentially a pre-parsed HTTP requests. This module, as well as the AJP protocol itself has gone through severel revisions, from mod_jk, to mod_jk2. I cannot quite recall, but I think some version of mod_jk might have lost the ability to run in-process. Every other version, including the most current, can, if I recall correctly. This is besides the point, because as far as I know, AJP always has been a trivial performance overhead (I believe recent versions can run over Unix domain sockets). In fact, Apache is routinely used in production as the front-end web server, instead of the built-in servlet engine web server, simply because it is faster at serving static content, and that the AJP protocol is negligable. If the "stub" referred to in the quote is not the AJP module, then this may not be relevant, nevertheless AJP has always been highly efficient and typically negligable with regard to performance (the same typical connection min/max/idle count configurations apply as do to Apache itself).

The article goes on to proclaim the complexities of caching and data object persistence which we have eliminated from our comparision. Let's move on to the real bottleneck - the database. The article says "PHP's connectivity to the database consists of either a thin layer on top of the C data access functions, or a database abstraction layer called PEAR::DB. There is nothing to suggest that there is any PHP-specific database access performance penalty.". I will respond to this critical issue with an excerpt from one of the above Ace's articles:

Scaling with Larger Workloads Effectively

As you may have guessed by now, the key to our server performance does not lie in the physical hardware itself, but in the software. Specifically, our web application is designed from the ground up to serve even some of our most complex dynamic pages in milliseconds. To accomplish this, you have to isolate the bottlenecks in your software and eliminate them. Running complex SQL queries on every request, for instance, can be a severe limit to a server's capacity. Even simple queries can be a problem if the demand is high enough.

To take this example a little further, it's important to realize that the problem is not just the query, but all the steps required to even be able to run the query. This includes the time required to allocate a database connection, or create one if none are available from a connection pool, the time required to optimize and compile the query (unless it's a stored procedure), and the time required to generate the result and send it back to the application.

Aside from the query time, this added complexity also introduces some potential bottlenecks depending upon the situation. Since it is necessary to allocate a connection to communicate with the database, we have to consider how such connections are allocated in order to avoid possible limits on the number of concurrent connections and make sure the total number of open connections does not consume too much memory. Aside from this, there are also other query-dependent issues, such as table locks. If the application stalls under high load as a result of an exclusive table lock or a database connection limit, there may be serious consequences for performance.

Software Bottlenecks

In the webserver article I mentioned earlier, I described a situation where a server running a database-driven dynamic site on Apache 1.3.x and PHP could literally DoS itself when subjected to high loads (like those from Slashdot). In that situation, the database connections are poorly distributed amongst individual HTTP server processes. To improve performance, these connections can be made to persist between requests, but since HTTP is a stateless protocol and each individual process is unware of the others, the persistent database connection is permanently associated with that specific HTTP process and can only be used for requests handled by it.

This differs from traditional connection pooling where all open connections are pooled and shared between all threads/processes. In the case of a connection pool, a connection can be allocated from the pool when needed and then later returned so that it may be used by other processes.

Back to the Apache 1.3.x example, we have a large number of HTTP processes with open database connections that are very likely, at any given moment, not serving requests requiring interaction with the database. Consider the graph shown earlier, comparing requests for article pages with all other requests. If the server is serving a dynamic page with five images in it, then there will be five static requests for each dynamic one.

There's another specific issue with Apache 1.3.x regarding how keepalives are handled. This is a feature designed to improve performance by holding the connection to the client open following a request for a short time in the event that further requests are made. This makes sense, as individual pages often need multiple requests to be downloaded in the case of images and so forth. However, this also means that each HTTP process must wait idle for a certain period of time, perhaps 10 or 15 seconds, before accepting requests from a different client. When a server encounters a large number of concurrent requests from unique clients, this behavior can result in a large number of idle HTTP processes waiting to timeout.

The effect is cumulative. A machine serving dynamic database-driven pages is under high load, so it spawns more processes to handle all the concurrent requests. Since the persistent database connections in use by existing HTTP processes are not universally available, the newly created processes have to open more database connections, even if others are not currently in use. Additionally, since keepalives are enabled and there are a large number of unique clients, a significant portion of the HTTP process pool is idle and cannot serve new clients until they timeout. This means even more HTTP processes have to be created and even more database connections have to be opened. The real question in this situation is what resource will the server run out of first: connections or memory?

Again let's eliminate the issues of poor Apache scalability, and just assume PHP is behind a modern Apache 2.0 instance with a multithreaded worker-flavor MPM. We are still left with each page bringing up and tearing down its own connections EACH TIME the page is called. But there is a mitigation strategy for this right? Of course, it's called, persistent connections. Let's see what the PHP manual says about persistent connections:

Persistent connections are SQL links that do not close when the execution of your script ends.

...

Persistent connections are good if the overhead to create a link to your SQL server is high. Whether or not this overhead is really high depends on many factors. Like, what kind of database it is, whether or not it sits on the same computer on which your web server sits, how loaded the machine the SQL server sits on is and so forth. The bottom line is that if that connection overhead is high, persistent connections help you considerably. They cause the child process to simply connect only once for its entire lifespan, instead of every time it processes a page that requires connecting to the SQL server. This means that for every child that opened a persistent connection will have its own open persistent connection to the server. For example, if you had 20 different child processes that ran a script that made a persistent connection to your SQL server, you'd have 20 different connections to the SQL server, one from each child.

Note, however, that this can have some drawbacks if you are using a database with connection limits that are exceeded by persistent child connections. If your database has a limit of 16 simultaneous connections, and in the course of a busy server session, 17 child threads attempt to connect, one will not be able to. If there are bugs in your scripts which do not allow the connections to shut down (such as infinite loops), a database with only 32 connections may be rapidly swamped. Check your database documentation for information on handling abandoned or idle connections.

Ok, so now we have a bunch of "persistent" connections that hang around with the process. How long do they hang around? We don't know. What if two threads in the same process want to use a connection? Do they block, or is it up to the driver to determine transactional boundaries? Again we don't know, although it seems like the latter case. Anyway, have no fine-grained control over it. In the worst case, persistent connections make your problem much much worse, because now you have many more connections open to your database. In the best of cases, the database driver might actually expose configuration variables that allow you to tweak these options, and your performance might reach that of database connection pooling in Java. If you change databases it is up to you to rewrite your code or hope for equivalent settings for the new driver.

Re:rebuttal (4, Informative)

julesh (229690) | about 10 years ago | (#9599851)

This two-tier/logical-three-tier architecture is the only one PHP supports natively.

I'm not sure what you're on, but you can build however-many-tiers-you-like applications with PHP. In fact, PHP supports a number of technologies specificallly designed to communicate with additional tiers, including CORBA, JavaBeans and SOAP.

Let's also discard any caching as again this is something that Java handily accomodates but is not natively (or at least easily) available in PHP due to lack of state

PHP supports persistent state through shared memory blocks trivially. The implementation of data caching schemes that use this feature is not hard.

17 child threads attempt to connect, one will not be able to. If there are bugs in your scripts which do not allow the connections to shut down (such as infinite loops), a database with only 32 connections may be rapidly swamped

Why would you limit your database to serving fewer connections than you have limited your web server to?

PHP supports an option to kill runaway scripts and reclaim their resources after a time limit has elapsed, which handily prevents the infinite loop problems mentioned.

Ok, so now we have a bunch of "persistent" connections that hang around with the process. How long do they hang around?

Until the database closes them or the PHP server process is killed.

What if two threads in the same process want to use a connection?

The connection is locked from the moment a thread acquires it (using the *_pconnect function) until the script using it terminates.

In the worst case, persistent connections make your problem much much worse, because now you have many more connections open to your database.

What does an inactive open connection to the database cost? Not very much, in my experience.

Your arguments have a little merit, but please try to do your research before ranting about a system.

Re:rebuttal (0)

Anonymous Coward | about 10 years ago | (#9599866)

damn you haveto much time on your hands....

Let's find out. (1)

CordMeyer (452485) | about 10 years ago | (#9599771)

What is the largest, or most heavily used php driven site?

Re:Let's find out. (1)

freezin fat guy (713417) | about 10 years ago | (#9599942)

The busiest website in the world. As was already mentioned, Yahoo! runs on PHP. I would also hazard a guess that their code does not look quite like the kind of quick and dirty scripts most of us are guilty of firing off for cheap low-to-moderate load applications.

Regardless of platform, application design is just about the most critical factor in ensuring high load performance. If you want your PHP driven site to scale well to heavy loads you will wind up using a few concepts more familiar to the J2EE crowd.

Real world examples? (4, Interesting)

javab0y (708376) | about 10 years ago | (#9599778)

I think to settle this debate is a possible real-world example. Look at the story on the Jboss Nukes Project [onjava.com] . It explains the CPU utilization and speed of the PHP version and how moving to a J2EE implementation decreased the wait times dramatically.

Its difficult to argue with facts.

I can summarize it all (1, Insightful)

sfjoe (470510) | about 10 years ago | (#9600061)

1. PHP scales well.
2. Java scales well.
3. Friendster couldn't devlop a scalable J2EE application, so they switched to PHP.
4. WHat will Friendster switch to when they can't develop a scalable PHP application?

Re:I can summarize it all (1)

smitty45 (657682) | about 10 years ago | (#9600121)

seeing how they had pretty smart people writing the J2EE app they started with, I doubt #3. When you have the authors of "Tomcat: The Definitive Guide", and "Mastering Tomcat Development" writing your app, then I would assume at the very least that they are as good as any other Java team. I think people should consider the possibility that Friendster has/had scaling issues that other sites plainly don't have. All of their pages have to be dynamic, and I doubt that any of barely 1% would be cachable, besides images. I also think that people should consider the possibility that they don't have any idea why Java didn't work for Friendster, because frankly, they don't. And your app is only going to be as good as the stuff behind it.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...