Discussion:
[RevServer tips] Spreading the load or why wise developers use asynchronous workflows
Andre Garzia
2010-08-04 15:36:25 UTC
Permalink
Hello Folks,

Spreading the load or why wise developers use asynchronous workflows

Introduction

Continuing the trend of posting tips for RevServer here, today I decided to
talk about how to spread the load of your web application. During the last
days we've seen many talks about memory limitations and execution time
limitations regarding RevServer (actually regarding On-Rev). The fact is
that all web hosts that use shared servers need some kind of virtualization
police to prevent rogue processes from taking all RAM and/or CPU and thus
rendering the whole system unresponsive. This kind of policy can be
implemented on different OS levels and how each web hosting service does it
is beyond the scope of this email.

On-Rev service has a policy of allowing a process to run for about 30
seconds and to take up to 64MB, while this seems small to all the desktop
developers in here, all of which are used to swallowing big chunks of memory
and CPU (seen people trying to insert 1 GB of data into Text fields), these
are actually very sensible values and should accomodate most users. Those 30
secs are like forever in terms of web serving, usually a page takes
milliseconds to be served. 64MB of RAM is also a big sum. Here where I work
we use the biggest database I've ever seen, it is spread among different
machines but the one I am working now has 83 million records totaling 9GB of
data and this is our small database, the big one holds more than 7 thousand
tables and millions and millions of records. We're on the business of being
evil, I mean, we're on the business of sending email marketing and just one
of our machines pushes 26 thousand emails per minute. Right now our system
is built with PHP (it was built before me) and we do all this stuff with a
memory allocation of 120MB (which we use about 70MB), so the 64MB allowed by
On-Rev appears quite good. Remember this limits applies only to On-Rev
service, I am running RevServer on my own VPS and I am yet to face such
limits (see my previous email). Other web hosts have different limits and
one should not assume that his own hosting company has no limit.

Many web developers here are just beginning on their path to total web
server domination, most are coming from the safe lands of Desktop
application design where you are free to do basically anything. You have as
much memory as you can swap pages to disk. You can display a progress bar
and have a handler execute for minutes without a problem. Those developers
sometimes are unprepared to deal with the constraints forced on them by
server side programming. They are no unprepared because they are lazy or
anything but because they are not used to the "design patterns" of server
side programming, you can't think in terms of something when you never ever
saw that thing, without some knowledge base the only way up on the steep
hill of server side programming is quite a hard track. So today's topic is
on load distribution and asynchronous workflow, let us first detail the
problem.

Well, this message is to big for the list server, it blocked me, so if you
want to carry on reading:

move to: http://andregarzia.com/async.irev
--
http://www.andregarzia.com All We Do Is Code.
Bob Sneidar
2010-08-04 16:45:03 UTC
Permalink
Nice work Andre. This extraordinarily long bit of useful information ought to be in an article in the Run-Rev newsletter next month.

Bob
Post by Andre Garzia
Hello Folks,
Spreading the load or why wise developers use asynchronous workflows
Introduction
<snip extraordinarily long bit of useful information that ought to be in an article in the Run-Rev newsletter next month>
wayne durden
2010-08-04 16:49:36 UTC
Permalink
Hi,

Just want to make sure I have the general understanding of the issues... On
a shared hosting setup where there is a process time limit such as 30
seconds, would that mean that some other entity using the same server with
an intensive process could latch essentially all of the processing for up to
30 seconds? Is there a more finely granulated check that still swaps users
in and out to a degree below a certain process priority claim? And if the
first assertion is the case, it wouldn't matter what tech one went with Rev,
Ruby, PHP, etc. you could still get a wait time of almost 30 seconds before
the server ended your sharer's processing and reached you, correct?

Thanks,

Wayne
Andre Garzia
2010-08-04 16:54:02 UTC
Permalink
Wayne,

you got it wrong, it is a per process limitation. The policies change from
shared hosting company to shared hosting company. At On-Rev means that a
single process can only use 30 secs of processing time, this is done
precisely to prevent a rogue process from using all the resources and thus
making the life of other users a mess. No one can hog the whole server for
30 seconds because, there is a CPU limit as well. It is not just time, the
limits are set so all users can reach the limit without affecting each
other. Thats the idea

Andre
Post by wayne durden
Hi,
Just want to make sure I have the general understanding of the issues...
On
a shared hosting setup where there is a process time limit such as 30
seconds, would that mean that some other entity using the same server with
an intensive process could latch essentially all of the processing for up to
30 seconds? Is there a more finely granulated check that still swaps users
in and out to a degree below a certain process priority claim? And if the
first assertion is the case, it wouldn't matter what tech one went with Rev,
Ruby, PHP, etc. you could still get a wait time of almost 30 seconds before
the server ended your sharer's processing and reached you, correct?
Thanks,
Wayne
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
wayne durden
2010-08-04 16:59:58 UTC
Permalink
Thanks Andre, and I am working through your article now as well. I get that
it is per process but the part that isn't still clear to me is that the OS
can be doing my intensive process for 30 seconds before closing it and also
attending to another user simultaneously or not. I am under the impression
there is still some resource slicing going on, I just don't have a concrete
understanding...

This is all very interesting to me because I am interested in moving a
desktop app that processes datafiles up to 100,000 lines which can mean for
each line comparing against the remainder (in reality sorts cust this down a
great deal), but this can run for minutes on a desktop app and I have got to
cut it down into asynchronous processing as per your article...

Thanks!

Wayne
Post by Andre Garzia
Wayne,
you got it wrong, it is a per process limitation. The policies change from
shared hosting company to shared hosting company. At On-Rev means that a
single process can only use 30 secs of processing time, this is done
precisely to prevent a rogue process from using all the resources and thus
making the life of other users a mess. No one can hog the whole server for
30 seconds because, there is a CPU limit as well. It is not just time, the
limits are set so all users can reach the limit without affecting each
other. Thats the idea
Andre
Post by wayne durden
Hi,
Just want to make sure I have the general understanding of the issues...
On
a shared hosting setup where there is a process time limit such as 30
seconds, would that mean that some other entity using the same server
with
Post by wayne durden
an intensive process could latch essentially all of the processing for up to
30 seconds? Is there a more finely granulated check that still swaps
users
Post by wayne durden
in and out to a degree below a certain process priority claim? And if
the
Post by wayne durden
first assertion is the case, it wouldn't matter what tech one went with Rev,
Ruby, PHP, etc. you could still get a wait time of almost 30 seconds
before
Post by wayne durden
the server ended your sharer's processing and reached you, correct?
Thanks,
Wayne
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
Andre Garzia
2010-08-04 17:02:36 UTC
Permalink
Glad the article is useful!

The OS will be able to attend you and others with no problem but it will
enforce the limitations, meaning in about 30 secs of work, your process will
shutdown. For your intensive task, the best idea is an asynchronous workflow
with some kind of map/reduce or queue processing governated by the client
browser.

Andre
Post by wayne durden
Thanks Andre, and I am working through your article now as well. I get that
it is per process but the part that isn't still clear to me is that the OS
can be doing my intensive process for 30 seconds before closing it and also
attending to another user simultaneously or not. I am under the impression
there is still some resource slicing going on, I just don't have a concrete
understanding...
This is all very interesting to me because I am interested in moving a
desktop app that processes datafiles up to 100,000 lines which can mean for
each line comparing against the remainder (in reality sorts cust this down a
great deal), but this can run for minutes on a desktop app and I have got to
cut it down into asynchronous processing as per your article...
Thanks!
Wayne
Post by Andre Garzia
Wayne,
you got it wrong, it is a per process limitation. The policies change
from
Post by Andre Garzia
shared hosting company to shared hosting company. At On-Rev means that a
single process can only use 30 secs of processing time, this is done
precisely to prevent a rogue process from using all the resources and
thus
Post by Andre Garzia
making the life of other users a mess. No one can hog the whole server
for
Post by Andre Garzia
30 seconds because, there is a CPU limit as well. It is not just time,
the
Post by Andre Garzia
limits are set so all users can reach the limit without affecting each
other. Thats the idea
Andre
Post by wayne durden
Hi,
Just want to make sure I have the general understanding of the
issues...
Post by Andre Garzia
Post by wayne durden
On
a shared hosting setup where there is a process time limit such as 30
seconds, would that mean that some other entity using the same server
with
Post by wayne durden
an intensive process could latch essentially all of the processing for
up
Post by Andre Garzia
Post by wayne durden
to
30 seconds? Is there a more finely granulated check that still swaps
users
Post by wayne durden
in and out to a degree below a certain process priority claim? And if
the
Post by wayne durden
first assertion is the case, it wouldn't matter what tech one went with Rev,
Ruby, PHP, etc. you could still get a wait time of almost 30 seconds
before
Post by wayne durden
the server ended your sharer's processing and reached you, correct?
Thanks,
Wayne
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
wayne durden
2010-08-04 17:17:03 UTC
Permalink
Thanks Andre, I am coming to that conclusion I believe as well. Wrestling
with how to do some processing and save state of where it is and restart at
the left off point...

What still doesn't quite make sense to me is why if the server is already
slicing its resources amount users (I get x memory and x amount of the
processing time on the server), exactly why there then needs to be any per
process limitation time wise if the OS can already swap resources between
users. This isn't a question I need answered, it's just a matter of wanting
to understand more concretely "all the way down" exactly how things work. I
accept that it is so, and I suspect that the granularity of the OS time
slice parceling amongst users perhaps isn't nearly as easy if a user has a
process continually running.

Please don't spend any more time responding, I will do some side reading to
satisfy the curiosity until I reach my limit of effort to curiosity. Thanks
again for all you put out on this list! Great pointers for how I have to
rethink my app to turn it into a server service.

Wayne
Post by Andre Garzia
Glad the article is useful!
The OS will be able to attend you and others with no problem but it will
enforce the limitations, meaning in about 30 secs of work, your process will
shutdown. For your intensive task, the best idea is an asynchronous workflow
with some kind of map/reduce or queue processing governated by the client
browser.
Andre
Post by wayne durden
Thanks Andre, and I am working through your article now as well. I get that
it is per process but the part that isn't still clear to me is that the
OS
Post by wayne durden
can be doing my intensive process for 30 seconds before closing it and
also
Post by wayne durden
attending to another user simultaneously or not. I am under the
impression
Post by wayne durden
there is still some resource slicing going on, I just don't have a
concrete
Post by wayne durden
understanding...
This is all very interesting to me because I am interested in moving a
desktop app that processes datafiles up to 100,000 lines which can mean
for
Post by wayne durden
each line comparing against the remainder (in reality sorts cust this
down
Post by wayne durden
a
great deal), but this can run for minutes on a desktop app and I have got to
cut it down into asynchronous processing as per your article...
Thanks!
Wayne
Post by Andre Garzia
Wayne,
you got it wrong, it is a per process limitation. The policies change
from
Post by Andre Garzia
shared hosting company to shared hosting company. At On-Rev means that
a
Post by wayne durden
Post by Andre Garzia
single process can only use 30 secs of processing time, this is done
precisely to prevent a rogue process from using all the resources and
thus
Post by Andre Garzia
making the life of other users a mess. No one can hog the whole server
for
Post by Andre Garzia
30 seconds because, there is a CPU limit as well. It is not just time,
the
Post by Andre Garzia
limits are set so all users can reach the limit without affecting each
other. Thats the idea
Andre
Post by wayne durden
Hi,
Just want to make sure I have the general understanding of the
issues...
Post by Andre Garzia
Post by wayne durden
On
a shared hosting setup where there is a process time limit such as 30
seconds, would that mean that some other entity using the same server
with
Post by wayne durden
an intensive process could latch essentially all of the processing
for
Post by wayne durden
up
Post by Andre Garzia
Post by wayne durden
to
30 seconds? Is there a more finely granulated check that still swaps
users
Post by wayne durden
in and out to a degree below a certain process priority claim? And
if
Post by wayne durden
Post by Andre Garzia
the
Post by wayne durden
first assertion is the case, it wouldn't matter what tech one went
with
Post by wayne durden
Post by Andre Garzia
Post by wayne durden
Rev,
Ruby, PHP, etc. you could still get a wait time of almost 30 seconds
before
Post by wayne durden
the server ended your sharer's processing and reached you, correct?
Thanks,
Wayne
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
Bob Sneidar
2010-08-04 17:22:05 UTC
Permalink
Okay, so let's say I'm a script kiddie with a bug up my butt about your web server. I decide I am going to take it down. Now I'm smart enough to know that servers are multi-threaded, meaning they can host lots of connections and process threads to manage simultaneous connections. But what I am banking on is that your server does not have any limits on how long a process can stay open.

So what I do is craft an application that continuously opens processes that will take forever. All the well behaved processes from other users will eventually finish, leaving one more process thread for my malicious app to gobble up.

Eventually my malicious app gobbles up ALL the available processes, and bobs-yer-uncle I have your server by the short hairs. Oh but wait! Turns out you were not as dumb as moi hoped you were, and you set up policies on your web server that automatically terminated processes lasting longer than 30 seconds. Well I might be able to bog down your server, but I can't kill it.

Oh but wait! You turned out to be MUCH smarter than I thought, and after my server terminates x number of processes from a particular address, you lock me out of your server! Okay, well I craft my program now to create HUGE processes, as big as I can get them. Oh but wait again! Your server has limits on how big a process can be! Dang! Yer a genius and I am screwed!

Bob
Post by wayne durden
Thanks Andre, and I am working through your article now as well. I get that
it is per process but the part that isn't still clear to me is that the OS
can be doing my intensive process for 30 seconds before closing it and also
attending to another user simultaneously or not. I am under the impression
there is still some resource slicing going on, I just don't have a concrete
understanding...
This is all very interesting to me because I am interested in moving a
desktop app that processes datafiles up to 100,000 lines which can mean for
each line comparing against the remainder (in reality sorts cust this down a
great deal), but this can run for minutes on a desktop app and I have got to
cut it down into asynchronous processing as per your article...
Thanks!
Wayne
Mark Wieder
2010-08-04 17:26:45 UTC
Permalink
Bob-

Good use-case about why this is necessary.
--
-Mark Wieder
***@ahsoftware.net
wayne durden
2010-08-04 17:37:22 UTC
Permalink
Great Bob, got it! Probably would have taken hours of searching before this
light bulb would have flashed on about the rationale for the limitation.

Thanks, Wayne
Post by Mark Wieder
Bob-
Good use-case about why this is necessary.
--
-Mark Wieder
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
Chipp Walters
2010-08-04 21:23:25 UTC
Permalink
Brilliant example, Bob! Thanks so much for sharing.
Okay, so let's say I'm a script kiddie with a bug up my butt about your web server. I decide I am going to take it down. Now I'm smart enough to know that servers are multi-threaded, meaning they can host lots of connections and process threads to manage simultaneous connections. But what I am banking on is that your server does not have any limits on how long a process can stay <snip>
Andre Garzia
2010-08-04 21:27:19 UTC
Permalink
Folks,

Let me tell you that I once did something like that by accident on my own
test server. I had a recursive process that started spawning itself and
would not quit... in the end I had to reboot the damn vps. Thats why those
limits are important

:-/
Post by Chipp Walters
Brilliant example, Bob! Thanks so much for sharing.
Post by Bob Sneidar
Okay, so let's say I'm a script kiddie with a bug up my butt about your
web server. I decide I am going to take it down. Now I'm smart enough to
know that servers are multi-threaded, meaning they can host lots of
connections and process threads to manage simultaneous connections. But what
I am banking on is that your server does not have any limits on how long a
process can stay <snip>
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
Mark Wieder
2010-08-05 21:26:32 UTC
Permalink
Andre-
Post by Andre Garzia
Let me tell you that I once did something like that by accident on my own
test server. I had a recursive process that started spawning itself and
would not quit... in the end I had to reboot the damn vps. Thats why those
limits are important
Awww... I've brought down bigger systems than that. Next time we sit
down over a beer I'll tell you about the time I found out what the
nohup command does...
--
-Mark Wieder
***@ahsoftware.net
Jeff Massung
2010-08-05 21:36:33 UTC
Permalink
Post by Mark Wieder
Andre-
Post by Andre Garzia
Let me tell you that I once did something like that by accident on my own
test server. I had a recursive process that started spawning itself and
would not quit... in the end I had to reboot the damn vps. Thats why
those
Post by Andre Garzia
limits are important
Awww... I've brought down bigger systems than that. Next time we sit
down over a beer I'll tell you about the time I found out what the
nohup command does...
While we're bringing up old "war stories" and measuring lengths...

I have a friend who works for Boeing. Just to give a little physics
background, satellites are not in perfect orbit; they are continuously
falling to the Earth and need course correcting every so often.

This one particular satellite used a particular program to course correct
and the thrust units were measured in thousands (2000, 3000, etc). A new
hire who was working late got the call for a course correct went to the
machine, typed in the appropriate thrust amount and which rockets and hit
enter. Was prompted "are you sure?", hit yes, and bye-bye satellite.
Un-beknowst to the new guy, the program implicitly did the multiplication of
units for him (he was supposed to enter 2, 3, ...).

Nothing like firing a few hundred million dollars out into space your first
week on the job, eh? ;-)

Jeff M.
Bob Sneidar
2010-08-05 21:54:59 UTC
Permalink
Eh heh. Sounds a lot like urban legend to me. ;-) Ima going to have to look that up. Still, not unbelievable. My first day in the fleet I talked a second class petty officer into troubleshooting a particularly nasty problem with the control console for the "billion" dollar missile radar I was supposed to be trained to work on.

After looking at the signal flow, and determining that the problem had to be on a particular card, I said, "We gotta put the card on an extender and check the signal here and here." So he got the scope and I pulled the card, inserted the extended and plugged the card in.

We powered up the console, but after about 10 seconds I said, "What's burning?" You guessed it. The card I had just plugged into the extender... BACKWARDS! It seems that while the card slots were keyed to prevent this very thing, the extender cards were not. Opps didn't go over well the next day with the work center manager. I didn't get to touch the equipment for the next six months.

Bob
Post by Jeff Massung
While we're bringing up old "war stories" and measuring lengths...
I have a friend who works for Boeing. Just to give a little physics
background, satellites are not in perfect orbit; they are continuously
falling to the Earth and need course correcting every so often.
This one particular satellite used a particular program to course correct
and the thrust units were measured in thousands (2000, 3000, etc). A new
hire who was working late got the call for a course correct went to the
machine, typed in the appropriate thrust amount and which rockets and hit
enter. Was prompted "are you sure?", hit yes, and bye-bye satellite.
Un-beknowst to the new guy, the program implicitly did the multiplication of
units for him (he was supposed to enter 2, 3, ...).
Nothing like firing a few hundred million dollars out into space your first
week on the job, eh? ;-)
Jeff M.
_______________________________________________
use-revolution mailing list
http://lists.runrev.com/mailman/listinfo/use-revolution
Jeff Massung
2010-08-05 22:05:35 UTC
Permalink
Post by Bob Sneidar
Eh heh. Sounds a lot like urban legend to me. ;-)
Maybe, I'm just passing along the story as it was told to me (my friend does
work at Boeing, although I have to take his word on the story).

Regardless, it makes me chuckle every time I think about it. Yours does as
well. :-)

Jeff M.
Mark Wieder
2010-08-05 22:11:53 UTC
Permalink
Bob-

LOL...

...and there's nothing like plugging a *chip* into a socket backwards
and watching it go flying across the room when you turn on the
power...
--
-Mark Wieder
***@ahsoftware.net
Andre Garzia
2010-08-05 22:12:33 UTC
Permalink
Post by Jeff Massung
Post by Mark Wieder
Andre-
Post by Andre Garzia
Let me tell you that I once did something like that by accident on my
own
Post by Mark Wieder
Post by Andre Garzia
test server. I had a recursive process that started spawning itself and
would not quit... in the end I had to reboot the damn vps. Thats why
those
Post by Andre Garzia
limits are important
Awww... I've brought down bigger systems than that. Next time we sit
down over a beer I'll tell you about the time I found out what the
nohup command does...
While we're bringing up old "war stories" and measuring lengths...
I have a friend who works for Boeing. Just to give a little physics
background, satellites are not in perfect orbit; they are continuously
falling to the Earth and need course correcting every so often.
This one particular satellite used a particular program to course correct
and the thrust units were measured in thousands (2000, 3000, etc). A new
hire who was working late got the call for a course correct went to the
machine, typed in the appropriate thrust amount and which rockets and hit
enter. Was prompted "are you sure?", hit yes, and bye-bye satellite.
Un-beknowst to the new guy, the program implicitly did the multiplication of
units for him (he was supposed to enter 2, 3, ...).
Nothing like firing a few hundred million dollars out into space your first
week on the job, eh? ;-)
Jeff M.
Wow, I wish I could ballistically implant a satellite into my government
presidential palace...

Not counting the whole earth orbit thing, I have a similar story, while
being a rookie student at engineering, we were "stationed" at a robotics lab
as the university. We built a mechanical arm and hand that was supposed to
pick a cup of coffee from a table without spilling it and put it into our
robotic waiter (aka motorized skate board).

This was in 1998, we used old 486 and 386 for our spare parts, we did not
have enclosures for those machines, so they usually were naked motherboards
on our tables while we were DOING SCIENCE or something similar.

The arm control software was written in Pascal and for the servo control we
would pass an integer to a function and after a quick sum it would move the
little arm. One very wise guy wondered what would happen if he changed that
+ sign to a * sign... coffee ended up on top of the motherboard as the arm
made a complete turn and manage to hit its own video card with the cup...
not a million dollars loss but a waste of a 386 running minix...

I've learned on that day never ever fiddle with magic numbers in software...
and I was not the one doing the arm thing, I was just watching and exploding
coffeemakers trying to hook them to my serial port.

As they said: "Robotics is the art of combining physics, computer science,
mechanics and large amounts of money into a machine that will gather huge
amounts of information regarding its surroundings and then ignore it and
drive into a wall."
Post by Jeff Massung
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
Thomas McGrath III
2010-08-05 18:42:14 UTC
Permalink
Bob,

It's amazing how your scenario turned on the light bulb in my brain about this. Thanks...

And bobs-yer-uncle I got it...

Tom
Post by Bob Sneidar
Okay, so let's say I'm a script kiddie with a bug up my butt about your web server. I decide I am going to take it down. Now I'm smart enough to know that servers are multi-threaded, meaning they can host lots of connections and process threads to manage simultaneous connections. But what I am banking on is that your server does not have any limits on how long a process can stay open.
So what I do is craft an application that continuously opens processes that will take forever. All the well behaved processes from other users will eventually finish, leaving one more process thread for my malicious app to gobble up.
Eventually my malicious app gobbles up ALL the available processes, and bobs-yer-uncle I have your server by the short hairs. Oh but wait! Turns out you were not as dumb as moi hoped you were, and you set up policies on your web server that automatically terminated processes lasting longer than 30 seconds. Well I might be able to bog down your server, but I can't kill it.
Oh but wait! You turned out to be MUCH smarter than I thought, and after my server terminates x number of processes from a particular address, you lock me out of your server! Okay, well I craft my program now to create HUGE processes, as big as I can get them. Oh but wait again! Your server has limits on how big a process can be! Dang! Yer a genius and I am screwed!
Bob
Post by wayne durden
Thanks Andre, and I am working through your article now as well. I get that
it is per process but the part that isn't still clear to me is that the OS
can be doing my intensive process for 30 seconds before closing it and also
attending to another user simultaneously or not. I am under the impression
there is still some resource slicing going on, I just don't have a concrete
understanding...
This is all very interesting to me because I am interested in moving a
desktop app that processes datafiles up to 100,000 lines which can mean for
each line comparing against the remainder (in reality sorts cust this down a
great deal), but this can run for minutes on a desktop app and I have got to
cut it down into asynchronous processing as per your article...
Thanks!
Wayne
_______________________________________________
use-revolution mailing list
http://lists.runrev.com/mailman/listinfo/use-revolution
Michael Kann
2010-08-04 16:56:46 UTC
Permalink
Andre wrote:

If there's interest in this community I can craft some real RevServer
scripts showing this approach. This is the key to being able to serve
lots of request while doing intensive work and being a good shared
server citizen.

---------------------------------------------------------
I'm sure that everybody who has read your explanation would appreciate some scripts. Thanks again,

Mike

--- On Wed, 8/4/10, Andre Garzia <***@andregarzia.com> wrote:

From: Andre Garzia <***@andregarzia.com>
Subject: [RevServer tips] Spreading the load or why wise developers use asynchronous workflows
To: "How to use Revolution" <use-***@lists.runrev.com>
Date: Wednesday, August 4, 2010, 10:36 AM

Hello Folks,

Spreading the load or why wise developers use asynchronous workflows

Introduction

Continuing the trend of posting tips for RevServer here, today I decided to
talk about how to spread the load of your web application. During the last
days we've seen many talks about memory limitations and execution time
limitations regarding RevServer (actually regarding On-Rev). The fact is
that all web hosts that use shared servers need some kind of virtualization
police to prevent rogue processes from taking all RAM and/or CPU and thus
rendering the whole system unresponsive. This kind of policy can be
implemented on different OS levels and how each web hosting service does it
is beyond the scope of this email.

On-Rev service has a policy of allowing a process to run for about 30
seconds and to take up to 64MB, while this seems small to all the desktop
developers in here, all of which are used to swallowing big chunks of memory
and CPU (seen people trying to insert 1 GB of data into Text fields), these
are actually very sensible values and should accomodate most users. Those 30
secs are like forever in terms of web serving, usually a page takes
milliseconds to be served. 64MB of RAM is also a big sum. Here where I work
we use the biggest database I've ever seen, it is spread among different
machines but the one I am working now has 83 million records totaling 9GB of
data and this is our small database, the big one holds more than 7 thousand
tables and millions and millions of records. We're on the business of being
evil, I mean, we're on the business of sending email marketing and just one
of our machines pushes 26 thousand emails per minute. Right now our system
is built with PHP (it was built before me) and we do all this stuff with a
memory allocation of 120MB (which we use about 70MB), so the 64MB allowed by
On-Rev appears quite good. Remember this limits applies only to On-Rev
service, I am running RevServer on my own VPS and I am yet to face such
limits (see my previous email). Other web hosts have different limits and
one should not assume that his own hosting company has no limit.

Many web developers here are just beginning on their path to total web
server domination, most are coming from the safe lands of Desktop
application design where you are free to do basically anything. You have as
much memory as you can swap pages to disk. You can display a progress bar
and have a handler execute for minutes without a problem. Those developers
sometimes are unprepared to deal with the constraints forced on them by
server side programming. They are no unprepared because they are lazy or
anything but because they are not used to the "design patterns" of server
side programming, you can't think in terms of something when you never ever
saw that thing, without some knowledge base the only way up on the steep
hill of server side programming is quite a hard track. So today's topic is
on load distribution and asynchronous workflow, let us first detail the
problem.

Well, this message is to big for the list server, it blocked me, so if you
want to carry on reading:

move to: http://andregarzia.com/async.irev
--
http://www.andregarzia.com All We Do Is Code.
wayne durden
2010-08-04 17:01:05 UTC
Permalink
I second Michael per the scripts Andre!

Wayne
Michael Kann
2010-08-04 17:09:35 UTC
Permalink
Wayne,

Someone on this forum might be able to find room for improvement in your data processing program. You might want to put it out as a challenge to see what others can do with it.

Mike

Wayne wrote:

This is all very interesting to me because I am interested in moving a
desktop app that processes datafiles up to 100,000 lines which can mean for
each line comparing against the remainder (in reality sorts cust this down a
great deal), but this can run for minutes on a desktop app and I have got to
cut it down into asynchronous processing as per your article...
wayne durden
2010-08-04 17:32:19 UTC
Permalink
Hi Mike,

Thanks for that encouragement and I have seen such efforts on this list in
the past. I don't think my situation is an ideal candidate because the
business logic of the processing that happens is quite convoluted and hard
to keep mentally in focus. The app in question takes lines of reported
equity trades, and matches opposite sides prorating as necessary. That part
is all rather straightforward and simple, but there is a particulary nasty
tax rule called the wash sale rule that then requires a lengthy series of
condition checks for other trades to see if it is triggered. It's not
rocket science, but it is tax law with a bunch of weird nested conditions.

I don't doubt that the members of this list could probably cut the compares
substantially, but I think the complexity of the rules needed to understand
is beyond the "interesting puzzle " level. Additionally, for the average
case of less than 4000 lines, trying to optimize the desktop app is not
necessary.

Right now RunRev is a secret weapon which allows me to do this very
effectively as a "touch" service, but where this is headed is in an
institutional setting handling the accounts automatically and that is beyond
optimizing the matching algorithms, to rethinking the breakdown of how the
processing actually needs to be handled.

I do appreciate the suggestion, and I have seen many cases in the past where
the members on this list would jump at the chance to optimize others code.
This list is truly a fantastic thing!

Wayne
Post by Andre Garzia
Wayne,
Someone on this forum might be able to find room for improvement in your
data processing program. You might want to put it out as a challenge to see
what others can do with it.
Mike
This is all very interesting to me because I am interested in moving a
desktop app that processes datafiles up to 100,000 lines which can mean for
each line comparing against the remainder (in reality sorts cust this down a
great deal), but this can run for minutes on a desktop app and I have got to
cut it down into asynchronous processing as per your article...
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
Phil Davis
2010-08-04 17:51:54 UTC
Permalink
Excellent piece, Andre! Thanks.
--
Phil Davis

PDS Labs
Professional Software Development
http://pdslabs.net
Andre Garzia
2010-08-04 17:56:43 UTC
Permalink
Thanks Phil!

:-D
Post by Phil Davis
Excellent piece, Andre! Thanks.
--
Phil Davis
PDS Labs
Professional Software Development
http://pdslabs.net
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
Phil Davis
2010-08-04 18:04:56 UTC
Permalink
You already give back a lot to this community, but along with others I
think it would be great if you could create an example like you
mentioned. There's nothing like the real thing!

Phil
Post by Andre Garzia
Thanks Phil!
:-D
Post by Phil Davis
Excellent piece, Andre! Thanks.
--
Phil Davis
PDS Labs
Professional Software Development
http://pdslabs.net
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
Phil Davis

PDS Labs
Professional Software Development
http://pdslabs.net
Andre Garzia
2010-08-04 18:15:28 UTC
Permalink
I will craft something as soon as I have the time :-D

Anyone has a suggestion on something lenghty and memory intensive for us to
try?
Post by Phil Davis
You already give back a lot to this community, but along with others I
think it would be great if you could create an example like you mentioned.
There's nothing like the real thing!
Phil
Post by Andre Garzia
Thanks Phil!
:-D
Post by Phil Davis
Excellent piece, Andre! Thanks.
--
Phil Davis
PDS Labs
Professional Software Development
http://pdslabs.net
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
Phil Davis
PDS Labs
Professional Software Development
http://pdslabs.net
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
wayne durden
2010-08-04 19:09:13 UTC
Permalink
Graphic manipulations perhaps multiple iterations over the same image
changing pixel values has proven to eat lots of time in my experience with
rev in the past.
Post by Andre Garzia
I will craft something as soon as I have the time :-D
Anyone has a suggestion on something lenghty and memory intensive for us to
try?
Post by Phil Davis
You already give back a lot to this community, but along with others I
think it would be great if you could create an example like you
mentioned.
Post by Phil Davis
There's nothing like the real thing!
Phil
Post by Andre Garzia
Thanks Phil!
:-D
Post by Phil Davis
Excellent piece, Andre! Thanks.
--
Phil Davis
PDS Labs
Professional Software Development
http://pdslabs.net
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
Phil Davis
PDS Labs
Professional Software Development
http://pdslabs.net
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
Scott Rossi
2010-08-04 19:03:43 UTC
Permalink
Post by Andre Garzia
I will craft something as soon as I have the time :-D
Anyone has a suggestion on something lenghty and memory intensive for us to
try?
Image processing?

Regards,

Scott Rossi
Creative Director
Tactile Media, UX Design
Andre Garzia
2010-08-04 19:32:13 UTC
Permalink
Post by Scott Rossi
Post by Andre Garzia
I will craft something as soon as I have the time :-D
Anyone has a suggestion on something lenghty and memory intensive for us
to
Post by Andre Garzia
try?
Image processing?
Scott,

Good idea, we'll try to craft something later.

:D
Post by Scott Rossi
Regards,
Scott Rossi
Creative Director
Tactile Media, UX Design
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
Richard Gaskin
2010-08-04 19:28:44 UTC
Permalink
Post by wayne durden
This is all very interesting to me because I am interested in moving a
desktop app that processes datafiles up to 100,000 lines which can mean for
each line comparing against the remainder (in reality sorts cust this down a
great deal), but this can run for minutes on a desktop app and I have got to
cut it down into asynchronous processing as per your article...
I don't know the specifics of your data or your needs, but lately I've
been experimenting with a variety of different ways to store data, and
I've found that for many tasks using column-based storage over row-based
storage can speed up searches and comparisons by orders of magnitude.

This is where the old acronyms OLAP and OLTP come in, with the "A" being
"access" (analytics, data mining; mostly read operations) and "T" being
"transaction" (posting as well as reading). That's an
oversimplification, but spending some time following those links out in
Wikipedia from those can lead to all sorts of different ways to store
and index data for task-specific needs which can radically reduce CPU
and RAM consumption.

For example, if you had a data set in which you had 300,000 address
records stored in eight fields, you could store them in eight files in
which each stores only the values for a given column. Finding addresses
in zip code would then no longer need to traverse the whole data set and
parse each line, but merely pick up the one file for zip codes and
"repeat for each" with those. Any columns you're not interested in for
a given search are left on disk and take up zero RAM.

Then there are other things one can add in, like cardinal indexing of
column values for one-step searches across data sets of any size.

Quick example using the zip code exercise again: You write an indexer
that runs through the data set and produces a stack file in which each
of the custom property keys of the stack is a zip code, and the value of
each property is a list of the ID numbers of all the records that have
that zip code.

With that index you can now search in one step:

get the uZipCodes["90031"] of stack "ZipIndex.rev"

...and you have an instant list of the ID of every record with that zip
code.

How to get the data once you've found those IDs?

There are an infinite number of ways to store data, but if you used even
just simple tab-delimited files you'd be surprised how quickly you can
get to what you want using the seek command if you write an index first.

Such a master index could also be a simple list of properties in a stack
(by far the most efficient way to load persistent arrays in Rev, much
faster than arrayDecode), in which each element key is the ID number of
the record and each value is just two lines: the byte offset to the
start of the record, and the length of the record.

With that relatively small index you can get any record anywhere in even
a giant file in four lines:

open file tMyDataFile for read
seek to tRecordStart in file tMyDataFile
read from file tMyDataFile for tRecordLength
close file tMyDataFile

On my slow Mac here I can use that to pull a record out of a 500 MB file
containing 300,000 records in about 50 MICROseconds.

Since an index for a file like that will take only a few MBs it can be
loaded in no time, and the seek command doesn't load the whole data file
into RAM so the only memory consumption for getting the record is just
the record itself + the index + the engine's normal overhead.

Combined with the cardinal indexing described above and you can slice
and dice data any number of ways really quickly.

Of course this is only suited for OLAP-style tasks, dependent on the
data not changing frequently so it can be worthwhile indexing it without
the indexing adding more overhead than it's worth. FWIW, on my slow Mac
I can write the master index and two or three columnar cardinal indices
in well under a minutes.

For all sorts of task in which data is read far more frequently than
written, you can use methods like this to get ultra-fast results with
minimal resource consumption.

If the data on the server is not modified there but merely used as a
data repository for your searches, you could do the indexing tasks on
your desktop and just upload the index stacks to your server along with
a copy of the file. The server load will always be minimal, and you can
do some relatively massive tasks well under even most shared hosting limits.

Of course you could also use MySQL, CouchDB, or any number of other
off-the-shelf solutions for much of this, but for some tasks you may
find you can write an indexer and retriever faster in Rev than you could
dig up the syntax to do it in another language. :)


WARNING: Once you start exploring indexing techniques you may become
addicted; you will find yourself daydreaming about new methods at odd
hours of the day, and time formerly spent with the family will suddenly
become spent on the web learning even better methods. You may find
yourself thinking about ways to use Rev's union and intersect commands
on results from index searches to implement even complex AND and OR
queries in one step. Turning data inside out can cause your mind to
cave in on itself, and worse you make like it. You have been warned.

--
Richard Gaskin
Fourth World
Rev training and consulting: http://www.fourthworld.com
Webzine for Rev developers: http://www.revjournal.com
revJournal blog: http://revjournal.com/blog.irv
Jan Schenkel
2010-08-04 19:59:51 UTC
Permalink
Excellent article, Andre - perhaps you should expand it with a stern warning for people who want to access their remote database directly, rather than going througha cgi? That's also one of those coming-from-the-desktop practices that need to be taken care of once and for all, IMO :-)

Jan Schenkel.
=====
Quartam Reports & PDF Library for Revolution
<http://www.quartam.com>

=====
"As we grow older, we grow both wiser and more foolish at the same time." (La Rochefoucauld)
Post by Andre Garzia
Hello Folks,
Spreading the load or why wise developers use asynchronous
workflows
[snip]
move to: http://andregarzia.com/async.irev
Bob Sneidar
2010-08-04 20:10:17 UTC
Permalink
Hi Jan.

Is accessing your database directly from a remote location taboo? I'm developing an app that does exactly that!

Bob
Post by Jan Schenkel
Excellent article, Andre - perhaps you should expand it with a stern warning for people who want to access their remote database directly, rather than going througha cgi? That's also one of those coming-from-the-desktop practices that need to be taken care of once and for all, IMO :-)
Jan Schenkel.
Jeff Massung
2010-08-04 20:20:57 UTC
Permalink
Never, ever, ever do this. ;-)

It's basically asking for someone to hack - and nuke - your database out
from under you. You never want to connect to it remotely, and you never want
to send SQL commands to it directly. Use an intermediate layer.

For example, have a server process that accepts incoming connections and
[indirect] commands that will end up modifying the database. But that
process is capable of doing a lot of security checks:

- Logins + permissions
- DOS attack checks
- Ensure validity of actions
- Much more...

The 3rd one there is probably most important. Instead of having a remote app
send direct SQL commands to a remotely hosted database, you create action
commands that end up performing the correct SQL under-the-hood.

This has *many* advantages:

- Clients have no direct access to the database (which may hold the data for
many clients)
- You can change your data schema without a client ever knowing, and no
application updates are required.
- The data storage method is hidden from potential hackers.
- Much more...

Jeff M.
Post by Bob Sneidar
Hi Jan.
Is accessing your database directly from a remote location taboo? I'm
developing an app that does exactly that!
Bob
Andre Garzia
2010-08-04 20:26:18 UTC
Permalink
Jan,

Will write a piece on this shortly, this is a big no no no!

my lib RevSpark was created to serve exactly that situation where you need
to be able to create simple CGIs that do not require complex views and
stuff. I created it specifically to serve as an easy way to built RESTful
services for database interaction.

http://hg.andregarzia.com/revspark

:D
Post by Jeff Massung
Never, ever, ever do this. ;-)
It's basically asking for someone to hack - and nuke - your database out
from under you. You never want to connect to it remotely, and you never want
to send SQL commands to it directly. Use an intermediate layer.
For example, have a server process that accepts incoming connections and
[indirect] commands that will end up modifying the database. But that
- Logins + permissions
- DOS attack checks
- Ensure validity of actions
- Much more...
The 3rd one there is probably most important. Instead of having a remote app
send direct SQL commands to a remotely hosted database, you create action
commands that end up performing the correct SQL under-the-hood.
- Clients have no direct access to the database (which may hold the data for
many clients)
- You can change your data schema without a client ever knowing, and no
application updates are required.
- The data storage method is hidden from potential hackers.
- Much more...
Jeff M.
Post by Bob Sneidar
Hi Jan.
Is accessing your database directly from a remote location taboo? I'm
developing an app that does exactly that!
Bob
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
Pierre Sahores
2010-08-04 21:29:41 UTC
Permalink
Please, follow the Andre's and Jeff's explainations as closely as possibe. It's realy important, if you don't want to get your dbs and, secondly, your accounts hacked in just some attempts.

Best, Pierre


RIA or Web served n-tier apps don't never need to provide a public access to SQL back-end. In setting the SQL servers to allow localhost or LAN access only via application's servers or cgis, we are sure to get the best from our SQL db without having to care about unneeded security glichies.
Post by Andre Garzia
Jan,
Will write a piece on this shortly, this is a big no no no!
my lib RevSpark was created to serve exactly that situation where you need
to be able to create simple CGIs that do not require complex views and
stuff. I created it specifically to serve as an easy way to built RESTful
services for database interaction.
http://hg.andregarzia.com/revspark
:D
Post by Jeff Massung
Never, ever, ever do this. ;-)
It's basically asking for someone to hack - and nuke - your database out
from under you. You never want to connect to it remotely, and you never want
to send SQL commands to it directly. Use an intermediate layer.
For example, have a server process that accepts incoming connections and
[indirect] commands that will end up modifying the database. But that
- Logins + permissions
- DOS attack checks
- Ensure validity of actions
- Much more...
The 3rd one there is probably most important. Instead of having a remote app
send direct SQL commands to a remotely hosted database, you create action
commands that end up performing the correct SQL under-the-hood.
- Clients have no direct access to the database (which may hold the data for
many clients)
- You can change your data schema without a client ever knowing, and no
application updates are required.
- The data storage method is hidden from potential hackers.
- Much more...
Jeff M.
Post by Bob Sneidar
Hi Jan.
Is accessing your database directly from a remote location taboo? I'm
developing an app that does exactly that!
Bob
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
_______________________________________________
use-revolution mailing list
http://lists.runrev.com/mailman/listinfo/use-revolution
--
Pierre Sahores
mobile : (33) 6 03 95 77 70

www.woooooooords.com
www.sahores-conseil.com
Bob Sneidar
2010-08-05 16:52:03 UTC
Permalink
Problem is, I don't want to learn how to do web CGI's yet. I got the On-Rev account for 2 reasons: It was an AWESOME deal, and it had an SQL server I could use for my development wherever I go.

Bob
Post by Pierre Sahores
Please, follow the Andre's and Jeff's explainations as closely as possibe. It's realy important, if you don't want to get your dbs and, secondly, your accounts hacked in just some attempts.
Best, Pierre
Jeff Massung
2010-08-05 18:06:04 UTC
Permalink
Post by Bob Sneidar
Problem is, I don't want to learn how to do web CGI's yet. I got the On-Rev
account for 2 reasons: It was an AWESOME deal, and it had an SQL server I
could use for my development wherever I go.
Bob
Bob,

There's nothing to "CGI". The term has gone through many iterations. But,
think of it like this:

When someone makes an HTTP request to your web server (typically through a
browser, but not required), the web server accepts the incoming connection,
looks at the REST command (typically a GET or POST) and then attempts to
fulfill the request. Let's try an example:

GET /index.html HTTP/1.1

That would be the command sent by the socket (with more information, but
that's primarily the important part). Your web server (Apache w/ On-Rev)
looks at the file requested and says, ".HTML files are just sent verbatim
back." So it loads /index.html and sends all the data back over the
connection.

With CGI, all that's different is that there's a level of indirection added
to the process. Let's perform a similar command:

GET /register_user.irev HTTP/1.1

Now, the On-Rev Apache server is configured to understand that .IREV files
don't get sent verbatim back to the client. Instead, they are opened,
parsed, portions of them are executed, and the results are then sent on to
the client. That "executing" part of the story is a form of CGI.

In your register_user.irev script, you can then do something like this
(pseudo-code as I don't remember all of it correctly from memory):

<?rev
put $_GET["username"] into tLogin
put $_GET["password"] into tPasswd
put connectToDatabase(...) into tDB
revExecuteSQL tDb, "INSERT INTO ... WITH tLogin & tPasswd"
?>

You've just executed a database action using CGI and a REST API (note: REST
is just a glorified way of saying "via HTTP").

There's a lot to begin thinking about (security-wise*) once you've gotten it
working, but you can use the above to do all sorts of things. And best of
all, you don't need a browser. You can just send commands through Rev if you
wan:

get url "http://.../register_user.irev?username=bob&password=luggage12345"

Hope this helps,

Jeff M.

* I -highly- recommend that you take some time an look up DOS attacks on
Wikipedia and follow the links there to all the other kinds of attacks you
should worry about once a database is exposed to the world (DOS is just the
most common). Some key ones:

- Data validation
- Captcha
- IP validation
Devin Asay
2010-08-04 21:47:44 UTC
Permalink
Jan, Jeff, Andre,

So is it okay to have irev scripts that are on the same server as the DB make the requests? Or are you just saying you should only submit DB queries from localhost? (In MySQL terms, the difference between 'localhost' access and '%' access, for example.)

Of course, when doing DB access from Rev standalone apps, the only way it can be done is if the DB allows non-local access, through some port. If I understand you correctly, you're saying it is a Bad Idea to have an irev or php script query a DB from another server.

Just trying to make sure I understand the context. I'm a desktop guy who is doing more and more with revServer and the web environment, and I'd like to avoid having my server nuked.

Regards,

Devin
Post by Andre Garzia
Jan,
Will write a piece on this shortly, this is a big no no no!
my lib RevSpark was created to serve exactly that situation where you need
to be able to create simple CGIs that do not require complex views and
stuff. I created it specifically to serve as an easy way to built RESTful
services for database interaction.
http://hg.andregarzia.com/revspark
:D
Post by Jeff Massung
Never, ever, ever do this. ;-)
It's basically asking for someone to hack - and nuke - your database out
from under you. You never want to connect to it remotely, and you never want
to send SQL commands to it directly. Use an intermediate layer.
For example, have a server process that accepts incoming connections and
[indirect] commands that will end up modifying the database. But that
- Logins + permissions
- DOS attack checks
- Ensure validity of actions
- Much more...
The 3rd one there is probably most important. Instead of having a remote app
send direct SQL commands to a remotely hosted database, you create action
commands that end up performing the correct SQL under-the-hood.
- Clients have no direct access to the database (which may hold the data for
many clients)
- You can change your data schema without a client ever knowing, and no
application updates are required.
- The data storage method is hidden from potential hackers.
- Much more...
Jeff M.
Post by Bob Sneidar
Hi Jan.
Is accessing your database directly from a remote location taboo? I'm
developing an app that does exactly that!
Bob
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
_______________________________________________
use-revolution mailing list
http://lists.runrev.com/mailman/listinfo/use-revolution
Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
Andre Garzia
2010-08-04 21:53:53 UTC
Permalink
Devin,

Database communications such as SQL queries and logins should never cross
networks. If the database server is running at a given host, then use a cgi
at the same host as middleware to talk to it.

:D
Post by Devin Asay
Jan, Jeff, Andre,
So is it okay to have irev scripts that are on the same server as the DB
make the requests? Or are you just saying you should only submit DB queries
from localhost? (In MySQL terms, the difference between 'localhost' access
and '%' access, for example.)
Of course, when doing DB access from Rev standalone apps, the only way it
can be done is if the DB allows non-local access, through some port. If I
understand you correctly, you're saying it is a Bad Idea to have an irev or
php script query a DB from another server.
Just trying to make sure I understand the context. I'm a desktop guy who is
doing more and more with revServer and the web environment, and I'd like to
avoid having my server nuked.
Regards,
Devin
Post by Andre Garzia
Jan,
Will write a piece on this shortly, this is a big no no no!
my lib RevSpark was created to serve exactly that situation where you
need
Post by Andre Garzia
to be able to create simple CGIs that do not require complex views and
stuff. I created it specifically to serve as an easy way to built RESTful
services for database interaction.
http://hg.andregarzia.com/revspark
:D
Post by Jeff Massung
Never, ever, ever do this. ;-)
It's basically asking for someone to hack - and nuke - your database out
from under you. You never want to connect to it remotely, and you never want
to send SQL commands to it directly. Use an intermediate layer.
For example, have a server process that accepts incoming connections and
[indirect] commands that will end up modifying the database. But that
- Logins + permissions
- DOS attack checks
- Ensure validity of actions
- Much more...
The 3rd one there is probably most important. Instead of having a remote app
send direct SQL commands to a remotely hosted database, you create
action
Post by Andre Garzia
Post by Jeff Massung
commands that end up performing the correct SQL under-the-hood.
- Clients have no direct access to the database (which may hold the data for
many clients)
- You can change your data schema without a client ever knowing, and no
application updates are required.
- The data storage method is hidden from potential hackers.
- Much more...
Jeff M.
Post by Bob Sneidar
Hi Jan.
Is accessing your database directly from a remote location taboo? I'm
developing an app that does exactly that!
Bob
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
_______________________________________________
use-revolution mailing list
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
Devin Asay
2010-08-04 22:46:39 UTC
Permalink
Thanks for the reply, Andre. While I've been doing simple HTML and web stuff for years, I'm still relatively new to the world of server-side apps and server scripting.
Post by Andre Garzia
Database communications such as SQL queries and logins should never cross
networks. If the database server is running at a given host, then use a cgi
at the same host as middleware to talk to it.
So SQL queries to DB servers, such as you can easily do from Rev stacks are inherently insecure? I've been doing this for years, so why am I even still alive!? ;-)

Don't get me wrong; I have no reason to doubt your judgment. I'm just surprised I've never heard this before. (Or maybe never paid attention.) The ability to access online DBs is touted as a major feature of the Rev desktop product, and I make heavy use of it.

What is the core issue--that when you send DB queries across network boundaries you're sending clear text? Does that mean if I use encryption or SSL in conjunction with DB calls I'm okay?

Sorry to belabor the question. I just want to make sure I understand so I can limit my exposure to risk. I know how to do DB calls from irev scripts on localhost, so I can easily avoid a potential security hole.

Thanks,

Devin


Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
Jeff Massung
2010-08-04 22:07:52 UTC
Permalink
This is a typical, safe setup:

1. Client web browser clicks a button on a web page.
2. Web server sends a command to a CGI script (running on the server).
3. CGI script makes a connection to the database and runs a SQL function.
4. The SQL function executes a transaction on the database.

#1 is executed at some random, remote, external machine.
#2 is the only socket action that actually takes place.
#3 is run through the localhost loopback device to gain DB access.
#4 is entirely done within the database server code.

Something you can usually do as a quick test is to set your database up so
that it only accepts connections from the localhost. Then run through all
your tests. If any of them fail, that's a point where you are trying to
access the database remotely and need to fix it.

HTH,

Jeff M.
Mark Wieder
2010-08-04 22:19:31 UTC
Permalink
Jeff-
Post by Jeff Massung
Something you can usually do as a quick test is to set your database up so
that it only accepts connections from the localhost. Then run through all
your tests. If any of them fail, that's a point where you are trying to
access the database remotely and need to fix it.
If memory serves here, I believe both MySQL and postgresql are locked
down to localhost only by default. You have to go out of your way to
do things in a non-secure manner.
--
-Mark Wieder
***@ahsoftware.net
Devin Asay
2010-08-04 22:53:04 UTC
Permalink
Post by Jeff Massung
1. Client web browser clicks a button on a web page.
2. Web server sends a command to a CGI script (running on the server).
3. CGI script makes a connection to the database and runs a SQL function.
4. The SQL function executes a transaction on the database.
#1 is executed at some random, remote, external machine.
#2 is the only socket action that actually takes place.
#3 is run through the localhost loopback device to gain DB access.
#4 is entirely done within the database server code.
Something you can usually do as a quick test is to set your database up so
that it only accepts connections from the localhost. Then run through all
your tests. If any of them fail, that's a point where you are trying to
access the database remotely and need to fix it.
Thanks, Jeff! This is a really useful outline that I can easily adapt to my work. I assume you'd want to follow this procedure even if you are making the db requests directly from a stack? So send a get or post http request to the CGI script from the stack and then process the returned data in the stack?

Devin

Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
Jeff Massung
2010-08-04 23:03:39 UTC
Permalink
Post by Devin Asay
Thanks, Jeff! This is a really useful outline that I can easily adapt to my
work. I assume you'd want to follow this procedure even if you are making
the db requests directly from a stack? So send a get or post http request to
the CGI script from the stack and then process the returned data in the
stack?
Exactly. Then you get to begin worrying about network traffic security
(read: SSL). ;-)

Jeff M.
Bob Sneidar
2010-08-05 16:33:58 UTC
Permalink
I suppose on a server that was unrestricted, that would be true. But I set it up to only accept connections from the IP's I want, and I have strong passwords protecting it. Wouldn't that be enough?

Bob
Post by Jeff Massung
Never, ever, ever do this. ;-)
It's basically asking for someone to hack - and nuke - your database out
from under you. You never want to connect to it remotely, and you never want
to send SQL commands to it directly. Use an intermediate layer.
Terry Judd
2010-08-04 22:50:38 UTC
Permalink
So, just to be clear, are you guys saying that even in a desktop app it's
not safe to use revDB calls to a networked mySQL server and that all
database calls should be done via PHP (or whatever)?

Regards,

Terry...
Post by Devin Asay
Jan, Jeff, Andre,
So is it okay to have irev scripts that are on the same server as the DB make
the requests? Or are you just saying you should only submit DB queries from
localhost? (In MySQL terms, the difference between 'localhost' access and '%'
access, for example.)
Of course, when doing DB access from Rev standalone apps, the only way it can
be done is if the DB allows non-local access, through some port. If I
understand you correctly, you're saying it is a Bad Idea to have an irev or
php script query a DB from another server.
Just trying to make sure I understand the context. I'm a desktop guy who is
doing more and more with revServer and the web environment, and I'd like to
avoid having my server nuked.
Regards,
Devin
Post by Andre Garzia
Jan,
Will write a piece on this shortly, this is a big no no no!
my lib RevSpark was created to serve exactly that situation where you need
to be able to create simple CGIs that do not require complex views and
stuff. I created it specifically to serve as an easy way to built RESTful
services for database interaction.
http://hg.andregarzia.com/revspark
:D
Post by Jeff Massung
Never, ever, ever do this. ;-)
It's basically asking for someone to hack - and nuke - your database out
from under you. You never want to connect to it remotely, and you never
want
to send SQL commands to it directly. Use an intermediate layer.
For example, have a server process that accepts incoming connections and
[indirect] commands that will end up modifying the database. But that
- Logins + permissions
- DOS attack checks
- Ensure validity of actions
- Much more...
The 3rd one there is probably most important. Instead of having a remote
app
send direct SQL commands to a remotely hosted database, you create action
commands that end up performing the correct SQL under-the-hood.
- Clients have no direct access to the database (which may hold the data
for
many clients)
- You can change your data schema without a client ever knowing, and no
application updates are required.
- The data storage method is hidden from potential hackers.
- Much more...
Jeff M.
Post by Bob Sneidar
Hi Jan.
Is accessing your database directly from a remote location taboo? I'm
developing an app that does exactly that!
Bob
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution
--
http://www.andregarzia.com All We Do Is Code.
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your subscription
http://lists.runrev.com/mailman/listinfo/use-revolution
Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your subscription
http://lists.runrev.com/mailman/listinfo/use-revolution
--
Dr Terry Judd | Senior Lecturer in Medical Education
Medical Education Unit
Melbourne Medical School
The University of Melbourne
Jan Schenkel
2010-08-05 05:30:41 UTC
Permalink
Hi Bob et al,

Jeff and others have already given the most important reason: security. It is one less place for the hackers to try and crack open (I recall a huge problem with MS SQLServer back in 2000 where one of its open ports allowed a virus to spread and bring servers to a grinding halt).

Closely related is encapsulation: if someone can find his way into the database port and run arbitrary queries, that person can not only steal information, but can cripple everything by deleting or maiming data.

Compatibility between versions is also much easier to accomplish: updates to the database schema won't cause headaches because one person is using an older version of the client application. Bonus points for using "go stack url" to fetch the latest version of the client app from the same server.

Performance is another important factor: the closer your business logic is to the data, the faster things can run. Ideally the logic is on the same server, in which case some database drivers use shared memory to increase performance; if it is running on another server closeby, you could improve merformance even further by connecting the two servers directly.

Oh, and whatever you do, don't just make a cgi that simply executes whatever string comes over the internet as an SQL query. It's already bad enough that we have to deal with SQL injection into forms, so don't make it worse - here's a nice cartoon to explain: <http://xkcd.com/327/>

Now, it is tempting to just scale up your single-user SQLite database access to a MySQL server on your own local network. And if you get away with it, you may even want to use the same method to go from local network to the internet by simply moving the database.

But to handle concurrent data changes correctly, you're going to have to make changes to how you approach your database records (e.g. use optimistic locking with versioning). And at that point you ought to step back and see what else can be improved.

Cheers,

Jan Schenkel
=====
Quartam Reports & PDF Library for Revolution
<http://www.quartam.com>

=====
"As we grow older, we grow both wiser and more foolish at the same time." (La Rochefoucauld)
Post by Bob Sneidar
Hi Jan.
Is accessing your database directly from a remote location
taboo? I'm developing an app that does exactly that!
Bob
Post by Jan Schenkel
Excellent article, Andre - perhaps you should expand
it with a stern warning for people who want to access their
remote database directly, rather than going througha cgi?
That's also one of those coming-from-the-desktop practices
that need to be taken care of once and for all, IMO :-)
Post by Jan Schenkel
Jan Schenkel.
Loading...