Discussion:
relative performance of two dimensional array and in-memory sqlite database
Dr. Hawkins
2013-07-25 15:14:26 UTC
Permalink
Searching through the archives, I found what would have been my next
question. The answer being that using ":memory:" as the filename
opens an sqlite database in memory rather than disk.

That said, has anyone ever looked into the relative performance of
caching data into two-dimensional arrays, and into in-memory sqlite
when it comes time to search?

If I want to do something, say, for all the rows in which "squidget >
5", I could loop through ary[row][squidget] and check every one, or I
could "SELECT FROM table WHERE squidget>5 ORDER BY sqName".

It would seem to make sense that sqlite would be better optimized for
such things than a loop. (in fact, much of my data manipulation would
be easier in SQL with WHERE than in memory).

And, oh happy day, I actually wrote my code so that my data isn't
manipulated directly, but only through setVal and getVal(), so that I
can easily rip out and replace . . .
--
Dr. Richard E. Hawkins, Esq.
(702) 508-8462
Richard Gaskin
2013-07-25 15:32:35 UTC
Permalink
Post by Dr. Hawkins
Searching through the archives, I found what would have been my next
question. The answer being that using ":memory:" as the filename
opens an sqlite database in memory rather than disk.
That said, has anyone ever looked into the relative performance of
caching data into two-dimensional arrays, and into in-memory sqlite
when it comes time to search?
If I want to do something, say, for all the rows in which "squidget >
5", I could loop through ary[row][squidget] and check every one, or I
could "SELECT FROM table WHERE squidget>5 ORDER BY sqName".
It would seem to make sense that sqlite would be better optimized for
such things than a loop. (in fact, much of my data manipulation would
be easier in SQL with WHERE than in memory).
I've benchmarked various LC-based disk stores (simple indexed files) to
SQLite on disk, but not in memory.

Benchmark fetishist that I am, I would of course be interested in any
definitive findings related to this.

My hunch is that you'll find a lot of "depends", in which certain types
of searches in certain sizes of colunns across certain ranges of record
numbers are faster in SQLite, others faster with LC arrays, and some
faster with simple chunk expressions.

I'm frequently amazed at how well "repeat for each..." with collected
results collated with "put...after" performs relative to alternatives.

When we consider the various hash table jumps and offset moves that
SQLite and arrays need to make to access data, this is perhaps less
surprising.

For single-access calls, arrays are hard to beat. But for aggregate
operations across an entire data set, like a search, "repeat for
each..." performs quite admirably.

That said, SQLite offers indexing options that can greatly reduce the
search space, so for columns with high carindality I'd be surprised if
any scripted solution could beat it.

But if you need free text searches, I'd wager chunk expressions would
perform roughly on par with SQLite, possibly besting it, in at least
some cases (where lines aren't particularly long, or have more than say
a couple dozen columns).

--
Richard Gaskin
Fourth World
LiveCode training and consulting: http://www.fourthworld.com
Webzine for LiveCode developers: http://www.LiveCodeJournal.com
Follow me on Twitter: http://twitter.com/FourthWorldSys
Mike Kerner
2013-07-25 16:52:15 UTC
Permalink
I would pick the method that's the easiest to implement and try it, first.
I would rather just access the DB instead of trying to load everything into
containers, so I indexed the relevant fields, and have found that even for
tens-of-thousands of records, doing a compound LIKE (with wildcards) query
on multiple fields, it's giggle-fast. If you're using mobile, everything
is technically in memory anyway, since your storage is on an SSD.
Post by Richard Gaskin
Post by Dr. Hawkins
Searching through the archives, I found what would have been my next
question. The answer being that using ":memory:" as the filename
opens an sqlite database in memory rather than disk.
That said, has anyone ever looked into the relative performance of
caching data into two-dimensional arrays, and into in-memory sqlite
when it comes time to search?
If I want to do something, say, for all the rows in which "squidget >
5", I could loop through ary[row][squidget] and check every one, or I
could "SELECT FROM table WHERE squidget>5 ORDER BY sqName".
It would seem to make sense that sqlite would be better optimized for
such things than a loop. (in fact, much of my data manipulation would
be easier in SQL with WHERE than in memory).
I've benchmarked various LC-based disk stores (simple indexed files) to
SQLite on disk, but not in memory.
Benchmark fetishist that I am, I would of course be interested in any
definitive findings related to this.
My hunch is that you'll find a lot of "depends", in which certain types of
searches in certain sizes of colunns across certain ranges of record
numbers are faster in SQLite, others faster with LC arrays, and some faster
with simple chunk expressions.
I'm frequently amazed at how well "repeat for each..." with collected
results collated with "put...after" performs relative to alternatives.
When we consider the various hash table jumps and offset moves that SQLite
and arrays need to make to access data, this is perhaps less surprising.
For single-access calls, arrays are hard to beat. But for aggregate
operations across an entire data set, like a search, "repeat for each..."
performs quite admirably.
That said, SQLite offers indexing options that can greatly reduce the
search space, so for columns with high carindality I'd be surprised if any
scripted solution could beat it.
But if you need free text searches, I'd wager chunk expressions would
perform roughly on par with SQLite, possibly besting it, in at least some
cases (where lines aren't particularly long, or have more than say a couple
dozen columns).
--
Richard Gaskin
Fourth World
LiveCode training and consulting: http://www.fourthworld.com
Webzine for LiveCode developers: http://www.LiveCodeJournal.com
Follow me on Twitter: http://twitter.com/**FourthWorldSys<http://twitter.com/FourthWorldSys>
______________________________**_________________
use-livecode mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>
--
On the first day, God created the heavens and the Earth
On the second day, God created the oceans.
On the third day, God put the animals on hold for a few hours,
and did a little diving.
And God said, "This is good."
Dr. Hawkins
2013-07-25 17:43:39 UTC
Permalink
Post by Mike Kerner
I would pick the method that's the easiest to implement and try it, first.
I initially did that--then changed to the array, as remote db access
turned out to have expensive latency.

So I have the remote DB, and also need a local cache of that to
operate on (the dataset will always be small enough, unless someone
tries GM's next bankruptcy on this . . . )
Post by Mike Kerner
I would rather just access the DB instead of trying to load everything into
containers, so I indexed the relevant fields, and have found that even for
tens-of-thousands of records, doing a compound LIKE (with wildcards) query
on multiple fields, it's giggle-fast.
Thanks. The DB would be easier in so many ways, and my gut says it
will be faster if I go back to it, but . . .
--
Dr. Richard E. Hawkins, Esq.
(702) 508-8462
Mike Kerner
2013-07-25 18:43:31 UTC
Permalink
Then what you might want to do is either use ODBC or your own method to
mirror the remote db locally. Then the rest of your code stays the same
and if the remote issue is resolved, you're grinning.
Post by Mike Kerner
Post by Mike Kerner
I would pick the method that's the easiest to implement and try it,
first.
I initially did that--then changed to the array, as remote db access
turned out to have expensive latency.
So I have the remote DB, and also need a local cache of that to
operate on (the dataset will always be small enough, unless someone
tries GM's next bankruptcy on this . . . )
Post by Mike Kerner
I would rather just access the DB instead of trying to load everything
into
Post by Mike Kerner
containers, so I indexed the relevant fields, and have found that even
for
Post by Mike Kerner
tens-of-thousands of records, doing a compound LIKE (with wildcards)
query
Post by Mike Kerner
on multiple fields, it's giggle-fast.
Thanks. The DB would be easier in so many ways, and my gut says it
will be faster if I go back to it, but . . .
--
Dr. Richard E. Hawkins, Esq.
(702) 508-8462
_______________________________________________
use-livecode mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-livecode
--
On the first day, God created the heavens and the Earth
On the second day, God created the oceans.
On the third day, God put the animals on hold for a few hours,
and did a little diving.
And God said, "This is good."
Dr. Hawkins
2013-07-25 23:10:21 UTC
Permalink
Post by Mike Kerner
Then what you might want to do is either use ODBC or your own method to
mirror the remote db locally. Then the rest of your code stays the same
and if the remote issue is resolved, you're grinning.
I'm actually developing my own, with queries to update to/from the
remote to keep them in sync.

They're small enough to load in from a remote when opened (right now,
into the array).
--
Dr. Richard E. Hawkins, Esq.
(702) 508-8462
Loading...