Introducing the Factor database library
Saturday, October 18, 2008
Background
One of the first libraries I wrote in Factor was a binding to PostgreSQL
library in May 2005. Factor makes it incredibly easy to bind to C
functions – just add a FUNCTION:
declaration and call the C function.
For example,
( scratchpad ) FUNCTION: int getuid ( ) ;
( scratchpad ) getuid .
0
At about the same time, Chris Double wrote a SQLite binding and a high-level library he called tuple-db. It had some good ideas, such as query-by-example and prepared SQL statements for better security.
Earlier this year, I took both of these libraries and merged them to come up with the the current database library. The supported backends are PostgreSQL and SQLite and there are protocols for implementing other backends with relative ease. Contributions are welcome.
Overview
Briefly, Factor tuples correspond one-to-one to tables in the database
through a mapping defined with the define-persistent
word. Tuples are
then manipulated through insert, update, delete, and select words. Upon
calling the insert word, all of the filled-in slots in a tuple will be
saved to the database. The process is reversible, and by filling in
slots for a tuple and calling select-tuples
, the libary generates a
select statement from the filled-in slots (query-by-example) and returns
tuples in a sequence. More advanced queries and lower level raw SQL
statements are possible as well.
First, we’ll look at how to connect to a database.
Connecting to a database
Every database has its own setup which Factor encapsultes in a tuple.
SQLite just needs a path to a file on disk, but since PostgreSQL has a
server/client model, the setup is more complex. After making your
database tuple, connecting works the same for any database by using the
with-db
word. This word takes a quotation (a block of code) and your
tuple object, then calls the database open routine and your quotation.
After running your quotation, the database is closed, even if your code
throws an exception. Having an interface like this means you can simply
swap out your SQLite database for a networked PostgreSQL one if you
suddenly need more scalability or networked database access.
You should generally make a custom combinator with your project’s connection information. Here are a couple of examples.
SQLite example combinator:
USING: db.sqlite db io.files ;
: with-sqlite-db ( quot -- )
"my-database.db" temp-file <sqlite-db> swap with-db ; inline
PostgreSQL example combinator:
USING: db.postgresql db ;
: with-postgresql-db ( quot -- )
<postgresql-db>
"localhost" >>host
5432 >>port
"user" >>username
"seeecrets?" >>password
"factor-test" >>database
swap with-db ; inline
To make sure your database connection works, you can test it with an empty quotation:
[ ] with-postgresql-db
If that line of code doesn’t throw an error, you can safely assume connecting to the database worked.
Defining persistent tuples
The highest-level database abstraction Factor offers relates tuples directly to database tables. Tuples must map one-to-one to a database table, but foreign keys to other tables are allowed.
Primary keys
Each tuple needs a primary key for indexing by the database. The primary key can be an increasing integer assigned by the database (a +db-assigned-id+), an object assigned by the user (a +user-defined-id+), a random number, or even a compound key consisting of multiple values used together as a key.
Database defined primary keys are automatically set on the tuple after
insertion. While SQLite makes this feature easy to implement with the
sqlite3_last_insert_rowid
library call, PostgreSQL lacks such a
feature and instead, in the backend, the inserts are done through a SQL
function that queries the most recently inserted row.
User-assigned ids can be anything that the programmer knows will be unique for each object in the table. The same is true for compound keys, but only one of the values has to differ in this case for it to be unique.
Sometimes, a randomly generated id is useful, and the database library makes it easy to associate a tuple with its random key. By default, it generates a 64-bit random integer and in the unlikely event that it collides with an existing entry, it tries again up to ten times. The random integer is then set in the primary key slot of the tuple.
Database types
Data should have a type to allow the database to optimize its queries and storage. The supported data types are booleans, varchars, text, integers, big-integers, doubles, reals, timestamps, byte-arrays, URLs, and Factor blobs, which store arbitrary Factor objects. Types can have default values, unique qualifiers, and not-null restrictions. The database framework does all of the marshalling of objects for you transparently.
Creating and dropping tables
There are quite a few options when it comes to creating tables. The most
basic is the create-table
word. The problem is that it throws an
exception when the table exists, which happens often. So one alternative
is ensure-table
– we make sure a table exists and silently ignore
errors. However, if we’re just developing an application and we want to
make sure the table is always the latest version of the code, we might
use recreate-table
, which drops and creates the table without throwing
errors, and of course drops all the data with it. To simply drop a
table, use the drop-table
word.
Inserting tuples
Tuples are inserted one at a time with the insert-tuple
word. SQL
insert commands are generated directly from the Factor object and its
filled-in slots. An example will follow.
Selecting tuples
Select statements are generated from exemplar tuples, or tuples with
certain slots filled in with any value besides f
. Passing an empty
exemplar tuple will select all tuples from the table and return them as
a sequence. A useful feature we support is querying by ranges,
sequences, or intervals, as shown in following demonstration.
Exams demo
About the simplest example of interest is one that matches students names with their grades on a particular exam. Here’s aa exam tuple, its mapping to the database, and a utility word to generate random exam objects.
USING: math.ranges random db.types ;
TUPLE: exam id name score ;
exam "EXAM" {
{ "id" "ID" +db-assigned-id+ }
{ "name" "NAME" TEXT }
{ "score" "SCORE" INTEGER }
} define-persistent
: random-exam ( -- exam )
f
6 [ CHAR: a CHAR: z [a,b] random ] replicate >string
100 [0,b] random
exam boa ;
I’ll use a trick for opening a database to use it interactively without
putting a with-db
wrapper around every call.
"my-database.db" temp-file <sqlite-db> db-open db set
Note that the database handle should be cleaned up with db get dispose
if you don’t want to leak memory.
Let’s create the table and add some random exams to the database:
exam create-table
25 [ random-exam insert-tuple ] times
Now to demonstrate selects. You can select all the exams:
T{ exam } select-tuples .
{
T{ exam { id 1 } { name "qklkzk" } { score 38 } }
T{ exam { id 2 } { name "feeuwv" } { score 38 } }
T{ exam { id 3 } { name "hlzwuu" } { score 51 } }
T{ exam { id 4 } { name "liiptp" } { score 52 } }
T{ exam { id 5 } { name "mwzlmv" } { score 74 } }
...
}
Ranges can be used with any datatype that makes sense, like integers or timestamps.
A range of passing exams:
T{ exam } 70 100 [a,b] >>score select-tuples .
{
T{ exam { id 5 } { name "mwzlmv" } { score 74 } }
T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
T{ exam { id 11 } { name "rorbyd" } { score 83 } }
T{ exam { id 16 } { name "ttvkar" } { score 78 } }
T{ exam { id 17 } { name "nnzkvs" } { score 99 } }
T{ exam { id 21 } { name "izoiqi" } { score 83 } }
}
You can search for specific grades by using a sequence:
T{ exam } { 10 20 30 40 50 60 70 80 90 100 } >>score select-tuples .
{
T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
T{ exam { id 9 } { name "bdoztd" } { score 30 } }
T{ exam { id 20 } { name "lnmxfq" } { score 60 } }
}
An interval query. Nobody should have above a 100:
USE: math.intervals T{ exam } 100 1/0. (a,b) >>score select-tuples .
{ }
Let’s order by the grade:
<query> T{ exam } >>tuple "score" >>order select-tuples .
...
T{ exam { id 21 } { name "izoiqi" } { score 83 } }
T{ exam { id 6 } { name "ftxhuw" } { score 90 } }
T{ exam { id 17 } { name "nnzkvs" } { score 99 } }
}
Or by (random-generated) name:
<query> T{ exam } >>tuple "name" >>order select-tuples .
{
T{ exam { id 13 } { name "aagnof" } { score 61 } }
T{ exam { id 36 } { name "ailftz" } { score 41 } }
T{ exam { id 9 } { name "bdoztd" } { score 30 } }
...
Updates and deletes are also by example and may use sequences and
intervals in the ‘where’ clause in the same way as the selects with the
update-tuples
and delete-tuples
words.
Raw SQL
You can drop down into SQL if you want to do anything complex or things that are not supported by the library. A drawback is that the SQL you write may not work across SQL databases. Notice that, for a select, you have to convert the data to Factor tuples yourself since the data is returned as an array of strings.
"select * from exam where name like '%a%'" sql-query .
{
{ "8" "rxoyga" "53" }
{ "10" "fchhha" "1" }
{ "13" "aagnof" "61" }
{ "16" "ttvkar" "78" }
{ "22" "yhqkav" "28" }
}
Raw SQL that does not return results is possible too:
"delete from exam where name like '%a%'" sql-command
"select * from exam where name like '%a%'" sql-query length .
0
Another tutorial
Here is a tutorial from the database documentation that shows moref basic usage of the library. If you run this from within the Factor environment itself, you can click on the code blocks and run them one by one without copy/pasting or typing them in yourself.
Real-world usage
The Factor Wiki uses the database library, as does Factor’s web framework, Furnace. Expect to see more useful websites and applications using it in the future.
Lastly, if anyone can suggest a catchy name for the database library, please let me know.