Using Transactions In MySQL (part 2)

Learn to avoid data corruption with MySQL's various isolation levels, and see examples of transaction usage at the application level.

Into The Fire

In the first part of this article, I introduced you to transactions with MySQL 4.0, explaining the theory behind the transactional model and showing you how to implement a transactional environment with MySQL's InnoDB tables. I also showed you how MySQL handles COMMITs and ROLLBACKs in different situations, and explained how to control automatic commits with the AUTOCOMMIT variable.

Thus far, I've been explaining transactions under the assumption that the database is being accessed by a single user at a time. This was done for purposes of simplicity in explanation; however, in the real world, such an assumption is unlikely to hold true. Therefore, in this concluding segment, I'll be exploring the MySQL transactional model in a multi-user scenario, illustrating some of the data corruption problems that raise their ugly heads in this environment and showing you how you can use MySQL's isolation levels to reduce their likelihood.

I'll also show you how to leverage off the transaction model to build more robust SQL applications, with a sample Perl/DBI application, and - in case you're still stuck using an older version of MySQL - illustrate how you can simulate transactions with non-transactional MyISAM tables. Keep reading, this is gonna be one wild ride!

Isolating Yourself

Let's begin with a simple example. While a transaction is in progress, open up a new client session and see if the changes made by the transaction are visible. Consider the following example, which illustrates by using two clients, A and B. A begins a transaction; while it is in progress, B attempts to view the changes being made by it before it has been committed.

clientA> START TRANSACTION;
Query OK, 0 rows affected (0.37 sec)

clientA> INSERT INTO USERS (name, pass) VALUES ('paul', PASSWORD('paul'));
Query OK, 1 row affected (0.38 sec)

clientA> SELECT * FROM users;
+----+------+------------------+
| id | name | pass             |
+----+------+------------------+
|  1 | paul | 29bb48b07ee5526b |
+----+------+------------------+
1 row in set (0.10 sec)

clientB> SELECT * FROM users;
Empty set (0.08 sec)

What you've just seen is an example of the isolation property in action. As noted in the first section of this article, isolation implies that the changes made by a transaction become visible only after the transaction has been committed. Thus, as the example above demonstrates, client B cannot see the transaction being executed by client A while it is in progress. However, once client A commits the transaction, client B can view the results.

clientA> COMMIT;
Query OK, 0 rows affected (0.00 sec)

clientB> SELECT * FROM users;
+----+------+------------------+
| id | name | pass             |
+----+------+------------------+
|  1 | paul | 29bb48b07ee5526b |
+----+------+------------------+
1 row in set (0.00 sec)

Isolation between different transactions is of tremendous importance in a multi-user environment. If the changes made by a transaction are visible while still in progress, other transactions may mistakenly use this data for calculations, or as the basis for future operations. If the transaction making the changes then rolls them back, calculations made on the basis of the old data will be invalid, and much confusion will reign. It is, therefore, extremely important that transactions be insulated from each other while still in progress.

There is, however, a flip side to this. The level of insulation between transactions is inversely correlated to their performance. The higher the isolation level, the more work MySQL has to do to keep users from seeing each other's modifications and the slower things run. For simple transactions that expose a low risk of data corruption, a lower isolation can speed things up without any significant increase in risk; on the other hand, a high isolation level is necessary for mission-critical applications (like banking) even at the cost of some degradation in performance.

The Three Rs

In order to understand the various isolation levels, it is necessary to first understand some of the problems that a low isolation level can cause. The ANSI specification defines three potential problems:

Dirty reads: A dirty read is said to occur if a transaction can view the changes made by another transaction before they have been committed.

Unrepeatable reads: An unrepeatable read is said to occur if a transaction sees different result sets from the same query while it is in progress. Typically, this occurs due to table UPDATEs committed by other transactions.

Phantom reads: A phantom read is a variant of an unrepeatable read. It occurs when a transaction sees a new row while it is in progress. Typically, this occurs due to new records INSERTed and committed by other transactions.

I'm not going to get into the details of each of these problems at this stage, you should take a look at the links at the end of this article for more information. Suffice to say that they do exist, and in large RDBMS, steps need to be taken to prevent them from corrupting the system with misleading data while simultaneously not causing too great an impact on performance.

In order to help database administrators achieve this optimal balance between transaction isolation and security, four different isolation levels are available, ranging from very secure to insecure; these four levels are defined in terms of the three problems described above.

READ UNCOMMITTED: The least secure isolation level, this permits dirty, unrepeatable and phantom reads.

READ COMMITTED: Offering more security than the READ UNCOMMITTED level, this level does not allow a transaction to see the uncommitted data of other transactions; however, it still allows unrepeatable and phantom reads.

REPEATABLE READ: The default isolation level in MySQL, this level eliminates dirty reads and unrepeatable reads. Phantom reads are still theoretically possible, but in reality are almost impossible to reproduce.

SERIALIZABLE: SERIALIZABLE eliminates phantom reads as well to offer the most secure isolation between transactions; it's also the slowest.

MySQL makes it possible to control the isolation level via the special TRANSACTION ISOLATION LEVEL variable. Consider the following examples, which illustrate:

mysql> SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Query OK, 0 rows affected (0.51 sec)

mysql> SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Query OK, 0 rows affected (0.51 sec)

You can obtain the current value of this variable at any time with a fast SELECT, as below:

mysql> SELECT @@tx_isolation;
+------------------+
| @@tx_isolation   |
+------------------+
| READ-UNCOMMITTED |
+------------------+
1 row in set (0.13 sec)

Peeping Tom

In order to illustrate, let's consider a simple example. Let's assume we have two separate transactions, started by two different clients, A and B. Client A is adding a new account to a table containing bank account balances, while client B is executing a query to obtain the sum of all account balances. At a high level of isolation, the record added by client A will not appear in the sum calculated by client B; at a lower level, it will.

Assuming an isolation level of REPEATABLE READ (the default), here's what things might look like:

As you can see in this example, even though client A's transaction has completed, client B cannot see the changes made by that transaction until its own transaction is complete. There is thus a high isolation level between transactions in this example, which ensures that clients do not read incorrect data.

Now, let's repeat the transaction with a lower isolation level - say, READ UNCOMMITTED.

The difference should now be glaringly obvious - client B can now see the changes made by client A even before client A commits its transaction. This is a classic dirty read, and it can seriously affect the accuracy of calculations due to the low isolation level between transactions.

The choice of isolation level is thus something that every developer working with transactions must give sufficient thought to, and an optimum level should be chosen for the specific application being developed.

Locks And Keys

Thus far, I've been working with the InnoDB and BDB table types, which natively support transactions. However, these two table types are relatively new to MySQL; in many situations, users are still limited to the older MyISAM table type, which does not support transactions and commits table changes immediately, with no mechanism for rollback in case of error. Implementing a transactional environment with such tables is, therefore, a challenging problem, and one which can only be solved - and that too partially - through the use of table locks.

In order to understand this, a little background is necessary. You've already seen that MySQL does not allow a session to view the changes made by other in-progress transactions to avoid data corruption and faulty calculations. MySQL accomplishes this by locking the rows being changed during a transaction. MySQL supports a number of different table types, and each one uses a different locking mechanism.

Table locks: Table locks apply to a table as a whole, and are used by MyISAM tables. While a table is locked by a client, other clients will not be able to write from it and (depending on the nature of the lock) may not even be able to read it.

Page locks: Page locks apply to a block of records in a table, and are used by BDB tables. When a client sets a page lock on a set of rows, other clients will not be able to access the locked rows, though they can still access other, unlocked rows within the same table.

Row locks: Row locks are set on a per-row basis, and are used by InnoDB tables. Though these locks are the most flexible, allowing maximum access to the rows in a table by multiple clients, they also require the maximum amount of system resources.

In BDB and InnoDB tables, the relatively more-precise locking mechanisms make it possible for multiple sessions to access the same tables without too many conflicts. Since MyISAM tables only support table locks, simulating a transactional environment with these tables usually implies the use of table locks to block more than one session from making changes to the table at a time.

Let's look at a quick example.

Nothing Like The Real Thing

MyISAM tables support two types of table locks: read locks and write locks. A read lock means that the table is available to all clients only for reads; a write lock means that the table is only available to the client creating the lock for reads and writes, while all other clients will be denied access.

Table locks are initiated through the use of the LOCK TABLES command, which may be followed by one or more table names and the type of lock needed. This is illustrated below:

mysql> LOCK TABLES users READ, groups WRITE;
Query OK, 0 rows affected (0.08 sec)

Tables are unlocked with the UNLOCK TABLES command.

mysql> UNLOCK TABLES;
Query OK, 0 rows affected (0.01 sec)

The UNLOCK TABLES command does not need to be given a list of tables to unlock, it automatically unlocks all tables locked with the previous LOCK TABLES command.

Let's look at a quick example of how you can use these two commands to simulate a transaction with MySQL:

mysql> LOCK TABLES users WRITE, groups WRITE, mailboxes WRITE;
Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO users (name, pass) VALUES ('paul', PASSWORD('hitme'));
Query OK, 1 row affected (0.06 sec)

mysql> INSERT INTO groups (uid, grp) VALUES (LAST_INSERT_ID(), 'operations');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO mailboxes (uid, host, mboxname, mboxpass) VALUES (LAST_INSERT_ID(), 'some.host.com', 'pop3', 'hitmeagain');
Query OK, 1 row affected (0.01 sec)

mysql> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

In this example, the LOCK TABLES command is used to lock all the tables needed by the "transaction" to successfully complete. Once the tables have been locked, SQL can be used to make changes to the tables without any fear that other sessions will interfere with the transaction. On successful completion, the tables are unlocked and other sessions can then view the changed data.

You'll notice that this type of simulated transaction does not include any recovery mechanism in case things go wrong - there is no way, for example, to ROLLBACK the changes made, and the durability of the transaction is also not as high as with InnoDB/BDB tables.

Holding Pattern

Obviously, this means that other users are blocked from the table while the transaction is in progress and the lock is active. This is clearly illustrated in the following snippet, which shows client B waiting until client A finishes with its tables and releases its locks.

This is the primary downside of using table locks to simulate a transactional environment with MyISAM tables - with long transactions, it can result in a significant speed reduction as different sessions "wait" for the session initiating the transaction to release its locks on various tables. Therefore, it's a good idea to always make sure that your transactions are small in size and execute quickly - or else you're gonna have one very annoyed DBA after you!

This type of table locking is typically used in legacy MySQL environments where the new transactional formats are not supported; new users who need transactions will typically head straight for the InnoDB format, which eliminates both the problems above.

Timberrrrrrrrrr!

Let's move on to durability, another of the ACID properties. As noted previously, durability implies that once a transaction has been completed, the changes it has made must continue to exist even in the event of a system failure. MySQL does this through the use of a "binary log", a log file that tracks the changes made to its tables and can be used to revert the system to an earlier state.

In order to activate MySQL binary logging, the MySQL daemon must be started with the special "--log-bin" parameter.

$ /usr/bin/safe_mysqld --log-bin &
Starting mysqld daemon with databases from /var/lib/mysql

Once binary logging is turned on, MySQL will create a log file (usually named with the machine's host name) in its data directory. Every change made to the various tables will be logged in this file, together with a timestamp. Here's an example of what the file might look like:

$ cat /var/lib/mysql/localhost-bin.001
þbinBl`?E3.23.54-logBl`?vl`?@testcreate table a (a int(10) not null)­®a?/create database masterË®a?xmastercreate table users (uid int(4), uname varchar (200) not null, pass varchar(200) not null)Ù®a?Smasterinsert into users values (1, 'joe'

Doesn't make much sense, huh? Keep reading...

In the event of a system crash, the binary log can be used to revert the system to the state it was in prior to the crash. This is accomplished by means of the "mysqlbinlog" utility that ships with MySQL, which is used to read the binary log and display its contents in a more-readable format. Take a look:

$ mysqlbinlog /var/lib/mysql/localhost-bin.001
# at 4
#030911 18:06:18 server id  1   Start: binlog v 1, server v 3.23.54-log created 030911 18:06:18
# at 73
#030911 18:07:10 server id  1   Query   thread_id=2     exec_time=0     error_code=0
use test;
SET TIMESTAMP=1063283830;
create table a (a int(10) not null);
# at 137
#030912 17:01:57 server id  1   Query   thread_id=3     exec_time=0     error_code=0
SET TIMESTAMP=1063366317;
create database master;
# at 184
#030912 17:02:27 server id  1   Query   thread_id=3     exec_time=0     error_code=0
use master;
SET TIMESTAMP=1063366347;
create table users (uid int(4), uname varchar (200) not null, pass varchar(200) not null);
# at 304
#030912 17:02:41 server id  1   Query   thread_id=3     exec_time=0     error_code=0
SET TIMESTAMP=1063366361;
insert into users values (1, 'joe', password('joe'));
# at 387
#030912 17:03:35 server id  1   Stop

This output, which is in the form of regular SQL commands, can be piped to the "mysql" command-line client to reproduce the exact state the system was in before it went down, as below:

$ mysqlbinlog /var/lib/mysql/localhost-bin.001 | mysql

Perl Of Wisdom

You should now know a little bit about how transactions work at the SQL command prompt. This is great for learning purposes; however, in the real world, it's unlikely that your application will be interacting with MySQL through a command prompt. With that in mind, let's look at a simple application, written in Perl, that implements a transaction using Perl DBI methods.

#!/usr/bin/perl

# load module
use DBI();

# variables for the transaction
# assume these have come from standard input
$name = "ian";
$pass = "storm";
@groups = ("hr", "admin");
$mhost = "my.pop.server";
$muser = "my.name";
$mpass = "my.pass";

# connect
my $dbh = DBI->connect("DBI:mysql:database=master;host=192.168.0.241", "root", "secret", {RaiseError => 1, AutoCommit => 0});

# place the transaction in an eval{} block
# so that errors can be trapped for rollback
eval
{
        # insert user record
        $dbh->do("INSERT INTO users (name, pass) VALUES ('$name', '$pass')");

        # get ID
        $id = $dbh->{'mysql_insertid'};

        # insert group memberships
        foreach $g (@groups)
        {
                $dbh->do("INSERT INTO groups (uid, grp) VALUES ('$id', '$g')");
        }

        # insert mailbox data
        $dbh->do("INSERT INTO mailboxes (uid, host, mboxname, mboxpass) VALUES ('$id', '$mhost', '$muser', '$mpass')");

        # got this far means no errors
        # commit
        $dbh->commit();
};

# if any errors
# rollback
if ($@)
{
        print "Transaction aborted: $@";
        $dbh->rollback();
}

# clean up
$dbh->disconnect();

The transaction in question here is the same as before - adding a user to the system - only the method differs. The first step is to connect to the database using the connect() method. Note the addition of the AutoCommit parameter to connect() - as explained in the previous segment of this tutorial, this tells MySQL to turn off automatic commits, and turns every query into a transaction that must be explicitly committed in order for it to be saved.

Once a connection has been opened, standard INSERT statements are used to insert the new user's data into the system. These statements are enclosed in an eval{} exception handling block, so that errors, if any, are trapped and escalated upwards to the main program. An error in the eval{} block will be stored in the special $@ variable, and caught by the following "if" block, which rolls the entire transaction back with rollback(). If there are no errors, the transaction will be committed with commit().

End Work

And that's about all we have time for today. Over the last few pages, I took you deeper into the world of MySQL transactions, explaining how the transaction isolation level can affect the integrity of your transactions in a multi-user environment. I showed you how to control the isolation level, and also demonstrated the impact it has with a simple (and very likely) example.

Next, I explained how MySQL implements isolation through the use of locks, and demonstrated how to use this knowledge to implement a transaction with non-transactional tables like the MyISAM format. I explained the difference between read and write locks, showed you how a simulated transaction works, and gave you a quick rundown on the drawbacks of this approach: long wait times, no rollback mechanism and no true durability.

Finally, after a brief detour into the MySQL binary log, I wrapped things up with a sample application that demonstrated how transactions can be used at the application level - a program to perform a transaction, detect errors if any and either roll it back or commit it to the system. This program was written in Perl; however, it's fairly easy to write equivalent code in PHP, Python or any other language.

There's a lot more to MySQL transactions than what you've just learnt - but this will suffice to get you going. In case you'd like to learn more, I'd recommend the following links:

MySQL transaction commands, at http://www.mysql.com/doc/en/Transactional_Commands.html

MySQL table types, at http://www.mysql.com/doc/en/Table_types.html

The InnoDB transaction model, at http://www.mysql.com/doc/en/InnoDB_transaction_model.html

MySQL table locking, at http://www.mysql.com/doc/en/Locking_methods.html

The MySQL binary log, at http://www.mysql.com/doc/en/Binary_log.html

Until next time...take care!

Note: All examples in this article have been tested on MySQL 4.0.14. Examples are illustrative only, and are not meant for a production environment. Melonfire provides no warranties or support for the source code described in this article. YMMV!

This article was first published on12 Sep 2003.