The objective was to select sessions from a table, that are older than two days.

Table setup

This is the definition:

CREATE TABLE `session` (
  `id` varchar(32) NOT NULL DEFAULT '',
  `data` text NOT NULL,
  `user` int(11) DEFAULT NULL,
  `created` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  KEY `user_id` (`user`),
  KEY `rec_datemod` (`updated`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Right and wrong

Wrong query:

SELECT * FROM session
WHERE DATE_ADD(`updated`, INTERVAL 2 DAY)< NOW()

Correct query:

SELECT * FROM session
WHERE `updated` < DATE_SUB(NOW(), INTERVAL 2 DAY)

You wonder why?

Executing the first query, MySQL will scan the entire table and calculate the date from each row. Then it will continue and compare the value to NOW() and return the row if it matches. This is somewhat (Not really!) OK until a certain amount of traffic on the table. In my case, I have 500,000 (five-hundred-thousand) active sessions (aka rows) in the table, which makes it slower and slower and slower.

Because of the full table scan, this will also effectively lock the table (even though it’s INNODB) and block it from further updates.

The second query (obviously) works around that and uses the KEY on updated.

Conclusion

The first lesson is to always use EXPLAIN!

Further more, I know some of you will shiver but phpMyAdmin is actually a pretty useful tool for these circumstances. The website stalled, you log into phpMyAdmin and figure out what’s running (“Processes” tab, when you’re logged in as a privileged account). If you’re a shell-ninja, just execute SHOW PROCESSLIST (in mysql) and push whatever runs the longest to EXPLAIN.

The slow query-log is also something you should read up on.