Flags and Lollipops

Friday, January 11, 2008

LAMP performance for dummies (e.g. me)

I've spent the last six months combing Google for 'slow mysql' results. It was enlightening. Now in the spirit of giving back to the community I give you:

Euan's Top Tips For Medium Sized LAMP Powered Web Apps



Where 'medium sized' means you have four or five concurrent users on ten tables with around half a million rows each and 'top' means that you've already done all the basic stuff - picking table types, adding indexes, designing the database properly in the first place etc.

Some are common sense. Some are not appropriate for all situations. Use at your own risk.


  • If most operations are reads turn the MySQL query cache on.

  • Don't use mysqldump for backups. Users don't like five minutes of being unable to write anything to the database because the tables are locked. Use mysqlhotcopy or an LVM based system if you're feeling adventurous.

  • If you're retrieving large amounts of data with PHP but you don't need the entire rowset in memory at once (or mysql_num_rows()) then use mysql_unbuffered_query instead of mysql_query. Stops you running out of memory so fast.

  • xdebug is a PHP profiler. With kcachegrind it's like hot coder porn. It's incredibly useful to see how long your code takes to run and why.

  • in_array()s are innocent looking... and very slow. If the values are unique then use associative arrays instead - i.e. instead of


    $data = array("a", "b" .. large amount of data .. "zzz");

    for ($i=0; $i < 1000; $i++) {
    print "is c in array? ".in_array("c", $data);
    }


    use


    $data = array("a", "b" .. large amount of data .. "zzz");
    $a_data = array();
    foreach ($data as $point) {$a_data[$point] = true;}

    for ($i=0; $i < 1000; $i++) {
    print "is c is array? ".isset($a_data["c"]);
    }


  • key_buffer_size is the amount of memory that MySQL puts aside to keep table indexes in memory, which is a very good thing. As a rule of thumb it should be 25%-50% of your database server's total RAM (reduce this if you're also using it for other things, natch). The default is 8mb, so once you've got more than 8mb worth of indexes...

    How can you tell if you need a bigger key buffer size?


    mysql> SHOW STATUS LIKE '%key_read%';
    +-------------------+---------+
    | Variable_name | Value |
    +-------------------+---------+
    | Key_read_requests | 6375479 |
    | Key_reads | 130562 |
    +-------------------+---------+
    2 rows in set (0.00 sec)


    There should be at least 100 key_read_requests (from memory) to every key_reads (from disk), as above.

  • Do a lot of sorting? Increase sort_buffer_size but bear in mind that unlike key_buffer_size it is allocated per connection - i.e. increase it to 32mb and MySQL will gobble 32mb x 4 = 128mb the next time you have four connections open at once (also remember that your code will - probably - sometimes fail or forget to shut down the connection).

  • Do a lot of ORDER BYs? Increase read_rnd_buffer_size. Allocated per connection, as above.

  • Fetching frequently used data from disk can sometimes be faster than using memcached or similar (no, really. Far less overhead + the file is cached by OS and so is in memory anyway).

  • serialize() and unserialize() are slow. Use JSON where possible instead (nowhere near as flexible, sadly)

  • To paraphrase the best optimization is avoiding doing the work in the first place: do you really need to return ALL of the rows, or just the first few hundred? Do you really need to sort them? Do they need to be distinct? Is it faster to get more data than you need and process it in code?

  • Avoid ORDER BY RAND(). Do something clever in code (pick some randomly selected ids?)

  • SQL_CALC_FOUND_ROWS can be faster than rerunning a query with COUNT() - but not always.

Comments and trackbacks Feel free to post your comments Blogger armish Blogger Egon Willighagen Anonymous Stew Anonymous chielsen Anonymous Peter Goodman Anonymous Adam Anonymous Adam Anonymous Tiago Serafim Anonymous Anonymous Blogger jjsonp Anonymous Stew OpenID faisalferoz Blogger "C" Anonymous Anonymous . This post has trackbacks.

Trackbacks:

14 Comments:

At January 12, 2008 8:29 PM, Blogger armish said...

Fascinating and inspiring... thanks, +bookmark.

 
At January 13, 2008 8:16 AM, Blogger Egon Willighagen said...

Thanx, Chemical blogspace has serious performance issues, and I had to delete a lot of old blog items...

 
At January 13, 2008 12:01 PM, Anonymous Stew said...

Yeah, should really practice what I preach and check the openreview code... that's missing correct indexing, I think (didn't you find one such missing index already)?

 
At January 13, 2008 7:47 PM, Anonymous chielsen said...

This is a nice site about an alternatie to ORDER BY RAND():
http://jan.kneschke.de/projects/mysql/order-by-rand/

 
At January 13, 2008 7:51 PM, Anonymous Peter Goodman said...

I can't imagine this:
foreach ($data as $point) {$a_data[$point] = true;}
being faster than this:
$a_data = array_flip($data);

 
At January 13, 2008 8:21 PM, Anonymous Adam said...

Just a caveat about checking for the presence of a key in an associative array with isset():

$a = array('foo' => null)
assert(isset($a['foo'])); // fails

Usually, what you want to use is:

assert(array_key_exists($a['foo'])); // passes

 
At January 13, 2008 8:23 PM, Anonymous Adam said...

Woops! I meant:

assert(array_key_exists('foo', $a)); // passes

:)

 
At January 13, 2008 8:48 PM, Anonymous Tiago Serafim said...

Hi,

thanks for the tips. You should take a look on this: "Tuning Primer".

 
At January 13, 2008 11:21 PM, Anonymous Anonymous said...

The key reads is a rule of thumb and a fallacy. I could show you a really great performing MySQL DB that has bad key reads.

Stick with profiling MySQL and finding out where your time goes and tune that. Same thing with the PHP. If you don't know where your time goes, then how do you know where to tune? It's trial and error and a bunch of "rules of thumb". When those don't work or make the system worse, then what?

 
At January 13, 2008 11:45 PM, Blogger jjsonp said...

Excellent - thanks!

I am currently switching from a Microsoft platform (MSSQL, DTS, SSIS, and a 3rd-party data integration ETL tool called 'Data Junction'. Do you have any recommendations for a GUI-based open source ETL tool (and how to learn the ins and outs of that tool)?

Googling has turned up basically Pentaho Kettle and Clover.ETL, but I'd love to get some insight from someone actually using an application.

 
At January 14, 2008 10:16 AM, Anonymous Stew said...

Thanks Adam & Peter.

The rule of thumb thing: yeah, I agree. Being able to keep a query log for each page render is useful, as is the aforementioned xcachegrind. Would say that some rules of thumb are fairly safe, though - as long as you keep an eye on memory consumption key_buffer_size is never going to hurt?

jj - sorry, don't know much about any ETL packages, open source or otherwise. :(

 
At January 14, 2008 11:57 AM, OpenID faisalferoz said...

Never use SQL_CALC_FOUND_ROWS when you are using full text indexing. It will really slow down the performance as it turns out to be a full table scan.

 
At January 14, 2008 7:04 PM, Blogger "C" said...

I found that using json_encode/json_decode to be slower than PHP's internal serializer? Serialize is in fact very slow. You can get around that if your packing array data that does not require index's, by separating by a comma, then doing explode, its generally 6x faster. I ended up writing my own data format for arrays (key/value) to speed up the slowness in serialize/unserialize.

If you're storing data such as config data, apc_store is your best choice than using memcached for the mere fact your not hitting disk and its completely in memory. You can do some real interesting tricks with dumping APC memory to disk (requires a patch of course) for things like code pushes (minimize repopulating memory variables).

 
At January 18, 2008 8:54 AM, Anonymous Anonymous said...

Use

$data = array_flip($data);

instead of

foreach ($data as $point) {$a_data[$point] = true;}

then test isset($data["c"]) is faster but use

$a_data = array_values($data);

and test isset($a_data["c"]) is the fastest (both are hard-coded)

 

Post a Comment

<< Home


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008