Monthly Archives: September 2009

This week on twitter 2009-09-27

Powered by Twitter Tools

Faster wordpress page admin

In a recent post about wordpress I explained how you can create a lot of pages really fast.  If that went well and you inserted a few thousand pages in your wordpress blog  the page admin became useless. Displaying the list of pages would take 3 minutes for 7000 pages on my test server.

It seems this is not a new problem and there is a bug created in 2007 about it. Although it seems like there was a patch to fix this, the problem still exists in the 2.8.4 version.

Why is this so slow ?

Short story: because wordpress is trying to display and sort pages hierarchically .

At first I thought the problem was caused by the sql queries that fetched all the pages ( even though it doesn't display all of them on a page ) but that was not the case.

After profiling the code with xdebug and Kcachegrind I found there were a few parts of the code that were taking the longest time to complete.

The main problem is that wordpress is trying to find the children for all the pages in an inefficient way. There is this function get_page_children in wp-includes/post.php  which was taking about 2 thirds of the total time to complete ( ~ 2 minutes on my example  ).

The Solution

I rewrote that function to make it a lot more efficient. In my case it reduced the time from 2 minute to 1-2 seconds but on other page hierarchy it might take more, the worst case being when every page is the parent of another page.  The diff is here : [download id="15"]

The second problem is that wordpress updates the page cache every time you list pages. This was taking almost 1 minute to complete. I'm not sure if it's the right thing to just remove that call to update_page_cache in wp-includes/post.php get_pages , but doing that made the page admin load in about 15 seconds.

Now this might still be annoying but it's way better then 3 minutes. Hopefully at least the new get_page_children function will b included in the next wordpress release... maybe you can help promote this ticket by giving it a positive vote although I'm not sure if those votes actually have any influence.

This week on twitter 2009-09-20

Powered by Twitter Tools

Fast page insert in wordpress

You want to create a lot of pages in wordpress using a script to populate a blog with content you might have. To make sure your script will be compatible with future versions of wordpress you want to use wp_insert_post

The Problem

The more pages you add the slower your script will be. With about 700 pages my script took 12 seconds to add a new page.
After digging through the code I found out that wp_insert_code calls $wp_rewrite->flush_rules() every time a new post or page is inserted and that this is what takes the most time to finish.
Now it makes sense, the more pages you have the more rules ( permalinks ) you have and more time it will take to finish that function call.

The Solution

The call to $wp_rewrite->flush_rules() can be disabled by defining WP_IMPORTING. Now inserting a post takes just a second or less. But you still have to call $wp_rewrite->flush_rules() after you're done inserting all posts. This call will take quite a lot depending on the total number of posts/pages you have but it's a lot better to call it only once then a few hundreds or thousand times.

The way wordpress updates it's rules needs to change. Even if we can solve the bulk import problem by calling flush_rules at the end , we end up with a blog with thousands of pages where trying to publish a new post manually might take 30 or more seconds.

This week on twitter 2009-09-13

Powered by Twitter Tools

Qmail Big concurrency

Wanna send messages faster with your qmail server? Everyone will tell you to increase the remote concurrency. Till you find out that it can only go as high as 255. If you want more then that you have to apply the big concurrency patch

The Problem

Applying the patch and setting concurrency ( conf-spawn ) bigger then 509 will break the compilation. I was hoping to get at least 1000 on that new quad core 🙁

Why 509? It seems the number depends on the maximum size of the FD_SET array used for "monitoring" ( using select ) the file descriptors ( connection sockets or opened files ) . This limit is set in FD_SETSIZE constant to 1024. In case you're wondering ... the formula that gets you from 1024 to 509 is (FD_SETSIZE-5)/2 ( from chkspawn.c )

Trying to define FD_SETSIZE to a higher value in conf-cc like this -DFD_SETSIZE=4096 doesn't work because FD_SETSIZE is redefined in sys/select.h like this

  1. #define FD_SETSIZE __FD_SETSIZE

__FD_SETSIZE is defined somewhere in /usr/include/bits/types.h ( actually typesizes.h ) to 1024. Defining -D__FD_SETSIZE doesn't work either...I even tried both and still no luck.

The Solution

After hours of digging through mail archives and sites I found this mailing list post that really helped:
Re: fd_setsize

If you just want to get it working just download my patch [download id="14"] , apply it ( after you apply the big concurrency patch ) , set conf-spawn to something big ( but less then 65000 ) and then you should be able to compile qmail.

If you want to know how it's done, read bellow...

It seems like the solution is to include bits/types.h, undefine __FD_SETSIZE and then define it to a higher value. The author of that post says that this is not a good idea (from the portability point of view, but I don't care about that ) since programs should never directly include bits/types.h ( true ) but the alternative is to modify that system file, again not a good idea since it will be overwritten by a possible update.

My first idea was to just use that code from the mailing list post into the select.h2 , since this is the file used to generate select.h and select.h is included in spawn.c and ckhspawn.c but this didn't work because spawn.c was including "select.h" after "sys/types.h" so even if select.h would define __FD_SETSIZE it would be useless since FD_SETSIZE ( this is the one that really matters ) would have been already defined in sys/types.h .

The solution I found at the time was to just move "select.h" at the top of the file and remove "sys/types.h" since it was already included from select.h but now I realized I could have just as well undefined and defined FD_SETSIZE too inside select.h

And that's the story about how I got to run 1000 concurrent connection in qmail.

The real problem

Now that we can have so much concurrency we hit another wall. Qmail, as most other MTAs, doesn't have any way of controlling the remote concurrency per destination domain.

At 1000 simultaneous connections it's very likely that it would create a few tens or hundreds of connections simultaneously to the same domain.

When this happens that domain will just ban your ip.  So how do we fix this one?

PS: I have an answer but I want to see what you have for a solution 🙂 so hit the comments...

php tail

The Problem

You want the functionality of tail(1) in a php function.

The Solution

Bellow is a php function that gives you the last N lines from a file. It doesn't do everything tail(1) does ( like following, retrying, etc ) but just the basic stuff:

  1. span style="color: #ff0000;">'r'"";$ret="""\n"""

It may not be the fastest solution but it works.
Have a better one? Please let me know about it.

This week on twitter 2009-09-06

Powered by Twitter Tools